Closed
Bug 940087
Opened 11 years ago
Closed 10 years ago
Shutdown crash in _cairo_hash_table_remove during Android 4.0 Debug mochitest-2
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla31
People
(Reporter: gbrown, Assigned: esawin)
References
Details
Attachments
(2 files)
2.67 KB,
patch
|
jrmuizel
:
review-
|
Details | Diff | Splinter Review |
966 bytes,
patch
|
jrmuizel
:
review+
|
Details | Diff | Splinter Review |
On Android 4.0 Debug (running on Cedar only), mochitest-2 usually completes successfully except for a crash on shutdown. From https://tbpl.mozilla.org/php/getParsedLog.php?id=30616114&tree=Cedar&full=1#error0: 17:14:36 INFO - 32866 INFO TEST-START | Shutdown 17:14:36 INFO - 32867 INFO Passed: 31746 17:14:36 INFO - 32868 INFO Failed: 0 17:14:36 INFO - 32869 INFO Todo: 489 17:14:36 INFO - 32870 INFO Slowest: 45367ms - /tests/content/html/content/test/forms/test_input_typing_sanitization.html 17:14:36 INFO - 32871 INFO SimpleTest FINISHED 17:14:36 INFO - INFO | automation.py | Application ran for: 0:09:29.475872 17:14:36 INFO - INFO | zombiecheck | Reading PID log: /tmp/tmpBJclAgpidlog 17:14:37 INFO - mozcrash INFO | Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/cedar-android-debug/1384556438/fennec-28.0a1.en-US.android-arm.crashreporter-symbols.zip 17:14:47 INFO - /data/anr/traces.txt not found 17:14:47 WARNING - PROCESS-CRASH | Shutdown | application crashed [@ _cairo_hash_table_remove] 17:14:47 INFO - Crash dump filename: /tmp/tmpMJCrH5/46711953-75d7-b9ea-2e734f1d-7baf1814.dmp 17:14:47 INFO - Operating system: Android 17:14:47 INFO - 0.0.0 Linux 3.2.0+ #2 SMP PREEMPT Thu Nov 29 08:06:57 EST 2012 armv7l pandaboard/pandaboard/pandaboard:4.0.4/IMM76I/5:eng/test-keys 17:14:47 INFO - CPU: arm 17:14:47 INFO - 2 CPUs 17:14:47 INFO - Crash reason: SIGSEGV 17:14:47 INFO - Crash address: 0x0 17:14:47 INFO - Thread 13 (crashed) 17:14:47 INFO - 0 libxul.so!_cairo_hash_table_remove [cairo-hash.c:e8087e6cbf4d : 488 + 0x2] 17:14:47 INFO - r4 = 0x635bd3c4 r5 = 0x0000002b r6 = 0x65a0b060 r7 = 0x730810c0 17:14:47 INFO - r8 = 0x63743408 r9 = 0x5be05e20 r10 = 0x635bd248 fp = 0x85bdd17b 17:14:47 INFO - sp = 0x5d466918 lr = 0x629849bf pc = 0x629849c4 17:14:47 INFO - Found by: given as instruction pointer in context 17:14:47 INFO - 1 libxul.so!_cairo_scaled_font_map_destroy [cairo-scaled-font.c:e8087e6cbf4d : 417 + 0x7] 17:14:47 INFO - r4 = 0x65c86000 r5 = 0x730810c0 r6 = 0x63743408 r7 = 0x635bebf8 17:14:47 INFO - r8 = 0x635be9a4 r9 = 0x635be984 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466948 pc = 0x629955c5 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 2 libxul.so!_moz_cairo_debug_reset_static_data [cairo-debug.c:e8087e6cbf4d : 64 + 0x3] 17:14:47 INFO - r4 = 0x69916000 r5 = 0x69916018 r6 = 0x69916020 r7 = 0x5be592b4 17:14:47 INFO - r8 = 0x5be595e8 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466970 pc = 0x629813bb 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 3 libxul.so!gfxPlatform::~gfxPlatform() [gfxPlatform.cpp:e8087e6cbf4d : 588 + 0x3] 17:14:47 INFO - r4 = 0x69916000 r5 = 0x69916018 r6 = 0x69916020 r7 = 0x5be592b4 17:14:47 INFO - r8 = 0x5be595e8 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466978 pc = 0x6276b807 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 4 libxul.so!gfxAndroidPlatform::~gfxAndroidPlatform() [gfxAndroidPlatform.cpp:e8087e6cbf4d : 127 + 0x9] 17:14:47 INFO - r4 = 0x69916000 r5 = 0x69916074 r6 = 0x6398f3c0 r7 = 0x5be592b4 17:14:47 INFO - r8 = 0x5be595e8 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466988 pc = 0x62749771 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 5 libxul.so!gfxAndroidPlatform::~gfxAndroidPlatform() [gfxAndroidPlatform.cpp:e8087e6cbf4d : 127 + 0x3] 17:14:47 INFO - r4 = 0x69916000 r5 = 0x5be59550 r6 = 0x00000038 r7 = 0x5be592b4 17:14:47 INFO - r8 = 0x5be595e8 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466998 pc = 0x62749785 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 6 libxul.so!gfxPlatform::Shutdown() [gfxPlatform.cpp:e8087e6cbf4d : 565 + 0x5] 17:14:47 INFO - r4 = 0x69274c70 r5 = 0x5be59550 r6 = 0x00000038 r7 = 0x5be592b4 17:14:47 INFO - r8 = 0x5be595e8 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d4669a0 pc = 0x6276beed 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 7 libxul.so!nsComponentManagerImpl::KnownModule::~KnownModule() [nsComponentManager.h:e8087e6cbf4d : 224 + 0x1] 17:14:47 INFO - r4 = 0x5beb3140 r5 = 0x5be59550 r6 = 0x00000038 r7 = 0x5be592b4 17:14:47 INFO - r8 = 0x5be595e8 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d4669b8 pc = 0x6271497d 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 8 libxul.so!nsAutoPtr<nsComponentManagerImpl::KnownModule>::~nsAutoPtr() [nsAutoPtr.h:e8087e6cbf4d : 77 + 0x5] 17:14:47 INFO - r4 = 0x5beb3140 r5 = 0x5be59550 r6 = 0x00000038 r7 = 0x5be592b4 17:14:47 INFO - r8 = 0x5be595e8 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d4669c0 pc = 0x627179cf 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 9 libxul.so!nsTArray_Impl<nsAutoPtr<nsComponentManagerImpl::KnownModule>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned int, unsigned int) [nsTArray.h:e8087e6cbf4d : 534 + 0x3] 17:14:47 INFO - r4 = 0x00000000 r5 = 0x5be59554 r6 = 0x00000038 r7 = 0x5be592b4 17:14:47 INFO - r8 = 0x5be595e8 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d4669d0 pc = 0x62717a2f 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 10 libxul.so!nsComponentManagerImpl::Shutdown() [nsComponentManager.cpp:e8087e6cbf4d : 808 + 0x7] 17:14:47 INFO - r4 = 0x5be59200 r5 = 0x626ed107 r6 = 0x63743408 r7 = 0x63743408 17:14:47 INFO - r8 = 0x63203f61 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d4669f0 pc = 0x62717b81 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 11 libxul.so!mozilla::ShutdownXPCOM(nsIServiceManager*) [nsXPComInit.cpp:e8087e6cbf4d : 778 + 0x3] 17:14:47 INFO - r4 = 0x5d466a1c r5 = 0x00000000 r6 = 0x63383be8 r7 = 0x63743408 17:14:47 INFO - r8 = 0x63203f61 r9 = 0x5d466a24 r10 = 0x626f9f5d fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466a08 pc = 0x626f10f5 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 12 libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:e8087e6cbf4d : 1132 + 0x5] 17:14:47 INFO - r4 = 0x5d466a4c r5 = 0x5be3b144 r6 = 0x63743408 r7 = 0x00000000 17:14:47 INFO - r8 = 0x63743408 r9 = 0x00000000 r10 = 0x630a6a63 fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466a48 pc = 0x61a11717 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 13 libxul.so!XREMain::XRE_main(int, char**, nsXREAppData const*) [nsAppRunner.cpp:e8087e6cbf4d : 4069 + 0x5] 17:14:47 INFO - r4 = 0x5d466a94 r5 = 0x00000000 r6 = 0x5be3b144 r7 = 0x00000000 17:14:47 INFO - r8 = 0x63743408 r9 = 0x00000000 r10 = 0x630a6a63 fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466a60 pc = 0x61a161d5 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 14 libxul.so!XRE_main [nsAppRunner.cpp:e8087e6cbf4d : 4246 + 0x3] 17:14:47 INFO - r4 = 0x00000000 r5 = 0x5bc3ec90 r6 = 0x5be35148 r7 = 0x0000000c 17:14:47 INFO - r8 = 0x5d466ba8 r9 = 0x5bc3ec90 r10 = 0x630a6a63 fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466a90 pc = 0x61a16355 17:14:47 INFO - Found by: call frame info 17:14:47 INFO - 15 libxul.so!GeckoStart [nsAndroidStartup.cpp:e8087e6cbf4d : 73 + 0xf] 17:14:47 INFO - r4 = 0x5d466ba4 r5 = 0x5be53180 r6 = 0x63743408 r7 = 0x00000000 17:14:47 INFO - r8 = 0x5d466ba8 r9 = 0x5bc3ec90 r10 = 0x630a6a63 fp = 0x5d466c04 17:14:47 INFO - sp = 0x5d466ba0 pc = 0x61a101dd 17:14:47 INFO - Found by: call frame info
Updated•11 years ago
|
Assignee: nobody → snorp
Reporter | ||
Comment 1•10 years ago
|
||
Still happening, pretty consistently: https://tbpl.mozilla.org/php/getParsedLog.php?id=33013667&tree=Cedar&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=32788613&tree=Cedar&full=1 https://tbpl.mozilla.org/php/getParsedLog.php?id=32787590&tree=Cedar&full=1 Notice that the .dmp files for these crashes are readily available: 22:32:48 INFO - (blobuploader) - INFO - TinderboxPrint: Uploaded 7886d373-c6df-adc1-338f84ea-58698ae0.dmp to http://mozilla-releng-blobs.s3.amazonaws.com/blobs/cedar/sha512/16450aec0b8c9e1b0e6641ee4e27e96e8b583a46a8a2fa4ccae569c63c0298895f3114ea1a921b261858614e0842584d190bc790b5d7330ed7b25d3327501356
Updated•10 years ago
|
Assignee: snorp → esawin
Reporter | ||
Comment 2•10 years ago
|
||
There is a similar crash in some crashtests, again, only on Android 4.0 Debug: https://tbpl.mozilla.org/php/getParsedLog.php?id=33337466&tree=Cedar&full=1#error20
Assignee | ||
Comment 3•10 years ago
|
||
The crash is reproducible on optimal builds with enabled cairo_debug_reset_static_data() at http://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatform.cpp#599 , but it's non-deterministic: https://tbpl.mozilla.org/?tree=Try&rev=1b3596cfebec
Assignee | ||
Comment 4•10 years ago
|
||
Cache entries may expire and be removed from the hash table before the reset, so we need to check whether the entry for a given key still exists before removing it. Besides solving the bug, this patch makes _cairo_hash_table_remove more safe and idiomatic for a hash table remove-by-key function. Opt tests: https://tbpl.mozilla.org/?tree=Try&rev=608da6bc6c6b Opt (with activated cairo reset) tests: https://tbpl.mozilla.org/?tree=Try&rev=7175c6adb6e6
Attachment #8367270 -
Flags: review?(jmuizelaar)
Comment 5•10 years ago
|
||
Comment on attachment 8367270 [details] [diff] [review] cairo-hash-table-remove.patch Review of attachment 8367270 [details] [diff] [review]: ----------------------------------------------------------------- I'd rather not change the semantics of this function without also doing it upstream. Can we just fix the caller instead?
Attachment #8367270 -
Flags: review?(jmuizelaar) → review-
Assignee | ||
Comment 6•10 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #5) > Comment on attachment 8367270 [details] [diff] [review] > cairo-hash-table-remove.patch > > Review of attachment 8367270 [details] [diff] [review]: > ----------------------------------------------------------------- > > I'd rather not change the semantics of this function without also doing it > upstream. Can we just fix the caller instead? We can move the check out of the function into the caller, but that would introduce overhead due to the linear probing in the hash table lookup and leave the remove function unsafe. That leaves us with the options: 1. Move the check up into caller: additional lookup per entry overhead. 2. Move the check up into caller and add remove-by-entry function to cairo: no overhead, cairo extension. 3. Keep the check in the remove function: no overhead, safer remove function, changed semantics (silent ignore instead of crash). 4. Keep the check in the remove function and let it return whether it was successful: no overhead, safer remove function, changed semantics, changed interface, most idiomatic behavior for a remove-by-key function. I think solution 4 is the "right way", with 2 being the workaround without overhead. But since it's just the reset function, option 1 is also acceptable.
Comment 7•10 years ago
|
||
(In reply to Eugen Sawin (:esawin) <esawin@mozilla.com> from comment #6) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #5) > > Comment on attachment 8367270 [details] [diff] [review] > > cairo-hash-table-remove.patch > > > > Review of attachment 8367270 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > I'd rather not change the semantics of this function without also doing it > > upstream. Can we just fix the caller instead? > > We can move the check out of the function into the caller, but that would > introduce overhead due to the linear probing in the hash table lookup and > leave the remove function unsafe. > > That leaves us with the options: > 1. Move the check up into caller: additional lookup per entry overhead. > 2. Move the check up into caller and add remove-by-entry function to cairo: > no overhead, cairo extension. > 3. Keep the check in the remove function: no overhead, safer remove > function, changed semantics (silent ignore instead of crash). > 4. Keep the check in the remove function and let it return whether it was > successful: no overhead, safer remove function, changed semantics, changed > interface, most idiomatic behavior for a remove-by-key function. > > I think solution 4 is the "right way", with 2 being the workaround without > overhead. But since it's just the reset function, option 1 is also > acceptable. In this case the caller is a debug destruction helper so doing an extra lookup is not a problem. However, one thing that's not clear to me is how fonts in the holdover table manage to not have entries in the hash table. How does that happen?
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #7) > (In reply to Eugen Sawin (:esawin) <esawin@mozilla.com> from comment #6) > > (In reply to Jeff Muizelaar [:jrmuizel] from comment #5) > > > Comment on attachment 8367270 [details] [diff] [review] > > > cairo-hash-table-remove.patch > > > > > > Review of attachment 8367270 [details] [diff] [review]: > > > ----------------------------------------------------------------- > > > > > > I'd rather not change the semantics of this function without also doing it > > > upstream. Can we just fix the caller instead? > > > > We can move the check out of the function into the caller, but that would > > introduce overhead due to the linear probing in the hash table lookup and > > leave the remove function unsafe. > > > > That leaves us with the options: > > 1. Move the check up into caller: additional lookup per entry overhead. > > 2. Move the check up into caller and add remove-by-entry function to cairo: > > no overhead, cairo extension. > > 3. Keep the check in the remove function: no overhead, safer remove > > function, changed semantics (silent ignore instead of crash). > > 4. Keep the check in the remove function and let it return whether it was > > successful: no overhead, safer remove function, changed semantics, changed > > interface, most idiomatic behavior for a remove-by-key function. > > > > I think solution 4 is the "right way", with 2 being the workaround without > > overhead. But since it's just the reset function, option 1 is also > > acceptable. > > In this case the caller is a debug destruction helper so doing an extra > lookup is not a problem. > > However, one thing that's not clear to me is how fonts in the holdover table > manage to not have entries in the hash table. How does that happen? Further investigations have revealed that the holdover table becomes inconsistent during the execution of cairo_scaled_font_destroy: https://tbpl.mozilla.org/?tree=Try&rev=e38a109ef5fc. Since the error is not deterministically reproducible, I assume this inconsistency is caused by improper mutex locking and some of the code comments do hint at such a possibility. Regarding code comments, the comment at http://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-scaled-font.c#320 does not imply a strong coupling between the hash table entries and the holdovers. This would make the check as proposed by options 3 and 4 in my previous comment necessary. Another comment at http://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-hash.c#474 does propose that the proposed solution 4 would actually implement the remove function as it was intended. Since the tests show that the errors happen within Cairo functions and not between calls, I don't see a way to avoid upstreaming and I would be inclined to go with option 4 and correct the semantics of the remove function.
Assignee | ||
Comment 9•10 years ago
|
||
This is the line, where we add a font to the holdovers, which is not necessarily contained in the hash table: http://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-scaled-font.c#1261 I think we can take the description at http://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo-scaled-font.c#320 literally and consider the holdovers and the hash table as decoupled.
Flags: needinfo?(jmuizelaar)
Comment 10•10 years ago
|
||
(In reply to Eugen Sawin (:esawin) <esawin@mozilla.com> from comment #8) > Further investigations have revealed that the holdover table becomes > inconsistent during the execution of cairo_scaled_font_destroy: > https://tbpl.mozilla.org/?tree=Try&rev=e38a109ef5fc. > Since the error is not deterministically reproducible, I assume this > inconsistency is caused by improper mutex locking and some of the code > comments do hint at such a possibility. I don't think there are any cases where we use cairo on multiple threads on Android so this shouldn't be related to mutex locking. (In reply to Eugen Sawin (:esawin) <esawin@mozilla.com> from comment #9) > This is the line, where we add a font to the holdovers, which is not > necessarily contained in the hash table: > http://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo- > scaled-font.c#1261 Under what conditions would the scaled_font not be in the hash table? From my reading, all live fonts should be in the hash table and so remain in the hash table when added to the holdover table in scaled_font_destroy. > I think we can take the description at > http://dxr.mozilla.org/mozilla-central/source/gfx/cairo/cairo/src/cairo- > scaled-font.c#320 literally and consider the holdovers and the hash table as > decoupled. I'm not convinced of this.
Flags: needinfo?(jmuizelaar)
Reporter | ||
Comment 11•10 years ago
|
||
Still happening: https://tbpl.mozilla.org/php/getParsedLog.php?id=35889326&tree=Cedar&full=1
Assignee | ||
Comment 13•10 years ago
|
||
The tests showed that we try to delete fonts (and consequently put them into the holdovers), which are not contained in the hash table. The previously proposed patch fixed that by making the hash table remove function safe. However, the real issue is that on Android, we call |cairo_debug_reset_static_data()| twice for debug builds. First time we call it at http://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxAndroidPlatform.cpp#136 and the second time in its base class destructor at http://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxPlatform.cpp#530. According to code comments, calling |cairo_debug_reset_static_data()| before |SkGraphics::Term()| is unsafe, which adds to the issue. If calling |FT_Done_Library(gPlatformFTLibrary)| before |cairo_debug_reset_static_data()| is safe, we can simply remove the cairo resetting in |~gfxAndroidPlatform()|, otherwise we need to add a preprocessor switch in |~gfxPlatform()|. Results show green for the first solution: https://tbpl.mozilla.org/?tree=Try&rev=61b9f49a980b. I will create the proper patches for both solutions for review next.
Flags: needinfo?(esawin) → needinfo?(jmuizelaar)
Assignee | ||
Comment 14•10 years ago
|
||
This patch prevents double resetting of Cairo data on debug builds by removing resetting for optimal builds.
Attachment #8392198 -
Flags: review?(jmuizelaar)
Flags: needinfo?(jmuizelaar)
Updated•10 years ago
|
Attachment #8392198 -
Flags: review?(jmuizelaar) → review+
Assignee | ||
Updated•10 years ago
|
Keywords: checkin-needed
Comment 15•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/ee01dea0ae3b
Keywords: checkin-needed
Comment 16•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/ee01dea0ae3b
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
You need to log in
before you can comment on or make changes to this bug.
Description
•