Closed Bug 1054813 Opened 5 years ago Closed 5 years ago

Intermittent test_timeoutTracing.html | application crashed [@ PL_DHashTableEnumerate(PLDHashTable*, PLDHashOperator (*)(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*), void*)] after "Assertion failure: !didRemove || aTable->recursionLevel == 1"

Categories

(Core :: Networking: JAR, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox32 --- fixed
firefox33 --- fixed
firefox34 --- fixed
firefox-esr24 --- unaffected
firefox-esr31 --- wontfix

People

(Reporter: RyanVM, Assigned: aklotz)

References

Details

(Keywords: assertion, crash, intermittent-failure)

Attachments

(1 file)

Not sure if DOM: Workers is really the right component or not, but it's a start.

https://tbpl.mozilla.org/php/getParsedLog.php?id=46138610&tree=Mozilla-Inbound

Android 4.0 Panda mozilla-inbound debug test mochitest-7 on 2014-08-17 17:28:21 PDT for push a3af0d1e1662
slave: panda-0260

17:42:49     INFO -  205 INFO TEST-START | /tests/dom/workers/test/test_timeoutTracing.html
17:42:49     INFO -  INFO | automation.py | Application ran for: 0:03:02.688971
17:42:49     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpoRbQSepidlog
17:42:49     INFO -  Contents of /data/anr/traces.txt:
17:42:52     INFO -  206 INFO mozcrash Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-debug/1408320375/fennec-34.0a1.en-US.android-arm.crashreporter-symbols.zip
17:42:56     INFO -  207 INFO mozcrash Saved minidump as /builds/panda-0260/test/build/blobber_upload_dir/195b86d1-7b26-5bf1-06d8120b-09a99afa.dmp
17:42:56     INFO -  208 INFO mozcrash Saved app info as /builds/panda-0260/test/build/blobber_upload_dir/195b86d1-7b26-5bf1-06d8120b-09a99afa.extra
17:42:56  WARNING -  PROCESS-CRASH | /tests/dom/workers/test/test_timeoutTracing.html | application crashed [@ PL_DHashTableEnumerate(PLDHashTable*, PLDHashOperator (*)(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*), void*)]
17:42:56     INFO -  Crash dump filename: /tmp/tmpkhwU5x/195b86d1-7b26-5bf1-06d8120b-09a99afa.dmp
17:42:56     INFO -  Operating system: Android
17:42:56     INFO -                    0.0.0 Linux 3.2.0+ #2 SMP PREEMPT Thu Nov 29 08:06:57 EST 2012 armv7l pandaboard/pandaboard/pandaboard:4.0.4/IMM76I/5:eng/test-keys
17:42:56     INFO -  CPU: arm
17:42:56     INFO -       2 CPUs
17:42:56     INFO -  Crash reason:  SIGSEGV
17:42:56     INFO -  Crash address: 0x0
17:42:56     INFO -  Thread 13 (crashed)
17:42:56     INFO -   0  libxul.so!PL_DHashTableEnumerate(PLDHashTable*, PLDHashOperator (*)(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*), void*) [pldhash.cpp:a3af0d1e1662 : 713 + 0x14]
17:42:56     INFO -       r4 = 0x5c3346d0    r5 = 0x5c396500    r6 = 0x00000010    r7 = 0x00000010
17:42:56     INFO -       r8 = 0x00000002    r9 = 0x00000014   r10 = 0x00000140    fp = 0x5d93fc5c
17:42:56     INFO -       sp = 0x5d93fc08    lr = 0x628c835b    pc = 0x628ca634
17:42:56     INFO -      Found by: given as instruction pointer in context
17:42:56     INFO -   1  libxul.so!nsZipReaderCache::Observe(nsISupports*, char const*, char16_t const*) [nsBaseHashtable.h:a3af0d1e1662 : 207 + 0x3]
17:42:56     INFO -       r4 = 0x5c3346a0    r5 = 0x673e5bb0    r6 = 0x673e5bb0    r7 = 0x00000000
17:42:56     INFO -       r8 = 0x00000001    r9 = 0x00000005   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93fc40    pc = 0x62bdc8f3
17:42:56     INFO -      Found by: call frame info
17:42:56     INFO -   2  libxul.so!nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverList.cpp:a3af0d1e1662 : 100 + 0x9]
17:42:56     INFO -       r4 = 0x00000034    r5 = 0x00000000    r6 = 0x673e5bb0    r7 = 0x6a80a340
17:42:56     INFO -       r8 = 0x00000001    r9 = 0x00000005   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93fcc8    pc = 0x6289c045
17:42:56     INFO -      Found by: call frame info
17:42:56     INFO -   3  libxul.so!nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverService.cpp:a3af0d1e1662 : 329 + 0x9]
17:42:56     INFO -       r4 = 0x673e5bb0    r5 = 0x5c3c4794    r6 = 0x00000000    r7 = 0x6a80a340
17:42:56     INFO -       r8 = 0x00000001    r9 = 0x00000005   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93fce8    pc = 0x6289c0d7
17:42:56     INFO -      Found by: call frame info
17:42:56     INFO -   4  libxul.so!NS_InvokeByIndex [xptcinvoke_arm.cpp:a3af0d1e1662 : 164 + 0x13]
17:42:56     INFO -       r4 = 0x5d93fd20    r5 = 0x6289c055    r6 = 0x5c3c4780    r7 = 0x5d93fd28
17:42:56     INFO -       r8 = 0x00000002    r9 = 0x00000005   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93fd08    pc = 0x628bd96d
17:42:56     INFO -      Found by: call frame info
17:42:56     INFO -   5  libxul.so!CallMethodHelper::Call() [XPCWrappedNative.cpp:a3af0d1e1662 : 2368 + 0xb]
17:42:56     INFO -       r4 = 0x5d93fd78    r5 = 0x00000003    r6 = 0x00000000    r7 = 0x00000003
17:42:56     INFO -       r8 = 0x6480fd3c    r9 = 0x00000000   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93fd48    pc = 0x62bce19b
17:42:56     INFO -      Found by: call frame info
17:42:56     INFO -   6  libxul.so!XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:a3af0d1e1662 : 1696 + 0x5]
17:42:56     INFO -       r4 = 0x5d93fe98    r5 = 0x00000000    r6 = 0x00000001    r7 = 0x00000001
17:42:56     INFO -       r8 = 0x6480fd3c    r9 = 0x00000000   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93fd78    pc = 0x62bcf1d7
17:42:56     INFO -      Found by: call frame info
17:42:56     INFO -   7  libxul.so!XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [XPCWrappedNativeJSOps.cpp:a3af0d1e1662 : 1288 + 0x7]
17:42:56     INFO -       r4 = 0x6c9e4340    r5 = 0x00000000    r6 = 0x00000001    r7 = 0x00000001
17:42:56     INFO -       r8 = 0x6480fd3c    r9 = 0x00000000   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93fe48    pc = 0x62bcf5f7
17:42:56     INFO -      Found by: call frame info
17:42:56     INFO -   8  libxul.so!js::CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) [jscntxtinlines.h:a3af0d1e1662 : 231 + 0x9]
17:42:56     INFO -       r4 = 0x6c9e4340    r5 = 0x5d93ff44    r6 = 0x62bcf4f9    r7 = 0x00000000
17:42:56     INFO -       r8 = 0x00000002    r9 = 0x00000000   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93ff18    pc = 0x63edd281
17:42:56     INFO -      Found by: call frame info
17:42:56     INFO -   9  libxul.so!js::Invoke(JSContext*, JS::CallArgs, js::MaybeConstruct) [Interpreter.cpp:a3af0d1e1662 : 464 + 0x9]
17:42:56     INFO -       r4 = 0x5d93ff44    r5 = 0x00000000    r6 = 0x6c9e4340    r7 = 0x00000000
17:42:56     INFO -       r8 = 0x00000002    r9 = 0x00000000   r10 = 0x5d940438    fp = 0xffffff88
17:42:56     INFO -       sp = 0x5d93ff38    pc = 0x63f09587
17:42:56     INFO -      Found by: call frame info

17:43:00     INFO -  08-17 17:42:25.703 E/GeckoConsole( 2249): [JavaScript Warning: "The character encoding of a framed document was not declared. The document may appear different if viewed without the document framing it." {file: "http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html" line: 0}]
17:43:00     INFO -  08-17 17:42:25.734 I/GeckoDump( 2249): ⰲ겿{"action":"test_start","time":1408297345742,"thread":"","pid":null,"source":"mochitest","test":"/tests/dom/workers/test/test_timeoutTracing.html"}ⰲ겿
17:43:00     INFO -  08-17 17:42:25.804 I/Gecko   ( 2249): ++DOMWINDOW == 23 (0x673d7e00) [pid = 2249] [serial = 349] [outer = 0x6c9ec000]
17:43:00     INFO -  08-17 17:42:25.953 E/GeckoConsole( 2249): [JavaScript Warning: "The character encoding of a framed document was not declared. The document may appear different if viewed without the document framing it." {file: "http://mochi.test:8888/tests/dom/workers/test/test_timeoutTracing.html" line: 0}]
17:43:00     INFO -  08-17 17:42:25.976 I/Gecko   ( 2249): nsWindow::SetFocus: can't set focus without raising, ignoring aRaise = false!
17:43:00     INFO -  08-17 17:42:26.835 I/Gecko   ( 2249): --DOCSHELL 0x6e4db000 == 4 [pid = 2249] [id = 49]
17:43:00     INFO -  08-17 17:42:27.492 I/Gecko   ( 2249): [2249] WARNING: A control runnable was posted to a worker that is already shutting down!: file /builds/slave/m-in-and-d-0000000000000000000/build/dom/workers/WorkerPrivate.cpp, line 2233
17:43:00     INFO -  08-17 17:42:27.492 I/Gecko   ( 2249): [2249] WARNING: Failed to CC worker!: file /builds/slave/m-in-and-d-0000000000000000000/build/dom/workers/WorkerPrivate.cpp, line 2955
17:43:00     INFO -  08-17 17:42:27.500 I/Gecko   ( 2249):
17:43:00     INFO -  08-17 17:42:27.500 I/Gecko   ( 2249): ###!!! [Parent][OnMaybeDequeueOne] Error: Channel closing: too late to send/recv, messages will be lost
17:43:00     INFO -  08-17 17:42:27.500 I/Gecko   ( 2249):
17:43:00     INFO -  08-17 17:42:27.507 F/MOZ_Assert( 2249): Assertion failure: !didRemove || aTable->recursionLevel == 1, at /builds/slave/m-in-and-d-0000000000000000000/build/xpcom/glue/pldhash.cpp:713
17:43:00     INFO -  08-17 17:42:28.687 F/libc    ( 2249): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)
17:43:00     INFO -  08-17 17:42:28.890 I/DEBUG   ( 1290): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
17:43:00     INFO -  08-17 17:42:28.890 I/DEBUG   ( 1290): Build fingerprint: 'pandaboard/pandaboard/pandaboard:4.0.4/IMM76I/5:eng/test-keys'
17:43:00     INFO -  08-17 17:42:28.890 I/DEBUG   ( 1290): pid: 2249, tid: 2270  >>> org.mozilla.fennec <<<
17:43:00     INFO -  08-17 17:42:28.890 I/DEBUG   ( 1290): signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
Looks hash-tabley though I'm not sure why we're hitting a null deref.
Component: DOM: Workers → XPCOM
mZips is being touched on multiple threads simultaneously, or on the same thread recursively.  I see a number of points in that file where we operate on mZips without holding mLock (or asserting that mLock is held in that function).
Component: XPCOM → Networking: JAR
Looks like bug 982087 covers similar instances of this crash.
See Also: → 982087
Taras, is there somebody who can look at this?  This crash looks pretty sketchy and there are a number of crashes in bug 982087.
Flags: needinfo?(taras.mozilla)
Maybe one of aklotz/glandium can look at this? Can you guys talk among yourselves decide who can poke at this?
Flags: needinfo?(taras.mozilla)
Comment on attachment 8474836 [details] [diff] [review]
Fix up some mutex problems with nsZipReaderCache

Review of attachment 8474836 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.

::: modules/libjar/nsJAR.cpp
@@ +1146,5 @@
>    nsCOMPtr<nsIZipReader> outerZipReader;
>    nsresult rv = GetZip(zipFile, getter_AddRefs(outerZipReader));
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> +  MutexAutoLock lock(mLock);

I think you can avoid taking this lock till a bit later in the function.
Attachment #8474836 - Flags: review?(mwu) → review+
https://hg.mozilla.org/mozilla-central/rev/61602e8912a6
Assignee: nobody → aklotz
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Can we get Aurora/Beta nominations for this please? :)
Comment on attachment 8474836 [details] [diff] [review]
Fix up some mutex problems with nsZipReaderCache

Approval Request Comment
[Feature/regressing bug #]: JAR Cache
[User impact if declined]: Instability
[Describe test coverage new/current, TBPL]: Landed on central to fix intermittent failures in /tests/dom/workers/test/test_timeoutTracing.html
[Risks and why]: No risks; trivial patch adding additional mutual exclusion
[String/UUID change made/needed]: None
Attachment #8474836 - Flags: approval-mozilla-beta?
Attachment #8474836 - Flags: approval-mozilla-aurora?
Flags: needinfo?(aklotz)
Comment on attachment 8474836 [details] [diff] [review]
Fix up some mutex problems with nsZipReaderCache

I spoke with Aaron and Ryan about this. It looks like this crash may be occurring at some frequency in the wild, this will help the sheriffs, and Aaron has already confirmed that this is a low risk fix. Approved for Aurora and Beta.
Attachment #8474836 - Flags: approval-mozilla-beta?
Attachment #8474836 - Flags: approval-mozilla-beta+
Attachment #8474836 - Flags: approval-mozilla-aurora?
Attachment #8474836 - Flags: approval-mozilla-aurora+
QA Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.