Closed
Bug 764758
Opened 12 years ago
Closed 12 years ago
Intermittent crash during test_xhtmlManifest.xhtml [@ nsOfflineCacheUpdateItem::OnDataAvailable]
Categories
(Core :: Networking: Cache, defect)
Tracking
()
RESOLVED
FIXED
mozilla19
Tracking | Status | |
---|---|---|
b2g18 | --- | fixed |
People
(Reporter: emorley, Assigned: mayhemer)
References
Details
(Keywords: crash, intermittent-failure)
Crash Data
Attachments
(4 files)
1.06 KB,
patch
|
jduell.mcbugs
:
review+
mayhemer
:
checkin+
|
Details | Diff | Splinter Review |
1.21 KB,
patch
|
jduell.mcbugs
:
review+
|
Details | Diff | Splinter Review |
1020 bytes,
patch
|
lsblakk
:
approval-mozilla-release-
lsblakk
:
approval-mozilla-b2g18+
|
Details | Diff | Splinter Review |
1.21 KB,
patch
|
lsblakk
:
approval-mozilla-release-
lsblakk
:
approval-mozilla-b2g18+
|
Details | Diff | Splinter Review |
Rev4 MacOSX Lion 10.7 mozilla-inbound opt test mochitests-3/5 on 2012-06-13 12:26:51 PDT for push 584ae96f6c95 slave: talos-r4-lion-075 https://tbpl.mozilla.org/php/getParsedLog.php?id=12632207&tree=Mozilla-Inbound { 2441 INFO TEST-START | /tests/dom/tests/mochitest/ajax/offline/test_xhtmlManifest.xhtml TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/ajax/offline/test_xhtmlManifest.xhtml | Exited with code 1 during test run INFO | automation.py | Application ran for: 0:01:04.853570 INFO | automation.py | Reading PID log: /var/folders/qd/srwd5f710sj0fcl9z464lkj00000gn/T/tmp7Sfd_ypidlog Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1339613605/firefox-16.0a1.en-US.mac.crashreporter-symbols.zip PROCESS-CRASH | /tests/dom/tests/mochitest/ajax/offline/test_xhtmlManifest.xhtml | application crashed (minidump found) Crash dump filename: /var/folders/qd/srwd5f710sj0fcl9z464lkj00000gn/T/tmpeQN5Os/minidumps/BC32F20A-7DE5-469F-A027-84C92E87BD2E.dmp Operating system: Mac OS X 10.7.2 11C74 CPU: amd64 family 6 model 23 stepping 10 2 CPUs Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0xd0 Thread 0 (crashed) 0 XUL!nsOfflineCacheUpdateItem::OnDataAvailable [nsOfflineCacheUpdate.cpp : 1882 + 0x0] rbx = 0x140fbb80 r12 = 0x0a6f1110 r13 = 0x00000000 r14 = 0x0000016a r15 = 0x00000000 rip = 0x01c48d5b rsp = 0x5fbfba10 rbp = 0x00000000 Found by: given as instruction pointer in context 1 XUL!nsStreamListenerTee::OnDataAvailable [nsStreamListenerTee.cpp : 89 + 0x1a] rbx = 0x00000000 r12 = 0x0a6f1110 r13 = 0x00000000 r14 = 0x0000016a r15 = 0x00000000 rip = 0x010590f9 rsp = 0x5fbfba40 rbp = 0x0cd21580 Found by: call frame info 2 XUL!mozilla::net::nsHttpChannel::OnDataAvailable [nsHttpChannel.cpp : 5053 + 0x16] rbx = 0x08906800 r12 = 0x0000016a r13 = 0x003d2000 r14 = 0x00000000 r15 = 0x0a6f1110 rip = 0x010d5a5c rsp = 0x5fbfbad0 rbp = 0x0000016a Found by: call frame info 3 XUL!nsInputStreamPump::OnStateTransfer [nsInputStreamPump.cpp : 486 + 0x1f] rbx = 0x00000000 r12 = 0x003d2000 r13 = 0x5fbfbb4c r14 = 0x5fbfbb30 r15 = 0x5fbfbc4f rip = 0x010385fb rsp = 0x5fbfbb20 rbp = 0x16430b70 Found by: call frame info 4 XUL!nsInputStreamPump::OnInputStreamReady [nsInputStreamPump.cpp : 370 + 0x7] rbx = 0x16430b70 r12 = 0x16430b78 r13 = 0x00307da8 r14 = 0x5fbfbbe0 r15 = 0x5fbfbc4f rip = 0x01038a58 rsp = 0x5fbfbb80 rbp = 0x003d2000 Found by: call frame info 5 XUL!nsInputStreamReadyEvent::Run [nsStreamUtils.cpp : 82 + 0x5] rbx = 0x161678e0 r12 = 0x00000000 r13 = 0x00307da8 r14 = 0x5fbfbbe0 r15 = 0x5fbfbc4f rip = 0x0202b51c rsp = 0x5fbfbba0 rbp = 0x8000ffff Found by: call frame info 6 XUL!nsThread::ProcessNextEvent [nsThread.cpp : 624 + 0x5] rbx = 0x00307d80 r12 = 0x00000000 r13 = 0x00307da8 r14 = 0x5fbfbbe0 r15 = 0x5fbfbc4f rip = 0x020400b3 rsp = 0x5fbfbbb0 rbp = 0x8000ffff Found by: call frame info 7 XUL!NS_ProcessPendingEvents_P [nsThreadUtils.cpp : 166 + 0xd] rbx = 0x00307d80 r12 = 0xe751dd1c r13 = 0x00000014 r14 = 0x5fbfbc4f r15 = 0x00306d88 rip = 0x01fff82e rsp = 0x5fbfbc40 rbp = 0x00000000 Found by: call frame info 8 XUL!nsBaseAppShell::NativeEventCallback [nsBaseAppShell.cpp : 97 + 0x13] rbx = 0x05b36480 r12 = 0x00307d80 r13 = 0x00000000 r14 = 0x05b0a458 r15 = 0x00306d88 rip = 0x01e69f19 rsp = 0x5fbfbc80 rbp = 0x00000002 Found by: call frame info 9 XUL!nsAppShell::ProcessGeckoEvents [nsAppShell.mm : 402 + 0x7] rbx = 0x05b0a440 r12 = 0x003d2000 r13 = 0x74f1dae0 r14 = 0x05b0a458 r15 = 0x00306d88 rip = 0x01e2e09f rsp = 0x5fbfbcb0 rbp = 0x05b36480 }
Reporter | ||
Comment 1•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=12635604&tree=Mozilla-Inbound
Assignee | ||
Comment 2•12 years ago
|
||
Could it be that Thread 0 stack is not complete? My idea is a re-entrance. But the stack starts with CoreFoundation (in full log) that indicates it's the thread's entry point (though, I'm not a Gecko-on-MacOSX expert, however, some code in ProcessGeckoEvents suggests it may be re-entrant aware).
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•12 years ago
|
Assignee: nobody → jschoenick
Status: NEW → ASSIGNED
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 20•12 years ago
|
||
John, are you working on this?
Comment 21•12 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #20) > John, are you working on this? I spent some time looking at this, and made some unsuccessful attempts to reproduce this in a debugger. As you noted it's not clear how we could be re-entering, and I didn't see any obvious issues looking over the code path. I'll look at it some more today, if you have any ideas let me know.
Assignee | ||
Comment 22•12 years ago
|
||
It's definitely a regression from one of these changesets: http://hg.mozilla.org/mozilla-central/rev/10b04214089d http://hg.mozilla.org/mozilla-central/rev/f2b089df69b3 Question is, which one. Hard to say based on the low frequency of this crash. I'd like to keep this on the radar, since it may be caused by some more serious issue hidden under, however that is not very likely (from my experience). I personally would land an investigation patch that MOZ_RUNTIME_ABORT's (or what is the name) when mUpdate is null on the crashing place with some status message about the object. It has to be smart. I don't at the moment have an idea about the cause, but some mistake in the parallelization code is very likely.
Reporter | ||
Comment 23•12 years ago
|
||
Thank you for looking at this :-)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 25•12 years ago
|
||
John, this is really something we need to take care about (we already had to, actually, since it really hides or creates more serious issue). What is your progress here? If none, then I will take this and start working on this immediately. Thanks.
Comment 26•12 years ago
|
||
Apologies, this ended up pretty low on my priority list, and I didn't get the sense that it was very critical from comment 22.
Assignee: jschoenick → honzab.moz
Assignee | ||
Comment 27•12 years ago
|
||
Jason, this review is quit urgent, please take a look soon, the patch is extremely simple. https://tbpl.mozilla.org/?tree=Try&rev=49c69c40ec59 Cause of the crash, quit racy: - important note here is that the first item in a manifest must lead to e.g. a 404 and have a lot (16+) of following, working, explicit items - we post nsOfflineCacheUpdate::Run() up to the count of the parallel limit - there may be a lot of them, up to 16 - the channel for the first item finishes with an error (like 404) before all those nsOfflineCacheUpdate::Run() gets processed - nsOfflineCacheUpdate switches to STATE_FINISHED and cancels all items that are in progress (what are most of them) - canceling an item means to a) cancel its http channel if there is such yet b) set its mState to UNINITIALIZED - then, another nsOfflineCacheUpdate::Run() gets called and searches for an item that is in state UNINITIALIZED (most of them) ; [this is the bug actually, it should just immediately return when the update is already done] - it finds it and opens another channel on it - problem is that OnStopRequest on the item posts nsOfflineCacheUpdateItem::Run() [method of the item in this case, not the update] that may get triggered later, after the previous step - call to nsOfflineCacheUpdateItem::Run() is already in the queue while we have opened a new (second) channel for which the same item is again a listener - nsOfflineCacheUpdateItem::Run() gets fired, clears mUpdate - the second channel now calls OnStopRerquest => Bang!
Attachment #670810 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Comment 28•12 years ago
|
||
An additional check that should prevent this issue when we changes something in the future. This has to land only on m-c since, IMO dangerous for m-a+.
Attachment #670813 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Comment 29•12 years ago
|
||
Comment on attachment 670813 [details] [diff] [review] v1 annexe for m-c only The annex has its own try run: https://tbpl.mozilla.org/?tree=Try&rev=203090a833f7
Comment 30•12 years ago
|
||
Comment on attachment 670810 [details] [diff] [review] v1 Review of attachment 670810 [details] [diff] [review]: ----------------------------------------------------------------- ::: uriloader/prefetch/nsOfflineCacheUpdate.cpp @@ +1714,5 @@ > NS_ASSERTION(mState == STATE_DOWNLOADING, > "ProcessNextURI should only be called from the DOWNLOADING state"); > > + if (mState != STATE_DOWNLOADING) { > + LOG((" after completion, ignoring")); Do you know that the only other state possible here is after completion? If not, better to use "Unexpected: mState != STATE_DOWNLOADING. Ignoring URI".
Attachment #670810 -
Flags: review?(jduell.mcbugs) → review+
Comment 31•12 years ago
|
||
Comment on attachment 670813 [details] [diff] [review] v1 annexe for m-c only Review of attachment 670813 [details] [diff] [review]: ----------------------------------------------------------------- Looks good too. Do you intend for this to bake on m-c for a while then land on aurora, or do you just want to let it ride the trains? It will miss B2G v1 if it doesn't land on aurora.
Attachment #670813 -
Flags: review?(jduell.mcbugs) → review+
Assignee | ||
Comment 32•12 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #30) > Do you know that the only other state possible here is after completion? If > not, better to use "Unexpected: mState != STATE_DOWNLOADING. Ignoring URI". I changed the log to " should only be called from the DOWNLOADING state, ignoring", similar to what the assertion comment looks like. (In reply to Jason Duell (:jduell) from comment #31) > Looks good too. Do you intend for this to bake on m-c for a while then land > on aurora, or do you just want to let it ride the trains? It will miss B2G > v1 if it doesn't land on aurora. I think it is OK to just land and leave this part on m-c. It is not needed for the fix. BTW, I am also able to modify an existing test [1] to reproduce the crash on OSX instantly [2]. I may produce a patch for it too... [1] https://hg.mozilla.org/try/rev/e4ce2cbec104 [2] https://tbpl.mozilla.org/?tree=Try&rev=e4ce2cbec104
Assignee | ||
Comment 33•12 years ago
|
||
Comment on attachment 670810 [details] [diff] [review] v1 https://hg.mozilla.org/integration/mozilla-inbound/rev/7e6fcfed1b74
Attachment #670810 -
Flags: checkin+
Assignee | ||
Comment 34•12 years ago
|
||
Comment on attachment 670813 [details] [diff] [review] v1 annexe for m-c only https://hg.mozilla.org/integration/mozilla-inbound/rev/14516fa301a3
Comment 35•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/7e6fcfed1b74 https://hg.mozilla.org/mozilla-central/rev/14516fa301a3
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
Updated•12 years ago
|
Keywords: intermittent-failure
Updated•12 years ago
|
Whiteboard: [orange]
Assignee | ||
Updated•11 years ago
|
Attachment #670810 -
Flags: approval-mozilla-b2g18?
Assignee | ||
Updated•11 years ago
|
Attachment #670813 -
Flags: approval-mozilla-b2g18?
Assignee | ||
Comment 38•11 years ago
|
||
Assignee | ||
Comment 39•11 years ago
|
||
Assignee | ||
Comment 40•11 years ago
|
||
Comment on attachment 702488 [details] [diff] [review] v1 annexe merged for b2g18 [Approval Request Comment] Regression caused by (bug #): 744719 User impact if declined: potential null deref crash Testing completed (on m-c, etc.): fix now on beta Risk to taking this patch (and alternatives if risky): very low according the time we have been testing it on trains
Attachment #702488 -
Attachment is patch: true
Attachment #702488 -
Flags: approval-mozilla-release?
Assignee | ||
Comment 41•11 years ago
|
||
Comment on attachment 702487 [details] [diff] [review] v1 merged for b2g18 [Approval Request Comment] Same as the other part of the fix.
Attachment #702487 -
Flags: approval-mozilla-release?
Attachment #702487 -
Flags: approval-mozilla-b2g18?
Assignee | ||
Updated•11 years ago
|
Attachment #670810 -
Flags: approval-mozilla-b2g18?
Assignee | ||
Updated•11 years ago
|
Attachment #670813 -
Flags: approval-mozilla-b2g18?
Assignee | ||
Updated•11 years ago
|
Attachment #702488 -
Flags: approval-mozilla-b2g18?
Assignee | ||
Comment 42•11 years ago
|
||
According the release approval: this is a low volume crash - https://crash-stats.mozilla.com/query/query?product=Firefox&version=ALL%3AALL&range_value=3&range_unit=weeks&date=01%2F15%2F2013+21%3A18%3A07&query_search=signature&query_type=contains&query=nsOfflineCacheUpdateItem%3A%3AOnDataAvailable&reason=&build_id=&process_type=any&hang_type=any&do_query=1
Comment 43•11 years ago
|
||
> [Approval Request Comment] Please note that we believe this is causing bug 829637, a b2g18 test permaorange, which likely reflects broken appcache functionality.
Comment 44•11 years ago
|
||
Comment on attachment 702487 [details] [diff] [review] v1 merged for b2g18 Approving for b2g18 branch, no need for this on the mozilla-release branch - we're already past the deadline for 18.0.1 landings.
Attachment #702487 -
Flags: approval-mozilla-b2g18? → approval-mozilla-b2g18+
Updated•11 years ago
|
Attachment #702487 -
Flags: approval-mozilla-release? → approval-mozilla-release-
Updated•11 years ago
|
Attachment #702488 -
Flags: approval-mozilla-release?
Attachment #702488 -
Flags: approval-mozilla-release-
Attachment #702488 -
Flags: approval-mozilla-b2g18?
Attachment #702488 -
Flags: approval-mozilla-b2g18+
Assignee | ||
Comment 45•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g18/rev/e20248de35e6 https://hg.mozilla.org/releases/mozilla-b2g18/rev/adb15d672c3f
Updated•11 years ago
|
status-b2g18:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•