Last Comment Bug 761228 - Startup Crash - Assertion failure: mCacheAsyncInputStream, at ../mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450
: Startup Crash - Assertion failure: mCacheAsyncInputStream, at ../mozilla/netw...
Status: RESOLVED FIXED
[startupcrash]
: crash, regression
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: 15 Branch
: All All
: -- normal (vote)
: mozilla16
Assigned To: Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
:
Mentors:
: 761660 (view as bug list)
Depends on: 737615 769531
Blocks: 746018
  Show dependency treegraph
 
Reported: 2012-06-04 10:28 PDT by Mike Conley (:mconley) - (needinfo me!)
Modified: 2012-07-16 16:30 PDT (History)
13 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
unaffected
+
disabled
+
fixed


Attachments
Debug log (16.48 KB, text/plain)
2012-06-05 10:30 PDT, Florian Quèze [:florian] [:flo]
no flags Details
test case (3.49 KB, patch)
2012-06-11 19:50 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
no flags Details | Diff | Review
test case [v2] (3.72 KB, patch)
2012-06-12 03:03 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
jduell.mcbugs: review+
Details | Diff | Review
Fix 304 response handling for custom conditional responses and clarify handling of unexpected 304 responses (3.71 KB, patch)
2012-06-12 03:05 PDT, Brian Smith (:briansmith, :bsmith, use NEEDINFO?)
jduell.mcbugs: review+
Details | Diff | Review

Description Mike Conley (:mconley) - (needinfo me!) 2012-06-04 10:28:44 PDT
On a Linux debug build of comm-central.

STR:

1. Start up Thunderbird, and open up the "What's New" tab.
2. Close Thunderbird with the "What's New" tab still open. 
3. Restart Thunderbird. The "What's New" tab should restore itself.
4. Wait a few seconds.

What happens?

Crash!

What's expected?

No crashes please.

Backtrace:

#0  0xb78ce424 in __kernel_vsyscall ()
#1  0xb7672c16 in nanosleep () at ../sysdeps/unix/syscall-template.S:82
#2  0xb7672a0f in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
#3  0xb4428669 in ah_crap_handler (signum=6) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsSigHandlers.cpp:87
#4  0xb442f07b in nsProfileLock::FatalSignalHandler (signo=6, info=0xbfb7297c, context=0xbfb729fc) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/toolkit/profile/nsProfileLock.cpp:190
#5  <signal handler called>
#6  0xb78ce424 in __kernel_vsyscall ()
#7  0xb78a7dde in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:42
#8  0xb45005fe in mozilla::net::nsHttpChannel::ReadFromCache (this=0x9f116c00, alreadyMarkedValid=false) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450
#9  0xb4501538 in mozilla::net::nsHttpChannel::ProcessNotModified (this=0x9f116c00) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:2186
#10 0xb450475a in mozilla::net::nsHttpChannel::ProcessResponse (this=0x9f116c00) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:1250
#11 0xb4504b46 in mozilla::net::nsHttpChannel::OnStartRequest (this=0x9f116c00, request=0x9f13e640, ctxt=0x0) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:4747
#12 0xb4451e15 in nsInputStreamPump::OnStateStart (this=0x9f13e640) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/base/src/nsInputStreamPump.cpp:416
#13 0xb4452568 in nsInputStreamPump::OnInputStreamReady (this=0x9f13e640, stream=0x9f19853c) at /media/Projects/mozilla/thunderbird/mozilla/netwerk/base/src/nsInputStreamPump.cpp:367
#14 0xb55fd8eb in nsInputStreamReadyEvent::Run (this=0x9f13f680) at /media/Projects/mozilla/thunderbird/mozilla/xpcom/io/nsStreamUtils.cpp:81
#15 0xb56125f7 in nsThread::ProcessNextEvent (this=0xb734cf20, mayWait=true, result=0xbfb730bf) at /media/Projects/mozilla/thunderbird/mozilla/xpcom/threads/nsThread.cpp:624
#16 0xb55ceec8 in NS_ProcessNextEvent_P (thread=<optimized out>, mayWait=true) at /media/Projects/mozilla/objdir-thunderbird-patches/mozilla/xpcom/build/nsThreadUtils.cpp:213
#17 0xb54f37b4 in mozilla::ipc::MessagePump::Run (this=0xb2619490, aDelegate=0xb735b900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/glue/MessagePump.cpp:113
#18 0xb563fddc in MessageLoop::RunInternal (this=0xb735b900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:208
#19 0xb563fe01 in RunHandler (this=0xb735b900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:201
#20 MessageLoop::Run (this=0xb735b900) at /media/Projects/mozilla/thunderbird/mozilla/ipc/chromium/src/base/message_loop.cc:175
#21 0xb50fd715 in nsBaseAppShell::Run (this=0xb12512e0) at /media/Projects/mozilla/thunderbird/mozilla/widget/xpwidgets/nsBaseAppShell.cpp:163
#22 0xb4f42a74 in nsAppStartup::Run (this=0xb1278130) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/components/startup/nsAppStartup.cpp:256
#23 0xb4423298 in XREMain::XRE_mainRun (this=0xbfb733e0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3786
#24 0xb4423546 in XREMain::XRE_main (this=0xbfb733e0, argc=1, argv=0xbfb746a4, aAppData=0xb73158c0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3863
#25 0xb44237ad in XRE_main (argc=1, argv=0xbfb746a4, aAppData=0xb73158c0, aFlags=0) at /media/Projects/mozilla/thunderbird/mozilla/toolkit/xre/nsAppRunner.cpp:3939
#26 0x0804a0ee in do_main (argv=0xbfb746a4, argc=1, exePath=0xbfb735dc "/media/Projects/mozilla/objdir-thunderbird-patches/mozilla/dist/bin/") at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:111
#27 main (argc=<error reading variable: Cannot access memory at address 0x6c9>, argv=<error reading variable: Cannot access memory at address 0x6cd>) at /media/Projects/mozilla/thunderbird/mail/app/nsMailApp.cpp:200
Comment 1 David :Bienvenu 2012-06-04 10:41:24 PDT
I see this on windows as well.
Comment 2 Mike Conley (:mconley) - (needinfo me!) 2012-06-04 10:46:43 PDT
This assertion that we're failing seems to have been introduced via bug 746018.
Comment 3 :Irving Reid (No longer working on Firefox) 2012-06-04 11:01:03 PDT
I'm seeing it maybe 1/3 of the time on Mac
Comment 4 Florian Quèze [:florian] [:flo] 2012-06-05 06:10:11 PDT
This is how it looks in my terminal when it crashes:

Failed to load file:///Users/florian/buildhg/comm-central/obj-tbird/mozilla/dist/DailyDebug.app/Contents/MacOS/chrome/messenger/content/messenger/AccountManager.js
++DOMWINDOW == 28 (0x22deb3ec) [serial = 28] [outer = 0x1e7260f0]
WARNING: NS_ENSURE_TRUE(mScriptGlobalObject) failed: file /Users/florian/buildhg/comm-central/mozilla/content/xul/document/src/nsXULDocument.cpp, line 3598
WARNING: NS_ENSURE_TRUE((mCachedResponseHead->Status() / 100 != 3) || isCachedRedirect) failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2941
WARNING: cache check failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2857
--DOMWINDOW == 27 (0x22de992c) [serial = 27] [outer = 0x1e7260f0] [url = chrome://messenger/content/msgAccountCentral.xul]
--DOMWINDOW == 26 (0x22d9975c) [serial = 17] [outer = 0x228894d0] [url = about:blank]
--DOMWINDOW == 25 (0x22d7e4cc) [serial = 16] [outer = 0x2288f4b0] [url = about:blank]
--DOMWINDOW == 24 (0x22dd3e6c) [serial = 18] [outer = 0x1e7260f0] [url = about:blank]
Assertion failure: mCacheAsyncInputStream, at /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450


And the same part of the log when it doesn't crash:
Failed to load file:///Users/florian/buildhg/comm-central/obj-tbird/mozilla/dist/DailyDebug.app/Contents/MacOS/chrome/messenger/content/messenger/AccountManager.js
++DOMWINDOW == 28 (0x24d1beac) [serial = 28] [outer = 0x1e726060]
WARNING: NS_ENSURE_TRUE(mScriptGlobalObject) failed: file /Users/florian/buildhg/comm-central/mozilla/content/xul/document/src/nsXULDocument.cpp, line 3598
WARNING: NS_ENSURE_TRUE(mCachedResponseHead) failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2126
--DOMWINDOW == 27 (0x22de970c) [serial = 27] [outer = 0x1e726060] [url = chrome://messenger/content/msgAccountCentral.xul]
--DOMWINDOW == 26 (0x22d9969c) [serial = 17] [outer = 0x228894d0] [url = about:blank]
--DOMWINDOW == 25 (0x22d7e40c) [serial = 16] [outer = 0x2288f4c0] [url = about:blank]
--DOMWINDOW == 24 (0x22dd3d9c) [serial = 18] [outer = 0x1e726060] [url = about:blank]
Comment 5 Florian Quèze [:florian] [:flo] 2012-06-05 06:32:20 PDT
Note: The 2 logs in comment 4 were with -purgecaches on the command line. Without it, I see an additional assertion, immediately after the "Failed to load" message:

Failed to load file:///Users/florian/buildhg/comm-central/obj-tbird/mozilla/dist/DailyDebug.app/Contents/MacOS/chrome/messenger/content/messenger/AccountManager.js
###!!! ASSERTION: Existing entry in disk StartupCache.: 'zipItem == nsnull', file /Users/florian/buildhg/comm-central/mozilla/startupcache/StartupCache.cpp, line 344
Comment 6 Florian Quèze [:florian] [:flo] 2012-06-05 06:45:54 PDT
I've just tried with a fresh profile, and some of the messages in comment 4 and 5 don't appear.

In my terminal before the crash I see:
WARNING: NS_ENSURE_TRUE((mCachedResponseHead->Status() / 100 != 3) || isCachedRedirect) failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2941
WARNING: cache check failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2857
Assertion failure: mCacheAsyncInputStream, at /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp:3450


And when there's no crash I see:
WARNING: NS_ENSURE_TRUE(mCachedResponseHead) failed: file /Users/florian/buildhg/comm-central/mozilla/netwerk/protocol/http/nsHttpChannel.cpp, line 2126
Comment 7 Mike Conley (:mconley) - (needinfo me!) 2012-06-05 07:11:39 PDT
I reported this yesterday before the merge, so I'm pretty sure this crash just got merged over to comm-aurora.

I'm gonna take a run at this today to see what's going on.
Comment 8 Mike Conley (:mconley) - (needinfo me!) 2012-06-05 07:48:03 PDT
Having read into the patch a bit more, I'm starting to suspect that this is indeed caused by bug 746018. I'm compiling without those patches to confirm.
Comment 9 Mike Conley (:mconley) - (needinfo me!) 2012-06-05 08:20:02 PDT
Yep, the patches in bug 746018 introduce the assertion, and our debug builds crash with them, and don't without them.

So either we've hit an edge case, or we're somehow abusing nsHttpChannel and need to change our ways. Either way, marking this as blocking 746018.
Comment 10 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-05 09:56:00 PDT
I filed the platform bug, bug 761660 to track the fixing of the platform bug that is causing this crash. Even if you are doing something unusual with nsHttpChannel, we should not be crashing.

It would very very useful to get a log of this, as I cannot reproduce. Please run Daily with NSPR_LOG_MODULES=nsHttp:5 and add the log to this bug or to bug 761660. Thanks!
Comment 11 Mike Conley (:mconley) - (needinfo me!) 2012-06-05 10:28:15 PDT
(In reply to Brian Smith (:bsmith) from comment #10)
> I filed the platform bug, bug 761660 to track the fixing of the platform bug
> that is causing this crash. Even if you are doing something unusual with
> nsHttpChannel, we should not be crashing.
> 

Note that this crash is due to an assertion failure, and I think we want to crash on those - so the crash, while unpleasant, is not necessarily unexpected / unwanted.

> It would very very useful to get a log of this, as I cannot reproduce.
> Please run Daily with NSPR_LOG_MODULES=nsHttp:5 and add the log to this bug
> or to bug 761660. Thanks!

Sure, posting in bug 761660.
Comment 12 Florian Quèze [:florian] [:flo] 2012-06-05 10:30:42 PDT
Created attachment 630221 [details]
Debug log

(In reply to Brian Smith (:bsmith) from comment #10)

> It would very very useful to get a log of this, as I cannot reproduce.
> Please run Daily with NSPR_LOG_MODULES=nsHttp:5 and add the log to this bug
> or to bug 761660. Thanks!

The build I used to generate this log has the
MOZ_ASSERT(mCacheAsyncInputStream); line commented out so that I could
work on something else, but I added a comment in the log at the point where the crash would occur.
Comment 13 Florian Quèze [:florian] [:flo] 2012-06-05 10:38:03 PDT
Note to Thunderbird developers annoyed by this crash:

The debug log I attached shows that we crash during the HTTP request for testpilot.mozillalabs.com/testcases/index-tb.json so I think we can workaround the crash by disabling the Test Pilot add-on.
Comment 14 :Irving Reid (No longer working on Firefox) 2012-06-07 13:06:55 PDT
(In reply to Florian Quèze from comment #13)
> Note to Thunderbird developers annoyed by this crash:
> 
> The debug log I attached shows that we crash during the HTTP request for
> testpilot.mozillalabs.com/testcases/index-tb.json so I think we can
> workaround the crash by disabling the Test Pilot add-on.

Interesting - I wonder if TP (at least under Thunderbird) is trying to initialize before all the infrastructure is set up? If so, it would be nice to get a more human-readable error message.

Is there a well defined point in the platform startup when the HTTP channel caching becomes ready to use?
Comment 15 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-07 13:32:53 PDT
I don't think this has to do with using the HTTP cache before it is ready to be used.

The issue seems to be related to this check failing:

   NS_ENSURE_TRUE((mCachedResponseHead->Status() / 100 != 3) || isCachedRedirect)

It looks like we wrote a 3xx response into the cache and this new stricter check I added is refusing to cache that. But, then we get a 304 response from the server and we start to read from the cache entry input stream that we didn't create.

Still haven't determined why we're sending a conditional request. I wonder if TestPilot is doing a custom conditional request; if so, the problem is probably that we do not set mCustomConditionalRequest before we do the above check.
Comment 16 :Irving Reid (No longer working on Firefox) 2012-06-08 08:17:25 PDT
Test Pilot is trying to explicitly manage this particular file download to use a local cached version if it is unable to retrieve the on-line version; see

http://mxr.mozilla.org/comm-central/source/mail/app/profile/extensions/tbtestpilot@labs.mozilla.com/modules/remote-experiment-loader.js#448

and

http://mxr.mozilla.org/comm-central/source/mail/app/profile/extensions/tbtestpilot@labs.mozilla.com/modules/remote-experiment-loader.js#85

Copy Gregg Lind on the bug in case he can help with the specifics of the test pilot experiment loader.
Comment 17 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-11 19:50:01 PDT
Created attachment 632111 [details] [diff] [review]
test case

Here is the test case. If you run it on an unpatched build, it will crash. With the upcoming patch, there is no crash and we correctly process the response.

make -C netwerk/test
SOLO_FILE=test_customConditionalRequest_304.js make -C netwerk/test check-one
Comment 18 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-11 19:53:18 PDT
Comment on attachment 632111 [details] [diff] [review]
test case

Will update the test case to include another case.
Comment 19 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-12 03:03:24 PDT
Created attachment 632187 [details] [diff] [review]
test case [v2]

Here's the test case that can be used to reproduce the crash.
Comment 20 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-12 03:05:08 PDT
Created attachment 632188 [details] [diff] [review]
Fix 304 response handling for custom conditional responses and clarify handling of unexpected 304 responses

The bug was mainly that we returned from CheckCache() before setting mCustomConditionalRequest. This patch fixes the bug by remembering that that the request is a custom conditional request before we would ever return from CheckCache.
Comment 21 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-12 03:07:56 PDT
*** Bug 761660 has been marked as a duplicate of this bug. ***
Comment 22 Jason Duell [:jduell] (needinfo? me) 2012-06-13 13:26:11 PDT
Comment on attachment 632187 [details] [diff] [review]
test case [v2]

Review of attachment 632187 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/test/unit/test_304_responses.js
@@ +33,5 @@
> +  response.setHeader("Returned-From-Handler", "1");
> +}
> +
> +function run_test() {
> +  evict_cache_entries();

As mentioned on IRC, we don't currently store cache entries across XPCshell tests, so no need to evict at startup.

@@ +64,5 @@
> +
> +// Test that we can cope with a 304 response that was (erroneously) stored in
> +// the cache.
> +add_test(function test_304_stored_in_cache() {
> +  asyncOpenCacheEntry(

It's really nice that we have this new ability to create cache entries in tests.  Good stuff.
Comment 23 Jason Duell [:jduell] (needinfo? me) 2012-06-13 13:45:11 PDT
Comment on attachment 632188 [details] [diff] [review]
Fix 304 response handling for custom conditional responses and clarify handling of unexpected 304 responses

Review of attachment 632188 [details] [diff] [review]:
-----------------------------------------------------------------

Irving/Gregg:

> Test Pilot is trying to explicitly manage this particular file download to use a 
> local cached version if it is unable to retrieve the on-line version

That code does it's own, custom store of the response data into a file that has nothing to do with the HTTP cache.  AFACIT from glancing over it should work OK with this patch (you seem to handle the 304 case explicitly and use your copy).
Comment 24 Mark Banner (:standard8) 2012-06-18 03:08:42 PDT
Apologies for noise, need to clear the now not-necessary Thunderbird tracking flag.
Comment 25 Alex Keybl [:akeybl] 2012-06-18 16:30:35 PDT
Bug 746018 has been in the release since FF10. Why should this be tracked for release now? Is this a high volume crash?
Comment 26 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-18 17:15:04 PDT
(In reply to Alex Keybl [:akeybl] from comment #25)
> Bug 746018 has been in the release since FF10. Why should this be tracked
> for release now? Is this a high volume crash?

No, that landed in FF15 and this is a regression since FF15.
Comment 27 Alex Keybl [:akeybl] 2012-06-19 11:26:48 PDT
(In reply to Brian Smith (:bsmith) from comment #26)
> (In reply to Alex Keybl [:akeybl] from comment #25)
> > Bug 746018 has been in the release since FF10. Why should this be tracked
> > for release now? Is this a high volume crash?
> 
> No, that landed in FF15 and this is a regression since FF15.

Mistook the "version" for the "target milestone". We'll track this new startup crash regression for FF15 in that case.
Comment 28 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-06-28 15:33:59 PDT
https://hg.mozilla.org/mozilla-central/rev/28d3f3778315
Comment 29 Alex Keybl [:akeybl] 2012-07-16 08:00:25 PDT
(In reply to Brian Smith (:bsmith) from comment #28)
> https://hg.mozilla.org/mozilla-central/rev/28d3f3778315

Is this good to uplift to FF15 at this point? Or does the patch carry significant risk of regression?
Comment 30 Brian Smith (:briansmith, :bsmith, use NEEDINFO?) 2012-07-16 16:30:29 PDT
(In reply to Alex Keybl [:akeybl] from comment #29)
> (In reply to Brian Smith (:bsmith) from comment #28)
> > https://hg.mozilla.org/mozilla-central/rev/28d3f3778315
> 
> Is this good to uplift to FF15 at this point? Or does the patch carry
> significant risk of regression?

This was caused by the patches for bug 746018, which got backed out of Aurora 15 in bug 722034 comment 79.

Note You need to log in before you can comment on or make changes to this bug.