Crash in mozilla::detail::RunnableFunction<nsCookieService::RebuildCorruptDB(DBState*)::<lambda()>::<lambda()> >::Run

RESOLVED FIXED in Firefox 58

Status

()

defect
P3
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: shawnjohnjr, Assigned: junior)

Tracking

unspecified
mozilla59
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox58 fixed, firefox59 fixed)

Details

(Whiteboard: [necko-triaged], crash signature)

Attachments

(2 attachments, 3 obsolete attachments)

Copied from:
https://crash-stats.mozilla.com/report/index/da7a3510-acaf-430c-b75d-61aad0171107#tab-details

0 	libxul.so 	mozilla::detail::RunnableFunction<nsCookieService::RebuildCorruptDB(DBState*)::<lambda()>::<lambda()> >::Run 	netwerk/cookie/nsCookieService.cpp:1921
1 	libxul.so 	nsThread::ProcessNextEvent(bool, bool*) 	
2 	libxul.so 	NS_ProcessNextEvent(nsIThread*, bool) 	
3 	libxul.so 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	
4 	libxul.so 	MessageLoop::Run() 	
5 	libxul.so 	nsBaseAppShell::Run 	widget/nsBaseAppShell.cpp:158
6 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:288
7 	libxul.so 	XREMain::XRE_mainRun() 	
8 	libxul.so 	XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) 	
9 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4932
10 	firefox 	do_main(int, char**, char**) 	
11 	firefox 	main 	

I hitted crash when tapping a facebook post for "more messages". Before this crash, I have battery low and it causes my laptop shutdown directly, I guess that might create corruptted DB. After I attached power cord and boot up again, I restored sessions.
Ni? Junior who might have some idea about it.
Flags: needinfo?(juhsu)
Thanks for reporting this.

It's obviously a regression of bug 870460.
Not sure there's any relationship to the "more messages" xhr.

I put P3 since there are not many crashes here.
I bet it's a null stmtInsert, but need some time to take a look.
(I didn't change the sequence, however there's something async and I will add some assertions or check first)
Assignee: nobody → juhsu
Flags: needinfo?(juhsu)
Priority: -- → P3
Whiteboard: [necko-triaged]
Posted patch crashTest, v1 (obsolete) — Splinter Review
I can reproduce in the test with the same syndrome.
A short patch with lots of explanation:

While rebuilding, we will close the dbConn and initDbConn
after we build the new cookie database file. 

If someone access the cookie during the closing and init,
they will try to init the dbConn, fail and close everything.

In this patch, we won't try to init |dbConn|, just get/set
cookie in the hashtable and bypass all the db operations.

We already have the check to avoid accessing null dbConn like:
https://searchfox.org/mozilla-central/rev/bab833ebeef6b2202e71f81d225b968283521fd6/netwerk/cookie/nsCookieService.cpp#5162
Attachment #8927736 - Attachment is obsolete: true
Attachment #8928414 - Flags: review?(hurley)
Posted patch Part2: crashTest, v2 (obsolete) — Splinter Review
Attachment #8928415 - Flags: review?(hurley)
Comment on attachment 8928414 [details] [diff] [review]
Part1: bypassDbOpsWhileCorrupt, v1

Review of attachment 8928414 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good, I just have one concern and one request.

::: netwerk/cookie/nsCookieService.cpp
@@ +1728,5 @@
>  
>    CleanupDefaultDBConnection();
>  
>    mDefaultDBState = nullptr;
> +  mInitializedDBConn = false;

What's the point of moving this here? It seems much more appropriate in CleanupDefaultDBConnect (and there's multiple callers of CleanupDefaultDBConnect, so moving this out of there could cause inconsistencies).

@@ +2760,5 @@
>  nsCookieService::EnsureReadComplete(bool aInitDBConn)
>  {
>    MOZ_ASSERT(NS_IsMainThread());
>  
> +  if (!mInitializedDBStates && mDBState->corruptFlag == DBState::OK) {

Add a comment here telling why we care about DBState::OK (it can be pretty short with a pointer to this bug).
Attachment #8928414 - Flags: review?(hurley)
Comment on attachment 8928415 [details] [diff] [review]
Part2: crashTest, v2

Review of attachment 8928415 [details] [diff] [review]:
-----------------------------------------------------------------

::: extensions/cookie/test/unit/test_cookies_async_failure.js
@@ +162,5 @@
>  
>    // Check that the cookie service accepted the new cookie.
>    do_check_eq(Services.cookiemgr.countCookiesFromHost(cookie.host), 1);
>  
> +  

nit: this line has trailing whitespace

@@ +170,5 @@
> +    Services.obs.removeObserver(rebuildingObserve, "cookie-db-rebuilding");
> +  };
> +  Services.obs.addObserver(rebuildingObserve, "cookie-db-rebuilding");
> +
> +  // Crash test

This comment needs more explanation for casual readers of the code. What crash are we testing? How, exactly, would this cause the crash (if it were to return)?
Attachment #8928415 - Flags: review?(hurley)
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #7)
> Comment on attachment 8928414 [details] [diff] [review]
> Part1: bypassDbOpsWhileCorrupt, v1
> 
> Review of attachment 8928414 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks pretty good, I just have one concern and one request.
> 
> ::: netwerk/cookie/nsCookieService.cpp
> @@ +1728,5 @@
> >  
> >    CleanupDefaultDBConnection();
> >  
> >    mDefaultDBState = nullptr;
> > +  mInitializedDBConn = false;
> 
> What's the point of moving this here? It seems much more appropriate in
> CleanupDefaultDBConnect (and there's multiple callers of
> CleanupDefaultDBConnect, so moving this out of there could cause
> inconsistencies).
> 

I might have a bad naming, hence let's talk more about the idea of
|mInitializedDBStates| and |mInitializedDBConn|.

Here is the startup sequence:
Step1: Triggered in the main thread 
Step2: sync read db in the cookie thread, end with mInitializedDBStates = true
Step3: initialize the async db in the main thread, end with mInitializedDBConn = true

And we can access hash table and do operations to db after the whole sequence.
However, if we fail at Step2/3, we still _finish_ the initialization and set those variable to true.
(Although we cannot read the cookies from previous session/we cannot do db operations)

Let's say here comes the rebuilding.
We close the async db for corruption instead of closing the profile.
We don't block the client of CookieService at rebuilding with several reasons:
(a) we still have cookies in hashtable to read, providing a good experience
(b) no need to block the main thread for the db operations,
    since we will write back after we create a brand-new db
(c) before OMT, we already simply block the db operations if our dbConn is broken.

To confirm, all |CleanupDefaultDBConnection|s except |CloseDBStates| are
owing to init failure and rebuilding.

So, CleanupDefaultDBConnection doesn't mean we _need_ to initialize it next time.
It usually indicates we're closing the dbConn for whatever reason.


> @@ +2760,5 @@
> >  nsCookieService::EnsureReadComplete(bool aInitDBConn)
> >  {
> >    MOZ_ASSERT(NS_IsMainThread());
> >  
> > +  if (!mInitializedDBStates && mDBState->corruptFlag == DBState::OK) {
> 
> Add a comment here telling why we care about DBState::OK (it can be pretty
> short with a pointer to this bug).

I'd like to remove this change because rebuilding will not change the status of init.
Crash Signature: [@ mozilla::detail::RunnableFunction<T>::Run ]
Attachment #8928414 - Attachment is obsolete: true
Attachment #8928816 - Flags: review?(hurley)
Attachment #8928415 - Attachment is obsolete: true
Attachment #8928817 - Flags: review?(hurley)
(In reply to Junior[:junior] from comment #9)
> I might have a bad naming, hence let's talk more about the idea of
> |mInitializedDBStates| and |mInitializedDBConn|.

OK, I've read this (and the rest of your reply) and I'm satisfied. But we should contemplate renaming this if/when we do more major digging through this code :)
Attachment #8928816 - Flags: review?(hurley) → review+
Comment on attachment 8928817 [details] [diff] [review]
Part2: crashTest, v3

Review of attachment 8928817 [details] [diff] [review]:
-----------------------------------------------------------------

Nice comment, thanks!
Attachment #8928817 - Flags: review?(hurley) → review+
Keywords: checkin-needed
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2125c1a2272f
Part1: no db access while rebuilding, r=nwgh
https://hg.mozilla.org/integration/mozilla-inbound/rev/271e6768d94a
Part2: crash test for cookie db rebuild, r=nwgh
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2125c1a2272f
https://hg.mozilla.org/mozilla-central/rev/271e6768d94a
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Hello Gerry,
Bug 1419548 and Bug 1411685 Comment 9 talked about this crash.
What do you think about uplift?
Flags: needinfo?(gchang)
(In reply to Junior[:junior] from comment #16)
> Hello Gerry,
> Bug 1419548 and Bug 1411685 Comment 9 talked about this crash.
> What do you think about uplift?

Moreover, this crash during rebuilding cookie database in session
would loose it's cookie as a side effect.
Hi Junior,
Let's take this in Beta58. Please request a beta uplift.
Flags: needinfo?(gchang) → needinfo?(juhsu)
Comment on attachment 8928816 [details] [diff] [review]
Part1: bypassDbOpsWhileCorrupt, v2

Approval Request Comment
[Feature/Bug causing the regression]:bug 870460
[User impact if declined]:it's one of the top crashes
[Is this code covered by automated tests?]:yes, xpcshell-test
[Has the fix been verified in Nightly?]:yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]:two patches in this bug
[Is the change risky?]:some
[Why is the change risky/not risky?]:It changes part of code logic, but I have some confident in this patch.
[String changes made/needed]:no
Flags: needinfo?(juhsu)
Attachment #8928816 - Flags: approval-mozilla-beta?
Comment on attachment 8928817 [details] [diff] [review]
Part2: crashTest, v3

Approval Request Comment
[Feature/Bug causing the regression]:bug 870460
[User impact if declined]:it's one of the top crashes
[Is this code covered by automated tests?]:yes, xpcshell-test
[Has the fix been verified in Nightly?]:yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]:two patches in this bug
[Is the change risky?]:some
[Why is the change risky/not risky?]:It changes part of code logic, but I have some confident in this patch.
[String changes made/needed]:no
Attachment #8928817 - Flags: approval-mozilla-beta?
Comment on attachment 8928816 [details] [diff] [review]
Part1: bypassDbOpsWhileCorrupt, v2

Fix a crash. Beta58+.
Attachment #8928816 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8928817 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Duplicate of this bug: 1419548
Blocks: 1420175
You need to log in before you can comment on or make changes to this bug.