Crash in mozilla::detail::RunnableFunction<nsCookieService::RebuildCorruptDB(DBState*)::<lambda()>::<lambda()> >::Run

RESOLVED FIXED in Firefox 59

Status

()

Core
Networking: Cookies
P3
normal
RESOLVED FIXED
15 days ago
5 hours ago

People

(Reporter: shawnjohnjr, Assigned: junior)

Tracking

unspecified
mozilla59
Points:
---

Firefox Tracking Flags

(firefox59 fixed)

Details

(Whiteboard: [necko-triaged], crash signature)

Attachments

(2 attachments, 3 obsolete attachments)

(Reporter)

Description

15 days ago
Copied from:
https://crash-stats.mozilla.com/report/index/da7a3510-acaf-430c-b75d-61aad0171107#tab-details

0 	libxul.so 	mozilla::detail::RunnableFunction<nsCookieService::RebuildCorruptDB(DBState*)::<lambda()>::<lambda()> >::Run 	netwerk/cookie/nsCookieService.cpp:1921
1 	libxul.so 	nsThread::ProcessNextEvent(bool, bool*) 	
2 	libxul.so 	NS_ProcessNextEvent(nsIThread*, bool) 	
3 	libxul.so 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	
4 	libxul.so 	MessageLoop::Run() 	
5 	libxul.so 	nsBaseAppShell::Run 	widget/nsBaseAppShell.cpp:158
6 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:288
7 	libxul.so 	XREMain::XRE_mainRun() 	
8 	libxul.so 	XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) 	
9 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:4932
10 	firefox 	do_main(int, char**, char**) 	
11 	firefox 	main 	

I hitted crash when tapping a facebook post for "more messages". Before this crash, I have battery low and it causes my laptop shutdown directly, I guess that might create corruptted DB. After I attached power cord and boot up again, I restored sessions.

Comment 1

14 days ago
Ni? Junior who might have some idea about it.
Flags: needinfo?(juhsu)
(Assignee)

Comment 2

14 days ago
Thanks for reporting this.

It's obviously a regression of bug 870460.
Not sure there's any relationship to the "more messages" xhr.

I put P3 since there are not many crashes here.
I bet it's a null stmtInsert, but need some time to take a look.
(I didn't change the sequence, however there's something async and I will add some assertions or check first)
Assignee: nobody → juhsu
Flags: needinfo?(juhsu)
Priority: -- → P3
Whiteboard: [necko-triaged]
(Assignee)

Comment 3

9 days ago
Created attachment 8927736 [details] [diff] [review]
crashTest, v1

I can reproduce in the test with the same syndrome.
(Assignee)

Comment 4

7 days ago
Created attachment 8928414 [details] [diff] [review]
Part1: bypassDbOpsWhileCorrupt, v1

A short patch with lots of explanation:

While rebuilding, we will close the dbConn and initDbConn
after we build the new cookie database file. 

If someone access the cookie during the closing and init,
they will try to init the dbConn, fail and close everything.

In this patch, we won't try to init |dbConn|, just get/set
cookie in the hashtable and bypass all the db operations.

We already have the check to avoid accessing null dbConn like:
https://searchfox.org/mozilla-central/rev/bab833ebeef6b2202e71f81d225b968283521fd6/netwerk/cookie/nsCookieService.cpp#5162
Attachment #8927736 - Attachment is obsolete: true
Attachment #8928414 - Flags: review?(hurley)
(Assignee)

Comment 5

7 days ago
Created attachment 8928415 [details] [diff] [review]
Part2: crashTest, v2
Attachment #8928415 - Flags: review?(hurley)
(Assignee)

Comment 6

7 days ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c4adc29cb3a8f465e33c0ee47e045a27f4eec8d
Comment on attachment 8928414 [details] [diff] [review]
Part1: bypassDbOpsWhileCorrupt, v1

Review of attachment 8928414 [details] [diff] [review]:
-----------------------------------------------------------------

This looks pretty good, I just have one concern and one request.

::: netwerk/cookie/nsCookieService.cpp
@@ +1728,5 @@
>  
>    CleanupDefaultDBConnection();
>  
>    mDefaultDBState = nullptr;
> +  mInitializedDBConn = false;

What's the point of moving this here? It seems much more appropriate in CleanupDefaultDBConnect (and there's multiple callers of CleanupDefaultDBConnect, so moving this out of there could cause inconsistencies).

@@ +2760,5 @@
>  nsCookieService::EnsureReadComplete(bool aInitDBConn)
>  {
>    MOZ_ASSERT(NS_IsMainThread());
>  
> +  if (!mInitializedDBStates && mDBState->corruptFlag == DBState::OK) {

Add a comment here telling why we care about DBState::OK (it can be pretty short with a pointer to this bug).
Attachment #8928414 - Flags: review?(hurley)
Comment on attachment 8928415 [details] [diff] [review]
Part2: crashTest, v2

Review of attachment 8928415 [details] [diff] [review]:
-----------------------------------------------------------------

::: extensions/cookie/test/unit/test_cookies_async_failure.js
@@ +162,5 @@
>  
>    // Check that the cookie service accepted the new cookie.
>    do_check_eq(Services.cookiemgr.countCookiesFromHost(cookie.host), 1);
>  
> +  

nit: this line has trailing whitespace

@@ +170,5 @@
> +    Services.obs.removeObserver(rebuildingObserve, "cookie-db-rebuilding");
> +  };
> +  Services.obs.addObserver(rebuildingObserve, "cookie-db-rebuilding");
> +
> +  // Crash test

This comment needs more explanation for casual readers of the code. What crash are we testing? How, exactly, would this cause the crash (if it were to return)?
Attachment #8928415 - Flags: review?(hurley)
(Assignee)

Comment 9

6 days ago
(In reply to Nicholas Hurley [:nwgh][:hurley] (also hurley@todesschaf.org) from comment #7)
> Comment on attachment 8928414 [details] [diff] [review]
> Part1: bypassDbOpsWhileCorrupt, v1
> 
> Review of attachment 8928414 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks pretty good, I just have one concern and one request.
> 
> ::: netwerk/cookie/nsCookieService.cpp
> @@ +1728,5 @@
> >  
> >    CleanupDefaultDBConnection();
> >  
> >    mDefaultDBState = nullptr;
> > +  mInitializedDBConn = false;
> 
> What's the point of moving this here? It seems much more appropriate in
> CleanupDefaultDBConnect (and there's multiple callers of
> CleanupDefaultDBConnect, so moving this out of there could cause
> inconsistencies).
> 

I might have a bad naming, hence let's talk more about the idea of
|mInitializedDBStates| and |mInitializedDBConn|.

Here is the startup sequence:
Step1: Triggered in the main thread 
Step2: sync read db in the cookie thread, end with mInitializedDBStates = true
Step3: initialize the async db in the main thread, end with mInitializedDBConn = true

And we can access hash table and do operations to db after the whole sequence.
However, if we fail at Step2/3, we still _finish_ the initialization and set those variable to true.
(Although we cannot read the cookies from previous session/we cannot do db operations)

Let's say here comes the rebuilding.
We close the async db for corruption instead of closing the profile.
We don't block the client of CookieService at rebuilding with several reasons:
(a) we still have cookies in hashtable to read, providing a good experience
(b) no need to block the main thread for the db operations,
    since we will write back after we create a brand-new db
(c) before OMT, we already simply block the db operations if our dbConn is broken.

To confirm, all |CleanupDefaultDBConnection|s except |CloseDBStates| are
owing to init failure and rebuilding.

So, CleanupDefaultDBConnection doesn't mean we _need_ to initialize it next time.
It usually indicates we're closing the dbConn for whatever reason.


> @@ +2760,5 @@
> >  nsCookieService::EnsureReadComplete(bool aInitDBConn)
> >  {
> >    MOZ_ASSERT(NS_IsMainThread());
> >  
> > +  if (!mInitializedDBStates && mDBState->corruptFlag == DBState::OK) {
> 
> Add a comment here telling why we care about DBState::OK (it can be pretty
> short with a pointer to this bug).

I'd like to remove this change because rebuilding will not change the status of init.
Crash Signature: [@ mozilla::detail::RunnableFunction<T>::Run ]
(Assignee)

Comment 10

6 days ago
Created attachment 8928816 [details] [diff] [review]
Part1: bypassDbOpsWhileCorrupt, v2
Attachment #8928414 - Attachment is obsolete: true
Attachment #8928816 - Flags: review?(hurley)
(Assignee)

Comment 11

6 days ago
Created attachment 8928817 [details] [diff] [review]
Part2: crashTest, v3
Attachment #8928415 - Attachment is obsolete: true
Attachment #8928817 - Flags: review?(hurley)
(In reply to Junior[:junior] from comment #9)
> I might have a bad naming, hence let's talk more about the idea of
> |mInitializedDBStates| and |mInitializedDBConn|.

OK, I've read this (and the rest of your reply) and I'm satisfied. But we should contemplate renaming this if/when we do more major digging through this code :)
Attachment #8928816 - Flags: review?(hurley) → review+
Comment on attachment 8928817 [details] [diff] [review]
Part2: crashTest, v3

Review of attachment 8928817 [details] [diff] [review]:
-----------------------------------------------------------------

Nice comment, thanks!
Attachment #8928817 - Flags: review?(hurley) → review+
(Assignee)

Updated

5 days ago
Keywords: checkin-needed

Comment 14

5 days ago
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2125c1a2272f
Part1: no db access while rebuilding, r=nwgh
https://hg.mozilla.org/integration/mozilla-inbound/rev/271e6768d94a
Part2: crash test for cookie db rebuild, r=nwgh
Keywords: checkin-needed

Comment 15

4 days ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/2125c1a2272f
https://hg.mozilla.org/mozilla-central/rev/271e6768d94a
Status: NEW → RESOLVED
Last Resolved: 4 days ago
status-firefox59: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
(Assignee)

Comment 16

6 hours ago
Hello Gerry,
Bug 1419548 and Bug 1411685 Comment 9 talked about this crash.
What do you think about uplift?
Flags: needinfo?(gchang)
(Assignee)

Comment 17

6 hours ago
(In reply to Junior[:junior] from comment #16)
> Hello Gerry,
> Bug 1419548 and Bug 1411685 Comment 9 talked about this crash.
> What do you think about uplift?

Moreover, this crash during rebuilding cookie database in session
would loose it's cookie as a side effect.
Hi Junior,
Let's take this in Beta58. Please request a beta uplift.
Flags: needinfo?(gchang) → needinfo?(juhsu)
(Assignee)

Comment 19

5 hours ago
Comment on attachment 8928816 [details] [diff] [review]
Part1: bypassDbOpsWhileCorrupt, v2

Approval Request Comment
[Feature/Bug causing the regression]:bug 870460
[User impact if declined]:it's one of the top crashes
[Is this code covered by automated tests?]:yes, xpcshell-test
[Has the fix been verified in Nightly?]:yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]:two patches in this bug
[Is the change risky?]:some
[Why is the change risky/not risky?]:It changes part of code logic, but I have some confident in this patch.
[String changes made/needed]:no
Flags: needinfo?(juhsu)
Attachment #8928816 - Flags: approval-mozilla-beta?
(Assignee)

Comment 20

5 hours ago
Comment on attachment 8928817 [details] [diff] [review]
Part2: crashTest, v3

Approval Request Comment
[Feature/Bug causing the regression]:bug 870460
[User impact if declined]:it's one of the top crashes
[Is this code covered by automated tests?]:yes, xpcshell-test
[Has the fix been verified in Nightly?]:yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]:two patches in this bug
[Is the change risky?]:some
[Why is the change risky/not risky?]:It changes part of code logic, but I have some confident in this patch.
[String changes made/needed]:no
Attachment #8928817 - Flags: approval-mozilla-beta?
You need to log in before you can comment on or make changes to this bug.