Closed Bug 708248 Opened 13 years ago Closed 13 years ago

Crash in mozilla::storage::StorageSQLiteMultiReporter::CollectReports with Stylish 1.2.4 enabled

Categories

(Toolkit :: Storage, defect)

All
Windows XP
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: ferongr, Assigned: n.nethercote)

References

Details

(Keywords: crash, regression, reproducible)

Crash Data

Attachments

(2 files, 1 obsolete file)

Having only Stylish 1.2.4 enabled, without any userstyles active makes Nightly crash with a [@ RtlpWaitForCriticalSection | RtlEnterCriticalSection ] signature every approximately 2 minutes, regardless of what pages are open, or what action I perform. Latest crash report bellow

https://crash-stats.mozilla.com/report/index/bp-efc4daf6-6304-46fb-919d-e7b3c2111207
I can confirm this crash.
Earlier having stylish 1.2.4 enabled, I was getting a crash in about a minute or two.
But after disabling, I haven't got one since 30 minutes.
(In reply to scrapmachines from comment #1)
> I can confirm this crash.
> Earlier having stylish 1.2.4 enabled, I was getting a crash in about a
> minute or two.
> But after disabling, I haven't got one since 30 minutes.
EDIT : It still crashes , just with a larger time gap.
bp-0bbf12b4-a3df-4e8a-b51a-0571d2111207

Reproducible: Always

Steps to Reproduce:
1. Start Firefox with clean profile
2. Open URL ( https://addons.mozilla.org/ja/firefox/addon/stylish/ )
3. Install Stylish 1.2.4
4. Restart
5. Open about:memory

Actual Results:
 Browser crashes

Expected Results:
 Browser should not crash
 
Regression window(m-c)
Works;
http://hg.mozilla.org/mozilla-central/rev/338fae43b9d0
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111205 Firefox/11.0a1 ID:20111205235717
Crashes;
http://hg.mozilla.org/mozilla-central/rev/bf8259fcab61
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111206 Firefox/11.0a1 ID:20111206051818
Pushlog;
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=338fae43b9d0&tochange=bf8259fcab61


Regression window(m-i)
Works;
http://hg.mozilla.org/integration/mozilla-inbound/rev/bfb517a374fc
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111205 Firefox/11.0a1 ID:20111205153219
Crashes;
http://hg.mozilla.org/integration/mozilla-inbound/rev/1f093067e982
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0a1) Gecko/20111205 Firefox/11.0a1 ID:20111205192318
Pushlog;
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=bfb517a374fc&tochange=1f093067e982
Triggered by:
1f093067e982	Nicholas Nethercote — Bug 703143 - Use a memory multi-reporter for SQLite's per-connection reporting. r=sdwilsh.
Blocks: 703143
Severity: major → critical
Status: UNCONFIRMED → NEW
Crash Signature: [@ RtlpWaitForCriticalSection | RtlEnterCriticalSection] [@ sqlite3_db_status]
Component: General → Storage
Ever confirmed: true
Keywords: regression
Product: Firefox → Toolkit
QA Contact: general → storage
Crash Signature: [@ RtlpWaitForCriticalSection | RtlEnterCriticalSection] [@ sqlite3_db_status] → [@ RtlpWaitForCriticalSection | RtlEnterCriticalSection] [@ sqlite3_db_status] [@ RtlpWaitOnCriticalSection | RtlDeNormalizeProcessParams]
Keywords: crash, reproducible
Hardware: x86 → All
Summary: 20111207 Nightly build crashes with Stylish 1.2.4 enabled → Crash in mozilla::storage::StorageSQLiteMultiReporter::CollectReports with Stylish 1.2.4 enabled
Crash Signature: [@ RtlpWaitForCriticalSection | RtlEnterCriticalSection] [@ sqlite3_db_status] [@ RtlpWaitOnCriticalSection | RtlDeNormalizeProcessParams] → [@ RtlpWaitForCriticalSection | RtlEnterCriticalSection] [@ sqlite3_db_status] [@ RtlpWaitOnCriticalSection | RtlDeNormalizeProcessParams] [@ ntdll.dll@0x7e0d8] [@ RtlpWaitOnCriticalSection | CsrAllocateMessagePointer] [@ RtlGetElementGenericTable] …
I received this crash when I submitted a comment here on Bugzilla..

https://crash-stats.mozilla.com/report/index/bp-8943a2c2-7025-41ee-8130-522082111207
The MultiReporter runs on every telemetry ping, which is why some people are getting it every minute or so.  It also runs when about:memory is loaded.

I can't reproduce this on a 64-bit debug Linux build.

It's very odd that Stylish is triggering this.
Attached patch patch (obsolete) — Splinter Review
> I can't reproduce this on a 64-bit debug Linux build.

My mistake, I was running the wrong build.  I can reproduce hangs and/or assertion failures.

Stylish opens multiple DB connections and then somehow closes them such that the Connection is still present but Connection::mDBConn is NULL and so Connection::sharedDBMutex is bogus.  Looks like this could happen if Stylish calls mozIStorageConnection::close, which calls Connection::Close().

StorageSQLiteMultiReporter::CollectReports() then tries to lock Connection::sharedDBMutex, which is when the crash/hang/assert happens.

The attached patch checks if the connection is ready before getting the lock.  That's enough to avoid the hangs/assertions on my machine, but I'm not sure if it's good enough in general -- if we're really unlucky another thread might close the connection after the readiness check succeeds but before we get the lock?

Hmm, but it looks like this approach is used throughout Connection -- in BeginTransactionAs, GetTransactionInProgress, CommitTransaction, etc.  So hopefully it's good enough here.  I also added a comment saying that any callers of getConnections() need to check if the Connection is ready via mozIStorageConnection::connectionReady.

FWIW, here's some diagnostic output where I printed details about all the live Connections:

i=0, name=stylish.sqlite, mDBConn=(nil)
i=1, name=permissions.sqlite, mDBConn=0x7f59e2ee7800
i=2, name=stylish.sqlite, mDBConn=(nil)
i=3, name=stylish.sqlite, mDBConn=(nil)
i=4, name=places.sqlite, mDBConn=0x7f59ddef5000
i=5, name=stylish.sqlite, mDBConn=(nil)
i=6, name=content-prefs.sqlite, mDBConn=0x7f59dda28400
i=7, name=places.sqlite, mDBConn=0x7f59dcd5dc00
i=8, name=webappsstore.sqlite, mDBConn=0x7f59ddae4c00
i=9, name=chromeappsstore.sqlite, mDBConn=0x7f59dc32dc00
i=10, name=urlclassifier3.sqlite, mDBConn=0x7f59dc1dac00
i=11, name=stylish.sqlite, mDBConn=(nil)
i=12, name=stylish.sqlite, mDBConn=(nil)
i=13, name=stylish.sqlite, mDBConn=(nil)

(BTW, I've gotta say that I really dislike the |sqlite *| type operator in Connection, I find it just obfuscates everything horribly.  I'd much rather use GetNativeConnection().)
Assignee: nobody → nnethercote
Attachment #579961 - Flags: review?(sdwilsh)
Attachment #579961 - Attachment description: patch, probably not good enough → patch
I'm getting crashes when I post in a forum for example. And sometimes when I use the middle click.
Crash Signature: RtlGetElementGenericTable] [@ ntdll.dll@0x82825] → RtlGetElementGenericTable] [@ ntdll.dll@0x82825] [@ ntdll.dll@0x3de10] [@ RtlRaiseStatus] [@ RtlpDosPathNameToRelativeNtPathName_U] [@ RtlpOptimizeSRWLockList] [@ RtlpWaitOnCriticalSection | RtlNtStatusToDosErrorNoTeb] [@ RegisterGuidsApiCallback] …
Combining crash signatures from this bug and bug 708285, there have been 3450 crashes in 11.0a1/20111207, making it the crashiest build.
It would be fine to review the patch ASAP.
Today's build crashes a lot as well, and it is not stylish because I have it disabled and all.
(In reply to Guillermo Moya (MetalS) from comment #9)
> Today's build crashes a lot as well, and it is not stylish because I have it
> disabled and all.

Do you have any other add-ons installed?
I disabled Stylish yesterday and all of my crashes stopped. I still have Stylish disabled and haven't had a single crash with today's Nightly build. 

Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:11.0a1) Gecko/20111208 Firefox/11.0a1
Built from http://hg.mozilla.org/mozilla-central/rev/6785d3003414
Same as v1, but with a test.

If/when you give r+, I'll fold this into the patch from bug 703143 when landing.
Attachment #579961 - Attachment is obsolete: true
Attachment #579961 - Flags: review?(sdwilsh)
Attachment #580203 - Flags: review?(sdwilsh)
On WinXp Win32 2011-12-08 Nightly 11.0a1, Stylish 1.2.4 enabled, when trying to view about:memory I get this crash ID: https://crash-stats.mozilla.com/report/index/bp-78a72b63-89f7-41bd-847f-5a00a2111208
Comment on attachment 580203 [details] [diff] [review]
patch v2, with test

Review of attachment 580203 [details] [diff] [review]:
-----------------------------------------------------------------

::: storage/src/mozStorageService.cpp
@@ +197,5 @@
>          pathHead.Append(conn->getFilename());
>          pathHead.AppendLiteral("/");
>  
> +        if (!*conn.get())
> +            continue;

Can you attach a patch with more context please?  Not enough here to easily review this.
> Can you attach a patch with more context please?  Not enough here to easily
> review this.

The machine with the patch is currently turned off, but https://hg.mozilla.org/integration/mozilla-inbound/rev/1f093067e982#l3.53 should give you plenty of context.
I've reported bug 709000 (when in doubt, file) but it may just be the same issue.
I don't have the stylish add-on. I am a simple person ;)

http://crash-stats.mozilla.com/report/index/bp-6ac27dc3-d179-4d6e-9251-43a862111207

http://crash-stats.mozilla.com/report/index/bp-5a3e224a-3484-4140-a27a-7acd62111209

Is there a 64 bit beta version of Firefox?
(In reply to Nicholas Nethercote [:njn] from comment #10)
> (In reply to Guillermo Moya (MetalS) from comment #9)
> > Today's build crashes a lot as well, and it is not stylish because I have it
> > disabled and all.
> 
> Do you have any other add-ons installed?

Yes, but disabling any of them seems to work. Anyway I give you mi list:
NoScript, AdBlock Plus, FlashGot, priv3 and Compability reporter.

Browser crashed around 6 times while writting this.
huh Disabling any of them DOESN'T seem to work***
Hi, I was just added to this bug from 709517. This is what Nightly is doing on my computer. It tries to open but stops after 2 to 3 seconds. That is what I see in the Processes tab in Windows Task Manager.

This is what I have tried. First I tried starting it in safe mode. Then I reinstalled it twice and tried using 3 different profiles. I downloaded the newest version yesterday and reinstalled over the existing version then I uninstalled the program and reinstalled it. I tried the default profile on a new fresh installation, the profile I use for FF 11.0a1 and the profile I use for FF 8.

I don't have the Stylish addon. I have approx. 50 addons that I use. It is probably more than I should. I can provide a list if anyone wants me to. I don't think my issue is an addon problem. It would not start in safe mode and a test profile without addons.
This bug is fixed by the backout of bug 703143.
Download again the Nightly from http://nightly.mozilla.org/
(In reply to Scoobidiver from comment #24)
> This bug is fixed by the backout of bug 703143.
> Download again the Nightly from http://nightly.mozilla.org/

In my computer Nightly 64 bit edition crashes right after opening, actually it doesn't even open up, and it is not caught by Mozilla's Crash Reporter. 

Windows Error Reporting comes up and asks me to send additional information to Microsoft.

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-11.0a1.en-US.win64-x86_64.installer.exe

My CPU is AMD Athlon X3 450 running Windows 7, 64 bit.

I think this was caused by Friday's Dec 9th update?
(In reply to Vic from comment #26)
> (In reply to Scoobidiver from comment #24)
> > This bug is fixed by the backout of bug 703143.
> > Download again the Nightly from http://nightly.mozilla.org/
> 
> In my computer Nightly 64 bit edition crashes right after opening, actually
> it doesn't even open up, and it is not caught by Mozilla's Crash Reporter. 
> 
> Windows Error Reporting comes up and asks me to send additional information
> to Microsoft.
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/firefox-
> 11.0a1.en-US.win64-x86_64.installer.exe
> 
> My CPU is AMD Athlon X3 450 running Windows 7, 64 bit.
> 
> I think this was caused by Friday's Dec 9th update?

sorry...please my ignore previous post...found the bug was already filed here: Bug 709383
Yes it started Friday morning.
Nightly x64 is doing the same thing on my computer as Vic's.

Windows 7 x64
Intel Core 2 Duo E8400
Sorry, Bug 709383 sounds like the problems I'm having.
I can't tell if comment 15 satisfied you, so here's the patch with 30 lines of context, hopefully that's enough.  It applies on top of the patch from bug 703143, thereby fixing the crash that it causes.
Attachment #581475 - Flags: review?(sdwilsh)
(In reply to Nicholas Nethercote [:njn] from comment #30)
> I can't tell if comment 15 satisfied you, so here's the patch with 30 lines
> of context, hopefully that's enough.  It applies on top of the patch from
> bug 703143, thereby fixing the crash that it causes.
I haven't had time to look at bugzilla, sorry.  I'll get to this either tonight or tomorrow.
Comment on attachment 581475 [details] [diff] [review]
patch v2, with test (with lots of context)

Review of attachment 581475 [details] [diff] [review]:
-----------------------------------------------------------------

r=sdwilsh

::: storage/src/mozStorageService.cpp
@@ +197,5 @@
>          pathHead.Append(conn->getFilename());
>          pathHead.AppendLiteral("/");
>  
> +        if (!*conn.get())
> +            continue;

nit: brace this if please, and add a comment explaining why we check that.  It'd probably be clearer if you called `GetConnectionReady` as the method header states too.
Attachment #581475 - Flags: review?(sdwilsh) → review+
> The attached patch checks if the connection is ready before getting the
> lock.  That's enough to avoid the hangs/assertions on my machine, but I'm
> not sure if it's good enough in general -- if we're really unlucky another
> thread might close the connection after the readiness check succeeds but
> before we get the lock?
>
> Hmm, but it looks like this approach is used throughout Connection -- in
> BeginTransactionAs, GetTransactionInProgress, CommitTransaction, etc.  So
> hopefully it's good enough here.

sdwilsh, are you able to explain why this is thread-safe?  E.g. in this code:

 NS_IMETHODIMP
 Connection::BeginTransactionAs(PRInt32 aTransactionType)
 {
   if (!mDBConn) return NS_ERROR_NOT_INITIALIZED; 

   SQLiteMutexAutoLock lockedScope(sharedDBMutex);

Is it possible for another thread to close mDBConn after the test but before the lock is taken?
(In reply to Nicholas Nethercote [:njn] from comment #33)
> sdwilsh, are you able to explain why this is thread-safe?  E.g. in this code:
> 
>  NS_IMETHODIMP
>  Connection::BeginTransactionAs(PRInt32 aTransactionType)
>  {
>    if (!mDBConn) return NS_ERROR_NOT_INITIALIZED; 
> 
>    SQLiteMutexAutoLock lockedScope(sharedDBMutex);
> 
> Is it possible for another thread to close mDBConn after the test but before
> the lock is taken?
There is an invariant (enforced at http://hg.mozilla.org/mozilla-central/annotate/2cff3f89e54b/storage/src/mozStorageConnection.cpp#l820) that you can only open and close the database connections on the same thread.  Given that, I *believe* we should be OK (unless people are using one `Connection` on multiple threads which has never been well supported or tested - this doesn't count using async statements which are explicitly supported).  If we are running asynchronous statements, we are careful to not clear this out until all of our async statements are completed (see http://hg.mozilla.org/mozilla-central/annotate/2cff3f89e54b/storage/src/mozStorageConnection.cpp#l365)
Attachment #580203 - Flags: review?(sdwilsh)
why do you need to check conn after building a string you won't use? couldn't you check immediately after &conn is assigned?
(In reply to Marco Bonardo [:mak] from comment #35)
> why do you need to check conn after building a string you won't use?
> couldn't you check immediately after &conn is assigned?

I reordered the code to do this.

I landed the patch along with the original code, details in bug 703143 comment 11.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: