Closed Bug 1376874 Opened 7 years ago Closed 7 years ago

Crash in AsyncShutdownTimeout | profile-before-change | Sqlite.jsm shutdown blocker

Categories

(Toolkit :: Blocklist Implementation, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- fixed

People

(Reporter: philipp, Assigned: leplatrem)

References

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-30da927f-631a-4a75-89a1-4fb690170628.
=============================================================

these async shutdownhangs are spiking cross-platform on 56.0a1 after build 20170627030209.

Correlations for Firefox Nightly
(100.0% in signature vs 00.03% overall) abort_message = ###!!! ABORT: file resource://gre/modules/Sqlite.jsm, line 156
(100.0% in signature vs 03.62% overall) moz_crash_reason = MOZ_CRASH()
(100.0% in signature vs 05.54% overall) Module "twinapi.appcore.dll" = true
(100.0% in signature vs 05.63% overall) Module "twinapi.dll" = true
(83.33% in signature vs 07.64% overall) process_type = null [100.0% vs 06.75% if startup_crash = 0]
(100.0% in signature vs 08.29% overall) Module "explorerframe.dll" = true
(100.0% in signature vs 09.27% overall) Module "winsta.dll" = true
(100.0% in signature vs 09.40% overall) Module "FWPUCLNT.DLL" = true
(100.0% in signature vs 09.55% overall) Module "rasadhlp.dll" = true
(100.0% in signature vs 11.11% overall) shutdown_progress = profile-before-change
(83.33% in signature vs 08.50% overall) Module "ondemandconnroutehelper.dll" = true

graph from nightly: https://crash-stats.mozilla.com/signature/?release_channel=nightly&signature=AsyncShutdownTimeout%20%7C%20profile-before-change%20%7C%20Sqlite.jsm%20shutdown%20blocker&date=%3E%3D2017-03-01T00%3A00%3A00.000Z&date=%3C2017-06-28T18%3A25%3A45.000Z#graphs
This signature is ranked #5 in nightly top-crashers.
::mak, could you investigate please ?
Flags: needinfo?(mak77)
Keywords: topcrash
I wonder if it's related to bug 1376202
Depends on: 1376202
It looks like it suddenly started having 50 crashes a day (from maybe 0-1 per day) in 56 nightly on June 28.
That's interesting, but I was expecting to see a bump on 21st if it was due to bug 1371945 :(

I wonder if some extension using Sqlite released a new version on that day.

Correlations don't seem to be correct for add-ons (for example I see activity-stream in a lot of crashes, but I don't see it in correlations) and have too much noise to make any use of them.
Flags: needinfo?(mak77)
Flags: needinfo?(mak77)
it doesn't necessarily need to be non-mozilla add-ons, our add-ons use (and thus could misuse) Sqlite.
I'm not sure which between pocket, activity stream and formautofill are using Sqlite.jsm, that could be checked.

If any of these add-ons got an update around the date where the bump happened, it could likely be related.
Looks like on 28 it has been release v1.14 of Activity Stream. I'm pretty sure it uses Sqlite.jsm. If it'd register in a transaction a promise that is never resolved, or run a very long (hanging) recursive query, it may cause this kind of async shutdown crash. It may be a coincidence, but the date and correlation seems to point there.

I've also excluded FormAutofill.
This was the #3 topcrash in Window Nightly 20170630030203 build. But it doesn't show up at all in the Windows Nightly 20170630150342 build, which was the second Nightly build for that same day. I'm not sure what to make of that change...
Nan, any idea if AS 1.14 did some kind of data migration or uses Sqlite executeTransaction with some promise that may never resolve in some circumstances?
Flags: needinfo?(mak77) → needinfo?(najiang)
So, I did an aggregation on async shudown timeout and I see that most crashes are caused by

{"name":"metadata.sqlite#0: waiting for shutdown","state":{"identifier":"metadata.sqlite#0","isCloseRequested":false,"hasDbConn":true,"hasInProgressTransaction":false,"pendingStatements":0,"statementCounter":11}

{"name":"kinto.sqlite#3: waiting for shutdown","state":{"identifier":"kinto.sqlite#3","isCloseRequested":false,"hasDbConn":true,"hasInProgressTransaction":false,"pendingStatements":0,"statementCounter":10}

These 2 dbs are covering 99% of the cases...

This seems to cope with the fact part of this started with the Activity Stream add-on update. metadata.sqlite is surely from it.
I'm not sure about kinto.sqlite, who is using it? blocklists?

This is not a bug in Storage or Sqlite.jsm, it's a consumers bug.

I'm moving to AS, but I'm not sure who should take care of the kinto.sqlite problem though, since I don't know who uses it.
Component: Storage → Activity Streams: General
Product: Toolkit → Firefox
Mark, do you know anything about kinto.sqlite usage and who owns that?
Flags: needinfo?(markh)
Flags: needinfo?(mathieu)
I suspect the problem here is that we never actually invoke close() on the connection, rather than a stuck transaction.
Note that consumers may use Sqlite.shutdown.addBlocker to ensure the connection is closed properly, it will be invoked when Sqlite.jsm is shutting down. Something like:

        let conn = await Sqlite.openConnection(...);
        try {
           Sqlite.shutdown.addBlocker("My connection closing",
                                      async function() {
                                        await conn.close();
                                      });
        } catch (ex) {
          // It's too late to block shutdown, just close the connection.
          await conn.close();
          throw ex;
        }
(In reply to Marco Bonardo [::mak] from comment #11)
> Mark, do you know anything about kinto.sqlite usage and who owns that?

No, but I suspect Ethan might.
Flags: needinfo?(markh) → needinfo?(eglassercamp)
I believe kinto.sqlite itself is only used by blocklists.
kinto.sqlite is the local database for blocklists. We are transitioning blocklists from a huge XML file downloaded daily to a diff-based system. 

I can take care of fixing the blocklists clients with your snippet. I'm surprised there are so few places where `Sqlite.shutdown.addBlocker()` is used though.
Flags: needinfo?(mathieu)
(In reply to Mathieu Leplatre (:leplatrem) from comment #15)
> I can take care of fixing the blocklists clients with your snippet. I'm
> surprised there are so few places where `Sqlite.shutdown.addBlocker()` is
> used though.

You are right, I suspect it's mostly a documentation problem, we should better document the close() semantics in Sqlite.jsm
Afair, close() can be invoked even multiple times, so if you can close the connection earlier, the shutdown blocker will just be a safety fallback.
Assignee: nobody → mathieu
> You are right, I suspect it's mostly a documentation problem, we should better
> document the close() semantics in Sqlite.jsm

I added a small mention on MDN docs about Sqlite.jsm https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/Sqlite.jsm#close()
I'm cloning the bug and creating 2 instances, one in blocklisting and one in Activity Stream, so we can handle the 2 consumers apart.
No longer depends on: 1376202
Flags: needinfo?(najiang)
Flags: needinfo?(eglassercamp)
No longer blocks: 1378091
See Also: → 1378091
Component: Activity Streams: General → Blocklisting
Product: Firefox → Toolkit
Priority: -- → P1
Comment on attachment 8883257 [details]
Bug 1376874 - Close blocklist Sqlite connection on shutdown

https://reviewboard.mozilla.org/r/154196/#review159338

LGTM, thank you for looking into it.

::: services/common/tests/unit/test_storage_adapter_shutdown.js:25
(Diff revision 1)
> +    do_check_eq("Should not succeed, connection should be closed.", false);
> +  } catch (e) {
> +    do_check_eq(e.message, "Connection is not open.");
> +  }
> +
> +  run_next_test();

if you use add_task instead of add_test, you can avoid run_next_next().
Attachment #8883257 - Flags: review?(mak77) → review+
(In reply to Marco Bonardo [::mak] from comment #4)
> That's interesting, but I was expecting to see a bump on 21st if it was due
> to bug 1371945 :(
> 
> I wonder if some extension using Sqlite released a new version on that day.
> 
> Correlations don't seem to be correct for add-ons (for example I see
> activity-stream in a lot of crashes, but I don't see it in correlations) and
> have too much noise to make any use of them.

I think activity-stream here refers to the system addon, which, unlike the one on Test Pilot, doesn't use a separate sqlite database called "metadata store".

Anyway, we have added the sqlite shutdown blocker to the Test Pilot version. Let's see what happens.
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/92b787c8d2a3
Close blocklist Sqlite connection on shutdown r=mak
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/92b787c8d2a3
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
If my blame digging is correct, this issue goes back as far as Fx53? Should we consider backporting this to Beta?
Flags: needinfo?(mathieu)
If I read crash-stats correctly, most entries are on 56 (I selected the ones with "async shutdown timeout" containing kinto, for the last month):
1 	56.0a1 	1419 	82.36 %
2 	54.0 	42 	2.44 %
3 	53.0.3 	29 	1.68 %
4 	55.0b4 	27 	1.57 %
5 	54.0b99 	24 	1.39 %
6 	54.0.1 	23 	1.33 %
7 	55.0b6 	19 	1.10 %
Technically the kinto.sqlite code was introduced for Fx45 (Oct 2015). I can't really say if the shutdown problem exists from the beginning or not. I just guess it does.

So yes, the patch should be relatively easy to backport. But to be honest, I don't know the process nor the amount of efforts it represents on the product integrity side!
Flags: needinfo?(mathieu)
I think this is probably worth a backport to Beta. I've already confirmed that it grafts cleanly, so we just need a Beta approval request on the attachment. For ESR52, I agree that the crash rate is low enough that we should probably just leave well enough alone.
Flags: needinfo?(mathieu)
Mathieu, we're running low on time this cycle. Can you please do the Beta approval request ASAP?
Comment on attachment 8883257 [details]
Bug 1376874 - Close blocklist Sqlite connection on shutdown

Approval Request Comment
[Feature/Bug causing the regression]: Kinto blocklists
[User impact if declined]: Possible crash on shutdown
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: not particularly
[Why is the change risky/not risky?]: the code is just adding a shutdown blocker to ensure closing the database connection, it won't affect other than shutdown.
[String changes made/needed]: none
Attachment #8883257 - Flags: approval-mozilla-beta?
Comment on attachment 8883257 [details]
Bug 1376874 - Close blocklist Sqlite connection on shutdown

this isn't super high volume, not a new issue, and we're out of betas.  beta55-
Attachment #8883257 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
I'm sorry for not making known my time off on Bugzilla :( Thanks for moving it forward!
Flags: needinfo?(mathieu)
Component: Blocklist Policy Requests → Blocklist Implementation
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: