Open Bug 1326309 Opened 7 years ago Updated 20 days ago

Crash in AsyncShutdownTimeout | places-will-close-connection | PlacesUtils read-only connection closing as part of Places shutdown

Categories

(Toolkit :: Places, defect, P3)

50 Branch
x86
Windows 10
defect

Tracking

()

Tracking Status
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr60 --- wontfix
firefox-esr68 --- wontfix
firefox53 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 - wontfix
firefox64 - wontfix
firefox65 - wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox67.0.1 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix

People

(Reporter: alex_mayorga, Unassigned)

References

(Depends on 3 open bugs)

Details

(Keywords: crash, stalled, top50)

Crash Data

Attachments

(7 files)

This bug was filed from the Socorro interface and is 
report bp-cfb6a35f-2194-4263-90fe-882912161228.
=============================================================

¡Hola!

Found about this type of crash form https://support.mozilla.org/en-US/questions/1152126

There are 246 in the past week at https://crash-stats.mozilla.com/signature/?product=Firefox&signature=AsyncShutdownTimeout%20%7C%20places-will-close-connection%20%7C%20PlacesUtils%20read-only%20connection%20closing%20as%20part%20of%20Places%20shutdown

Filing it FWIW.

¡Gracias!
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp:33
1 	xul.dll 	NS_DebugBreak 	xpcom/base/nsDebugImpl.cpp:436
2 	xul.dll 	nsDebugImpl::Abort(char const*, int) 	xpcom/base/nsDebugImpl.cpp:146
3 	xul.dll 	NS_InvokeByIndex 	xpcom/reflect/xptcall/md/win32/xptcinvoke_asm_x86_msvc.asm:54
4 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp:1361
5 	xul.dll 	XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1128
6 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:453
7 	xul.dll 	InternalCall 	js/src/vm/Interpreter.cpp:498
8 		@0x1
Too late for firefox 52, mass-wontfix.
Severity: critical → normal
Priority: -- → P3
Updating bug with additional signature as well as marking affected branches.
Crash Signature: [@ AsyncShutdownTimeout | places-will-close-connection | PlacesUtils read-only connection closing as part of Places shutdown] → [@ AsyncShutdownTimeout | places-will-close-connection | PlacesUtils read-only connection closing as part of Places shutdown] [@ AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown ]
This is the #3 topcrash on beta 62. It's also a very high volume crash on release 61. 
Nathan, might it be worth taking another look at this?
Flags: needinfo?(nfroyd)
Component: XPCOM → Places
Product: Core → Toolkit
I will defer to asuth or mak on this one.

I'm really curious what places is doing here.  Looking through a couple of crash reports, the Mac reports have (sort of) reasonable stacks:

https://crash-stats.mozilla.com/report/index/e47b02da-92f0-4613-852a-cb55a0180820
https://crash-stats.mozilla.com/report/index/4c85d0e2-b044-4311-b14c-6ca350180820

But I can't tell what sort of SQL is being executed that's taking so long here.
Flags: needinfo?(nfroyd) → needinfo?(bugmail)
I wonder if this is somehow related to Sync? Bug 1483976 has a similar hang and I suspect bug 1483241 will too. Sadly, I don't think we can tell from the crash whether Sync is configured.
Top 50 Crashing Signatures. 7 days ago

Top Crashers for Firefox 64.0a1

4 	2.67% 	-0.18% 	AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown 	111 	100 	6 	5 	93 	0 	2017-08-06 

Top Crashers for Firefox 63.0b
	
5 	1.46% 	0.5% 	AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown	471 	440 	25 	6 	415 	0 	2017-08-06	

Top Crashers for Firefox 62.0.3

3 	2.58% 	-0.34% 	AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown	3682 	3497 	123 	62 	2952 	0 	2017-08-06 

Top Crashers for Firefox 61.0.2

12 	1.08% 	0.22% 	AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown	40 	33 	7 	0 	39 	0 	2017-08-06

Top Crashers for Firefox 60.0.2

12 	0.92% 	-0.49% 	AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown	12 	12 	0 	0 	12 	0 	2017-08-06
	
Top Crashers for Firefox 60.2.2esr	
	
20 	0.47% 	0.01% 	AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown	110 	105 	4 	1 	82 	0 	2017-08-06
Keywords: top50, topcrash
We should still keep an eye on this for 63. It was very high volume in 62 release.
I'll happily take a patch for this if one materializes, but I don't see much value in tracking this either.
Flags: needinfo?(bugmail)
Calling this stalled for now. It is still a top crash on release and beta.

I'm experiencing this bug about once per day so I can fill in some details for you.

The actual crash reported here happens only when you exit Firefox. But things go wrong long before that. The two symptoms of the issue are that Firefox uses 100% CPU (but never more) and your bookmarks become completely read-only. You can't add, modify, remove, or re-arrange them either with drag and drop or with the bookmarks editor. Other than this the browser operates completely normally.

When I run the places integrity test in about:support it says:

Unable to check favicons.sqlite integrity: Error: Could not open connection to /home/al/.mozilla/firefox/o65awu5u.default-1503600779742/favicons.sqlite: 2153971713

That is NS_ERROR_STORAGE_BUSY in case you are wondering.

After restarting Firefox all these problems go away, but they come back after about 24 hours.

In my case bookmark sync is enabled. In fact I only recently enabled it, and I only recently saw this start to happen. I have now disabled bookmark sync to see if it makes this problem go away.

(In reply to Alistair Buxton from comment #12)

The actual crash reported here happens only when you exit Firefox. But things go wrong long before that. The two symptoms of the issue are that Firefox uses 100% CPU (but never more) and your bookmarks become completely read-only. You can't add, modify, remove, or re-arrange them either with drag and drop or with the bookmarks editor. Other than this the browser operates completely normally.

I see a few possible reasons for this, that you could check:

  1. favicons.sqlite file is corrupt in some strange way. In this case the best path forward is to delete it and let Firefox generate a new one (you'll have to reaload pages to collect new favicons, sorry). After that, try running the Integrity check in about:support and check it reports things as sane.
  2. The filesystem where this file lives is not well supported by Sqlite. What is it? Is this profile on a network mounted fs?
  3. Some third party software keeps our databases open/locked, for some reason. If you have external app that may be running in background and access (for whatever reason) firefox profile, it may cause this. It's less likely on Linux than on other OS anyway.
Flags: needinfo?(a.j.buxton)
  1. After restarting Firefox the integrity test says the file is fine until the next time the bug happens.
  2. The filesystem is just a regular local Linux ext4 filesystem on Ubuntu.
  3. No third party software but I am using the Livemarks extension which gives the bookmark subsystem a good workout. This is probably making it happen faster for me than for other people, but it still takes about a day.

Since disabling bookmark sync two days ago I have not observed the problem happening again, but it is perhaps too early to tell yet.

Flags: needinfo?(a.j.buxton)

I also just ran PRAGMA integrity_check on the database and it found no errors. I'll retry this next time I can reproduce the error.

Are you using Nightly? Could you please check the value of the services.sync.engine.bookmarks.buffer pref in about:config?

I am using 64.0 and that key is set to false. I also noticed that services.sync.engine.bookmarks.validation.interval is set to 86400 - presumably seconds ie exactly 24 hours. Perhaps this goes some way to explaining frequency of the bug.

(In reply to Alistair Buxton from comment #17)

I am using 64.0 and that key is set to false. I also noticed that services.sync.engine.bookmarks.validation.interval is set to 86400 - presumably seconds ie exactly 24 hours. Perhaps this goes some way to explaining frequency of the bug.

FWIW, we chatted in #sync and determined that bookmark validation isn't running in that profile, so I think we can rule that out (but not necessarily rule out bookmark syncing in general)

This bug is causing Firefox 66.0.5 to crash on my Dell Inspiron 7558 several times a day. I've asked Mozilla Support about this on https://support.mozilla.org/en-US/questions/1259452

Flags: needinfo?(mak77)

Thanks for the report!
Based on the support thread, it seems to be related to very long Syncs, thus I'm forwarding to Lina to get a proper Sync log from you.

Did you try the suggestion to run Integrity Check that was posted in the support forum? Note that Refreshing Firefox doesn't run that, so it may still be relevant.

Flags: needinfo?(mak77) → needinfo?(lina)

I am also experiencing this issue. After quitting Firefox, it crashes and the bug-reporter shows up. I have hundreds of bookmarks, and I have Sync enabled. The Sync spinner in the menu will stay blue and spinning for hours at a time. Oddly, the about:support integrity check says that the places and favicons databases are sane. As with Alistair, I am using Linux' EXT4 file-system, Firefox will use an entire CPU core, and bookmarks and history entries can't be created or deleted. I have sent in the bug reports bp-75a18eeb-9fc7-4db2-a2f6-8a4cc0190618 and bp-a07cae7b-b12d-4d12-8d6d-2e6880190618.

This bug is #4 overall in 67.0.4. I see there is a need info from last month. Have we gotten a sync log yet? Thanks.

Firefox 68.0.14b here, Windows 10 version 1903 build 18362.207.

Firefox 68.0.14b here, Windows 10 version 1903 build 18362.207. First crash in the crash history is from 67.0b18. Also noticed now that i couldn't click the star button to save a bookmark. Closed browser and it crashed. Haven't noticed long syncs, but nevermind, here are some sync logs.

Actually now the issue occured again and i checked and confirmed that it was stuck at synchronising.

This is still quite high on release. I see in Comment 20 we still have an outstanding ni from Lina.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
Keywords: regression

Hi! I'm so sorry this slipped through the cracks, my ni? queue has been pretty full.

We usually ask folks running into issues to install the About Sync add-on, enable success and trace logging, and upload a zip file with the logs the next time they can reproduce the problem.

For bookmarks specifically, Nightly and Beta have a new sync engine that we're working to enable by default on Release. We recently landed bug 1567238 to fix slow merges, which I think is what Lyubomir ran into in comment 23 on 68 Beta. Bug 1572615 also runs a watchdog timer that aborts long-running merges, and should land in Nightly sometime this week.

The symptoms that Alistair describes in comment 12 sounds like our SQLite busy retry loop run amok. We've seen new bookmark sync affected by the same issue (see bug 1561467 for some timings, and bug 1435446 comment 8 for some digging), but haven't been able to repro...and both Alistair and Tom are on Release, which doesn't have the new engine enabled.

The best way to help us debug bookmark syncing is probably to install Nightly, enable detailed logging using About Sync, and upload logs (either in a confidential bug, or email) and crash reports as you see them. That's a pretty big ask, and we're hoping we can turn on the new engine everywhere in the 70 release cycle.

IIUC, though, this shutdown hang signature shows up if anything blocks the Places connection as we're closing the browser. It could be Sync (bookmarks, as in this bug, or history), a long-running bookmarks UI operation, saving history visits, or sanitizing. A Sync log wouldn't be enough to tell if Sync caused the issue, or if it's getting stuck behind something else that's wedging the connection. We'll keep an eye on this in 70; new bookmark sync has a crash signature that mentions SyncedBookmarksMirror.

Unfortunately, I don't have other suggestions. 😕 Andrew, can you think of anything else we can try? Would mozStorage logging help?

Flags: needinfo?(lina) → needinfo?(bugmail)

I used to get this crash very frequently - In my case the sync wheel would start spinning indefinitely, and trying to verify places in about:support would give an error. Closing the browser would cause the crash dialogue box to open and ask me to submit a report.

In my case, I managed to fix it by deleting favicons.sqlite (this cost me all my bookmark/history but they're easy to regenerate).

Since then the problem has completely disappeared.

Deleting places.sqlite may also fix it (though I can't confirm, and you'll lose your browsing history, so pick your poison)

(In reply to Luke from comment #32)

In my case, I managed to fix it by deleting favicons.sqlite (this cost me all my bookmark/history but they're easy to regenerate).

Favicons.sqlite only contains icons, it doesn't contain any history or bookmarks information.
I suspect what caused that loss was removing places.sqlite
Also, Sync doesn't sync favicons.

(In reply to Marco Bonardo [::mak] from comment #33)

(In reply to Luke from comment #32)

In my case, I managed to fix it by deleting favicons.sqlite (this cost me all my bookmark/history but they're easy to regenerate).

Favicons.sqlite only contains icons, it doesn't contain any history or bookmarks information.
I suspect what caused that loss was removing places.sqlite
Also, Sync doesn't sync favicons.

Yes that was a typo, I meant to say I lost the icons for my bookmarks/history after deleting favicons.sqlite. But it solved the bug for me.

I assume favicons.sqlite became corrupted, which led to places.sqlite hanging as well (thus sync hanging). But I could be wrong.

I didn't remove places.sqlite because I wanted to retain my browsing history - but people experiencing this bug should try deleting either favicons.sqlite or places.sqlite because likely one of those files that's causing this bug.

Attached file Troubleshooting info
@Lina, i've sent you an error log from Firefox Nightly.

... On Email.

Attached file PlacesUtils.jsm crash
Flags: needinfo?(lina)

I can definitely confirm that my Nightly has a tendency to keep 100% CPU forever, until i close it via Ctrl + Q. After that it keeps this 100% usage and its processes for about 20-40 seconds, and then crashes with the above error. Performance bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1583927

Just to let you know, a user reported the same problem;

Product Firefox
Release Channel release
Version 70.0.1
Build ID 20191030021342 (2019-10-30)
Windows 7

https://support.mozilla.org/en-US/questions/1273749#answer-1269780

Fri, 29 Nov 2019 23:18:59 GMT
bp-f6136da9-a88c-4702-9da9-eb4e20191129

Whiteboard: [fxperf]

Looking at crash stats, this crash all by itself accounts for fully a third of asyncshutdown crashes.

Marco, what's the best way to make progress here? AIUI, there is some issue here with a long-running query and the sqlite busy loop waiting for that query to complete. Can we alter the shutdown blocker to include some data about which sqlite query this is, perhaps just on nightly? There's a decent (non-0) volume even on nightly, so I'm hopeful we could make relatively quick progress if we managed to diagnose what actually tripped these hangs - but I'm also not very familiar with the bug and what has already been tried...

Flags: needinfo?(mak)

(In reply to :Gijs (he/him) from comment #40)

Looking at crash stats, this crash all by itself accounts for fully a third of asyncshutdown crashes.

To properly analyze these crashes you must go to the Aggregations part of Crash-stats and aggregate on Async shutdown timeout. But unfortunately the state object for these is not great, there's not enough information to understand what's up. Yes, adding more information to the state object would help. The SQL may help, but it really depends, if there's a deadlock it would be pointless.
Afaict we are here https://searchfox.org/mozilla-central/rev/0e09b9191c02097034e46b193930f91c45b7885d/toolkit/components/places/PlacesUtils.jsm#2017 because all the metadata reports state 1 was reached. Apparently what hangs is the conn.close() call, that goes into Sqlite.jsm. From there we lose state information, so we don't know where we are in the Sqlite.jsm shutdown process, having the Sqlite.jsm state in the client state object would be nice.

Let's see what else we can do.

Recently Bug 1643491 landed, that may have a positive effect on this.

One thing I never considered is what happens if this asyncClose call throws:
https://searchfox.org/mozilla-central/rev/0e09b9191c02097034e46b193930f91c45b7885d/toolkit/modules/Sqlite.jsm#517
we should probably catch it and invoke markAsClosed.
I filed bug 1645331 and I'll patch it. We can monitor the effect.

I also supsect some of these cases are because something starts too late, when shutdown already started, and it adds more shutdown blockers. See https://bugzilla.mozilla.org/show_bug.cgi?id=1617559#c3 where we may be adding blockers after a phase has started, that allows us to startup services too late for them to behave properly.
I can try to make the change I suggested there, and see the effect on the number of async shutdown forced crashes. Though I'm not sure how complicate it will be to write an appropriate test for it. Let me try, and eventually we can add a temporary pref to undo the behavior in case it causes unexpected problems.

Flags: needinfo?(mak)
Depends on: 1643491, 1645331, 1617559

Marking P3 due to the blocking bug #1617559 still in progress.

Whiteboard: [fxperf] → [fxperf:p3]
Depends on: 1674418

Clearing needinfo as it's been a year now with no response.

Flags: needinfo?(lina)

(Blocking bug 1617559 seems like the appropriate next step to get better AsyncShutdown messages.)

Flags: needinfo?(bugmail)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash
Severity: critical → S2
Crash Signature: [@ AsyncShutdownTimeout | places-will-close-connection | PlacesUtils read-only connection closing as part of Places shutdown] [@ AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown ] → [@ AsyncShutdownTimeout | places-will-close-connection | PlacesUtils read-only connection closing as part of Places shutdown] [@ AsyncShutdownTimeout | Places Connection shutdown | PlacesUtils read-only connection closing as part of Places shutdown] [@ …
Depends on: 1700584

We will continue to monitor the number of crashes, and based on changes, we'll determine if we have enough/more details that could help pinpoint the location of crashes.

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash
Depends on: 1834046

To move on with this we must at least know which query is being executed, that means fixing Bug 1834046. We are blocked on a crash reporter rewrite atm, but it should be possible soon.
We'll probably be unable to fix all the cases though, as some are surely due to slow IO.
Lowering the severity because there's no obvious culprit here, the number of crashes is "reasonable" anyway, and we'll resume investigation once we have the necessary data.

Severity: S2 → S3
Whiteboard: [fxperf:p3]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: