Intermittent [xpcshell] <test_name> | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]
Categories
(Core :: Permission Manager, defect, P3)
Tracking
()
People
(Reporter: intermittent-bug-filer, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: crash, intermittent-failure)
Crash Data
Filed by: aciure [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=292713042&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UWApkGO8Tu-1NYvijLW0sA/runs/0/artifacts/public/logs/live_backing.log
[task 2020-03-11T17:47:02.494Z] 17:47:02 INFO - TEST-START | toolkit/components/url-classifier/tests/unit/test_bug1274685_unowned_list.js
[task 2020-03-11T17:47:03.122Z] 17:47:03 WARNING - TEST-UNEXPECTED-FAIL | toolkit/components/url-classifier/tests/unit/test_bug1274685_unowned_list.js | xpcshell return code: 1
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - TEST-INFO took 626ms
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - >>>>>>>
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /builds/worker/checkouts/gecko/toolkit/crashreporter/nsExceptionHandler.cpp, line 2930
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file /builds/worker/checkouts/gecko/xpcom/io/nsLocalFileWin.cpp, line 3118
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file /builds/worker/checkouts/gecko/xpcom/io/nsLocalFileWin.cpp, line 3118
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file /builds/worker/checkouts/gecko/xpcom/io/nsLocalFileWin.cpp, line 3118
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - (xpcshell/head.js) | test MAIN run_test finished (1)
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - exiting test
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/worker/checkouts/gecko/dom/media/CubebUtils.cpp, line 363
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: NS_ENSURE_SUCCESS_VOID(rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/extensions/permissions/nsPermissionManager.cpp, line 1665
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/security/manager/ssl/DataStorage.cpp, line 333
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Cache2 I/O] WARNING: Forcing memory-only entry since CacheFileIOManager doesn't have mCacheDirectory.: file /builds/worker/checkouts/gecko/netwerk/cache2/CacheFile.cpp, line 516
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: unable to post SHUTDOWN message: file /builds/worker/checkouts/gecko/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 239
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: NS_ENSURE_TRUE(gDbBackgroundThread) failed: file /builds/worker/checkouts/gecko/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp, line 2195
[task 2020-03-11T17:47:03.124Z] 17:47:03 INFO - PID 10492 | Storage connection not closed: cookies.sqliteHit MOZ_CRASH() at /builds/worker/checkouts/gecko/storage/mozStorageService.cpp:725
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #01: nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [xpcom/ds/nsObserverService.cpp:295]
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #02: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:645]
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #03: XRE_XPCShellMain(int, char**, char**, XREShellData const*) [js/xpconnect/src/XPCShellImpl.cpp:1383]
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #04: xpcshell.exe + 0x130b
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #05: xpcshell.exe + 0x11e1
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #06: xpcshell.exe + 0x74bc8
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #07: KERNEL32.DLL + 0x13034
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #08: ntdll.dll + 0x71461
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 3•5 years ago
|
||
So I made this perma-fail when attempting to land my patch on bug 1598924 because of timing.
The salient error here is:
0:01.92 pid:9436 Storage connection not closed: cookies.sqlite
Hit MOZ_CRASH() at d:/mozilla-central/storage/mozStorageService.cpp:725
Fundamentally, the following is happening:
- the test talks to the url classifier
- the url classifier talks to remote settings (quite a lot - each of the 10 classifier things (blocking + annotation features * 5) creates its own remote settings instance)
- remote settings checks the local DB. This is an xpcshell test so the local DB is empty
- it checks for a dump in a resource:/// file. Although these dumps exist in the Firefox case, I think we might not be able to read them in xpcshell tests, or something?
- it does a network request (so 10 of these happen, see (2)).
- The network requests all talk to the cookie service to ask for cookies.
Because of the multiple, async steps, we get to (6) well after the profile-before-change
notification is first sent (well, strictly speaking, while we're processing it, courtesy of RemoteSettings and various other things blocking shutdown via AsyncShutdown, and spinning the event loop, per the patch in bug 1598924 - but given this is already intermittent, we can hit this case without that patch; it's just a race). This is bad, because the cookie service tries to shut itself down when profile-before-change
happens. So we end up initializing the cookie service, and it never shuts down because when NotifyObservers
is called, it creates an array of the observers to notify (to avoid infinite loops or skipping items when they get removed), so when the cookie service tries to add an observer, it will not get notified for profile-before-change
because its new observer doesn't make it into that list which has already been created.
In some of the JS code in remote settings, we check Services.startup.isShuttingDown
, but (a) it's unclear if that'd stop this problem if the checks worked and (b) it doesn't work in this test because xpcshell tests don't set it to true when "real" Firefox builds do - xpcshell tests send the same shutdown notifications "manually" from head.js
The trivial wallpaper here is to force the cookie service to be initialized in the test. Tada, it shuts down nicely when profile-before-change
fires, everybody happy. I'm likely to use this to get the patch in bug 1598924 landed - in practice, the case in this test already exists, but is unlikely to be hit by actual Firefox use in the wild - good luck getting to profile-before-change
with a profile (and thus cookies) but without any network requests.
I'm curious what people think is the "right" solution here. Some improvements in the areas of code I can think of:
- make the url classifier only use 1 remotesettings client and one
.get()
call, instead of doing several. I could see this also avoiding loading the same dump multiple times, if these calls race in practice in Firefox. Mathieu, is there some reason that couldn't happen, if multipleRemoteSettings()
constructors +.get()
calls are made? - network should fail requests that happen after
profile-before-change
if aprofile-after-change
is ever fired, without initializing the cookie service, and/or the cookie service should fail to initialize afterprofile-before-change
fires. The requests already produce errors (NetworkError when attempting to fetch resource.
) but I'm not sure what that actually means. The wrinkle here is there's no way to tell thatprofile-before-change
has fired. Valentin / Marco, thoughts? Anyone know if we rely on these network requests still completing? For sure various other thing (like the http cache) are also no longer available at this point, so any requests are likely to be broken in "interesting" ways anyway. - remote settings code could try to catch the case where
profile-before-change
has started firing and avoid network requests afterwards. This would require some more work in RemoteSettingsClient and potentially other places, though it also has a similar issue to (2) - Doug, do we already have plans to track shutdown state somewhere centrally? - It feels like read requests to indexeddb shouldn't be blocking shutdown (because performance), but I'm unclear on whether, per bug 1545438 comment 26, we need to do that at least for now to avoid a much longer hang (I think so.). :asuth, can you clarify whether I'm right in thinking we're better off blocking the completion of
profile-before-change
on those IDB requests completing for now?
Comment 4•5 years ago
|
||
I don't know details about what the cookie service is doing with those network requests, but for sure we have various bugs and crashes due to services being initialized too late during shutdown. For example in bug 1617559 I suggested making not possible to add a blocker for a shutdown phase if that phase already started. If the service would fail to add a shutdown blocker, the service instantiation should fail as well and the service should not serve anything. It's not a perfect solution but would at least provide a barrier in async shutdown and some state tracking.
Comment 5•5 years ago
|
||
:baku is overhauling the cookie service in bug 1624146 so adding a needinfo on him related to comment 3
Comment 6•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #3)
- It feels like read requests to indexeddb shouldn't be blocking shutdown (because performance), but I'm unclear on whether, per bug 1545438 comment 26, we need to do that at least for now to avoid a much longer hang (I think so.). :asuth, can you clarify whether I'm right in thinking we're better off blocking the completion of
profile-before-change
on those IDB requests completing for now?
If you're not going to wait for the reads, it's appropriate to invoke abort() on any IDBTransaction instances that have already been created and then call close(). Alternately, destroying the global they were created in will automatically abort and close, although I don't know that this is possible for JSM globals at this time.
QuotaManager shutdown will also automatically abort the transactions and close at "profile-before-change-qm", but it would be wasteful to let the operations continue to run that long. The IDB bug being discussed in bug 1597312 is due to an attempt to open a new IDB database after the "profile-before-change-qm" shutdown phase has started (as perceived by QuotaManager).
Comment 7•5 years ago
|
||
it checks for a dump in a resource:/// file. Although these dumps exist in the Firefox case, I think we might not be able to read them in xpcshell tests, or something?
That's not expected. xpcshell tests load the dumps unless a flag is set.
- make the url classifier only use 1 remotesettings client and one .get() call, instead of doing several. I could see this also avoiding loading the same dump multiple times, if these calls race in practice in Firefox. Mathieu, is there some reason that couldn't happen, if multiple RemoteSettings() constructors + .get() calls are made?
I don't think there's any concrete reason to have multiple clients. We talked about something similar in the past and we could observe that multiple syncs cannot happen at the same time. I contacted @dimi and he confirmed that we could avoid that and have only one instance for the url-classifier. He opened Bug 1624822
Comment 8•5 years ago
|
||
I filed bug 1624983 for Andrew's suggestion from comment #6
Comment hidden (Intermittent Failures Robot) |
Comment 10•5 years ago
|
||
I've done an important refactoring of the CookieService code without touching its shutdown, which is still based on the "profile-before-change" notification. We already have issues related to the shutdown (bug 1245578).
My plan is to work on that bug as soon as my refactoring patches land. Can we mark this bug as a dup of bug 1245578 ? (or a dep).
Comment 11•5 years ago
|
||
(In reply to Andrea Marchesini [:baku] from comment #10)
I've done an important refactoring of the CookieService code without touching its shutdown, which is still based on the "profile-before-change" notification. We already have issues related to the shutdown (bug 1245578).
My plan is to work on that bug as soon as my refactoring patches land. Can we mark this bug as a dup of bug 1245578 ? (or a dep).
Sure, let's mark as a dep.
I still think it'd be useful to be able to "consult" shutdown state (in other words, I think consumers want to know whether profile-before-change
and other stages of shutdown have been fired / reached yet, or not), and to consider immediately failing new network requests (that aren't for file/resource/chrome URIs) if they happen after profile-before-change
. If we're trying to shut down, we should do as little work as possible. Leaving ni for Doug and Valentin for those (x-ref comment #3).
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 15•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #3)
- network should fail requests that happen after
profile-before-change
if aprofile-after-change
is ever fired, without initializing the cookie service, and/or the cookie service should fail to initialize afterprofile-before-change
fires. The requests already produce errors (NetworkError when attempting to fetch resource.
) but I'm not sure what that actually means. The wrinkle here is there's no way to tell thatprofile-before-change
has fired. Valentin / Marco, thoughts? Anyone know if we rely on these network requests still completing? For sure various other thing (like the http cache) are also no longer available at this point, so any requests are likely to be broken in "interesting" ways anyway.
As far as I can tell, we do fail connections that happen after "profile-before-change":
See profile-change-net-teardown which is sent before profile-before-change
Not sure about the cookie service bits.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 19•5 years ago
|
||
Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3
(Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3
(normal.)
Comment 20•5 years ago
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #15)
As far as I can tell, we do fail connections that happen after "profile-before-change":
See profile-change-net-teardown which is sent beforeprofile-before-change
Not sure about the cookie service bits.
Is it possible we start a connection before profile-change-net-teardown, but end up requesting cookies afterwards, because other code (looks like AsyncOpen does some async things?) may make it take longer before we continue with the request ?
Comment 21•5 years ago
|
||
(In reply to :Gijs (he/him) from comment #20)
(In reply to Valentin Gosu [:valentin] (he/him) from comment #15)
Is it possible we start a connection before profile-change-net-teardown, but end up requesting cookies afterwards, because other code (looks like AsyncOpen does some async things?) may make it take longer before we continue with the request ?
Yes, I think that's possible.
The cookies are set synchronously by HttpBaseChannel::AddCookiesToRequest
- however, we might call it after AsyncOpen, for example from DoAuthRetry
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 33•4 years ago
|
||
Seems this has stopped, but we don't really know why? Mathieu, do you know if something's changed around when this stopped happening?
Comment 34•4 years ago
|
||
Honestly, I have no idea. No major changes were made in the last 2 months 🤷
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 57•3 years ago
|
||
This seems to be the same underlying issue as bug 1707963.
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Updated•3 years ago
|
Description
•