Closed Bug 1621759 Opened 5 years ago Closed 3 years ago

Intermittent [xpcshell] <test_name> | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]

Categories

(Core :: Permission Manager, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1707963

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure)

Crash Data

Filed by: aciure [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=292713042&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UWApkGO8Tu-1NYvijLW0sA/runs/0/artifacts/public/logs/live_backing.log


[task 2020-03-11T17:47:02.494Z] 17:47:02 INFO - TEST-START | toolkit/components/url-classifier/tests/unit/test_bug1274685_unowned_list.js
[task 2020-03-11T17:47:03.122Z] 17:47:03 WARNING - TEST-UNEXPECTED-FAIL | toolkit/components/url-classifier/tests/unit/test_bug1274685_unowned_list.js | xpcshell return code: 1
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - TEST-INFO took 626ms
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - >>>>>>>
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.122Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Failed to get directory to cache.: file /builds/worker/checkouts/gecko/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 84
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /builds/worker/checkouts/gecko/toolkit/crashreporter/nsExceptionHandler.cpp, line 2930
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file /builds/worker/checkouts/gecko/xpcom/io/nsLocalFileWin.cpp, line 3118
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file /builds/worker/checkouts/gecko/xpcom/io/nsLocalFileWin.cpp, line 3118
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: This method is lossy. Use GetCanonicalPath !: file /builds/worker/checkouts/gecko/xpcom/io/nsLocalFileWin.cpp, line 3118
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - (xpcshell/head.js) | test MAIN run_test finished (1)
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - exiting test
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /builds/worker/checkouts/gecko/dom/media/CubebUtils.cpp, line 363
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: NS_ENSURE_SUCCESS_VOID(rv) failed with result 0x80004005: file /builds/worker/checkouts/gecko/extensions/permissions/nsPermissionManager.cpp, line 1665
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/security/manager/ssl/DataStorage.cpp, line 333
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Cache2 I/O] WARNING: Forcing memory-only entry since CacheFileIOManager doesn't have mCacheDirectory.: file /builds/worker/checkouts/gecko/netwerk/cache2/CacheFile.cpp, line 516
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: unable to post SHUTDOWN message: file /builds/worker/checkouts/gecko/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 239
[task 2020-03-11T17:47:03.123Z] 17:47:03 INFO - PID 10492 | [10492, Main Thread] WARNING: NS_ENSURE_TRUE(gDbBackgroundThread) failed: file /builds/worker/checkouts/gecko/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp, line 2195
[task 2020-03-11T17:47:03.124Z] 17:47:03 INFO - PID 10492 | Storage connection not closed: cookies.sqliteHit MOZ_CRASH() at /builds/worker/checkouts/gecko/storage/mozStorageService.cpp:725
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #01: nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [xpcom/ds/nsObserverService.cpp:295]
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #02: mozilla::ShutdownXPCOM(nsIServiceManager*) [xpcom/build/XPCOMInit.cpp:645]
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #03: XRE_XPCShellMain(int, char**, char**, XREShellData const*) [js/xpconnect/src/XPCShellImpl.cpp:1383]
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #04: xpcshell.exe + 0x130b
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #05: xpcshell.exe + 0x11e1
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #06: xpcshell.exe + 0x74bc8
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #07: KERNEL32.DLL + 0x13034
[task 2020-03-11T17:47:22.298Z] 17:47:22 INFO - PID 10492 | #08: ntdll.dll + 0x71461

Summary: Intermittent toolkit/components/url-classifier/tests/unit/test_bug1274685_unowned_list.js | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)] → Intermittent [xpcshell] toolkit/components/url-classifier/tests/unit/<test_name> | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]

So I made this perma-fail when attempting to land my patch on bug 1598924 because of timing.

The salient error here is:

 0:01.92 pid:9436 Storage connection not closed: cookies.sqlite
Hit MOZ_CRASH() at d:/mozilla-central/storage/mozStorageService.cpp:725

Fundamentally, the following is happening:

  1. the test talks to the url classifier
  2. the url classifier talks to remote settings (quite a lot - each of the 10 classifier things (blocking + annotation features * 5) creates its own remote settings instance)
  3. remote settings checks the local DB. This is an xpcshell test so the local DB is empty
  4. it checks for a dump in a resource:/// file. Although these dumps exist in the Firefox case, I think we might not be able to read them in xpcshell tests, or something?
  5. it does a network request (so 10 of these happen, see (2)).
  6. The network requests all talk to the cookie service to ask for cookies.

Because of the multiple, async steps, we get to (6) well after the profile-before-change notification is first sent (well, strictly speaking, while we're processing it, courtesy of RemoteSettings and various other things blocking shutdown via AsyncShutdown, and spinning the event loop, per the patch in bug 1598924 - but given this is already intermittent, we can hit this case without that patch; it's just a race). This is bad, because the cookie service tries to shut itself down when profile-before-change happens. So we end up initializing the cookie service, and it never shuts down because when NotifyObservers is called, it creates an array of the observers to notify (to avoid infinite loops or skipping items when they get removed), so when the cookie service tries to add an observer, it will not get notified for profile-before-change because its new observer doesn't make it into that list which has already been created.

In some of the JS code in remote settings, we check Services.startup.isShuttingDown, but (a) it's unclear if that'd stop this problem if the checks worked and (b) it doesn't work in this test because xpcshell tests don't set it to true when "real" Firefox builds do - xpcshell tests send the same shutdown notifications "manually" from head.js

The trivial wallpaper here is to force the cookie service to be initialized in the test. Tada, it shuts down nicely when profile-before-change fires, everybody happy. I'm likely to use this to get the patch in bug 1598924 landed - in practice, the case in this test already exists, but is unlikely to be hit by actual Firefox use in the wild - good luck getting to profile-before-change with a profile (and thus cookies) but without any network requests.

I'm curious what people think is the "right" solution here. Some improvements in the areas of code I can think of:

  1. make the url classifier only use 1 remotesettings client and one .get() call, instead of doing several. I could see this also avoiding loading the same dump multiple times, if these calls race in practice in Firefox. Mathieu, is there some reason that couldn't happen, if multiple RemoteSettings() constructors + .get() calls are made?
  2. network should fail requests that happen after profile-before-change if a profile-after-change is ever fired, without initializing the cookie service, and/or the cookie service should fail to initialize after profile-before-change fires. The requests already produce errors (NetworkError when attempting to fetch resource.) but I'm not sure what that actually means. The wrinkle here is there's no way to tell that profile-before-change has fired. Valentin / Marco, thoughts? Anyone know if we rely on these network requests still completing? For sure various other thing (like the http cache) are also no longer available at this point, so any requests are likely to be broken in "interesting" ways anyway.
  3. remote settings code could try to catch the case where profile-before-change has started firing and avoid network requests afterwards. This would require some more work in RemoteSettingsClient and potentially other places, though it also has a similar issue to (2) - Doug, do we already have plans to track shutdown state somewhere centrally?
  4. It feels like read requests to indexeddb shouldn't be blocking shutdown (because performance), but I'm unclear on whether, per bug 1545438 comment 26, we need to do that at least for now to avoid a much longer hang (I think so.). :asuth, can you clarify whether I'm right in thinking we're better off blocking the completion of profile-before-change on those IDB requests completing for now?
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(mathieu)
Flags: needinfo?(mak)
Flags: needinfo?(dothayer)
Flags: needinfo?(bugmail)

I don't know details about what the cookie service is doing with those network requests, but for sure we have various bugs and crashes due to services being initialized too late during shutdown. For example in bug 1617559 I suggested making not possible to add a blocker for a shutdown phase if that phase already started. If the service would fail to add a shutdown blocker, the service instantiation should fail as well and the service should not serve anything. It's not a perfect solution but would at least provide a barrier in async shutdown and some state tracking.

Flags: needinfo?(mak)

:baku is overhauling the cookie service in bug 1624146 so adding a needinfo on him related to comment 3

Flags: needinfo?(amarchesini)

(In reply to :Gijs (he/him) from comment #3)

  1. It feels like read requests to indexeddb shouldn't be blocking shutdown (because performance), but I'm unclear on whether, per bug 1545438 comment 26, we need to do that at least for now to avoid a much longer hang (I think so.). :asuth, can you clarify whether I'm right in thinking we're better off blocking the completion of profile-before-change on those IDB requests completing for now?

If you're not going to wait for the reads, it's appropriate to invoke abort() on any IDBTransaction instances that have already been created and then call close(). Alternately, destroying the global they were created in will automatically abort and close, although I don't know that this is possible for JSM globals at this time.

QuotaManager shutdown will also automatically abort the transactions and close at "profile-before-change-qm", but it would be wasteful to let the operations continue to run that long. The IDB bug being discussed in bug 1597312 is due to an attempt to open a new IDB database after the "profile-before-change-qm" shutdown phase has started (as perceived by QuotaManager).

Flags: needinfo?(bugmail)
See Also: → 1624822

it checks for a dump in a resource:/// file. Although these dumps exist in the Firefox case, I think we might not be able to read them in xpcshell tests, or something?

That's not expected. xpcshell tests load the dumps unless a flag is set.

  1. make the url classifier only use 1 remotesettings client and one .get() call, instead of doing several. I could see this also avoiding loading the same dump multiple times, if these calls race in practice in Firefox. Mathieu, is there some reason that couldn't happen, if multiple RemoteSettings() constructors + .get() calls are made?

I don't think there's any concrete reason to have multiple clients. We talked about something similar in the past and we could observe that multiple syncs cannot happen at the same time. I contacted @dimi and he confirmed that we could avoid that and have only one instance for the url-classifier. He opened Bug 1624822

I filed bug 1624983 for Andrew's suggestion from comment #6

See Also: → 1624983

I've done an important refactoring of the CookieService code without touching its shutdown, which is still based on the "profile-before-change" notification. We already have issues related to the shutdown (bug 1245578).
My plan is to work on that bug as soon as my refactoring patches land. Can we mark this bug as a dup of bug 1245578 ? (or a dep).

Flags: needinfo?(amarchesini)

(In reply to Andrea Marchesini [:baku] from comment #10)

I've done an important refactoring of the CookieService code without touching its shutdown, which is still based on the "profile-before-change" notification. We already have issues related to the shutdown (bug 1245578).
My plan is to work on that bug as soon as my refactoring patches land. Can we mark this bug as a dup of bug 1245578 ? (or a dep).

Sure, let's mark as a dep.

I still think it'd be useful to be able to "consult" shutdown state (in other words, I think consumers want to know whether profile-before-change and other stages of shutdown have been fired / reached yet, or not), and to consider immediately failing new network requests (that aren't for file/resource/chrome URIs) if they happen after profile-before-change. If we're trying to shut down, we should do as little work as possible. Leaving ni for Doug and Valentin for those (x-ref comment #3).

Depends on: 1245578, 1624983, 1624822
Flags: needinfo?(mathieu)
See Also: 1624822, 1624983
Priority: -- → P3

(In reply to :Gijs (he/him) from comment #3)

  1. network should fail requests that happen after profile-before-change if a profile-after-change is ever fired, without initializing the cookie service, and/or the cookie service should fail to initialize after profile-before-change fires. The requests already produce errors (NetworkError when attempting to fetch resource.) but I'm not sure what that actually means. The wrinkle here is there's no way to tell that profile-before-change has fired. Valentin / Marco, thoughts? Anyone know if we rely on these network requests still completing? For sure various other thing (like the http cache) are also no longer available at this point, so any requests are likely to be broken in "interesting" ways anyway.

As far as I can tell, we do fail connections that happen after "profile-before-change":
See profile-change-net-teardown which is sent before profile-before-change

Not sure about the cookie service bits.

Flags: needinfo?(valentin.gosu)

Because this bug's Severity has not been changed from the default since it was filed, and it's Priority is P3 (Backlog,) indicating it has been triaged, the bug's Severity is being updated to S3 (normal.)

Severity: normal → S3

(In reply to Valentin Gosu [:valentin] (he/him) from comment #15)

As far as I can tell, we do fail connections that happen after "profile-before-change":
See profile-change-net-teardown which is sent before profile-before-change

Not sure about the cookie service bits.

Is it possible we start a connection before profile-change-net-teardown, but end up requesting cookies afterwards, because other code (looks like AsyncOpen does some async things?) may make it take longer before we continue with the request ?

Flags: needinfo?(valentin.gosu)

(In reply to :Gijs (he/him) from comment #20)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #15)
Is it possible we start a connection before profile-change-net-teardown, but end up requesting cookies afterwards, because other code (looks like AsyncOpen does some async things?) may make it take longer before we continue with the request ?

Yes, I think that's possible.
The cookies are set synchronously by HttpBaseChannel::AddCookiesToRequest - however, we might call it after AsyncOpen, for example from DoAuthRetry

Flags: needinfo?(valentin.gosu)
See Also: → 1660115

Seems this has stopped, but we don't really know why? Mathieu, do you know if something's changed around when this stopped happening?

Flags: needinfo?(mathieu)

Honestly, I have no idea. No major changes were made in the last 2 months 🤷

Flags: needinfo?(mathieu)
Component: Safe Browsing → Storage: StorageManager
Product: Toolkit → Core
Summary: Intermittent [xpcshell] toolkit/components/url-classifier/tests/unit/<test_name> | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)] → Intermittent [xpcshell] <test_name> | application crashed [@ mozilla::storage::Service::Observe(nsISupports*, char const*, char16_t const*)]

This seems to be the same underlying issue as bug 1707963.

Component: Storage: StorageManager → Permission Manager
Flags: needinfo?(dothayer) → needinfo?(tihuang)
See Also: → 1707963
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(tihuang)
You need to log in before you can comment on or make changes to this bug.