Closed Bug 1644780 Opened 2 years ago Closed 1 year ago

Crash in [@ mozilla::dom::cache::Connection::Close]

Categories

(Core :: Storage: Cache API, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla80
Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox78 --- wontfix
firefox79 --- wontfix
firefox80 --- fixed

People

(Reporter: sg, Assigned: sg)

References

(Blocks 1 open bug)

Details

(Keywords: crash)

Crash Data

Attachments

(3 files)

This bug is for crash report bp-91a52b77-0202-47cf-8cdb-f6fbf0200610.

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::cache::Connection::Close dom/cache/Connection.cpp:42
1 xul.dll mozilla::dom::cache::Connection::~Connection dom/cache/Connection.cpp:28
2 xul.dll mozilla::dom::cache::Connection::Release dom/cache/Connection.cpp:18
3 xul.dll mozilla::dom::cache::Context::Data::Release dom/cache/Context.cpp:82
4 xul.dll mozilla::dom::cache::Context::ActionRunnable::Run dom/cache/Context.cpp:624
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1236
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:501
7 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:332
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
9 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290

This is happening with low frequency since Nightly build 20200409095500. The crash reason is MOZ_DIAGNOSTIC_ASSERT(false) (NS_SUCCEEDED(db::IncrementalVacuum(this))).

OS: Windows 10 → All
Hardware: Unspecified → All

Since Nightly build 20200409095500 is the build where Bug 1619965 landed, it might be that this happens in the situations that previously crashed with the signature from Bug 1619305. Not sure what happens in non-diagnostic builds.

Keywords: sec-other
See Also: → 1619965, 1619305
Group: core-security
Group: core-security → dom-core-security

Asserting that a db operation always succeeds looks too strong to me. It's also not critical if vacuuming fails, but Andrew or Tom should look at this too.

But it would be interesting to know if there's still a transaction open or it fails for some other reason.

Would adding a crash annotation with the actual nsresult value help to get insight into this?

Yes, that might help.

Keywords: leave-open
Assignee: nobody → sgiesecke
Status: NEW → ASSIGNED

I provided a patch. We need data review for that before landing.

Comment on attachment 9155879 [details]
Bug 1644780 - Include nsresult value in crash report when db::IncrementalVacuum fails. r=janv

What questions will you answer with this data?

Gain insight in the reasons for failure to perform an IncrementalVacuum.

Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:

Previously, it was assumed the call never fails using MOZ_ALWAYS_SUCCEEDS. Until recently, this was probably masked by another defect in indexedDB code.

What alternative methods did you consider to answer these questions? Why were they not sufficient?

None.

Can current instrumentation answer these questions?

No.

List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.

Measurement Description | Data Collection Category | Tracking Bug #
nsresult return value of db::IncrementalVacuum. | Category 1 | Bug 1644780

How long will this data be collected? Choose one of the following:

  • I want this data to be collected for 6 months initially (potentially renewable).

What populations will you measure?

Nightly and beta release channels (those where a MOZ_DIAGNOSTIC_ASSERT would be active)

If this data collection is default on, what is the opt-out mechanism for users?

None.

Please provide a general description of how you will analyze this data.

Determine which nsresult value(s) cause the failure. Depending on the result, further data collection might become necessary.

Where do you intend to share the results of your analysis?

This will probably result in a comment and/or new bug on Bugzilla.

Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so: Are you using that on the Mozilla backend? Or going directly to the third-party?

Not a third party tool, but not telemetry either, but a crash annotation.

Attachment #9155879 - Flags: data-review?(chutten)

Comment on attachment 9155879 [details]
Bug 1644780 - Include nsresult value in crash report when db::IncrementalVacuum fails. r=janv

Data reviews must be completed in a place accessible by the people subject to the collection. If this bug needs to remain private (given it's a secbug, I'm guessing that's likely), please file the data review request in its own separate bug. (and attach it as a plaintext attachment to the bug so it's easier for us Stewards to find)

Attachment #9155879 - Flags: data-review?(chutten)
Severity: -- → S3
Priority: -- → P1
Group: dom-core-security
Keywords: sec-other

Comment on attachment 9155879 [details]
Bug 1644780 - Include nsresult value in crash report when db::IncrementalVacuum fails. r=janv

The bug is now public. Sorry for the troubles.

Attachment #9155879 - Flags: data-review?(chutten)

Comment on attachment 9155879 [details]
Bug 1644780 - Include nsresult value in crash report when db::IncrementalVacuum fails. r=janv

As I mentioned earlier, data reviews should be attached to bugs. I'll take care of that in a moment.

Attachment #9155879 - Flags: data-review?(chutten)
Attached file data collection review
Attachment #9158563 - Flags: data-review?(chutten)
Comment on attachment 9158563 [details]
data collection review

PRELIMINARY NOTES:

Hm, the documentation on the crash reason annotation is kinda slim. We should look into improving that so that the inquisitive can more easily understand what sorts of things might be collected this way. Also, we should generate human-readable documentation from the CrashAnnotations yaml, maybe by exposing it through Probe Dictionary... I [filed it](https://github.com/mozilla/probe-scraper/issues/198).

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is a Crash Annotation so is documented in its definitions file [CrashAnnotations.yaml](https://hg.mozilla.org/mozilla-central/raw-file/tip/toolkit/crashreporter/CrashAnnotations.yaml).

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is a crash annotation so can be controlled through Firefox's Preferences for crash reporting.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

N / A.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

    Is the data collection request for default-on or default-off?

Default on for pre-release channels only.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

Yes. sgiesecke is responsible for renewing or removing the collection before 2020-12-23 (Christmas-ish).

---
Result: datareview+
Attachment #9158563 - Flags: data-review?(chutten) → data-review+

(In reply to Chris H-C :chutten from comment #11)

Comment on attachment 9155879 [details]
Bug 1644780 - Include nsresult value in crash report when db::IncrementalVacuum fails. r=janv

As I mentioned earlier, data reviews should be attached to bugs. I'll take care of that in a moment.

Sorry, I did not quite understand that part of your earlier comment as it seems. Now it's clear what you meant :) (I am still not sure why there is a data-review flag then on the attachment)

(In reply to Simon Giesecke [:sg] [he/him] from comment #14)

(In reply to Chris H-C :chutten from comment #11)

Comment on attachment 9155879 [details]
Bug 1644780 - Include nsresult value in crash report when db::IncrementalVacuum fails. r=janv

As I mentioned earlier, data reviews should be attached to bugs. I'll take care of that in a moment.

Sorry, I did not quite understand that part of your earlier comment as it seems. Now it's clear what you meant :) (I am still not sure why there is a data-review flag then on the attachment)

Oh, as why there's an option for a data-review on the phab "attachment"? When :emceeaich added the flag for us in the first place I don't know that there was an option to limit the flag to only certain mimetypes. I wonder if that's a possibility...

Pushed by sgiesecke@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c7f363de938a
Include nsresult value in crash report when db::IncrementalVacuum fails. r=janv

We have one report (https://crash-stats.mozilla.org/report/index/54855192-162d-4f47-bd71-316b60200705) now with a nsresult value of 0x8052000b, which is NS_ERROR_FILE_CORRUPTED. Jan, does this make this more actionable?

Flags: needinfo?(jvarga)

Interesting, the error is currently only handled here:
https://searchfox.org/mozilla-central/rev/c86c19bd64f8f19590a4190c282781d3a9631422/dom/cache/DBAction.cpp#230

But we already found out in LSNG that it's not sufficient to check it only after opening the database.
We need to solve this in general somehow, but in this concrete case we should just throw away entire cache ?

Flags: needinfo?(jvarga) → needinfo?(bugmail)

(In reply to Jan Varga [:janv] from comment #19)

We need to solve this in general somehow, but in this concrete case we should just throw away entire cache ?

I think the general plan is:

  1. Enhance mozStorage in bug 1125157 to provide corruption notifications.
  2. On corruption, wipe the storage bucket, which currently means the entire origin. (Currently this is specified by https://storage.spec.whatwg.org/#storage-endpoints to include LocalStorage but we can do a carve-out.) I think we've discussed having the DirectoryLock class expose a method for this which could then notify the nsIClearDataService.

Doing a localized removal of the cache (database) without doing things at the bucket/origin level is probably an okay stop-gap since the ServiceWorkerManager/ScriptLoader will notice the storage has gone away the next time it goes to try and wake up the ServiceWorker and will remove the registration.

Flags: needinfo?(bugmail)

In our team meeting we just discussed immediate removal of the assertion with a follow-up for doing the right thing.

Pushed by sgiesecke@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fc7530097680
Removed diagnostic assertion when db::IncrementalVacuum fails. r=dom-workers-and-storage-reviewers,janv

Andrew, the bug as such is now fixed, and I am not sure what exactly to file as a follow-up based on your comment https://bugzilla.mozilla.org/show_bug.cgi?id=1644780#c20. For the first bullet point, we already have a bug on file.

Blocks: 1125157
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Flags: needinfo?(bugmail)
Keywords: leave-open
Resolution: --- → FIXED

I filed bug 1652461 for point 2 of comment 20.

Flags: needinfo?(bugmail)
See Also: → 1652461
Target Milestone: --- → mozilla80
You need to log in before you can comment on or make changes to this bug.