Crash in mozilla::dom::indexedDB::FileInfo::GetFileForFileInfo

RESOLVED FIXED in Firefox 65

Status

()

defect
P2
critical
RESOLVED FIXED
Last year
4 months ago

People

(Reporter: calixte, Assigned: tt)

Tracking

(Blocks 2 bugs, {crash, regression})

59 Branch
mozilla66
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 wontfix, firefox57 unaffected, firefox58 unaffected, firefox59 wontfix, firefox60 wontfix, firefox61 wontfix, firefox62 wontfix, firefox63 wontfix, firefox64 wontfix, firefox65 fixed, firefox66 fixed)

Details

(Whiteboard: [clouseau]DWS_NEXT, crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

This bug was filed from the Socorro interface and is
report bp-a25f8c6b-f5fe-4cf1-ac16-bec5f0180121.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::indexedDB::FileInfo::GetFileForFileInfo dom/indexedDB/FileInfo.cpp:263
1 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::DatabaseOperationBase::GetStructuredCloneReadInfoFromExternalBlob dom/indexedDB/ActorsParent.cpp:19756
2 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::DatabaseOperationBase::GetStructuredCloneReadInfoFromSource<mozIStorageStatement> dom/indexedDB/ActorsParent.cpp:19636
3 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::ObjectStoreGetRequestOp::DoDatabaseWork dom/indexedDB/ActorsParent.cpp:26612
4 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::TransactionDatabaseOperationBase::RunOnConnectionThread dom/indexedDB/ActorsParent.cpp:23550
5 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::TransactionDatabaseOperationBase::Run dom/indexedDB/ActorsParent.cpp:23721
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1040
7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:517
8 xul.dll mozilla::SpinEventLoopUntil<1, <lambda_f55e483fda022ed1e4b129d57ce45f43> > xpcom/threads/nsThreadUtils.h:323
9 xul.dll mozilla::dom::indexedDB::`anonymous namespace'::ConnectionPool::ThreadRunnable::Run dom/indexedDB/ActorsParent.cpp:13207

=============================================================

There is 1 crash in nightly 59 with buildid 20180121100439. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1422335.

[1] https://hg.mozilla.org/mozilla-central/rev/265ec11162b8
Flags: needinfo?(amarchesini)
Assignee: nobody → amarchesini
Priority: -- → P2
It seems that FileInfo::Manager() returns a nullptr, but that is hard to believe.
asuth, do you have any idea here? It seems that this is the only crash report we have.
Flags: needinfo?(amarchesini) → needinfo?(bugmail)
Andrew, want to take a look?
Fix optional for 59 and 60 affected per the triage team.
Signature report for mozilla::dom::indexedDB::FileInfo::GetFileForFileInfo
Showing results from a month ago

771 Results

edea063c-0253-4036-b8da-89efe0180402 	2018-04-02 15:16:25 	Firefox 	59.0.2 	20180323154952 	Windows NT 	EXCEPTION_ACCESS_VIOLATION_READ 	0x10 	2018-03-30 11:30:18
ba8dcf7e-f26e-4431-ad77-857270180402 	2018-04-02 14:39:56 	Firefox 	59.0.2 	20180323154952 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x20 	2018-03-27 19:47:01
c0f0af4a-0eec-4beb-8102-4cd770180402 	2018-04-02 14:38:12 	Firefox 	59.0.2 	20180323154952 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x20 	2018-03-27 19:47:01
07a54d51-5641-4a23-9fd0-601fc0180402 	2018-04-02 14:36:25 	Firefox 	59.0.2 	20180323154952 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x20 	2018-03-27 19:47:01
65ec97d6-b889-448b-b5be-78cdc0180402 	2018-04-02 14:35:10 	Firefox 	59.0.2 	20180323154952 	Windows NT 	EXCEPTION_ACCESS_VIOLATION_READ 	0x10 	2018-03-30 13:19:50
73c18106-6c42-4f04-8228-cc30f0180402 	2018-04-02 14:29:57 	Firefox 	59.0.2 	20180323154952 	Windows NT 	EXCEPTION_ACCESS_VIOLATION_READ 	0x10 	2018-04-01 11:51:30
c295eb57-2eb3-47ef-9a7d-f77550180402 	2018-04-02 12:13:34 	Firefox 	59.0.2 	20180323154952 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x20 	2018-03-28 07:09:37
63896da8-4522-4d61-9b21-cfa300180402 	2018-04-02 11:45:45 	Firefox 	59.0.2 	20180323154952 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x20 	2015-01-01 00:00:59
11a6f9aa-2079-49ea-80be-2373c0180402 	2018-04-02 09:41:12 	Firefox 	59.0.2 	20180323154952 	Windows NT 	EXCEPTION_ACCESS_VIOLATION_READ 	0x10 	2018-03-28 13:20:42
Signature report for mozilla::dom::indexedDB::FileInfo::GetFileForFileInfo
Showing results from a month ago

Windows 7 	457 	59.3%
Windows 10 	149 	19.3%
OS X 10.11 	40 	5.2%
OS X 10.10 	32 	4.2%
Windows 8.1 	30 	3.9%
OS X 10.13 	28 	3.6%
OS X 10.12 	10 	1.3%
OS X 10.9 	10 	1.3%
Windows 8 	6 	0.8%
Linux 	        5 	0.6%
Android 	4 	0.5%

Firefox 	59.0.2 	229 	72.2% 	105
Firefox 	59.0.1 	65 	20.5% 	67
Firefox 	60.0b8 	8 	2.5% 	3
Firefox 	60.0b7 	6 	1.9% 	2
Firefox 	60.0b6 	5 	1.6% 	3
FennecAndroid 	59.0.1 	3 	0.9% 	1
Firefox 	60.0b5 	1 	0.3% 	1

amd64 	489 	63.4%
x86 	278 	36.1%
arm 	4 	0.5%
OS: Windows 10 → All
Hardware: Unspecified → All
Top Crashers for Firefox 59.0.2

64 	0.19% 	new 	mozilla::dom::indexedDB::FileInfo::GetFileForFileInfo	229 	197 	32 	0 	141 	0 	2018-01-21
Reporter

Updated

11 months ago

Comment 7

10 months ago
I'm submitting additional crash reports for this bug.

This morning between 11:30 and 11:52:
1: https://crash-stats.mozilla.com/report/index/bp-4619be25-156b-49b4-b53e-099200180907
2: https://crash-stats.mozilla.com/report/index/bp-04dd986b-3f80-4db8-ad6e-0485b0180907
3: https://crash-stats.mozilla.com/report/index/bp-7092c794-0ca1-4425-bd28-b7e160180907
4: https://crash-stats.mozilla.com/report/index/bp-c87dbbc6-c2e2-4fea-ab1e-9fbc70180907
5: https://crash-stats.mozilla.com/report/index/bp-d3e8ae06-bb54-4a17-8c5e-34bf90180907

Steps to reproduce:
1: --Either--
  A: I open a new tab in my Mozilla container (I'm a container mega-user) and go to drive.google.com
or
  B: I open a new tab, type in "drive.google.com" into the address bar

2: Firefox Developer Edition 63 crashes.

It doesn't seem to matter how many tabs I have open.
It does seem to matter what drive it is: my Outlaw Soaps drive in my Outlaw Soaps container doesn't seem to have this problem
It does seem to matter if it's in a container vs not in a container. Right now, I have a tab with Mozilla's drive in it, but it is uncontained.

This didn't happen with 62 (yesterday). It has only been since I updated to 63.

I am on Slack at dvincent if you want any quick information. I can share my screen if you'd like to see me reproduce it. It's very consistent.

Updated

8 months ago
Whiteboard: [clouseau] → [clouseau]DWS_NEXT
I don't have any news about how to reproduce this crash. Marion, if you want, assign it to somebody else. Thanks!
Flags: needinfo?(mdaly)
Giving to TOM.
Flags: needinfo?(shes050117)
Flags: needinfo?(mdaly)
Flags: needinfo?(bugmail)

Updated

8 months ago
Assignee: amarchesini → shes050117
Assignee

Comment 10

8 months ago
Will start from reproducing it based on the steps in comment 7
Flags: needinfo?(shes050117)
Assignee

Comment 11

8 months ago
(In reply to Tom Tung [:tt] from comment #10)
> Will start from reproducing it based on the steps in comment 7
Unfortunately, I couldn't reproduce that I suspected it's related to the data stored in the database? Also, I couldn't find useful information from the source code.

Thus, I might change the direction to investigate the code stack and the implementation in indexedDB.
Tom, were you able to investigate this crash more? Thanks
Flags: needinfo?(shes050117)
Bug 1507316 has a similar scenario which suggests database corruption and a check we should add as raised at https://bugzilla.mozilla.org/show_bug.cgi?id=1507316#c6
See Also: → 1507316
Assignee

Comment 14

7 months ago
(In reply to Pascal Chevrel:pascalc from comment #12)
> Tom, were you able to investigate this crash more? Thanks

Sure
Flags: needinfo?(shes050117)
Assignee

Comment 15

7 months ago
IIUC, the crash reports shows that |aFileInfo| object somehow is a nullptr (the top of crashing stack is [1]). It's carried by the |StructuredCloneReadInfo| object and it's created in DeserializeStructuredCloneFiles() if |aFileIds| isn't volid in DatabaseOperationBase::GetStructuredCloneReadInfoFromExternalBlob() [2].

Notes to myself:

If the |aFileIds| is not empty in [2], it will run into [3] and it's probably the cases Andrew mentioned in comment 13.

I'm not sure what will happen if |aFileIds.Volid()| [4] is true. Will look into that more.

Also, I'm still suspecting that it's related to unexpected text/string which had been written into the disk so that some users can reproduce it a few times.

[1] https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/dom/indexedDB/FileInfo.cpp#260
[2] https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/dom/indexedDB/ActorsParent.cpp#19509-19583
[3] https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/dom/indexedDB/ActorsParent.cpp#9704-9708
[4] https://searchfox.org/mozilla-central/rev/f997bd6bbfc4773e774fdb6cd010142370d186f9/dom/indexedDB/ActorsParent.cpp#19527
Assignee

Comment 16

7 months ago
(In reply to Tom Tung [:tt, :ttung] from comment #15)
> I'm not sure what will happen if |aFileIds.Volid()| [4] is true. Will look
> into that more.

If it happens, then it should be protected by the following index and Length() check.

I quote Andrew's suggestion in https://bugzilla.mozilla.org/show_bug.cgi?id=1507316#c6 below:

"Probably the right thing to do is to ensure that when we run into this scenario that we ensure we throw errors that propagate all the way back up to the top of the database operation, aborting/popping everything off the stack, and have us mark the database as corrupt.  And then nuke the origin as a result."

It seems to be a good way to do that. I'm thinking if it's possible to just remove the problematic table (the one contains id for FileInfo) in the database in the case like this bug? It might produce a lot of cases which need to be taken care, though.

Also, I'm still not sure how to reproduce this issue or running into cases that having a null FileInfo pointer in the hash table of FileManager. I'll figure that out...
Assignee

Comment 17

7 months ago
I still don't have much progress on hitting the issue.

However, by looking at the comments from crash reports, I suspect the "refcount" in the "file" table might less than zero because some of them said their Firefox kept crashing while visiting the same website.
Assignee

Comment 18

6 months ago
I have patches for returning an internal error and removing the database. And maybe finding a way to recover the fileInfo.

However, if the script of that website assumes some dependency between the databases and other objects, that could make the issue even worse. Probably the better way is like :asuth's suggestion --- nuke the origin directly.

I'm not sure how to "aborting/popping everything off the stack". I'll figure that out.
(In reply to Tom Tung [:tt, :ttung] from comment #18)
> I'm not sure how to "aborting/popping everything off the stack". I'll figure
> that out.

I just meant to convey that we'd want to audit the caller's call sites to verify they're all paying attention to the return value and don't attempt any further processing.  It's how things should work, but as part of any review we'd want to make sure we've done the legwork.
Assignee

Updated

6 months ago
Duplicate of this bug: 1515069
Assignee

Updated

6 months ago
Crash Signature: [@ mozilla::dom::indexedDB::FileInfo::GetFileForFileInfo] → [@ mozilla::dom::indexedDB::FileInfo::GetFileForFileInfo] [@ mozilla::dom::indexedDB::(anonymous namespace)::DatabaseOperationBase::GetStructuredCloneReadInfoFromExternalBlob ]
Assignee

Comment 21

6 months ago
In bug1515069 I found out:
I also check the table 'file' and 'object_data' table in the SQLite database and the files in the corresponding directory. Surprisingly, there are two files named: "56" and "57" in the directory, but "56" is recorded in the "file_ids" in the "object_data" table and "57" is recorded in the "id" field of the "file" table.

Note that there also is a -wal file.

I thought I might have a chance to somehow recover the table while not finding a fileInfo. (Probably somehow remove the file and its record in the database) However, in the bug 1515069, it seems that the issue is stranger than I expected.
Assignee

Comment 22

6 months ago
(In reply to Andrew Sutherland [:asuth] from comment #19)
> I just meant to convey that we'd want to audit the caller's call sites to
> verify they're all paying attention to the return value and don't attempt
> any further processing.  It's how things should work, but as part of any
> review we'd want to make sure we've done the legwork.

I see. Thanks for clarifying!
Assignee

Comment 23

6 months ago
In yesterday meeting, we are probably going to handle this issue in the following steps for the short-term:
1. Returning an error, and having telemetry to see what's happening (Note that the bug 1436188 does not cover the path for the structured clone.)
2. Having marker files on indexedDB and QuotaManager (also, probably a marker file for indicating the corrupt database or some strange error and fix that until the next upgrade)
3. Nuke the origin (Note: maybe not for the cookie)

For the long-term:
I guess we might need to figure out how could this happen
Assignee

Comment 25

6 months ago

Add bugs for step 2.
As for step 1, P1 shouldn't fix the issue, but it can at least stop the crash.

I'm going to test if:

  • Some things fail between insert the id into the table "object_data" and table "file"
  • Some things fail between remove the entry between the table "object_data" and table "file"
    to see whether these conditions might make the metadata in these two tables inconsistent.

If these conditions might create the issues, I believe at least I should add telemetry to see how often do they fail.
Note that even if these two create the issues, it might still not be able to explain why the broken databases have two different ids (one has id 56 and the other has id 57) instead of one of them contains more id than the other (one has id 56 and 57 while the other has none of them).

Depends on: 1504535, 1512750
Assignee

Updated

5 months ago
Blocks: 1519859
Assignee

Comment 26

5 months ago

Andrew, thanks for the review!

I change the warning message and the comment and just want to make sure I don't misunderstand anything. Would you mind taking the final look if it's appropriate? Thanks!

Flags: needinfo?(bugmail)
Assignee

Comment 27

5 months ago
Posted file data-request.txt (obsolete) —
Assignee

Comment 28

5 months ago

Comment on attachment 9036351 [details]
data-request.txt

:chutten, would you mind data-reviewing this? Thanks!

Attachment #9036351 - Flags: review?(chutten)

Comment on attachment 9036351 [details]
data-request.txt

It appears as though some of these answers were copied from the review for bug 1436188 (for instance, Q7 has a note about QuotaManager which appears not to be relevant?).

Attachment #9036351 - Flags: review?(chutten) → review-
Assignee

Comment 30

5 months ago

(In reply to Chris H-C :chutten from comment #29)

It appears as though some of these answers were copied from the review for bug 1436188 (for instance, Q7 has a note about QuotaManager which appears not to be relevant?).

You are right. I used that as a template file. Will correct that later. Thanks and sorry for that!

(In reply to Tom Tung [:tt, :ttung] from comment #26)

I change the warning message and the comment and just want to make sure I don't misunderstand anything. Would you mind taking the final look if it's appropriate? Thanks!

LGTM! (Making sure to address the data-review issues, of course.)

Flags: needinfo?(bugmail)
Assignee

Comment 32

5 months ago
Posted file data-request.txt

:chutten would you mind reviewing this again? Thanks!

Attachment #9036351 - Attachment is obsolete: true
Attachment #9036594 - Flags: review?(chutten)

Comment on attachment 9036594 [details]
data-request.txt

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is Telemetry so is documented in its definitions file (Scalars.yaml), the Probe Dictionary, and on telemetry.mozilla.org's Measurement Dashboards.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

N/A, expires in Firefox 70.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all release channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

Yes. :tt is responsible for removing or renewing this collection before it expires in Firefox 70.


Result: datareview+

Attachment #9036594 - Flags: review?(chutten) → review+
Attachment #9035307 - Attachment description: Bug 1432133 - P1 - Returning an error when FileInfo is missing as a short-term fix; → Bug 1432133 - P1 - Returning an error when FileInfo is missing as a short-term fix; r=asuth, data-review=chutten

Comment 34

5 months ago
Pushed by shes050117@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/08dccff23fbb
P1 - Returning an error when FileInfo is missing as a short-term fix; r=asuth, data-review=chutten
Assignee

Comment 35

5 months ago

Will also provide a patch for uplifting to Beta and Release after a few days.

Comment 36

5 months ago
bugherder
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

No need to worry about release at this point, but a Beta patch would be appreciated! Just FYI, we're building the final Beta of this cycle tomorrow.

Flags: needinfo?(shes050117)
Assignee

Comment 38

5 months ago

Comment on attachment 9035307 [details]
Bug 1432133 - P1 - Returning an error when FileInfo is missing as a short-term fix; r=asuth, data-review=chutten

[Beta/Release Uplift Approval Request]

Feature/Bug causing the regression: None

User impact if declined: If users run into the problematic area, then they will crash. However, note that the patch would just help them away from crashing. After applying this patch, they will get an error when running to this.

Is this code covered by automated tests?: No

Has the fix been verified in Nightly?: Yes

Needs manual test from QE?: No

If yes, steps to reproduce:

List of other uplifts needed: None

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): This patch provides an additional check to escape from using a null pointer. Note that this patch only fixes the crash by returning an error, so users won't recover from database corruption. We are working on finding the reason for causing corruption and finding the way to recover on bug 1519859.

String changes made/needed:

Flags: needinfo?(shes050117)
Attachment #9035307 - Flags: approval-mozilla-beta?
Assignee

Comment 39

5 months ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #37)

No need to worry about release at this point, but a Beta patch would be appreciated! Just FYI, we're building the final Beta of this cycle tomorrow.

Thanks for the information and notice!

Comment on attachment 9035307 [details]
Bug 1432133 - P1 - Returning an error when FileInfo is missing as a short-term fix; r=asuth, data-review=chutten

[Triage Comment]
Error out instead of crashing when encountering a corrupted database. Approved for 65.0b12.

Attachment #9035307 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Assignee

Comment 42

5 months ago

Move the blocking bugs to bug 1519859

No longer depends on: 1504535, 1512750
You need to log in before you can comment on or make changes to this bug.