Closed
Bug 1422815
Opened 7 years ago
Closed 5 years ago
browser crash when viewing https://bottender.js.org/docs/GettingStarted
Categories
(Core :: Storage: Quota Manager, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla68
People
(Reporter: tw0517tw, Assigned: janv)
References
Details
(Keywords: crash, regression, Whiteboard: DWS_NEXT)
Crash Data
Attachments
(1 file, 1 obsolete file)
1.32 KB,
patch
|
tt
:
review+
gchang
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0 Build ID: 20171130160223 Steps to reproduce: with Firefox version 58.0b8, go to https://bottender.js.org/docs/GettingStarted Actual results: browser crashes and ask for sending crash report. https://crash-stats.mozilla.com/report/index/cad08ee2-38a4-4a59-81df-f87140171202#tab-details Expected results: successfully load the website
Comment 1•7 years ago
|
||
http://bit.ly/2zMerOQ are the other crashes in 59 and 58, affecting Mac and Windows.
Status: UNCONFIRMED → NEW
Crash Signature: [@ mozilla::dom::cache::BodyMaybeUpdatePaddingSize]
status-firefox58:
--- → affected
status-firefox59:
--- → affected
Component: Untriaged → DOM
Ever confirmed: true
Keywords: crash,
regression
OS: Unspecified → All
Product: Firefox → Core
Hardware: Unspecified → All
Comment 2•7 years ago
|
||
Tom may know what's going on here (looks to be related to padding size).
Flags: needinfo?(ttung)
Priority: -- → P2
Comment 3•7 years ago
|
||
I cannot reproduce the crash on my mac with Nightly 59.0a1, but it seems to be related to the quota object. Skim the reports and I find out all the crashes happen because they failed to get the quota object[1]. It could imply the originInfo had either not been created yet or been deleted for some reasons. I'll dig more this afternoon. [1] https://hg.mozilla.org/releases/mozilla-beta/annotate/27dc691eb4a0/dom/cache/FileUtils.cpp#l300
Assignee: nobody → ttung
Status: NEW → ASSIGNED
Flags: needinfo?(ttung)
Another QuotaObject issue (both mQuotaObject and mJournalObject are nullptr) is Bug 1414737. It could be the simliar issue.
Comment 5•7 years ago
|
||
I still have no idea why we are missing quotaObject during executing cache-put-action. I suspect we fail to get the quotaObject because the originInfo is deleted due to eviction by QuotaManager. In theory, QuotaManager tries to evict the inactive origins when the increasing usage makes the overall usage greater than the overall limit [1]. Specifically, QuotaManager distinguishes whether the origin is active or not by checking if the origin is holding the directory lock[2]. I guess somehow users bump into this situation when either cache manager didn't hold a directory lock or there is something wrong while checking whether the origin is holding the directory lock. [1] https://searchfox.org/mozilla-central/source/dom/quota/ActorsParent.cpp#3097-3113 [2] https://searchfox.org/mozilla-central/source/dom/quota/ActorsParent.cpp#3464-3489 I'll keep investigating in the direction of my guess. Maybe start with a test, which makes the eviction frequently happen to hit the assertion, and run it by RR chaos mode. Until I find the root cause, I think we need a runtime check for the quotaObject to avoid from accessing the nullptr accidentally [3]. [3] https://searchfox.org/mozilla-central/source/dom/cache/FileUtils.cpp#307
Comment 6•7 years ago
|
||
Ben, this patch is preventing us from accessing quotaObject while failing to get it until I find the root cause. Could you help me to review? Thanks! Furthermore, I'm thinking to uplift this patch to beta. Does it sound reasonable?
Attachment #8934780 -
Flags: review?(bkelly)
Comment 7•7 years ago
|
||
Hi tw0517tw, It would be really nice if I could get more information from you. So that I can verify whether my guess at comment #5 is correct or not. Would you mind testing whether the Firefox is still crashing on that website after restarting it? Thanks!
Flags: needinfo?(tw0517tw)
Updated•7 years ago
|
Attachment #8934780 -
Flags: review?(bkelly) → review+
Comment 8•7 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=65f79d13d95c4e901447b0b2fec7ad4868dd4d11
Comment 9•7 years ago
|
||
(In reply to Tom Tung [:tt] from comment #6) Note that this patch won't fix the issue, it just avoids the release users from accessing the nullptr since MOZ_DIAGNOSTIC_ASSERT is only valid on Nightly and Beta.
Reporter | ||
Comment 10•7 years ago
|
||
Still crashes after restarting Firefox (Dev Edition 58.0b9). Getting this crash report: https://crash-stats.mozilla.com/report/index/6cea3765-1979-4f9c-8613-764310171206 But I found that the website can be loaded in Private Browsing mode.
Flags: needinfo?(tw0517tw)
Comment 11•7 years ago
|
||
(In reply to tw0517tw from comment #10) Thanks for the reply! > Still crashes after restarting Firefox (Dev Edition 58.0b9). Getting this > crash report: > https://crash-stats.mozilla.com/report/index/6cea3765-1979-4f9c-8613- > 764310171206 Okay, then my guess might be wrong. If you restart the Firefox, it should create an originInfo for each origin during initializing origins in QuotaManager. > But I found that the website can be loaded in Private Browsing mode. It's probably because the service worker won't work in Private Browsing mode. That website utilizes service worker to store opaque response via cache API. Thus, if the service worker doesn't be launched, you won't bump into the same condition.
Comment 12•7 years ago
|
||
Getting a copy of the profile, or just its storage directories, might be the best way to investigate. If you are willing to share them privately with us.
I tried to load that storage folder on Linux, it doesn't crash but always crash on my windows.
Comment 14•7 years ago
|
||
Thanks to tw0517tw who shared their crash profile with us privately. I get the reason why we bump into this situation. The group string get from QuotaManger is "js.org" but the group string from (dom) cache is "bottender.js.org". And, that why we fail to get the quotaObject because it cannot get the originInfo with the group's string is "bottender.js.org".
https://publicsuffix.org/list/effective_tld_names.dat See js.org. It's actually in the public suffix list. I guess we have mis-matched logic code. So we took js.org as eTLD+1.
Comment 16•7 years ago
|
||
(In reply to Tom Tung [:tt] from comment #14) Note that the group's string got from QuotaManager is read from ".metadata/.metadata-v2" file while (DOM) cache gets the group's string from the principal. Still not sure about why we write the different group's string into ".metadata/.metadata-v2".
(In reply to Shawn Huang [:shawnjohnjr] from comment #13) > I tried to load that storage folder on Linux, it doesn't crash but always > crash on my windows. Correction: I can hit the same problem on Linux.
Comment 18•7 years ago
|
||
(In reply to Tom Tung [:tt] from comment #16) > Still not sure about why we write the different group's string into > ".metadata/.metadata-v2". (In reply to Shawn Huang [:shawnjohnjr] from comment #15) > https://publicsuffix.org/list/effective_tld_names.dat > See js.org. It's actually in the public suffix list. > I guess we have mis-matched logic code. So we took js.org as eTLD+1. This might be able to explain that. Before we updated the list, we had used to treat "js.org" as a group (since "org" were eTLD) and store it into ".metadata/.metadata-v2" in the QuotaManager. After that, we started to treat "bottender.js.org" as a group (since updating the "js.org" to the list). Then, while Firefox is initializing origins, QuotaManager creates the originInfo/groupInfo/groupInfoPairs based on the record in the ".metadata/.metadata-v2". It makes the website fail to get the quota object from originInfo/groupInfo/groupInfoPairs. This scenario could happen on manifold websites.
Note: js.org is added into Aug 17. 2016.
Comment 20•7 years ago
|
||
We discuss this at today's storage meeting with Jan. I'll investigate more and consider the following options: 1. Recover group from origin -> but need to jump into the main thread - Maybe we could have a runtime recover mechanism while we are in the initializing step and we are in the main thread. 2. Only update the metadata file which their group has been changed. - Investigate what service we have for eTLD + 1 service - Check notification time via OS's functions. 3. Just delete all the metadata file -> but may lose the persisted state - Add a new table for updating eTLD list in storage.sqlite
Comment 21•7 years ago
|
||
If I understood Tom correctly, this isn't a new regression since 58 but the defect has been in our code earlier, so I modified the firefox57 status to reflect the situation properly. Please feel free to correct me if I am wrong.
status-firefox57:
--- → wontfix
Comment 22•7 years ago
|
||
Update the commit message. I'm planning to land and uplift it in this week.
Attachment #8934780 -
Attachment is obsolete: true
Attachment #8936443 -
Flags: review+
Comment 23•6 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/dfa71cd9294d9d27b8a416e6fa7aef222e3b73d4 Bug 1422815 - P1: Fix for failing to get quotaObject unexpectedly. r=bkelly
Updated•6 years ago
|
Keywords: leave-open
Comment 24•6 years ago
|
||
Comment on attachment 8936443 [details] [diff] [review] [Final] Bug 1422815 - P1: Fix for failing to get quotaObject unexpectedly. r=bkelly Land the fix for avoiding accessing nullptr first
Attachment #8936443 -
Attachment description: Bug 1422815 - P1: Fix for failing to get quotaObject unexpectedly. r=bkelly → [Final] Bug 1422815 - P1: Fix for failing to get quotaObject unexpectedly. r=bkelly
Comment 25•6 years ago
|
||
Comment on attachment 8936443 [details] [diff] [review] [Final] Bug 1422815 - P1: Fix for failing to get quotaObject unexpectedly. r=bkelly This patch is mainly to avoid the Release from accessing a null pointer. And, I can locally reproduce this in Release 57 (crash) by having the profile provided by the reporter. Approval Request Comment [Feature/Bug causing the regression]: Well, I'm not sure which specific bug causes this. There are two reasons which make the browser crash. First of all, we should be able to get a quotaObject after initializing the origin. However, in this case, we're not able to get it due to group's string mismatching. The other is that Bug 1290481 add a MOZ_DIAGNOSTIC_ASSERT for expecting quotaObject exists. Thus, the assertion is hit and crash the browser. However, the assert won't exist on Release and it makes Release may be able to access a null pointer in this condiction. Therefore, this patch is trying to avoid touch null pointer on Release 58+. [User impact if declined]: Crash while browsing a website which used indexedDB/Cache API before Firefox 57 and then visits it again after Firefox 57 while the website is storing opaque cache response. [Is this code covered by automated tests?]: No, but it just an early return. [Has the fix been verified in Nightly?]: It won't work on Nightly and Beta since it'll hit the assert before the if-statement. But, it should work on the Release since the if-statement prevent us from accessing the null pointer. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: Only this one for now. [Is the change risky?]: I don't think so. [Why is the change risky/not risky?]: It just an early return. I verified it (not crash again) on my Mac by hiding the assert and applying the patch. [String changes made/needed]: No.
Attachment #8936443 -
Flags: approval-mozilla-beta?
Comment 26•6 years ago
|
||
I put the note I got after discussing with Jan, Andrew, and Shawn below. Please correct me if I put something wrong. I might need to: - Have a minor upgrade on QuotaManager, to add a column of "hash for etld.dat" on storage.sqllite and maybe update the group string for the first time. - Each time when the QM is initialing if hash for etld.dat changes, check whether QM needs to update the group string in the metadata file. Others: - Have a test to test downgrade. - Need to worry about failure during the upgrade (e.g. have .tmp file then finalize the file). Something might be useful: - https://searchfox.org/mozilla-central/source/netwerk/dns/prepare_tlds.py
Comment 27•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/dfa71cd9294d
Comment 28•6 years ago
|
||
Comment on attachment 8936443 [details] [diff] [review] [Final] Bug 1422815 - P1: Fix for failing to get quotaObject unexpectedly. r=bkelly Avoid a crash. Beta58+.
Attachment #8936443 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 29•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/3d30fcbdd407
Comment 30•6 years ago
|
||
The issue in this bug is the group string from eTLD service is updated, but the one in the directory file isn't. We should have a mechanism in QuotaManager to update the directory for the group. However, the problem is that it might take tons of time to initialize and check whether the directory for the group is up-to-date. Also, the current optimization for eTLD service is lived in the main thread. I guess we should consider if we try to get this service from other thread (e.g. IO thread) is appropriate or not. Besides, we need a minor upgrade in QuotaManager to update and check all the directory for the group is okay or not. I planned to utilize this chance to search the whole directories and remove the bad chrome origins which are created after QM's schema 2_1. They shouldn't break the initialization for QM, but we should remove them in some place in the future.
Assignee: ttung → nobody
Status: ASSIGNED → NEW
Updated•6 years ago
|
status-firefox60:
--- → affected
status-firefox61:
--- → ?
Comment 31•6 years ago
|
||
take a look in a few weeks
Updated•6 years ago
|
status-firefox60:
--- → fix-optional
status-firefox61:
--- → affected
status-firefox-esr52:
--- → unaffected
Updated•6 years ago
|
Assignee | ||
Comment 32•6 years ago
|
||
Tom is back, I think he should take this bug again. Basically, comment 26 is still valid, but I think we could also try to get rid of .metadata files first and store metadata info in storage.sqlite. Maintaining own file is quite expensive and we will need to store and update more and more metadata in future I guess. Andrew, what do you think ?
Flags: needinfo?(shes050117)
Flags: needinfo?(jvarga)
Flags: needinfo?(bugmail)
Updated•6 years ago
|
Assignee: nobody → shes050117
Flags: needinfo?(shes050117)
Comment 33•6 years ago
|
||
Agreed. I think we want to get to a future where on startup we check for an indication that QM had a clean shutdown. If QM had a clean shutdown, all of the info we need can come from storage.sqlite without doing an initialization sweep of PROFILE/storage. If there's indication of a dirty shutdown, we would then do the sweep. And a good first step, as you propose, would be to move the contents of the metadata files into storage.sqlite. We could maintain the existing initialization sweep, but be rid of the .metadata files. I might propose we might want to add a telemetry histogram probe that tracks the duration of the initialization sweep and land that prior to this bug. That way we can see the speed-up from the .metadata file removals. We'd also have it include whether initialization succeeded as a boolean.
Flags: needinfo?(bugmail)
Comment 34•6 years ago
|
||
(In reply to Andrew Sutherland [:asuth] from comment #33) > Agreed. I think we want to get to a future where on startup we check for an > indication that QM had a clean shutdown. If QM had a clean shutdown, all of > the info we need can come from storage.sqlite without doing an > initialization sweep of PROFILE/storage. If there's indication of a dirty > shutdown, we would then do the sweep. I'm not sure if I understand that correctly. What if files are accidentally modified (maybe be added or deleted) between a clean shutdown and the next open. For instance, an anti-virus software deletes a file under PROFILE/storage while Firefox isn't opening, though It won't happen frequently. Although it's a clean shutdown, there are a few corner cases that make files be changed. If we want to do that, we may need to consider how to make sure the record in the storage.sqlite is always correct. > And a good first step, as you propose, would be to move the contents of the > metadata files into storage.sqlite. We could maintain the existing > initialization sweep, but be rid of the .metadata files. > > I might propose we might want to add a telemetry histogram probe that tracks > the duration of the initialization sweep and land that prior to this bug. > That way we can see the speed-up from the .metadata file removals. We'd > also have it include whether initialization succeeded as a boolean. That sounds like a good idea to me, so I file the bug 1481716.
Assignee | ||
Comment 35•6 years ago
|
||
(In reply to Tom Tung [:tt] from comment #34) > (In reply to Andrew Sutherland [:asuth] from comment #33) > > Agreed. I think we want to get to a future where on startup we check for an > > indication that QM had a clean shutdown. If QM had a clean shutdown, all of > > the info we need can come from storage.sqlite without doing an > > initialization sweep of PROFILE/storage. If there's indication of a dirty > > shutdown, we would then do the sweep. > > I'm not sure if I understand that correctly. What if files are accidentally > modified (maybe be added or deleted) between a clean shutdown and the next > open. For instance, an anti-virus software deletes a file under > PROFILE/storage while Firefox isn't opening, though It won't happen > frequently. Although it's a clean shutdown, there are a few corner cases > that make files be changed. If we want to do that, we may need to consider > how to make sure the record in the storage.sqlite is always correct. That's why we haven't done this yet, it's not trivial task, all the corner cases have to be covered or mitigated at least. There will be more and more origin directories, especially once we land new LocalStorage implementation. So the performance optimization (which avoids profile/storage sweep when possible) is inevitable.
Comment 36•6 years ago
|
||
(In reply to Tom Tung [:tt] from comment #34) > I'm not sure if I understand that correctly. What if files are accidentally > modified (maybe be added or deleted) between a clean shutdown and the next > open. For instance, an anti-virus software deletes a file under > PROFILE/storage while Firefox isn't opening, though It won't happen > frequently. Although it's a clean shutdown, there are a few corner cases > that make files be changed. If we want to do that, we may need to consider > how to make sure the record in the storage.sqlite is always correct. I think it's important to keep in mind that the main goal of Quota Manager is to make sure an origin doesn't use more space than we allow it to. If an anti-virus software quarantines or outright deletes a file under PROFILE/storage, QM over-estimating the quota in use by the origin isn't that big a problem as long as we have some mechanism for correcting our numbers, such as re-running the tally when the origin is opened. And/or doing an entire counting sweep that doesn't block QM startup as part of a maintenance process like how we try and vacuum IDB databases monthly. If something is adding files under PROFILE/storage, then I'd expect them to either be: - file-manager artifacts like ".desktop" that we've been seeing, that are user intent and shouldn't be tracked for QM purposes - The anti-virus putting a file it stole back. If it puts it back before we notice, it's not a problem. If it puts it back after, well, I would suggest that if we implement what I propose, we'd probably actually want to spend a release or two where we use the database and still do an initialization counting at startup to make sure they agree and report back telemetry that indicates if in fact files are disappearing or re-appearing a lot, etc. - Anything else I think amounts to malware trying to hide files in our profile. I'm not sure there's anything we can reasonably do about this other than report to the user that their computer seems haunted. One thing that may also be worth looking into is whether we're working with file-systems that can efficiently tell us if a directory sub-tree has changed. For example, on linux, btrfs may be able to efficiently let us know whether PROFILE/STORAGE has changed at all. On something like Windows/NTFS, however, I think we might be limited to checking directory timestamps to know if their immediate contents have changed (but not the contents of their sub-directories). All that said, I think these are our 2 big goals from a performance perspective: 1) For a given origin, especially a ServiceWorker, we want to reduce the latency of initial DOM Cache opening and IDB openings. This argues against doing *blocking* origin counting when we first go to display a site for the origin. I think however we could get away with counting the origin lazily after we've already opened it. 2) We want browser startup's time to visible about:newtab (or restored site) being as short as possible. This argues against the initialization counting sweep that we currently do.
Comment 37•6 years ago
|
||
Thank Jan and Andrew for explaining! (In reply to Jan Varga [:janv] from comment #35) > That's why we haven't done this yet, it's not trivial task, all the corner > cases > have to be covered or mitigated at least. > There will be more and more origin directories, especially once we land new > LocalStorage implementation. So the performance optimization (which avoids > profile/storage sweep when possible) is inevitable. I see! (In reply to Andrew Sutherland [:asuth] from comment #36) > I think it's important to keep in mind that the main goal of Quota Manager > is to make sure an origin doesn't use more space than we allow it to. If an It's a rather good point! I'll keep that in mind > anti-virus software quarantines or outright deletes a file under > PROFILE/storage, QM over-estimating the quota in use by the origin isn't > that big a problem as long as we have some mechanism for correcting our > numbers, such as re-running the tally when the origin is opened. And/or > doing an entire counting sweep that doesn't block QM startup as part of a > maintenance process like how we try and vacuum IDB databases monthly. > > If something is adding files under PROFILE/storage, then I'd expect them to > either be: > - file-manager artifacts like ".desktop" that we've been seeing, that are > user intent and shouldn't be tracked for QM purposes > - The anti-virus putting a file it stole back. If it puts it back before we > notice, it's not a problem. If it puts it back after, well, I would suggest > that if we implement what I propose, we'd probably actually want to spend a > release or two where we use the database and still do an initialization > counting at startup to make sure they agree and report back telemetry that > indicates if in fact files are disappearing or re-appearing a lot, etc. > - Anything else I think amounts to malware trying to hide files in our > profile. I'm not sure there's anything we can reasonably do about this > other than report to the user that their computer seems haunted. > > One thing that may also be worth looking into is whether we're working with > file-systems that can efficiently tell us if a directory sub-tree has > changed. For example, on linux, btrfs may be able to efficiently let us > know whether PROFILE/STORAGE has changed at all. On something like > Windows/NTFS, however, I think we might be limited to checking directory > timestamps to know if their immediate contents have changed (but not the > contents of their sub-directories). > > All that said, I think these are our 2 big goals from a performance > perspective: > 1) For a given origin, especially a ServiceWorker, we want to reduce the > latency of initial DOM Cache opening and IDB openings. This argues against > doing *blocking* origin counting when we first go to display a site for the > origin. I think however we could get away with counting the origin lazily > after we've already opened it. > 2) We want browser startup's time to visible about:newtab (or restored site) > being as short as possible. This argues against the initialization counting > sweep that we currently do.
Updated•6 years ago
|
Assignee: shes050117 → nobody
Whiteboard: DWS_NEXT
Comment 38•5 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:hsinyi, maybe it's time to close this bug?
Flags: needinfo?(htsai)
Comment 39•5 years ago
|
||
This issue is still on QuotaManager. We just have other issues with higher priority to fix. I put the leave-open keyword because I only land a quick-fix rather than an actual fix.
I guess it's better to keep it open? Or, I can open another bug to track this issue if it's more appropriate.
Comment 40•5 years ago
|
||
Thanks Tom for the comment. I'm moving to QuotaManager component.
Component: DOM → DOM: Quota Manager
Flags: needinfo?(htsai)
Assignee | ||
Updated•5 years ago
|
Crash Signature: [@ mozilla::dom::cache::BodyMaybeUpdatePaddingSize] → [@ mozilla::dom::cache::BodyMaybeUpdatePaddingSize]
[@ mozilla::dom::quota::QuotaObject::LockedMaybeUpdateSize]
Assignee | ||
Updated•5 years ago
|
Crash Signature: [@ mozilla::dom::cache::BodyMaybeUpdatePaddingSize]
[@ mozilla::dom::quota::QuotaObject::LockedMaybeUpdateSize] → [@ mozilla::dom::cache::BodyMaybeUpdatePaddingSize]
Assignee | ||
Comment 42•5 years ago
|
||
Fixed by bug 1535995.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Assignee: nobody → shes050117
Updated•5 years ago
|
Keywords: leave-open
Updated•5 years ago
|
Assignee: shes050117 → jvarga
status-firefox66:
--- → wontfix
status-firefox67:
--- → affected
status-firefox68:
--- → fixed
Target Milestone: --- → mozilla68
Using bug 1535995 to update status-firefox67 to fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•