Closed Bug 1576593 Opened 4 months ago Closed 2 months ago

Crash in [@ mozilla::dom::quota::OriginInfo::LockedDecreaseUsage]

Categories

(Core :: Storage: Quota Manager, defect, P1, critical)

70 Branch
Unspecified
All
defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 --- unaffected
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- fixed

People

(Reporter: calixte, Assigned: tt)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files, 2 obsolete files)

This bug is for crash report bp-34a7a90c-243e-4a28-9f98-22f0e0190825.

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::quota::OriginInfo::LockedDecreaseUsage dom/quota/ActorsParent.cpp:6821
1 xul.dll mozilla::dom::quota::QuotaManager::DecreaseUsageForOrigin dom/quota/ActorsParent.cpp:3667
2 xul.dll static nsresult mozilla::dom::indexedDB::`anonymous namespace'::DeleteFile dom/indexedDB/ActorsParent.cpp:8715
3 xul.dll static nsresult mozilla::dom::indexedDB::`anonymous namespace'::RemoveDatabaseFilesAndDirectory dom/indexedDB/ActorsParent.cpp:8837
4 xul.dll nsresult mozilla::dom::indexedDB::`anonymous namespace'::DeleteDatabaseOp::VersionChangeOp::Run dom/indexedDB/ActorsParent.cpp:21667
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1225
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:486
7 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:333
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
9 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290

There are 10 crashes (from 4 installations) in nightly 70 starting with buildid 20190824215209. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1563023.

[1] https://hg.mozilla.org/mozilla-central/rev?node=8192923fd123

Flags: needinfo?(jvarga)

All these crashes are coming from the IDB quota client and MOZ_CRASH Reason is MOZ_DIAGNOSTIC_ASSERT(mIsSome).
I guess the crash happens here:
https://hg.mozilla.org/mozilla-central/annotate/1125aad05d5c39c2453c04ca6d037afaef16c296/dom/quota/ActorsParent.cpp#l6806

IDB calls DecreaseUsageForOrigin here:
https://hg.mozilla.org/mozilla-central/annotate/1125aad05d5c39c2453c04ca6d037afaef16c296/dom/indexedDB/ActorsParent.cpp#l8715

It seems IDB tries to decrease client (IDB) usage when it's null. It looks to me this bug existed before, but was hidden.

Tom, you recently touched this code in bug 1504535, can you take a look ?

Flags: needinfo?(jvarga) → needinfo?(shes050117)

Sure, I will look into this issue.

Assignee: nobody → shes050117
Flags: needinfo?(shes050117)
Priority: -- → P1

Any luck? Is this something you'd be aiming for fix for 70 release?

Flags: needinfo?(shes050117)

(In reply to Liz Henry (:lizzard) from comment #3)

Any luck? Is this something you'd be aiming for fix for 70 release?

Unfortunately, not. I'll keep working on it to see if there is something can be uplifted.

Flags: needinfo?(shes050117)
Status: NEW → ASSIGNED

STR:

  1. Create a new profile with privacy.firstparty.isolate = true.
  2. Long-press the New Tab button and create a Personal container tab.
  3. Go to https://firefox-storage-test.glitch.me in the container tab.

Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=76ed02ac9371a169ebd0e30e08127c59be2d96e6&tochange=94258b96e70dc1622ef697138b115088b13c3554

Regressed by:
Bug 1563023 - Part 3: Change client usage type to support the null value

Has Regression Range: --- → yes
Has STR: --- → yes
OS: Windows 10 → All

(In reply to Kestrel from comment #5)

Many thanks to the STR! Now I can reproduce the issue.

However, I still don't think it's regressed by Bug 1563023. I suspect it's probably regressed by the implementation for the storage part of first-party isolation. The usage on QM seems to be overflowed before Bug 1563023 - Part 3 landed. I will investigate this more.

I haven't completed the investigation, because I hit an assertion in another place [0] on the debug build. And, I believe this is related so I spend time looking into that.

I found something incorrect:
In Connection::initialize ([1] as one of the examples), we open the SQLite database file with parameters.
The way we pass the parameter is like: "file:///Users/tomtung/Work/mozilla-central/objdir/tmp/profile-default/storage/permanent/chrome/idb/3870112724rsegmnoittet-es.sqlite?persistenceType=persistent&group=chrome&origin=chrome&clientType=idb&cache=private"
Each parameter is connected with char '&'. However, for the origin attributes, we also connect them with char '&'. I suspect the implementation for parsing parameters in sqlite use token '&' to divide the parameters.

So, for the example in STR, I wonder the scenario is that:

  1. We initilaize a database file with parameter origin as: "https+++firefox-storage-test.glitch.me^userContextId=1&firstPartyDomain=firefox-storage-test.glitch.me"
  2. We expect the parameter origin to be: "https+++firefox-storage-test.glitch.me^userContextId=1&firstPartyDomain=firefox-storage-test.glitch.me". However SQLite store two parameters in this case; one is parameter origin ("https+++firefox-storage-test.glitch.me^userContextId=1") and the other is parameter firstPartyDomain ("firstPartyDomain=firefox-storage-test.glitch.me").
    [Note that I haven't checked this compeltely] 3. When we get the quotaObject via telemetryVFS, we might get the wrong one (origin:"https+++firefox-storage-test.glitch.me^userContextId=1"). When we access the quotaObject from QM, we might get the correct one ("https+++firefox-storage-test.glitch.me^userContextId=1&firstPartyDomain=firefox-storage-test.glitch.me")

I'll check if my expectation is correct or not. If it is, then one solution I can think of now is maybe to pass originAttribute as another parameter and the content is the stringify JSON format of the origin attribute when we initialize the connect. (And, get the parameter when getting the quota object)

[0] https://searchfox.org/mozilla-central/rev/dafa68a84f01897c8cfb4bbaf6829cae8df4eb57/dom/quota/ActorsParent.cpp#4469
[1] https://searchfox.org/mozilla-central/rev/d1e33e3e11f559952d7d80e722d26a6cf5dd80ac/storage/mozStorageConnection.cpp#686

Note that instead of doing additional encoding tricks, at the bottom of https://bugzilla.mozilla.org/show_bug.cgi?id=1516333#c23 where we were discussing another quota tunneling issue, I proposed just passing the QuotaObject through directly via thread-local-storage.

Yeah, we want to stop passing persistenceType, group, origin and clientType. But since there can be more than one storage connection per thread I think we need to pass a UUID, so TelemetryVFS can use that to get matching QuotaObject.

I meant the thread-local-storage would only be used during the (synchronous) Connection::initialize call. It's just a means of tunneling a pointer through a layer that only allows passing strings without stringifying a pointer and scarily converting it back. That said, UUID would be a good improvement too. (And I haven't really tried to understand the costs of allocating an additional TLS slot for such a one-off purpose...)

Attached file Bug 1576593 - Initial patch; (obsolete) —

(In reply to Kestrel from comment #5)

STR:

  1. Create a new profile with privacy.firstparty.isolate = true.
  2. Long-press the New Tab button and create a Personal container tab.
  3. Go to https://firefox-storage-test.glitch.me in the container tab.

Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=76ed02ac9371a169ebd0e30e08127c59be2d96e6&tochange=94258b96e70dc1622ef697138b115088b13c3554

Regressed by:
Bug 1563023 - Part 3: Change client usage type to support the null value

After applying the patch, I cannot reproduce the crash. But, please note that the patch needs to be polished and some implementation detail might be changed.

Duplicate of this bug: 1583191
Crash Signature: [@ mozilla::dom::quota::OriginInfo::LockedDecreaseUsage] → [@ mozilla::dom::quota::OriginInfo::LockedDecreaseUsage] [@ mozilla::dom::quota::QuotaManager::DecreaseUsageForOrigin ]
See Also: → 1581067
See Also: 1581067

We might need this for plan B

See Also: → 1252409
Depends on: 1587263
See Also: 1252409

Tom, you can now base your patch on patches in bug 1587258 and bug 1587263.

(In reply to Jan Varga [:janv] from comment #17)

Tom, you can now base your patch on patches in bug 1587258 and bug 1587263.

Ah, thanks!

Tom, do you plan on landing your patch before the merge or request an uplift in Beta 71? Thanks

Flags: needinfo?(ttung)

I'm working on a review for him, if Tom doesn't have anything else more important, he should be able to land the fix before the merge on Monday.

Attachment #9098837 - Attachment description: Bug 1576593 - Plan B UUID; → Bug 1576593 - P1 - Change the way for tunneling quota information to SQLite for indexedDB;
Attachment #9093570 - Attachment is obsolete: true

(In reply to Pascal Chevrel:pascalc from comment #19)

Tom, do you plan on landing your patch before the merge or request an uplift in Beta 71? Thanks

Hopefully, I can land this before the end of the next week. However, I don't want to uplift that to 71 immediately since it changes the usage of the mutex lock in QuotaManager. It should be safe, but I want to only uplift this after it stays in Nighty for a while without any related issue reported.

Flags: needinfo?(ttung)
Attachment #9101812 - Attachment description: Bug 1576593 - P1a - Move IntString to QuotaCommon to be used by IDB and DOM Cache; → Bug 1576593 - P1a - Move IntString to QuotaCommon and introduce IntCString to be used by IDB and DOM Cache;

I have one note/suggestion for structuring patches in this bug and similar bugs in future.

Here's what we have now:
P1 - Change the way for tunneling quota information to SQLite for indexedDB
(There are build failures after applying this patch and tests for DOM cache don't pass)
P1a - Move IntString to QuotaCommon and introduce IntCString to be used by IDB and DOM Cache
(There are no build failures after applying this patch, but tests for DOM cache don't pass)
P2 - Change for DOM Cache
Everything works

I would suggest:

Option 1 (If you don't have time for splitting patches or when reviewer is ok with that):
Just one patch since P1a and P2 don't add complexity to P1 and they are intended for the same reviewer as P1.

Option 2 (A good compromise):
P1 - Move IntString to QuotaCommon and introduce IntCString to be used by IDB and DOM Cache
P2 - Change the way for tunneling quota information to SQLite

Option 3 (If you really want to be fancy):
P1 - Move IntString to QuotaCommon and introduce IntCString to be used by IDB and DOM Cache
P2 - Add ids to directory locks
P3 - Add support for getting quota objects by directory lock id
P4 - Change the way for tunneling quota information to SQLite for IndexedDB
P5 - Change the way for tunneling quota information to SQLite for DOM Cache

Splitting patches makes sense especially when the changes are complex and it would be hard for reviewer to understand it or when multiple modules with different reviewers are involved. It's also nice when you are able to successfully build and run tests after applying each patch (this is not always possible).

These are just recommendations, it's not always possible to do it this way, but it's good to follow them when you can.

In general, patches should be split for the sake of reducing complexity in one patch.

(In reply to Jan Varga [:janv] from comment #28)

I have one note/suggestion for structuring patches in this bug and similar bugs in future.

Thanks for reminding me of this! I will do your suggestion Option 2 and try to apply this rule to other bugs in the future.

In the beginning, I wanted to do Option 3, but I was afraid it's not a good idea spitting patch during review, especially some of un-submitted comments might be lost due to file changing. So, I only split simple stuff out (P1a) and wanted to do the rest of the split after the initial review.

However, it seems that Option 2 is the simplest and best option at this moment so I would change the layout to this.

Attachment #9101812 - Attachment description: Bug 1576593 - P1a - Move IntString to QuotaCommon and introduce IntCString to be used by IDB and DOM Cache; → Bug 1576593 - P1 - Move IntString to QuotaCommon and introduce IntCString to be used by IDB and DOM Cache;
Attachment #9098837 - Attachment description: Bug 1576593 - P1 - Change the way for tunneling quota information to SQLite for indexedDB; → Bug 1576593 - P2 - Change the way for tunneling quota information to SQLite;
Attachment #9102081 - Attachment is obsolete: true

As a reminder to myself as much as anyone, mercurial has an "absorb" extension that makes working with patch stacks easier, and :gps documented it nicely at https://gregoryszorc.com/blog/2018/11/05/absorbing-commit-changes-in-mercurial-4.8/.

Filed bug 1591014 for adding a test for orphaned files. It should be fixed before this bug.

(In reply to Andrew Sutherland [:asuth] (he/him) from comment #30)

As a reminder to myself as much as anyone, mercurial has an "absorb" extension that makes working with patch stacks easier, and :gps documented it nicely at https://gregoryszorc.com/blog/2018/11/05/absorbing-commit-changes-in-mercurial-4.8/.

Thanks for the information! In this case, I used hg flod and it's quite simple in this situation (folding two patches together). I will try absorb when I have some complex situations :p

Pushed by ttung@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2cf7f9254640
P1 - Move IntString to QuotaCommon and introduce IntCString to be used by IDB and DOM Cache; r=janv
https://hg.mozilla.org/integration/autoland/rev/4980beb4ea2a
P2 - Change the way for tunneling quota information to SQLite; r=janv
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

In case this needs to be uplifted to beta, bug 1591394 should be uplifted as well.

The volume on beta 4 (first beta at 100% rollout) is too low to justify the risk of uplifting several patches to beta, marking 71 as wontfix.

Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.