Closed Bug 1516333 Opened 9 months ago Closed 4 months ago

Crash in mozilla::dom::quota::QuotaObject::LockedMaybeUpdateSize

Categories

(Core :: DOM: Web Storage, defect, P1, critical)

66 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- unaffected
firefox66 --- unaffected
firefox67 --- disabled
firefox67.0.1 --- disabled
firefox68 + fixed
firefox69 --- fixed

People

(Reporter: philipp, Assigned: edenchuang, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is
report bp-6f50b8ea-32ee-4dc5-9d60-4b4f20181224.
=============================================================

Top 10 frames of crashing thread:

0 XUL mozilla::dom::quota::QuotaObject::LockedMaybeUpdateSize dom/quota/ActorsParent.cpp:2602
1 XUL mozilla::dom:: dom/quota/ActorsParent.cpp:2561
2 XUL mozilla::dom::PBackgroundLSDatabaseParent::OnMessageReceived ipc/ipdl/PBackgroundLSDatabaseParent.cpp:269
3 XUL mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2129
4 XUL mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1935
5 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1157
6 XUL NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:468
7 XUL mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:303
8 XUL nsThread::ThreadFunc ipc/chromium/src/base/message_loop.cc:314
9 libnss3.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:201

=============================================================

these crashes are increasing on nightly cross-platform after the next gen localstorage pref got enabled.
Priority: -- → P2
(In reply to [:philipp] from comment #0)
> these crashes are increasing on nightly cross-platform after the next gen
> localstorage pref got enabled.

FWIW due to bug 1516136, it's disabled again.

However, we should figure this out so I'll needinfo Jan so he sees it.
Flags: needinfo?(jvarga)

Small spike in this signature starting on 3-8, and in the last four nightlies there are crashes, but it appears only 1-2 installs.

(In reply to Andrew Overholt [:overholt] from comment #1)

(In reply to [:philipp] from comment #0)

these crashes are increasing on nightly cross-platform after the next gen
localstorage pref got enabled.

FWIW due to bug 1516136, it's disabled again.

That feature is re-enabled again, via bug 1517090, which is probably why the signature started appearing again per comment 2.

I personally can reproduce this 100% of the time, in Nightly on my Linux desktop machine at work, when viewing my own personal [my-own-subdomain].mozilla-iot.org page, to manage my home IOT devices remotely. Doesn't reproduce in a fresh profile, or on a different linux machine, so there may be something specific to my profile folder on this machine.

Here are my 3 crash reports:
bp-2f15d168-8e48-46ac-a1e4-344cd0190311
bp-1fcb2dd6-f4f0-40d8-a0fe-69a380190311
bp-2582b63b-d655-4654-919c-c43ce0190311

I think we get a null quota object here:
https://searchfox.org/mozilla-central/rev/8ff2cd0a27e3764d9540abdc5a66b2fb1e4e9644/dom/localstorage/ActorsParent.cpp#6191
The question is why.

I'll be working on this.

Daniel provided excellent debugging help and now we know that this bug is the same issue as in bug 1422815.

Flags: needinfo?(jvarga) → needinfo?(dholbert)
Flags: needinfo?(dholbert)

Thanks for helping me work through it!

Just for archival purposes or in case we forget: my build is taking this return nullptr statement:
https://searchfox.org/mozilla-central/rev/8ff2cd0a27e3764d9540abdc5a66b2fb1e4e9644/dom/quota/ActorsParent.cpp#3384-3386

janv pointed me at the path <profile>/storage/default/your_origin (where "your_origin" is the URL of my mozilla IOT subdomain). There's a file called .metadata-v2 in that folder, and the issue goes away if I delete that file. (It gets recreated with an adjustment that makes the issue not happen.)

The problematic version of that file is lacking the subdomain -- it starts out with some UTF junk, followed by "mozilla-iot.org". Whereas the fixed/regenerated version of that file has my subdomain in its version of that string.

Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1422815

Actually, we need to land a small patch that doesn't solve the problem with changed eTLD+1, but it avoids the crash at least.

Assignee: nobody → jvarga
Status: RESOLVED → REOPENED
Priority: P2 → P1
Resolution: DUPLICATE → ---
Depends on: 1535995

Fixed by bug 1535995.

Status: REOPENED → RESOLVED
Closed: 7 months ago6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Reflecting status-firefox67 from bug 1535995 here

It seems there are other issues.

Remaining issues may have something to do with bug 1536596.

Blocks: 1540402, 1539835
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No longer blocks: 1517090
Blocks: 1517090

Eden, can you take this low volume crasher? Jan can fill you in on the details.

Flags: needinfo?(echuang)
Assignee: jvarga → echuang
Status: REOPENED → ASSIGNED
Flags: needinfo?(echuang)

I am working on this now. Just sync the architecture with Tom.

any updates on this issue? (the signature is currently accounting for ~2% of browser crashes in 68.0b)

Flags: needinfo?(echuang)

Eden told me he thinks some special characters are being used in the origin string and we're not sanitizing it properly (or something like that). We're trying to figure out how to get that data.

Flags: needinfo?(echuang)
Attached file test'.html

Reproduce the same signature by the attached file
https://crash-stats.mozilla.org/report/index/bf7733b8-33ea-4bc8-b825-6b7330190529

Reproduce steps:
Open the test'.html file directly.

Root cause:
Using different strings when putting and getting groupInfoPairs when the URI includes special characters.

When PrepareDatastoreOp calls QuotaManager::EnsureQuotaForOrigin() to create needed groupInfoPairs, groupInfo and originInfo in PrepareDatastoreOp::DatabaseWork(). QuotaManager::LockedGetOrCreateGroupInfo() uses the passed in group/origin string. https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/quota/ActorsParent.cpp#6343

In the case, the used group string is
file:///testidr/test'.html

However, when PrepareDatastoreOp tries to call QuotaManager::GetQuotaObject() to create datastore in PrepareDatastoreOp::GetResponse(), QuotaManager::GetQuotaObject() escapes the passed in group/origin string to get groupInfoPair. https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/quota/ActorsParent.cpp#3746.

In the case, the used group string is
file:///testdir/test%27.html

Although the bug 1111842 mentioned that it is related to the URI string passed back from SQlite, I am still not sure why we escape the string in GetQuatoObject() but not when SQlite passing back.

Flags: needinfo?(jvarga)
See Also: → 1414737

Yeah, I'll be investigating this following days. Thanks for all the gathered information.

The LocalStorage unit test for file URI with special charactor '

We can't just remove the escaping in GetQuotaObject because it's needed for IndexedDB with URIs like:
file++++test>.html

The problem is that SQLite replaces escaped characters with original characters, so we added the re-escaping to GetQuotaObject.
However, it turns out it doesn't work well in all cases. We need to fix that.

Flags: needinfo?(jvarga)

IndexedDB currently passes the group and origin string to SQLite at this line:
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/indexedDB/ActorsParent.cpp#3902
That is then used to get a quota object for a sqlite file at this line:
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/storage/TelemetryVFS.cpp#296

The problem is that SQLite resolves escaped group and origin strings and we end up with a mismatch, see "3.2. Query String" section on this page:
https://www.sqlite.org/uri.html

We tried to fix this in the past by re-escaping these strings, but obviously it doesn't work well and I think it would be a night mare to get it right.

One option that would avoid these problems is to have a build option for SQLite that would keep query strings escaped.
Then we would be able to get rid of this block:
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/quota/ActorsParent.cpp#3732

Andrew, what do you think ?

Flags: needinfo?(bugmail)

We can fix LocalStorage by adding a new argument to GetQuotaObject() to avoid the re-escaping, but IndexedDB (DOM Cache in theory too) would still have this problem. I propose to add a new argument to GetQuotaObject to mitigate this crash on beta.
Nightly can later get a better fix that would probably depend on a new SQLite version.

It sounds like we're doing the escaping backwards. We should be escaping every URL component on the way into SQLite. We should never be attempting to re-escape decoded values we access via sqlite3_uri_parameter.

I understand Eden's scenario from comment 17 and the issue to be well expressed from this JS console example:

> decodeURIComponent(encodeURIComponent("%27"))
"%27"
> encodeURIComponent(decodeURIComponent("%27"))
"'"

Encode-then-decode (right, not what we're doing), escapes the "%" into "%25" which will decode back into "%".
Decode-then-encode (wrong, what we're doing), decodes "%27" into a single quote which doesn't need to get encoded back into "%27" thanks to the rules of encoding, and so the transform is lossy.

I think the code at https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/indexedDB/ActorsParent.cpp#3901 that currently looks like:

  rv = NS_MutateURI(mutator)
           .SetQuery(NS_LITERAL_CSTRING("persistenceType=") + type +
                     NS_LITERAL_CSTRING("&group=") + aGroup +
                     NS_LITERAL_CSTRING("&origin=") + aOrigin +
                     NS_LITERAL_CSTRING("&cache=private") +
                     telemetryFilenameClause)
           .Finalize(fileUrl);

instead wants to look like:

  rv = NS_MutateURI(mutator)
           .SetQuery(NS_LITERAL_CSTRING("persistenceType=") + NS_EscapeURL(type, esc_Param) +
                     NS_LITERAL_CSTRING("&group=") + NS_EscapeURL(aGroup, esc_Param) +
                     NS_LITERAL_CSTRING("&origin=") + NS_EscapeURL(aOrigin, esc_Param) +
                     NS_LITERAL_CSTRING("&cache=private") +
                     telemetryFilenameClause)
           .Finalize(fileUrl);

And we want to remove the escaping at https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/quota/ActorsParent.cpp#3732

That said, we also could just stop trying to tunnel the information as strings through the URI and instead could tunnel any data-structure we want through thread-local-storage. But since I think we're pretty close already on the escaping, it might make sense to just correct our behavior.

Flags: needinfo?(bugmail)

Ok, that works for uri "file:///test'.html", origin directory on disk is "file++++test'.html", group (in TelemetryVFS.cpp) is "/test'.html" and origin is "file:///test'.html"
It doesn't work for uri "file:///test>.html", origin directory on disk is "file++++test%3E.html", group is "/test>.html" and origin is "file:///test>.html".

We would have to escape this stuff in QM too and much sooner. Anyway, I'm not very excited about complicating all the stuff just because SQLite resolves escaped strings. If we do escaping sooner, we can also end up with different origin directories for specific URIs and I don't think we want to implement migration for that.

So long term, I would rather do it using thread-local-storage or ask SQLite devs to not resolve query parameters.

I think SQLite is doing the only sane thing possible here for URI's. We chose a mechanism to tunnel strings through a mechanism that has explicit rules that potentially are lossy if we don't properly encode on the way in. I don't think we can blame SQLite for that, although I understand your frustration. I've been burnt by issues like this many times in webdev and I'm not expecting to have a fun time if I get to review these patches or do any of the investigation myself ;)

It doesn't work for uri "file:///test>.html", origin directory on disk is "file++++test%3E.html", group is "/test>.html" and origin is "file:///test>.html".

This might be nsStandardURL at work doing the ">" to %3E at https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/indexedDB/ActorsParent.cpp#3883. For example,

> new URL("file:///test>.html")
URL { href: "file:///test%3E.html", origin: "null", protocol: "file:", username: "", password: "", host: "", hostname: "", port: "", pathname: "/test%3E.html", search: "" }

It might be worth seeing with the debugger what is happening at each stage of the URL transformation, as there is the issue that the URL may do things during initial parsing/etc. It certainly seemed that BuildNormalizedSpec at https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/netwerk/base/nsStandardURL.cpp#661 would potentially encode everything itself.

Maybe it would be easier to try the thread-local-storage data-passing thing right off the bat...

Yeah, I agee. Here's my proposal:

  1. Move the re-escaping from QuotaManager::GetQuotaObject to GetQuotaObjectFromNameAndParameters in TelemetryVFS.cpp
    This fixes LSNG since it doesn't use TelemetryVFS.cpp for quota tracking. This can be done rather quickly and it would be also quite safe for a beta uplift (which is necessary for shipping LSNG in 68).

  2. Implement thread-local-storage
    This will need a bit more time and it would be intended only for 69 for now.

Eden, can I take this bug ?
I now have more time and we need to fix this for LSNG by the end of this week.

Flags: needinfo?(echuang)

(In reply to Jan Varga [:janv] from comment #26)

Yeah, I agee. Here's my proposal:

  1. Move the re-escaping from QuotaManager::GetQuotaObject to GetQuotaObjectFromNameAndParameters in TelemetryVFS.cpp
    This fixes LSNG since it doesn't use TelemetryVFS.cpp for quota tracking. This can be done rather quickly and it would be also quite safe for a beta uplift (which is necessary for shipping LSNG in 68).

  2. Implement thread-local-storage
    This will need a bit more time and it would be intended only for 69 for now.

For a short-term solution, I guess we probably can move the re-escaping to the proper position.

I have already done 1. and running try now.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4040f0ac693f2a6571a81f5f3836135bd7599ef2&selectedJob=250094999

I searched QuotaManager::GetQuotaObject() on searchfox, and it is used in followings
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/cache/FileUtils.cpp#333
https://searchfox.org/mozilla-central/rev/f8b11433159cbc9cc80500b3e579d767473fa539/dom/localstorage/ActorsParent.cpp#7350
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/storage/TelemetryVFS.cpp#324

to make sure the behavior is the same as before moving re-escaping, we should also add re-escaping at
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/cache/FileUtils.cpp#333

Eden, can I take this bug ?
I now have more time and we need to fix this for LSNG by the end of this week.

sure. I uploaded the patches.

Flags: needinfo?(echuang)

(In reply to Eden Chuang[:edenchuang] from comment #28)

I searched QuotaManager::GetQuotaObject() on searchfox, and it is used in followings
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/cache/FileUtils.cpp#333
https://searchfox.org/mozilla-central/rev/f8b11433159cbc9cc80500b3e579d767473fa539/dom/localstorage/ActorsParent.cpp#7350
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/storage/TelemetryVFS.cpp#324

There's also https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/quota/FileStreams.cpp#46
but that one doesn't need re-escaping, it doesn't go through SQLite VFS.

to make sure the behavior is the same as before moving re-escaping, we should also add re-escaping at
https://searchfox.org/mozilla-central/rev/9eb2f739c165b4e294929f7b99fbb4d90f8a396b/dom/cache/FileUtils.cpp#333

This also doesn't need re-escaping, the padding doesn't use SQLite IIRC, you can check with Tom since he implemented the padding in DOM Cache.

Attachment #9069903 - Attachment description: Bug 1516333 - Moving group/string re-escaping from QuotaManager::GetQuotaObject → Move group and origin re-escaping from QuotaManager::GetQuotaObject to GetQuotaObjectFromNameAndParameters in TelemetryVFS.cpp
Attachment #9069619 - Attachment description: Bug 1516333 - LS unit test for file URI with special charactor ' → Bug 1516333 - LS unit test for uri encoding edge cases
Keywords: checkin-needed

I'm going to land it.

Keywords: checkin-needed
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d413f4867598
Part 1: Move group and origin re-escaping from QuotaManager::GetQuotaObject to GetQuotaObjectFromNameAndParameters in TelemetryVFS.cpp; r=janv,asuth
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/837a81caab26
Part 2: Add LS unit test for uri encoding edge cases; r=janv
Status: ASSIGNED → RESOLVED
Closed: 6 months ago4 months ago
Resolution: --- → FIXED

Comment on attachment 9069619 [details]
Bug 1516333 - LS unit test for uri encoding edge cases

Beta/Release Uplift Approval Request

  • User impact if declined: Crash Firefox when using localstorage with special file uris.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): low risky, the patch just moves the string re-escaping into the only use case.
  • String changes made/needed: no
Attachment #9069619 - Flags: approval-mozilla-beta?
Attachment #9069903 - Flags: approval-mozilla-beta?

Comment on attachment 9069619 [details]
Bug 1516333 - LS unit test for uri encoding edge cases

lsng crash fix, approved for 68.0b9

Attachment #9069619 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9069903 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

This bug may have contributed to a sudden change in the Telemetry probe TELEMETRY_PENDING_LOAD_FAILURE_READ1 which seems to have occurred in Nightly 2019-06-072.

There was a sudden order-of-magnitude decrease in the number of pings we no longer failed when reading from disk.
This might mean that the encoding strangeness that caused this bug was also causing Telemetry to be unable to read pings from disk for some reason? Maybe there are Telemetry probes that reported the strings for the group or origin which are now encoded?

This seems to be the only likely change from that build, but I could be wrong. Either way, it's a good thing.

Flags: needinfo?(echuang)
You need to log in before you can comment on or make changes to this bug.