Closed Bug 1780054 Opened 2 years ago Closed 2 years ago

URL.createObjectURL() often creates URLs that point to a file with missing blobs

Categories

(Core :: DOM: Networking, defect, P2)

Firefox 102
defect

Tracking

()

VERIFIED FIXED
105 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- verified
firefox102 --- wontfix
firefox103 --- wontfix
firefox104 --- verified
firefox105 --- verified

People

(Reporter: onurtag, Assigned: nika)

References

(Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged], [wptsync upstream])

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0

Steps to reproduce:

The bug was found using the Multithreaded Download Manager addon so (https://addons.mozilla.org/en-US/firefox/addon/multithreaded-download-manager/)
You can reproduce the bug with this method: https://github.com/jingyu9575/multithreaded-download-manager/issues/179#issuecomment-1175821990
or you can download 10 identical URLs using the addon with 4+ threads (to create more blobs)

Actual results:

3 out of 10 URLs created point to a corrupt file with missing blobs.
The results are random. More than 3 files can be corrupt or all 10 files can be complete.

Expected results:

All created URLs should point to the same complete file.

More information at the addon's github issue: https://github.com/jingyu9575/multithreaded-download-manager/issues/179
The issue is reproducible on Firefox 102.0.1 and Nightly 104.0a1 (2022-07-18) (Win10 21H2). Also reproducible on Arch Linux & Linux Mint according to some comments on the github issue.

Mozregression result: https://user-images.githubusercontent.com/7430003/177225481-089326ac-567e-47a0-856b-e5925e9ff318.png
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=69fb0f363c1889e054c8aac505a63875e6ce3099&tochange=3f4ebef6a4b1b4fb858d9d8eba17a4ccae268d1a

I should also mention that I can't reproduce the bug using this seperate test addon: https://github.com/Onurtag/bad-object-url

The Bugbug bot thinks this bug should belong to the 'Core::DOM: Networking' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → DOM: Networking
Product: Firefox → Core
Flags: needinfo?(nika)
Regressed by: 1754004

Set release status flags based on info from the regressing bug 1754004

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true

New test addon that can be used to replicate the bug (Thanks mix5003):
https://github.com/mix5003/bad-object-url
This addon uses indexedDB to store the file's blobs (like the Multithreaded Download Manager addon) and unlike my fork of the test addon above it can be used to replicate the bug.

Many thanks to both onurtag and mix5003 for the invaluable reproduction case and to onurtag for the regression identification in comment 1!

Based on the changes made to make the reproduction work, it appears like this may be an edge case related to IndexedDB file-backed Blobs when used in composite/multipart blobs. I'm going to quickly try and adapt the scenario's IDB reproduction into a WPT test that also validates FileReader and Blob.prototype.text() methods of reading the composite blob (which is the easy part compared to getting this to reproduce!) and then capture that under the pernosco debugger.

Assignee: nobody → bugmail
Status: NEW → ASSIGNED

It took some playing with the constants to get pernosco to reproduce this, but here's the trace. This corresponds exactly to the state of the WIP test which I ran with ./mach wpt --headless --no-pause-after-test --debugger=rr --debugger-args="record -h --disable-avx-512" /IndexedDB/blob-composite-blob-reads.any.html to produce this. Note that it seems like fetch/URL.createObjectURL is necessary and FileReader or the direct arrayBuffer() helper are not sufficient. But that makes sense because those will both do much simpler things to stream the Blob which also doesn't get sent over IPC, although I guess that could be added if we involve a shipped MessagePort.
https://pernos.co/debug/QJ8oNjumKn8yfhEWi7hYnA/index.html

Flags: needinfo?(nika)

(needinfo to draw attention to the pernosco trace above)

Flags: needinfo?(nika)

Set release status flags based on info from the regressing bug 1754004

Thanks a ton for this pernosco trace! Thanks to that (and quite a bit of time of trying to parse through the trace) I'm pretty sure I know what's happening here.

When serializing the stream from the content to parent process to register it in the BlobURL store the stream is serialized as a nsMultiplexInputStream consisting of 512 distinct sub-streams. Because each of these sub-streams comes from IDB, they're all RemoteLazyInputStreams. Due to the large-attachment-count, the serialization code decides to serialize a single DataPipe to contain all of the relevant data from all of these streams, starting an AsyncCopy from the nsMultiplexInputStream into the pipe. This copy ends up missing segments from 2 of the nested input streams: #385 and #469 in this case.

This happens due to a race condition and bug in nsMultiplexInputStream, where it assumes that a OnInputStreamReady notification must come from the currently selected stream, so if the passed-in stream is empty, it unconditionally advances to the next stream: https://searchfox.org/mozilla-central/rev/23bf1890e07f780ba70e075bc8f46ffb85d1128c/xpcom/io/nsMultiplexInputStream.cpp#906-908,915. What happens here is that the nsAStreamCopier is woken up by the output stream becoming available at the same time as the input stream #468 is trying to notify that it is done. We enter Read, try to read data from segment #468, and find that it is completed. Because of that we advance to segment #469, which reports that it doesn't have data available yet, returning NS_BASE_STREAM_WOULD_BLOCK. Immediately after that, the notification from input stream #468 arrives, and it is again detected that it is empty, meaning that the code attempts to advance to the next segment (#470), skipping #469 completely.

nsMultiplexInputStream likely needs to add some checks to OnInputStreamReady which makes sure that aStream is actually the currently selected stream, and discards the notification if it is not, though there's some complexity around that due to the awkward way that nsMultiplexInputStream is implemented right now, with its support for a mixture of both blocking and non-blocking streams.

Assignee: bugmail → nika
Flags: needinfo?(nika)

Previously, we could end up in situations where nsMultiplexInputStream
would receive a callback for a previously notified stream after it had
already advanced to the next stream due to the async nature of
notifications. This could cause us to skip a stream accidentally. This
changes the logic to re-call AsyncWait in that situation on the correct
stream rather than skipping streams.

Attachment #9287181 - Attachment description: WIP: Bug 1780054 - Add an IndexedDB composite blob stress test. → Bug 1780054 - Add an IndexedDB composite blob stress test. r=nika!
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]
Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/4179c610cd95
Add an IndexedDB composite blob stress test. r=nika
https://hg.mozilla.org/integration/autoland/rev/79b351c9c742
Part 2: Double-check notified stream in nsMultiplexInputStream, r=asuth
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/35264 for changes under testing/web-platform/tests
Whiteboard: [necko-triaged] → [necko-triaged], [wptsync upstream]
Regressions: 1782079
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
Upstream PR merged by moz-wptsync-bot

base on my test and original github issue. i can confirmed that bug has been fixed.

should this patch merge into beta and esr102 too?

Flags: needinfo?(nika)
Regressions: 1782181

We should probably consider upstreaming it, though it appears that there's a memory leak regression we should perhaps figure out first.

Flags: needinfo?(nika)

The patch landed in nightly and beta is affected.
:nika, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.Also, don't forget to request an uplift for the patches in the regressions caused by this fix.
  • If no, please set status-firefox104 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(nika)

Comment on attachment 9287315 [details]
Bug 1780054 - Part 2: Double-check notified stream in nsMultiplexInputStream, r=asuth

Beta/Release Uplift Approval Request

  • User impact if declined: Large blobs can intermittently become corrupted when being combined and transferred between processes.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: see comment 0
  • List of other uplifts needed: Bug 1782181
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This changes some details about the way that streams are handled for nsMultiplexInputStream, which is a core stream type and used in many different components. We have already encountered one memory leak issue (bug 1782181), which was difficult to diagnose, and we hope is fixed (though cannot be certain due to the intermittent nature).
  • String changes made/needed: None
  • Is Android affected?: Yes

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Broken behaviour for Blobs in some cases, leading to data corruption.
  • User impact if declined: Large blobs can intermittently become corrupted when being combined and transferred between processes.
  • Fix Landed on Version: 105
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This changes some details about the way that streams are handled for nsMultiplexInputStream, which is a core stream type and used in many different components. We have already encountered one memory leak issue (bug 1782181), which was difficult to diagnose, and we hope is fixed (though cannot be certain due to the intermittent nature).
Flags: needinfo?(nika)
Attachment #9287315 - Flags: approval-mozilla-esr102?
Attachment #9287315 - Flags: approval-mozilla-beta?
Attachment #9287181 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9287181 [details]
Bug 1780054 - Add an IndexedDB composite blob stress test. r=nika!

Approved for 104.0b6

Attachment #9287181 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9287315 [details]
Bug 1780054 - Part 2: Double-check notified stream in nsMultiplexInputStream, r=asuth

Approved for 104.0b6

Attachment #9287315 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

Reproduced the initial issue using an old Nightly from 2022-07-18 on MacOS and the method described in https://github.com/jingyu9575/multithreaded-download-manager/issues/179#issuecomment-1175821990. Verified that using latest Nightly 105.0a1 and Firefox 104.0b6 across platforms (Windows 10 64bit, macOS 11.6 and Ubuntu 22.04) this looks as fixed.

Comment on attachment 9287315 [details]
Bug 1780054 - Part 2: Double-check notified stream in nsMultiplexInputStream, r=asuth

Approved for 102.2esr.

Attachment #9287315 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
Attachment #9287181 - Flags: approval-mozilla-esr102+

Also verified as fixed using latest esr102 build from treeherder across platforms (Windows 10 64bit, macOS 11.6 and Ubuntu 18.04).

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: