<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 3

•

9 years ago

[Tracking Requested - why for this release]: regression corrupt data for large blobs I can reproduce on Windows10 64bit Firefox52.0.1 and Nightly55.0a1, but not on Firefox51.0.1.

Status: UNCONFIRMED → NEW

status-firefox52: --- → affected

status-firefox53: --- → affected

status-firefox54: --- → affected

status-firefox55: --- → affected

status-firefox-esr45: --- → unaffected

status-firefox-esr52: --- → affected

tracking-firefox-esr52: --- → ?

Ever confirmed: true

Keywords: regression

OS: Mac OS X → All

Hardware: x86_64 → All

Updated

•

9 years ago

Severity: normal → major

Comment 4

•

9 years ago

[Tracking Requested - why for this release]: [Tracking Requested - why for this release]: [Tracking Requested - why for this release]: [Tracking Requested - why for this release]: Regression window: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ca17ce6a2c9a3e906c9527c1e44c98185325cabe&tochange=c634201ba01d846403e692921a44038d2e55817a Regressed by: Bug 1202006

Blocks: 1202006

Severity: major → blocker

tracking-firefox52: --- → ?

tracking-firefox53: --- → ?

tracking-firefox54: --- → ?

tracking-firefox55: --- → ?

Flags: needinfo?(amarchesini)

Keywords: regressionwindow-wanted

Updated

•

9 years ago

Summary: XMLHttpRequest returning corrupt data for large blobs on macos-x → XMLHttpRequest returning corrupt data for large blobs

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Updated

•

9 years ago

Flags: needinfo?(keean)

Updated

•

9 years ago

Has Regression Range: --- → yes

Has STR: --- → yes

Component: DOM → DOM: Core & HTML

Flags: needinfo?(bugs)

Keywords: dataloss

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 5

•

9 years ago

I assume baku will take a look at this.

Flags: needinfo?(bugs)

Assignee

Updated

•

9 years ago

Assignee: nobody → amarchesini

Flags: needinfo?(amarchesini)

Assignee

Comment 6

•

9 years ago

I need extra information: 1. Does it happen also in non-e10s mode? 2. When the operation is completed, can you check if your have a temporary file called mozilla-temp-<something> ? It will be in your temporary directory. 3. if yes, does that file have the same content of the large_binary_file_200MB?

Flags: needinfo?(alice0775)

Comment hidden (obsolete)

Comment 8

•

9 years ago

(In reply to Andrea Marchesini [:baku] from comment #6) > I need extra information: > > 1. Does it happen also in non-e10s mode? Yes, I can reproduce the problem in non-e10s and e10s both. > 2. When the operation is completed, can you check if your have a temporary > file called mozilla-temp-<something> ? It will be in your temporary > directory. C:\Users\[userID]\AppData\Local\Temp\mozilla-temp-files\mozilla-temp-41 I got file after reload explorer. And the file will be deleted automatically after a while. > 3. if yes, does that file have the same content of the > large_binary_file_200MB? Not same.

Comment hidden (obsolete)

Updated

•

9 years ago

Attachment #8851042 - Attachment is obsolete: true

Assignee

Updated

•

9 years ago

Flags: needinfo?(kyle)

Comment 10

•

9 years ago

str testcase

Attached file modified reporter's testcase html — Details

Build Identifier: https://hg.mozilla.org/mozilla-central/rev/01d1dedf400d4be413b1a0d48090dca7acf29637 Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0 ID:20170324030205 Str: 1. Save the attached "modified reporter's testcase html" to local file system 2. Clear everything Menu > History > History – Clear Recent History… 3. Just to be sure, restart the browser 4. Open the downloaded html file 5. Wait a minutes (it is depended on net speed, this will download approx 50MB file of http://archive.mozilla.org/pub/firefox/nightly/2017/03/2017-03-01-03-02-02-mozilla-central/firefox-54.0a1.en-US.win32.zip ) --- observe results 6. Press F5 key to reload --- observe results 7. Repeat step6 if any Actual Results: In the first attempt, the hash result is incorrect. After reload, the result is random. Sometimes the correct result will be obtained. Expected Results: The results of hash should always be ed932fabd383a2132fdab7ff51edc8b8ed9d2857c0a73bd98f388a8a04815320 .

Comment 11

•

9 years ago

I'm building m-c for OS X and Windows now, will update when I verify repros. Thanks for the STR!

Comment 12

•

9 years ago

Ok. STR verified on macOS 10.12.3 running FF55 (almost constant failure), and Debian Stretch running FF55 (intermittent failures, maybe ~40% of the time). Only platform I could not repro on was Win64 with FF55, multiple runs of both locally compiled and distribution versions never returned an incorrect hash.

Flags: needinfo?(kyle)

Comment 13

•

9 years ago

That was win64 with a 64-bit FF build, btw.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 14

•

9 years ago

Ok, actually, strike that. After many retries on 64-bit ff on win64, it finally failed there too.

Comment 15

•

9 years ago

Tracking for 53 onwards. Release date for 53 is April 18. Marking wontfix for 52 though.

status-firefox52: affected → wontfix

tracking-firefox52: ? → -

tracking-firefox53: ? → +

tracking-firefox54: ? → +

tracking-firefox55: ? → +

Comment 16

•

9 years ago

Didn't make much progress on this today outside of repro'ing, doing some binary diffs to see where corruption happened (It usually seems to start < 10mb into the file, then there are some chunks afterward that are ok again but lots of corruption otherwise), and trying to figure out code paths. Debug seemed to show failures less often than release on macOS, but I'm not sure if this is related to timing or the file system or what yet.

Comment 17

•

9 years ago

Will never I/O threads run in out-of-order? I could not find a guarantee.

Andrew Sutherland [:asuth] (he/him)

Comment 18

•

9 years ago

nsStreamTransportService uses a thread pool, so multiple WriteRunnables can race, I think.

Comment 19

•

9 years ago

(In reply to Masatoshi Kimura [:emk] from comment #18) > nsStreamTransportService uses a thread pool, so multiple WriteRunnables can > race, I think. Yeah, that's definitely the case. This implementation treats it like it acts like AbstractThread's TaskQueue which provides the desired guarantees. Aside: It also looks like this logic bypasses Necko streams because of NS_OpenAnonymousTemporaryFile doing same-thread I/O when it probably wanted to create a new class nsTemporaryFileOutputStream to complement the existing nsTemporaryFileInputStream so it could use existing Necko logic that avoids problems like these. If this bug doesn't end up going that way and there's not a follow-up to clean that up already, it seems like there should be.

Updated

•

9 years ago

Flags: needinfo?(amarchesini)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 20

•

9 years ago

Attached patch xhr_bug.patch — Details — Splinter Review

Flags: needinfo?(amarchesini)

Attachment #8851492 - Flags: review?(bugmail)

Kohei Yoshino [:kohei]

Updated

•

9 years ago

Keywords: site-compat

Kohei Yoshino [:kohei]

Comment 21

•

9 years ago

Posted the site compatibility note: https://www.fxsitecompat.com/en-CA/docs/2017/xmlhttprequest-may-return-corrupt-data-for-large-blobs/

Keywords: dev-doc-complete

Comment 22

•

9 years ago

Comment on attachment 8851492 [details] [diff] [review] xhr_bug.patch Review of attachment 8851492 [details] [diff] [review]: ----------------------------------------------------------------- I think you also need to invoke BeginShutdown() if you want TaskQueue's destructor to not assert. Please make sure you've run a test for this under a debug build before landing! Relatedly, TaskQueue::Dispatch as used will do a MOZ_DIAGNOSTIC_ASSERT in even of dispatch failure. This is nice for ensuring dispatch, but unless you're sure this logic will never run/race the stream transport service's shutdown, you might want to provide the DontAssertDispatchSuccess parameter. (STS shuts down at xpcom-shutdown-threads it looks like. Not sure what constraints this code is operating undre.)

Attachment #8851492 - Flags: review?(bugmail) → review+

Pulsebot

Comment 23

•

9 years ago

Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/40d99f377bdd MutableBlobStorage must use a TaskQueue in order to preserve the order of runnables for I/O ops, r=asuth

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 24

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/40d99f377bdd

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox55: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Comment 25

•

9 years ago

Can we land an automated test for this?

Flags: needinfo?(amarchesini)

Flags: in-testsuite?

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 26

•

9 years ago

We have a test for MutableBlobStorage but I don't know if we can force a random scheduling of runnables in the thread pool. I can write a test, but I don't know if that would cover this issue. Andrew, do you have a better approach here?

Flags: needinfo?(amarchesini) → needinfo?(bugmail)

Comment 27

•

9 years ago

One possibility is creating an "adversarial" implementation of nsIStreamTransportService/nsIThreadPool/nsIEventTarget that temporarily replaces itself as the factory for the "@mozilla.org/network/stream-transport-service;1" contract. There's some existing testing code along these lines for different contracts. See http://searchfox.org/mozilla-central/source/browser/components/preferences/in-content/tests/browser_advanced_update.js#30 and http://searchfox.org/mozilla-central/source/testing/xpcshell/head.js#243 for examples of implementations that seem to work (but disclaimer, see later). This isn't as crazy an idea as it seems because implementations like nsInputStreamPump access the service via CID not ContractID so we're less likely to break everything by introducing this behavior. However, this means the above examples' idiom of unregisterFactory/registerFactory should be changed to not call unregisterFactory since it operates on the CID. Instead, the first registerFactory for the fake implementation clobbers the ContractID but leaves the original CID intact. Then when it's time to restore things, a second registerFactory call with the ContractID and original CID but a null factory will cause the component manager to re-establish the original mapping. The difficult bit is ensuring that there are at least 2 write runnables and that we have a chance to shuffle them, especially because implementations like nsInputStreamPump are very clever about consolidating events. The good news is that dom/xhr/tests/temporaryFileBlob.sjs just uses setTimeout for a timeline of: @0s: send 256 bytes, @1s: send the rest, @2s: close the connection. That definitely should net at least 2 OnDataAvailable invocations. So have the adversarial STS have a "reverseEventsDispatchAndBecomePassthrough()" method, do a setTimeout(1250), invoke that, unregister the fake STS, and then verify the XHR completes and the data is right. (Also try running the test in a tree without this bug's fix and ensure that it fails.)

Flags: needinfo?(bugmail)

Comment 28

•

9 years ago

Can this ride the trains or should we be tracking this for possible backport to affected branches?

Flags: needinfo?(amarchesini)

Comment 29

•

9 years ago

We should uplift given the severity of the problem (random data corruption). If we do not uplift, we should at least disable file-backed blobs on branches.

Liz Henry (:lizzard) (relman/hg->git project)

Assignee

Comment 30

•

9 years ago

Comment on attachment 8851492 [details] [diff] [review] xhr_bug.patch Approval Request Comment [Feature/Bug causing the regression]: MutableBlobStorage implementation [User impact if declined]: corrupted blob data [Is this code covered by automated tests?]: not yet. [Has the fix been verified in Nightly?]: not yet. [Needs manual test from QE? If yes, steps to reproduce]: Yes. follow the description of the bug: create a blob starting from a big resource in XHR. [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: no [Why is the change risky/not risky?]: We just introduce a TaskQueue when MutableBlobStorage dispatches runnable. [String changes made/needed]: none

Flags: needinfo?(amarchesini)

Attachment #8851492 - Flags: approval-mozilla-beta?

Attachment #8851492 - Flags: approval-mozilla-aurora?

Comment 31

•

9 years ago

Comment on attachment 8851492 [details] [diff] [review] xhr_bug.patch Avoid data corruption, may be risky, let's try this on beta 10. qdot, would you mind testing once this lands on beta later this week, since you already had a look for nightly?

Flags: needinfo?(kyle)

Attachment #8851492 - Flags: approval-mozilla-beta?

Attachment #8851492 - Flags: approval-mozilla-beta+

Attachment #8851492 - Flags: approval-mozilla-aurora?

Attachment #8851492 - Flags: approval-mozilla-aurora+

https://hg.mozilla.org/releases/mozilla-aurora/rev/633c460deb02

Comment 32

•

9 years ago

bugherder uplift

status-firefox54: affected → fixed

https://hg.mozilla.org/releases/mozilla-beta/rev/6c25e0308c34

Comment 33

•

9 years ago

bugherder uplift

status-firefox53: affected → fixed

Comment 34

•

9 years ago

I'll pull beta now and get it built, will try to test by this evening. That said, we have a manual testcase with STR in Comment 10 if you want to add that to a set also.

Assignee

Updated

•

9 years ago

Blocks: 1353629

Comment 35

•

9 years ago

Seems to be working fine for me on beta now. No errors on any platform.

Flags: needinfo?(kyle)

Comment 36

•

9 years ago

Excellent! baku, could you request an uplift or disable file-backed blobs on ESR 52?

Flags: needinfo?(amarchesini)

Andrei Vaida [:avaida]

Comment 37

•

9 years ago

Flagging this for manual testing, str and testcase in Comment 10.

Flags: qe-verify+

Bogdan Maris, Desktop Test Engineering

Assignee

Comment 38

•

9 years ago

Comment on attachment 8851492 [details] [diff] [review] xhr_bug.patch [Approval Request Comment] User impact if declined: corrupted Blob can be returned to content Risk to taking this patch (and alternatives if risky): none. The issue here was that the runnables were scheduled wrongly. We need to use a TaskQueue. String or UUID changes made by this patch: none.

Flags: needinfo?(amarchesini)

Attachment #8851492 - Flags: approval-mozilla-esr52?

Comment 39

•

9 years ago

I am unable to reproduce the initial issue on Windows 10 using 52.0.2 build. I maybe missing something but after following the steps from comment 10, once I reach step 5 and load the testcase nothing happens. I don't see anything in particular different in browser console as well (comparing 52.0.2 and 53beta9). Should I pay attention to something in particular here in order to differentiate the bad build from the fixed one?

Flags: needinfo?(alice0775)

Comment 40

•

9 years ago

It will take a bit (the test is downloading a 50MB file in the background), but a dialog should pop up with the MD5 sum of the downloaded file.

Julien Cristau [:jcristau]

Comment 41

•

9 years ago

(In reply to Bogdan Maris, QA [:bogdan_maris] from comment #39) > I am unable to reproduce the initial issue on Windows 10 using 52.0.2 build. > I maybe missing something but after following the steps from comment 10, > once I reach step 5 and load the testcase nothing happens. I don't see > anything in particular different in browser console as well (comparing > 52.0.2 and 53beta9). > > Should I pay attention to something in particular here in order to > differentiate the bad build from the fixed one? I can reproduce the problem on Firefox52.0.2 32bit/Windows10 HOME 64bit.

Flags: needinfo?(alice0775)

Comment 42

•

9 years ago

Comment on attachment 8851492 [details] [diff] [review] xhr_bug.patch fix a regression with large blobs on esr52

Attachment #8851492 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Julien Cristau [:jcristau]

Updated

•

9 years ago

tracking-firefox-esr52: ? → 53+

https://hg.mozilla.org/releases/mozilla-esr52/rev/864ff0c36b6b

Comment 43

•

9 years ago

bugherder uplift

status-firefox-esr52: affected → fixed

Iulia Cristescu, QA [:JuliaC] (away, please needinfo? cornel.ionce@softvision.ro)

Comment 44

•

9 years ago

(In reply to Alice0775 White from comment #41) > (In reply to Bogdan Maris, QA [:bogdan_maris] from comment #39) > > I am unable to reproduce the initial issue on Windows 10 using 52.0.2 build. > > I maybe missing something but after following the steps from comment 10, > > once I reach step 5 and load the testcase nothing happens. I don't see > > anything in particular different in browser console as well (comparing > > 52.0.2 and 53beta9). > > > > Should I pay attention to something in particular here in order to > > differentiate the bad build from the fixed one? > > I can reproduce the problem on Firefox52.0.2 32bit/Windows10 HOME 64bit. Hello, Alice! I didn't manage to reproduce the issue on Windows 10 x64, using 52.0.2 (20170323105023) (x86), following the steps from comment 10. The behavior is the same like in 55.0a1 (2017-04-11). As Bogdan already mentioned, there is no visible result after step5. More than that, the Web Console result is the same in the above mentioned builds (the affected and the fixed one). Could you please mention what we should particularly focus on in order to reproduce/verify this issue?

Flags: needinfo?(alice0775)

Iulia Cristescu, QA [:JuliaC] (away, please needinfo? cornel.ionce@softvision.ro)

Comment 45

•

9 years ago

You should wait for 3-4 minutes(ADSL 9bps) at step 5 and 6, then an alert box pops up. I am not sure, but it may need low performance PC(Core2Quad@2.5GHz, Mem8GB, HDD1TB, ADSL 9bps)

Flags: needinfo?(alice0775)

Comment 46

•

9 years ago

(In reply to Alice0775 White from comment #45) > You should wait for 3-4 minutes(ADSL 9bps) at step 5 and 6, then an alert > box pops up. > > I am not sure, but it may need low performance PC(Core2Quad@2.5GHz, Mem8GB, > HDD1TB, ADSL 9bps) Couldn't reproduce the issue, even using Netlimiter. I encountered the same behavior from comment 39 and comment 44. Alice, do you think you can try verifying the fix? Thank you!

Flags: needinfo?(alice0775)