Closed Bug 1303999 Opened 3 years ago Closed 3 years ago

Crash in mozilla::dom::Blob::ToFile when uploading many small files

Categories

(Core :: DOM: Core & HTML, defect, critical)

46 Branch
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox49 --- wontfix
firefox50 - wontfix
firefox51 - wontfix
firefox52 + wontfix
firefox53 --- affected
firefox54 --- affected

People

(Reporter: santiago.garcia, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0
Build ID: 20160823121617

Steps to reproduce:

Upload many (~1,000) small (1 KB) files. 


Actual results:

Firefox crashes. Not every time, but more often than not.

The crashdump is available at https://crash-stats.mozilla.com/report/index/3983bfec-47d8-44fd-b780-90b902160920

I have created a small project at https://github.com/sgarcialaguna-mms/FirefoxUploadCrash which is basically a stripped-down version of the client and server code with which we initially encountered the bug. The (Django) server basically does nothing other than accept some files, the most relevant port for you guys is presumably in the javascript and templates folders.


Expected results:

Firefox should not crash
Do you have self-contained testcase? Or can you explain how to deploy the testcase you provided?
Flags: needinfo?(santiago.garcia)
I updated the example at https://github.com/sgarcialaguna-mms/FirefoxUploadCrash

To reproduce the issue:

1. Clone the repo
2. Install Python 2.7
3. Run python webserver.py
4. Open testupload.html in Firefox
5. Drop a thousand small files into the dropzone
=> Crash
Severity: normal → critical
Crash Signature: [@ mozilla::dom::Blob::ToFile ]
Keywords: crash
Ok, I'm able to reproduce the crash and there is a regression.
I tested with FF33, no crash, uploading is fast and it took me only ~1 min to upload 1000 1-KB files.

In FF48, it's slow and after 400 files uploaded, the tab crashed.
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=2c86ca
757a1ec2514eb6ac2d47ada5fd29cceac3&tochange=d607fcbd12d493b9bdea164c873448d21257
3f77


Andrea Marchesini — Bug 1237595 - FormData ctor and form submission should create empty Blob/File when a input type=file is not set, r=smaug
Blocks: 1237595
Status: UNCONFIRMED → NEW
Has Regression Range: --- → yes
Has STR: --- → yes
Component: Untriaged → DOM
Ever confirmed: true
Flags: needinfo?(santiago.garcia) → needinfo?(amarchesini)
Keywords: regression
Product: Firefox → Core
Summary: Firefox crashes when uploading many small files → Crash in mozilla::dom::Blob::ToFile when uploading many small files
Version: 48 Branch → 46 Branch
I think there is another side-effect about performance, uploading 1000 files is not smooth, sometimes there are some pauses.
Flags: needinfo?(amarchesini)
NI Andrew to confirm triage here.
Baku: You were needinfo'd since it appears your bug caused this regression. Is there a reason you cleared the needinfo without commenting?
Flags: needinfo?(amarchesini)
I'll let baku comment and follow up with him off-bug.
Flags: needinfo?(overholt)
I don't know why, but my plan was to move the NI to Michael and somehow I just removed my NI flag.
Flags: needinfo?(amarchesini)
Flags: needinfo?(michael)
Michael, I tested this issue and the crash happens in DataTransferItem.cpp here:
https://dxr.mozilla.org/mozilla-central/source/dom/events/DataTransferItem.cpp#280
tracking 52+ for this reproducible crash.
(In reply to Andrea Marchesini [:baku] from comment #11)
> Michael, I tested this issue and the crash happens in DataTransferItem.cpp
> here:
> https://dxr.mozilla.org/mozilla-central/source/dom/events/DataTransferItem.
> cpp#280

I tried to reproduce this failure on win10 (32-bit) and couldn't reproduce a crash in a debug build (the upload completed successfully). I'll try again with win10 (64-bit) today probably. I also tried to reproduce it on linux64 and found a different, but as far as I can tell, unrelated issue related to how we talk to gtk and ask it for drag data.
Flags: needinfo?(michael)
It's actually quite easy to reproduce on linux:

1. create 100 files with a few bits in them.
2. drag them all into the testcase
3. of course e10s-mode.
Flags: needinfo?(michael)
I tested with 1000 1-KB files, crash at each upload.
I still can't reproduce this issue locally (although I am running into issues with dragging 1k files onto a drop area on linux being very very very slow). However, assuming you found the crash place, I inspected the code and I think I found where the issue would be. Is there any chance you could test the code with this patch and let me know if it fixes the issue?
Attachment #8794348 - Flags: review?(amarchesini)
Flags: needinfo?(michael)
Untrack 51 as the volume is very low in 51.
Assignee: nobody → michael
Have you tested the patch? it breaks the DataTransferItem completely. Nothing works with this patch, at least locally:

[Parent 11602] WARNING: 'NS_FAILED(rv) || !data', file /src/dom/events/DataTransferItem.cpp, line 190

When you were testing this issue, were the files empty or not? I don't know if it's important, but in my tests, the files were not empty. 100% reproducible.
(In reply to Andrea Marchesini [:baku] from comment #18)
> Have you tested the patch? it breaks the DataTransferItem completely.
> Nothing works with this patch, at least locally:
> 
> [Parent 11602] WARNING: 'NS_FAILED(rv) || !data', file
> /src/dom/events/DataTransferItem.cpp, line 190
> 
> When you were testing this issue, were the files empty or not? I don't know
> if it's important, but in my tests, the files were not empty. 100%
> reproducible.

That seems wrong. Nothing in this patch should break the DataTransferItem class... It definitely works for me. I am using non-empty files (They all contain the string "Random Text").

I am definitely also getting warnings such as "WARNING: Clipboard data provided by the OS does not match predicted kind:" but I believe that that is due to a different linux-specific bug related to how we read in drag data.

With my debug build of central + the attached patch I am able to consistently upload 156 files so long as I wait long enough for the cursor to change when hovering over (which is a very long time because we seem to have performance issues). I decided to profile the perf issues when uploading many files on linux (at least on my system), and it seems like we spend pretty much all of our time sleeping here: http://searchfox.org/mozilla-central/source/widget/gtk/nsDragService.cpp#1224 (we call this function 2x per file we upload at minimum). I feel like we could do a lot better with that.
I don't see any crashes with this signature on Beta50, untracked. Feel free to nominate for tracking if it spikes again in the future.
Comment on attachment 8794348 [details] [diff] [review]
Check our kind after filling in external data in GetAsFile and GetAsString

Review of attachment 8794348 [details] [diff] [review]:
-----------------------------------------------------------------

We should investigate why the upload is so slow on linux, but I cannot reproduce the crash.
Attachment #8794348 - Flags: review?(amarchesini) → review+
Pushed by michael@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/71ee175c9c00
Check our kind after filling in external data in GetAsFile and GetAsString, r=baku
https://hg.mozilla.org/mozilla-central/rev/71ee175c9c00
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Hi Santiago,
could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(santiago.garcia)
(In reply to Gerry Chang [:gchang] from comment #24)
> Hi Santiago,
> could you please verify this issue is fixed as expected on a latest Nightly
> build? Thanks!

Still crashes for me :-(
Flags: needinfo?(santiago.garcia)
michael, still crashing, can you take another look? Thanks!
Status: RESOLVED → REOPENED
Flags: needinfo?(michael)
Resolution: FIXED → ---
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #27)
> michael, still crashing, can you take another look? Thanks!

I managed to reproduce a crash in a debug build on windows (32-bit build, 64-windows). The crash seemed to occur being called from jitcode in js::CompartmentChecker::check().

This crash is due to an assertion failure (MOZ_ASSERT_IF(obj, IsInsideNursery(obj) || !obj->asTenured().isMarked(gc::GRAY))), but I imagine that the assertion failure is probably the root cause of whatever crash actually occurs.

Crash Stack:
	xul.dll!js::CompartmentChecker::check(JSObject * obj) Line 71	C++
 	xul.dll!js::CompartmentChecker::check(const JS::Value & v) Line 94	C++
 	xul.dll!js::assertSameCompartmentDebugOnly<JS::MutableHandle<JS::Value> >(js::ExclusiveContext * cx, const JS::MutableHandle<JS::Value> & t1) Line 177	C++
 	xul.dll!js::GetObjectElementOperation(JSContext * cx, JSOp op, JS::Handle<JSObject *> obj, JS::Handle<JSObject *> receiver, JS::Handle<JS::Value> key, JS::MutableHandle<JS::Value> res) Line 462	C++
 	xul.dll!js::jit::GetPropertyIC::update(JSContext * cx, JS::Handle<JSScript *> outerScript, unsigned int cacheIndex, JS::Handle<JSObject *> obj, JS::Handle<JS::Value> idval, JS::MutableHandle<JS::Value> vp) Line 2283	C++

It seems to me like it's being called from jitcode, and doesn't seem to happen until after many hundred uploads have occured. I'm inclined to think that it may be a problem in the JS engine's generation of jitcode or something along those lines.

The following is the output of DumpJSStack() at the crash site:
0 yb(a = [unavailable], b = [unavailable]) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":4]
1 .ajaxSetup(a = [unavailable], b = [unavailable]) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":4]
2 .ajax(b = undefined) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":4]
    this = function (a,b){return new n.fn.init(a,b)}
3 ._onSend/send("", "success", [object Object]) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/blueimp-file-upload/js/jquery.fileupload.js":906]
    this = [object Object]
4 .Deferred/d.then/</</<("", "success", [object Object]) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":2]
    this = [object Object]
5 n.Callbacks/i() ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":2]
6 n.Callbacks/j.fireWith(a = [unavailable], c = [unavailable]) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":2]
7 .Deferred/</e[f[0]]([unavailable], [unavailable], [unavailable]) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":2]
8 n.Callbacks/i() ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":2]
9 n.Callbacks/j.fireWith(a = [object Object], c = [object Object],,success,[object Object]) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":2]
    this = [object Object]
10 z(b = 200, c = "OK", d = [object Object], h = "") ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":4]
    this = [object Window]
11 .send/c/<([object ProgressEvent]) ["file:///C:/Users/mrl/FirefoxUploadCrash/node_modules/jquery/dist/jquery.min.js":4]
    this = [object XMLHttpRequest]

From looking at this stack, I don't think that this is caused by DataTransfer (though I may be wrong), so I'm inclined to move the problem over to someone in the JS team who would be able to look into the problem more. liz, can you redirect a needinfo to someone who might be able to take a look at this?
Flags: needinfo?(michael) → needinfo?(lhenry)
Naveed, can you help out? Thanks!
Flags: needinfo?(lhenry) → needinfo?(nihsanullah)
Hannes can you please help us determine how to proceed with this stack?
Flags: needinfo?(nihsanullah) → needinfo?(hv1989)
Talked to Jon about this. It is an assertion in the GC that has been added recently and they currently have a bunch of similar bug reports. The grey marking is incorrect. Jonco suggested that a GC person or a DOM person that understands gray marking should look at it. Jon will look for somebody that can move this forward.
Flags: needinfo?(hv1989) → needinfo?(jcoppeard)
Blocks: 1317672
(In reply to Hannes Verschore [:h4writer] from comment #31)
> Talked to Jon about this. It is an assertion in the GC that has been added
> recently and they currently have a bunch of similar bug reports. The grey
> marking is incorrect. Jonco suggested that a GC person or a DOM person that
> understands gray marking should look at it. Jon will look for somebody that
> can move this forward.

Did you find such a person, Jon?
I looked into it myself but wasn't able to reproduce on linux or windows.
You need to create and upload a folder with 1000 small 1-kb files. It crashes randomly (after 400 or 500 files) when uploading the folder in the file dropzone of the testcase testupload.html.
Fresh CR: https://crash-stats.mozilla.com/report/index/b91984d6-ebfb-4002-92ec-610542161123
Assignee: michael → nobody
Assignee: michael → nobody
Jon, do you think bug 1335117 will have an effect here?
Status update: I'm working on these gray marking assertion failures but don't have a fix yet.  Please see the bugs I've filed under bug 1317672.
Not a huge volume, and we're reaching the end of beta52 → wontfix.
I can't find any recent crash reports for either signature.  Maybe we can close this now.
Flags: needinfo?(jcoppeard)
I tried again with Firefox 52.0.2 64 bit and couldn't reproduce the issue either.
Thanks Jon and santiago.garcia for the updates.
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → WORKSFORME
Fix range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b63a7e
217db7558be2f863d3e70e2649c11dd060&tochange=9b9d0cfd3fa3c93cf8a9a97c603176ece0b2
66e9

Fixed by:
Jon Coppeard — Bug 1323241 - Only report things as gray if gray marking state is valid r=sfink
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.