Firefox occasionally crashes on bandwidthplace.com, which uses HTML5

RESOLVED FIXED

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: itiel_yn8, Assigned: erahm)

Tracking

47 Branch
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [MemShrink:P3], crash signature, )

Attachments

(5 attachments)

The crash is not 100% reproducible.
A user posted on the Mozilla Israel's community (http://community.mozilla.org.il/viewtopic.php?f=9&t=12493) and complained that Firefox is slow and crashes lately on Firefox 47.0.1.
I suggested him to create a new profile and check if the issues persist, and after resetting Firefox he said that there's an improvement with the browser's performance, but it's still crashing while speedtesting on http://www.bandwidthplace.com/.
His crash reports are all the same, all claiming to be OOM crashs, while it seems they're not:
https://crash-stats.mozilla.com/report/index/bp-b193a34c-6937-4475-81a8-47cf92160715
https://crash-stats.mozilla.com/report/index/bp-d82ee857-1e98-4fca-bf7e-410402160715
https://crash-stats.mozilla.com/report/index/bp-b6f8f23a-2b43-4bf8-826c-193c42160630
@ OOM | large | NS_ABORT_OOM | NS_ConvertUTF16toUTF8::NS_ConvertUTF16toUTF8
(See the Available Physical Memory, Available Page File and System Memory Use Percentage values)

I've tried speedtesting my internet connection with that website, and I also got a very similar crash:
https://crash-stats.mozilla.com/report/index/2d91608d-3f19-474b-a0ad-53b5d2160715
After Firefox crashes I tried running it again twice, without having any crashes this time.

Note that the site uses HTML5 for the test.
Unable to reproduce bug via OSX Yosemite 10.10.2 using nightly build 50.0a1 (2016-7-18). Ran several tests at bandwidthplace.com and didn't have any crashes.
I've tried about another 20 speedtests in this site and no crashes occured. Using Windows 10.

But doesn't it mean something that the same crash signature happened to both of us (the user and I), only on this site?
I'm unable to crash Firefox on Windows 10 using the latest Nightly 50 or Firefox 47.0.1 by taking the speedtests from http://www.bandwidthplace.com/.

Waldo, when possible, could you please take a look over this?
Severity: normal → critical
Crash Signature: [@ OOM | large | NS_ABORT_OOM | NS_ConvertUTF16toUTF8::NS_ConvertUTF16toUTF8 | js::GetProperty ] [@ OOM | large | NS_ABORT_OOM | NS_ConvertUTF16toUTF8::NS_ConvertUTF16toUTF8 | icudt56.dll@0x3fffff ]
Flags: needinfo?(jwalden+bmo)
ItielMaN, have you experienced any crashes since (could you try to reproduce the crash on the latest Firefox 48)? 

From what I see in the mozilla crash stats, the crashes seems specific to Firefox 47.
Flags: needinfo?(itiel_yn8)
FWIW, the stacks here are garbled nonsense, so the presence of js::GetProperty or ICU in the stack trace doesn't mean much.  Without some reproducible series of steps to investigate this, it's not really possible to figure out what went wrong here, or where it went wrong, or how to fix it.  :-(
Flags: needinfo?(jwalden+bmo)
(In reply to Simona B [:simonab] from comment #4)
> ItielMaN, have you experienced any crashes since (could you try to reproduce
> the crash on the latest Firefox 48)? 
> 
> From what I see in the mozilla crash stats, the crashes seems specific to
> Firefox 47.

I myself don't use this site often, I tested it only because the user was having troubles with it.
Anyway, I've tried speedtesting again multiple times without any crash.

I'll ask the user if Firefox is still crashing in v48 and report back here once he'll answer.
Flags: needinfo?(itiel_yn8)
(In reply to Simona B [:simonab] from comment #4)
> ItielMaN, have you experienced any crashes since (could you try to reproduce
> the crash on the latest Firefox 48)? 
> 
> From what I see in the mozilla crash stats, the crashes seems specific to
> Firefox 47.

Well what do you know, the user says that the crashes are gone in Firefox 48.
You can close this bug.
Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
The same user reports Firefox is crashing on the same website, again.
Latest crash report is the same as the previous ones..
https://crash-stats.mozilla.com/report/index/f9e6e7f1-5303-40ac-92aa-7918e2160816
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
Eric, any idea how we'd reason about what the 48MB failed allocation from the crash in comment 8 on would be?
Flags: needinfo?(erahm)
Something is seriously screwy with this site. I've managed one time to get a Nightly build spiking a CPU core and chewing up 4+GB of RAM and going up. But it doesn't reproduce every time.

I've been trying to catch things in about:memory when things are acting up. One observation from a non-broken run when I do a before/after diff is:
─54.00 MB (19.36%) -- string(length=50331648, copies=1, "987112948676987112948676987112948676987112948676987112948676987112948676987112948676987

I wonder if we're doing bad with strings that hit a pathological case at times. I still have no idea how to make this hit reliably, but I can certainly see how an OOM could happen here depending on the system.
50,331,648 bytes (the OOM allocation amount from comment 8) *exactly* matches that 54MB string showing up in about:memory. So that's apparently the allocation that's dying here.
I also see a whole lot of image memory usage while the test is running along the lines of:
│   │  │  ├───7.82 MB (01.89%) ── image(3868x3868, http://washington1.bandwidthplace.com/static/8192.jpg?1473269832664)/source

Where the numbers after the ? vary.
Not sure if that matters, but it doesn't match the OOM allocation size in this crash:
https://crash-stats.mozilla.com/report/index/b193a34c-6937-4475-81a8-47cf92160715
(but it does match the other 2 crash reports in comment 1)
What would be really nice would to be getting an about:memory snapshot when the browser is in the bad state. I unfortunately missed my chance last go-around :(
I managed to hit it again. Here's an about:memory report.
The CPU spike eventually went away. Here's an about:memory report from afterwards.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #18)
> Created attachment 8788979 [details]
> about:memory report from after the content process stopped hanging
> 
> The CPU spike eventually went away. Here's an about:memory report from
> afterwards.

Where in the world did this come from?!?
ghost/window(http://web.archive.org/web/*/http://psx-scene.com/*)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
> (In reply to Ryan VanderMeulen [:RyanVM] from comment #18)
> > Created attachment 8788979 [details]
> > about:memory report from after the content process stopped hanging
> > 
> > The CPU spike eventually went away. Here's an about:memory report from
> > afterwards.
> 
> Where in the world did this come from?!?
> ghost/window(http://web.archive.org/web/*/http://psx-scene.com/*)

Only place I've seen that is bug 1282322.
(In reply to Eric Rahm [:erahm] from comment #20)
> Only place I've seen that is bug 1282322.

That's, umm, interesting! Also, depending on when I collect the memory report while the test is running, I can get a pretty high heap-unclassified number as well:
├───72.39 MB (35.82%) ── heap-unclassified
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> Eric, any idea how we'd reason about what the 48MB failed allocation from
> the crash in comment 8 on would be?

Well they're creating ginormous strings. I don't think we should be crashing though, that ought to be fallible.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)
> I also see a whole lot of image memory usage while the test is running along
> the lines of:
> │   │  │  ├───7.82 MB (01.89%) ── image(3868x3868,
> http://washington1.bandwidthplace.com/static/8192.jpg?1473269832664)/source
> 
> Where the numbers after the ? vary.

This comes from the download portion:

>  var url = prefix + images[i] + postfix + '?' + start;
>  var item;
>  if (isiOS) {
>    item = document.createElement('iframe');
>  } else {
>    item = document.createElement('img');
>  }
>  var t = $(item).hide();
>  $('body').append(t);
>  item.onload = (function (i, start, t) {
>    return function () {
>      download.loadCallback(i, start);
>      t.remove();
>    }
>  }) (i, start, t);
>  item.src = url;
>  download.items.push(t);

To test download speed they repeatedly download an image, create an array of decoded images and use the onload callback to tell when the download has completed. They're kind enough to remove the image after loading it, but it still remains in the download.items array so it sticks around. The '?' portion is probably for cache-busting.

Then for upload we have:

>  upload.xhr = getXHR();
>  upload.xhr.open('POST', _BACKEND_SERVER + 'uploader/upload.cgi?sid=' + sid, true);
>  upload.xhr.onload = function () {
>    upload.pollxhr.abort();
>    clearTimeout(upload.timeout);
>    stopAnimation();
>    upload.first = true;
>    upload.start();
>  };
>  upload.bin = upload.bin || upload.generate();
>  upload.xhr.send(upload.bin);
> ...
> generate: function () {
>   var data = Math.floor((Math.random() * 1000000000000) + 100000000) + '';
>   var iterations = 11;
>   var i;
>   for (i = 0; i < iterations; i++) {
>     data += data.concat(data.concat(data));
>   }
>   return data;
> }

So yeah they create a ginormous string to upload via xhr. On the plus side they only do it once, although they iteratively build it so there will be a bunch of temp copies doing that.

Overall it's not clear to me why we're converting from UTF16 to UTF8 at all, maybe an XHR thing where we take our wide JS string and shrink it down when calling xhr.send?

(In reply to Ryan VanderMeulen [:RyanVM] from comment #21)
> Also, depending on when I collect the memory
> report while the test is running, I can get a pretty high heap-unclassified
> number as well:
> ├───72.39 MB (35.82%) ── heap-unclassified

There are some known issue with XHR memory reporting, I wouldn't be surprised if this is related.
Flags: needinfo?(erahm)
Whiteboard: [MemShrink]
The CPU spike would make sense for when we're build up the gigantic string (I think we're looking at something like 33 temp strings each probably requiring growing the buffer), then when we send the XHRs for upload we're looking at converting a 50MiB string from UTF16 -> UTF8 each time which requires double processing the string, a new allocation, and a roughly making a copy. Also there might be the cost of transferring data to the parent process (not sure if that's required or not).
And the oom itself is probably coming from the UTF16->UTF8 conversion in GetAsStream [1] when preparing to send the data. I think we could make that fallible at least.

[1] https://dxr.mozilla.org/mozilla-central/rev/91c2b9d5c1354ca79e5b174591dbb03b32b15bbf/dom/xhr/XMLHttpRequestMainThread.cpp#2271
(In reply to Eric Rahm [:erahm] from comment #22)
> So yeah they create a ginormous string to upload via xhr. On the plus side
> they only do it once, although they iteratively build it so there will be a
> bunch of temp copies doing that.

The JS engine's string implementation does do intelligent things here, so if you do |a + b|, internally there will be a "string" that just has a pointer to |a| and |b|.  So there might not be as many temp copies as you think.
Whiteboard: [MemShrink] → [MemShrink:P3]
(In reply to Nathan Froyd [:froydnj] from comment #25)
> (In reply to Eric Rahm [:erahm] from comment #22)
> > So yeah they create a ginormous string to upload via xhr. On the plus side
> > they only do it once, although they iteratively build it so there will be a
> > bunch of temp copies doing that.
> 
> The JS engine's string implementation does do intelligent things here, so if
> you do |a + b|, internally there will be a "string" that just has a pointer
> to |a| and |b|.  So there might not be as many temp copies as you think.

I'm doubtful that works even with a += a.concat(a.concat(a)), but would be impressed if it did!
When sending a string via XHR use fallible conversion to avoid OOMing if
content provides a particularly large string.
Attachment #8789507 - Flags: review?(amarchesini)
Assignee: nobody → erahm
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 8789507 [details] [diff] [review]
Use fallible unicode conversion in XHR

Review of attachment 8789507 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, but can you do the same here as well: https://dxr.mozilla.org/mozilla-central/source/dom/xhr/XMLHttpRequestMainThread.cpp#2232

::: dom/xhr/XMLHttpRequestMainThread.cpp
@@ +2267,5 @@
>  {
>    aContentType.AssignLiteral("text/plain");
>    aCharset.AssignLiteral("UTF-8");
>  
> +  nsCString converted;

nsAutoCString
Attachment #8789507 - Flags: review?(amarchesini) → review+
(In reply to Andrea Marchesini [:baku] from comment #28)
> Comment on attachment 8789507 [details] [diff] [review]
> Use fallible unicode conversion in XHR
> 
> Review of attachment 8789507 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> lgtm, but can you do the same here as well:
> https://dxr.mozilla.org/mozilla-central/source/dom/xhr/
> XMLHttpRequestMainThread.cpp#2232

Sure I'll update that as well.

> ::: dom/xhr/XMLHttpRequestMainThread.cpp
> @@ +2267,5 @@
> >  {
> >    aContentType.AssignLiteral("text/plain");
> >    aCharset.AssignLiteral("UTF-8");
> >  
> > +  nsCString converted;
> 
> nsAutoCString

Will change before landing.
We think this will fix the crash (it's possible it'll just shift the signature though), but currently that's just speculation.

RyanVM has offered to follow up with the reporter once this hits nightly, until then lets keep it open.
Flags: needinfo?(ryanvm)
Keywords: leave-open
(In reply to Eric Rahm [:erahm] from comment #31)
> We think this will fix the crash (it's possible it'll just shift the
> signature though), but currently that's just speculation.
> 
> RyanVM has offered to follow up with the reporter once this hits nightly,
> until then lets keep it open.

Technically I think you guys are better suited than me (and the user that reported the crash in the first place) to verify that the bug is indeed fixed, since we're not that experts in about:memory and all that stuff.
But sure, just tell me when it'll land in Nightly and we'll check.

BTW, could this patch theoretically fix also other crashes with big-strings related?

Thanks!
Flags: needinfo?(erahm)
Depends on: 1301742
Anecdotally, Task Manager appears to show a lower memory spike than it previously did while running the test now. Would be great to have some more testing, though :)
Flags: needinfo?(ryanvm) → needinfo?(itiel_yn8)
(In reply to ItielMaN from comment #32)
> Technically I think you guys are better suited than me (and the user that
> reported the crash in the first place) to verify that the bug is indeed
> fixed, since we're not that experts in about:memory and all that stuff.
> But sure, just tell me when it'll land in Nightly and we'll check.

I wasn't able to reproduce the crash, but am fairly certain this will fix it given the crash signature. It would be great to get feedback from someone who experienced the crash but not absolutely necessary. 

> BTW, could this patch theoretically fix also other crashes with big-strings
> related?

It could potentially help other situations where large strings are uploaded via XmlHttpRequest.
Flags: needinfo?(erahm)
(In reply to Eric Rahm [:erahm] from comment #35)
> I wasn't able to reproduce the crash, but am fairly certain this will fix it
> given the crash signature. It would be great to get feedback from someone
> who experienced the crash but not absolutely necessary. 

Has the patch landed on Nightly already?
Flags: needinfo?(itiel_yn8)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #37)
> Yes, comment 33.

Okay, didn't know it meant the patch was landed.
Will contact the crash's original reporter for testing.
Running with my patch from bug 1301742 I'm seeing better CPU performance as well.
Lets go ahead and close this, if the reporter still sees issues we can reopen.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago3 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.