Closed Bug 1233481 Opened 9 years ago Closed 6 years ago

Crashed while making userscript for private page.

Categories

(Core :: JavaScript Engine, defect)

44 Branch
Unspecified
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox44 --- affected
firefox45 --- affected
firefox46 --- affected
firefox47 --- affected

People

(Reporter: the.vbm, Unassigned)

References

Details

(Keywords: crash, Whiteboard: [MemShrink:P2])

Crash Data

Attachments

(2 files)

UserScript adds two iframes from same website for quicker checking on some info.

Page 1 where the script runs will load page 2 (shows special token) and page 3 (shows special token 2). This way i get all info i need on the same page. (Don't ask me why they made separate pages for each token, they don't know either, it's like that for ages)

For some reason US leaks which leads to this stats:
Process Name: [firefox.exe] PID: [9072] Mem Usage: [2,321.28MB] VM Size: [3,065.2MB] VM Peak: [3,092.38MB ]
Crash Signature: [@ js::AutoEnterOOMUnsafeRegion::crash ]
https://crash-stats.mozilla.com/report/index/b9c777f7-5eac-4bb9-8600-956ec2151217

From the signature I presume this is an intentional crash when running out of memory while in a sensitive area of the code. 

1) Is there some public site or code that reproduces the issue?
2) You're sure this is not just the UserScript JS leaking all memory? Because that's what it looks like.

Given that the page is private I don't see what we can do here. 

This was reported on IRC but the information that the JS that was using up all the memory is in an add-on, rather than a webpage (that presumably would work with other browsers) was missing.
Regarding the signature itself: this is just a rename from CrashAtUnhandlableOOM. We made it into an RAII guard a few months ago to help with automated testing of OOM conditions. These OOM crashes are not actually new.
Vladan, this is a top crasher on Beta44. I just thought I would poke you to see if we can get any help in the investigation here.
Flags: needinfo?(vladan.bugzilla)
This crash is in GC code, so Terrence is a much better choice for this.

vBm: It would be very helpful if you could pare this down into a reproducible test case
Flags: needinfo?(vladan.bugzilla)
(In reply to Vladan Djeric (:vladan) -- please needinfo from comment #4)
> This crash is in GC code, so Terrence is a much better choice for this.
> 
> vBm: It would be very helpful if you could pare this down into a
> reproducible test case

Thanks! Terrence, this is a top crasher on 44 and I am wondering if this is something that has a potential fix that can be uplifted to Beta44.
Flags: needinfo?(terrence)
(In reply to Ritu Kothari (:ritu) from comment #5)
> (In reply to Vladan Djeric (:vladan) -- please needinfo from comment #4)
> > This crash is in GC code, so Terrence is a much better choice for this.
> > 
> > vBm: It would be very helpful if you could pare this down into a
> > reproducible test case
> 
> Thanks! Terrence, this is a top crasher on 44 and I am wondering if this is
> something that has a potential fix that can be uplifted to Beta44.

I don't think so, no. The page has a memory leak, we run out of memory, we crash. This is not something that can be addressed in the browser.
Flags: needinfo?(terrence)
I just got this crash signature when viewing a fairly heavy Google Docs spreadsheet (https://crash-stats.mozilla.com/report/index/849ec6cf-dbc5-429e-808b-a9a912160105) (scripts running). It totally crashed Firefox, but remains stable in Chrome.
Johnny, Vladan, Terrence: Karen (thanks!) has provided us with a reproducible crash. This crash is ranked #5 on Beta44, I will be happy to uplift a safe fix to moz-beta for you. Could you please get a dev to investigate asap? Appreciate your help in addressing stability issues promptly.
Flags: needinfo?(vladan.bugzilla)
Flags: needinfo?(terrence)
Flags: needinfo?(jst)
(In reply to Ritu Kothari (:ritu) from comment #8)
> Johnny, Vladan, Terrence: Karen (thanks!) has provided us with a
> reproducible crash. This crash is ranked #5 on Beta44, I will be happy to
> uplift a safe fix to moz-beta for you. Could you please get a dev to
> investigate asap? Appreciate your help in addressing stability issues
> promptly.

This signature is an alias for OOM|small. If we fix this stack, the page will still crash somewhere else. I'm not sure what our policy for OOM|small is exactly, but we should do whatever it is we do for those crashes.
Flags: needinfo?(terrence)
Yeah this does like OOM|small, there is enough memory in the system overall.

Aaron: can you take a look at this?
Karen: can you share the document URL?
Flags: needinfo?(vladan.bugzilla)
Flags: needinfo?(krudnitski)
Flags: needinfo?(aklotz)
Send link to Vlad, Terrence and Aaron via the Google Doc.
Flags: needinfo?(krudnitski)
I'm knee-deep in JS code that I'm not very familiar with.
The |reason| parameter to js::AutoEnterOOMUnsafeRegion::crash is "failed to get a stable hash code", which points to https://dxr.mozilla.org/mozilla-central/source/js/src/gc/Barrier.cpp#129

There is a lot of inlining going on so it's really hard to figure out what's where. Terrence, does this information allow for any new insights?
Flags: needinfo?(aklotz) → needinfo?(terrence)
Running the sheet under Windows, I'm seeing dramatically different behavior: we are frequently spiking up to 2-3GiB of heap unclassified from a baseline of 500MiB or so. I seem to have enough headroom here for the GC to kick in and fix it before crashing, but I can see why people are seeing OOMs here. The spikes are more or less instantaneous; <1 refresh cycle in the windows process monitor.

(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #12)
> I'm knee-deep in JS code that I'm not very familiar with.
> The |reason| parameter to js::AutoEnterOOMUnsafeRegion::crash is "failed to
> get a stable hash code", which points to
> https://dxr.mozilla.org/mozilla-central/source/js/src/gc/Barrier.cpp#129
> 
> There is a lot of inlining going on so it's really hard to figure out what's
> where. Terrence, does this information allow for any new insights?

That's still a red-herring. The actual problem is whatever is in that 2-3GiB of heap-unclassified.

Eric, this seems like something that should be on memshrink's radar.
Flags: needinfo?(terrence) → needinfo?(erahm)
Whiteboard: [memshrink]
(In reply to Terrence Cole [:terrence] from comment #13)
> Running the sheet under Windows, I'm seeing dramatically different behavior:
> we are frequently spiking up to 2-3GiB of heap unclassified from a baseline
> of 500MiB or so. I seem to have enough headroom here for the GC to kick in
> and fix it before crashing, but I can see why people are seeing OOMs here.
> The spikes are more or less instantaneous; <1 refresh cycle in the windows
> process monitor.
> 
> That's still a red-herring. The actual problem is whatever is in that 2-3GiB
> of heap-unclassified.
> 
> Eric, this seems like something that should be on memshrink's radar.

Terrence, any chance you can do a DMD run? It would be be helpful to see a stack of the allocated memory.
Flags: needinfo?(erahm) → needinfo?(terrence)
I have a hunch this is Ion going nuts during compilation -- I filed bug 1156318 a while back after seeing large DMD stacks in that code path -- but we'll need stacks to verify.
I'm highly skeptical of that explanation: there's a big difference between 50MiB and 2GiB. Unlike C++, JS compilation overhead should be proportional to script size and not super-linear or exponential. We barely use that much memory when linking all of firefox and Ion does even compile everything. In particular, it does not compile large functions and it only compiles once the loop or call counters get above 10,000. This is highly unlikely to happen to large volumes of code less in than one second. Even more tellingly, very little, if any, of that code is platform-specific, so why are we seeing platform effects?

No, I think this is much more likely to be in a platform-dependent part of the engine that regularly allocates large chunks of memory. Maybe graphics?
DMD is substantially slower, so I just left it to run all week and this weekend. It has managed to accumulate 800MiB of reserved and 2GiB of mapped memory in that time. Clicking on "minimize memory usage" does not affect these sizes, so this is all either in use or leaked. I'll attach the memory and DMD logs.

A quick look at the DMD logs shows that most of the memory is over in net. ~6,507 live ProxiedChannel2's; ~6,150 live nsHttpHeaderArrays; ~2,216 live net::CacheEntry::HashingKeysWithStorage; ~2,111 live nSStandardURL's. Given that this is all in one page, I'm going to guess this is a leak of some sort. Although there's only 800MiB reserved, I'm guessing that fragmentation in the C heap is absolutely murderous with all of these small allocations sitting around for long periods. The difference I'm seeing in max commit size might be down to --enable-jemalloc vs --disable-jemalloc for the DMD build. 

Note: it was extremely windy this weekend in Santa Barbara, so the network was up and down a ton. I'd be particularly suspicious of leaks in our error paths here.
Flags: needinfo?(terrence) → needinfo?(erahm)
(In reply to Terrence Cole [:terrence] from comment #17)
> Created attachment 8714366 [details]
> bug1233481-dmd-report.txt.gz
> 
> DMD is substantially slower, so I just left it to run all week and this
> weekend. It has managed to accumulate 800MiB of reserved and 2GiB of mapped
> memory in that time. Clicking on "minimize memory usage" does not affect
> these sizes, so this is all either in use or leaked. I'll attach the memory
> and DMD logs.

I think we're looking at 2 (maybe 3) different issues here.
Issue 1:
- Comment 0, high memory usage with a user script, proposed that it's a leak, oom during GC
- Comment 7, maybe the same issue w/ a google doc
Issue 2:
- Comment 17, high heap unclassified during a long lived run

> A quick look at the DMD logs shows that most of the memory is over in net.
> ~6,507 live ProxiedChannel2's; ~6,150 live nsHttpHeaderArrays; ~2,216 live
> net::CacheEntry::HashingKeysWithStorage; ~2,111 live nSStandardURL's. Given
> that this is all in one page, I'm going to guess this is a leak of some
> sort. Although there's only 800MiB reserved, I'm guessing that fragmentation
> in the C heap is absolutely murderous with all of these small allocations
> sitting around for long periods. The difference I'm seeing in max commit
> size might be down to --enable-jemalloc vs --disable-jemalloc for the DMD
> build. 

Can you file a separate memshrink bug for this? Probably against networking.

So back to the original issue, if we can get a DMD report while the memory is spiking I can look into this further. Or if I can get access to the Google doc I'll do a DMD run on Windows.
Flags: needinfo?(erahm) → needinfo?(terrence)
Nathan can you do a DMD run on Windows?
Whiteboard: [memshrink] → [MemShrink:P2]
(In reply to Eric Rahm [:erahm] from comment #20)
> Nathan can you do a DMD run on Windows?

Sorry wrong bug!
Filed bug 1248657 to track the network heap-unclassified issue. It's odd that the other issue disappears for me when I'm running under DMD. Maybe it's timing related?
Flags: needinfo?(terrence)
Keywords: crash
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: 1248809
I looked at a number of these crashes, and they are all over the place. This is just how the JS engine indicates an OOM failure. I filed bug 1233481 so that this single signature will be split up based on whatever the next frame is.
(In reply to Andrew McCreight [:mccr8] from comment #23)
> I filed bug 1233481 so that this single signature will be split up based on whatever the next frame is.

I think mccr8 meant bug 1248809.
FWIW, I filed bug 1249739 for the memory usage of the spreadsheet from comment 7.
¡Hola!

12473 in the past week per https://crash-stats.mozilla.com/report/list?product=Firefox&signature=js%3A%3AAutoEnterOOMUnsafeRegion%3A%3Acrash

Seems to affect all the way from Release to Nightly, updating tags accordingly.

Ended up here from https://crash-stats.mozilla.com/report/index/452669fe-c0b6-411f-b7cf-c84c52160217

Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	js::AutoEnterOOMUnsafeRegion::crash(char const*) 	js/src/jscntxt.cpp
1 	xul.dll 	js::irregexp::InterpretedRegExpMacroAssembler::Expand() 	js/src/irregexp/RegExpMacroAssembler.cpp
2 	xul.dll 	js::irregexp::InterpretedRegExpMacroAssembler::EmitOrLink(js::jit::Label*) 	js/src/irregexp/RegExpMacroAssembler.cpp
3 	xul.dll 	js::irregexp::InterpretedRegExpMacroAssembler::CheckNotCharacterAfterAnd(unsigned int, unsigned int, js::jit::Label*) 	js/src/irregexp/RegExpMacroAssembler.cpp
4 	xul.dll 	ShortCutEmitCharacterPair 	js/src/irregexp/RegExpEngine.cpp
5 	xul.dll 	EmitAtomLetter 	js/src/irregexp/RegExpEngine.cpp
6 	xul.dll 	js::irregexp::TextNode::Emit(js::irregexp::RegExpCompiler*, js::irregexp::Trace*) 	js/src/irregexp/RegExpEngine.cpp
7 		@0x1729ff 	

¡Gracias!
Alex
See Also: → 1257387
Flags: needinfo?(jstenback+bmo)
Closing because no crash reported since 12 weeks.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: