Closed Bug 1029671 Opened 10 years ago Closed 9 years ago

Large OOM in jArray<T>::newJArray with NoScript & Session Manager add-on

Categories

(Core :: DOM: HTML Parser, defect)

x86_64
Windows 8.1
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox39 + wontfix
firefox40 + wontfix
firefox41 + wontfix
firefox42 + wontfix
firefox43 --- fixed

People

(Reporter: lagu, Assigned: hsivonen)

References

Details

(Keywords: crash, crashreportid, Whiteboard: [MemShrink:P1][fixed by dependencies])

Crash Data

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0 (Beta/Release)
Build ID: 20140623030201

Steps to reproduce:

I was visiting serveral google webpages and it crashed my firefox serveral times in the last day's


Actual results:

https://crash-stats.mozilla.com/report/index/a3478cc7-04f9-4bf8-a3a2-b6e5c2140624
https://crash-stats.mozilla.com/report/index/77549744-540f-4f81-871f-69ade2140623
https://crash-stats.mozilla.com/report/index/77549744-540f-4f81-871f-69ade2140623
https://crash-stats.mozilla.com/report/index/7c994f41-ec14-42a1-acd0-f7ce82140624
Crash Signature: [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<wchar_t, int>::newJArray(int) ]
Severity: normal → critical
Keywords: crashreportid
How long have these crashes been happening?
Yesterday again :(
https://crash-stats.mozilla.com/report/index/7dfb0e64-f3b3-444a-b7a8-c80642140704

I decide to use for google pages chrome, cause i doesn't like crashing my firefox :(
(In reply to Lars Gusowski [:lagu] from comment #3)
> Yesterday again :(

When did these crashes start? Did they start with Firefox 33 or an earlier version? 

Does the crash still happen if you run Firefox in safe mode? -> http://mzl.la/MwuO4X
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #4)
> (In reply to Lars Gusowski [:lagu] from comment #3)
> > Yesterday again :(
> 
> When did these crashes start? Did they start with Firefox 33 or an earlier
> version? 

Not sure if it was still in 32 with activation of the new cache or in firefox 33.
Today https://crash-stats.mozilla.com/report/index/aafad701-4418-4d9d-900e-4a4bb2140705

Firefox process was 1,2 gb big.

I will try to get it crashed in safe mode.
Up to now i wasn't able to crash it in safe mode
(In reply to Lars Gusowski [:lagu] from comment #6)
> Up to now i wasn't able to crash it in safe mode

Thanks, it's likely that one or more of your installed extensions are causing this crash. To troubleshoot this further you can run Firefox in normal mode and disable all your addons, turning your add-ons back on one by one until the crash returns. The last addon enabled is likely the one causing your crash.
Thats i did today in the morning.
Browser restart and disableing all extensions.
If an extension causes this issue, we could close the bug with wontfix - or?
Keywords: crash
If this crash is found to be caused by a particular add-on we should resolve this bug as INVALID. However knowing the add-on is useful so the add-on developer can be informed and so we can investigate potentially blocking the add-on if this problem becomes explosive.
Component: Untriaged → HTML: Parser
Product: Firefox → Core
i will try to identify the addon
https://crash-stats.mozilla.com/report/index/5c531a51-c159-41b0-9fae-e37f92140706

I reduced the number of extensions - if it is an extension then one of the following:

Adblock Plus	2.6.3	true	{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}
Current Pushlog	2.0	true	jid1-WwVLR26pEIqm1A@jetpack
Firebug	2.0.1b1	true	firebug@software.joehewitt.com
Gameforge BugHelper Addon	3.2.2	true	gfbughelper@p3n.ch
Greasemonkey	1.15	true	{e4a8a97b-f2ed-450b-b12d-ee082ba24781}
KeeFox	1.4.2b3	true	keefox@chris.tomlinson
NoScript	2.6.8.32rc1	true	{73a6fe31-595d-460b-a920-fcc0f8843232}
Stylish	1.4.3	true	{46551EC9-40F0-4e47-8E18-8E5CF550CFB8}
userChromeJS	1.5	true	userChromeJS@mozdev.org
Chances are there's an add-on that fails to close an attribute value, comment or quoted string in a doctype, so an excessively large page ends up in a buffer meant for attribute values, comments or quoted strings in doctypes and Firefox runs out of memory when growing that buffer.
That sounds like the "old" problem of memory managment of Adblock Plus.
I disabled yesterday the next addon's and we will see if the crashes continue or if i deactivated the bad addon.

https://blog.mozilla.org/nnethercote/2014/05/14/adblock-pluss-effect-on-firefoxs-memory-usage/
Ok - as far as i see it isn't Adblock, couse i disabled it day's ago.
The crashes stopped after disabling NoScript.
Thanks Lars. Feel free to report this to NoScript developers.
Summary: Crash Report [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<wchar_t, int>::newJArray(int) ] → Crash @ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<wchar_t, int>::newJArray(int) with NoScript add-on
¡Hola Anthony!

This seems to be alive an kicking on beta...

Report ID 	Date Submitted
bp-f7ac9898-ce20-4c01-a0c4-accab2150315
	15/03/2015	01:50 p.m.

Let me know if there's anything needed from the affected system.

¡Gracias!
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(anthony.s.hughes)
Version: 33 Branch → 37 Branch
I disagree, Alex. 

These crashes have been occurring for a really long time and have *never* went away. However, in terms of total volume this signature accounts for just 0.08% of the crashes reported last week and has actually been in decline. If you have NoScript installed you could try disabling it, this seemed to work for Lars. Unfortunately there is nothing we can do about this crash right now.
Flags: needinfo?(anthony.s.hughes)
Keywords: crashreportid
Version: 37 Branch → Trunk
at the moment i doesn't have problems and doesn't get the crash - doesn't matter if i am browsing with enabled or disabled noscript
(In reply to alex_mayorga from comment #16)
> ¡Hola Anthony!
> 
> This seems to be alive an kicking on beta...
> 
> Report ID 	Date Submitted
> bp-f7ac9898-ce20-4c01-a0c4-accab2150315
> 	15/03/2015	01:50 p.m.
> 
> Let me know if there's anything needed from the affected system.
> 
> ¡Gracias!

If i have a look at this report, there is no noscript installed.
there is adblock plus and evernote?
It crashes for me with this crash signature on this site - http://trowbotham.com/bug_tests/iloop.php
The cause is Session Manager in my case.

https://crash-stats.mozilla.com/report/index/e5dbcb47-b9fc-47af-9c2b-3db862150508
Keywords: crashreportid
Summary: Crash @ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<wchar_t, int>::newJArray(int) with NoScript add-on → Crash @ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<wchar_t, int>::newJArray(int) with NoScript & Session Manager add-on
Flags: needinfo?(morac99-firefox2)
This showed up pretty high on KaiRo's Top Crash Scores for 39.0b1.

The allocations here can be pretty large, in the multi-megabytes. Our usual approach is to make those allocations fallible, but that seems difficult in this case. There's no return value checking for several layers up the stack.

Henri, is there anything we can do here?
Flags: needinfo?(hsivonen)
If fallible is not an option, perhaps we could use a 1-meg segmented array.
Whiteboard: [MemShrink]
[Tracking Requested - why for this release]: This is one of the top crashes on 39b2
#26 crash on 39b2 with 688/139712 crashes.  Here's hoping we can figure out some workaround (or, if it's a specific addon, get them to fix their issue)
FWIW, In the 39 betas, I am not seeing any connection to NoScript or Session Manager. This even happens on "vanilla" installations.
715 crashes on 39b1 (~31%), 221 crashes on 39b2 (9.7%) out of total crashes. Also, it appears this crash was even happening in Firefox 15, only 1 time though.
Whiteboard: [MemShrink] → [MemShrink:P1]
(In reply to David Major [:dmajor] from comment #22)
> This showed up pretty high on KaiRo's Top Crash Scores for 39.0b1.
> 
> The allocations here can be pretty large, in the multi-megabytes. Our usual
> approach is to make those allocations fallible, but that seems difficult in
> this case. There's no return value checking for several layers up the stack.
> 
> Henri, is there anything we can do here?

I guess it's necessary to make these allocations fallible and then, instead, of propagating the error normally, something similar to http://mxr.mozilla.org/mozilla-central/source/parser/html/nsHtml5TreeBuilderHSupplement.h#67 is needed.

That should be relatively doable in the comment 20 and comment 21 cases, which are different from the cases that were originally reported. The new crashes are caused by giant text nodes rather than giant attribute values.

I'll try to find the time to attempt a fix later this week. Leaving needinfo uncleared so I don't forget.
I thought about this more. The right fix is to fix bug 489820 as the first step and then to reduce the number of distinct buffers in the parser to keep the overall memory use in check.

The first step is enough to stop crashing and should be fairly easy.
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Depends on: 489820
Flags: needinfo?(hsivonen)
Crash Signature: [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<wchar_t, int>::newJArray(int) ] → [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<wchar_t, int>::newJArray(int) ] [@ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<T>…
(In reply to Henri Sivonen (:hsivonen) from comment #29)
> I thought about this more. The right fix is to fix bug 489820 as the first
> step

I'm now at this point but without testing.

I've started a win32 try build with my changes:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f60b8247e8e

When treeherder shows it's ready, the build should appear under:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/?C=M;O=D

I'd appreciate someone who has been able to reproduce the crash trying to repro the crash with that build. Thanks.

> and then to reduce the number of distinct buffers in the parser to keep
> the overall memory use in check.

This I haven't done yet.
Summary: Crash @ OOM | large | mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | jArray<wchar_t, int>::newJArray(int) with NoScript & Session Manager add-on → Large OOM in jArray<T>::newJArray with NoScript & Session Manager add-on
@ Henri Sivonen (:hsivonen) - with STR from Comment #21 here, I don't have any crashes with Session Manager anymore,
but new crashes appear with other crashlog report signature [@ OOM | large | mozglue.dll@0x2620 ] with this build
https://crash-stats.mozilla.com/report/index/29bb0b42-d7b2-45f4-9c66-1fb572150612
https://crash-stats.mozilla.com/report/index/7512482c-538c-41da-8441-ee8292150612
I diagnose it further and it looks like now uBlock is the cause of it.
Hi Liz,

Henri and I spoke and this patch should probably ride the trains.  Of course, if the OOM is deemed bad enough to risk the uplift, the patch should be upliftable.
Flags: needinfo?(lhenry)
Sylvestre: do you want this for 40? I am inclined to say uplift it there since this OOM comes up periodically for various add-ons. Maybe it will mitigate this one or the next one.  Your call for 40!
Flags: needinfo?(lhenry) → needinfo?(sledru)
Yep, can we have an uplift request? OOM are a pain... Thanks
Flags: needinfo?(sledru)
Depends on: 1176681
Depends on: 1176698
Depends on: 1173818, 559303, 1176668
(In reply to Virtual_ManPL [:Virtual] from comment #31)
> @ Henri Sivonen (:hsivonen) - with STR from Comment #21 here, I don't have
> any crashes with Session Manager anymore,
> but new crashes appear with other crashlog report signature [@ OOM | large |
> mozglue.dll@0x2620 ] with this build
> https://crash-stats.mozilla.com/report/index/29bb0b42-d7b2-45f4-9c66-
> 1fb572150612
> https://crash-stats.mozilla.com/report/index/7512482c-538c-41da-8441-
> ee8292150612
> I diagnose it further and it looks like now uBlock is the cause of it.

Thanks. How about with https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/hsivonen@mozilla.com-1864b8ee821d/ ?
Flags: needinfo?(virtual616)
So I have this queue:
Bug 1173818 (already landed but must be uplifted if the below patches are uplifted)
Bug 559303
Bug 1176668 
Bug 489820
Bug 1176681
Bug 1176698

The try run ( https://treeherder.mozilla.org/#/jobs?repo=try&revision=1864b8ee821d ) looks OK, except W(2) shows a jemalloc crash in the JS engine somewhere under /html/dom/dynamic-markup-insertion/document-write/ on different platforms (the exact test varies but seems to always involve a document that is fully document.written without having a network stream at all).

I can't reproduce this crash locally. Also, I can't figure out what in my changes to document.write could manage to corrupt the allocator state (by double-free???) to make it crash later in the JS engine. My changes to document.write are in 
nsHtml5Parser::Parse(const nsAString& aSourceBuffer,
                     void* aKey,
                     const nsACString& aContentType,
                     bool aLastCall,
                     nsDTDMode aMode) // ignored
and
nsHtml5Parser::ParseUntilBlocked()
in
https://hg.mozilla.org/try/diff/5e9ae64494be/parser/html/nsHtml5Parser.cpp and https://hg.mozilla.org/try/rev/d1fe8985b750 .

I'll rebase and re-push to try to see if the jemalloc crash is unrelated and goes away by rebasing.

I'll be away for three weeks and am running out of time here. I'll request review on my patches and leave it up to release management to decide where (if anywhere!) they'd like the patches to be landed by someone other than me while I'm away (if the patches pass review!).

I'm pretty worried about how risky this stuff is and, if there's still a bug in the new code that I have missed, it would be shame to end the HTML5 parser's clean record of no security bugs in the portable parser code reaching the release channel. This is the sort of change that would really benefit from riding the trains. However, if release management prefers to fix the OOM crashes at the risk of potententially introducing new array-access-out-of-bounds bugs, the patches are technically upliftable (if patches for *all* the bugs mentioned at the start of this comment are uplifted in the order they are listed in).

The uplift request would be along these lines:
> [Feature/regressing bug #]:

HTML5 parser (bug 373864). The parser was designed with the assumption that the plan for "infallible malloc" would be implemented in full. However, only the first basic part of the original "infallible malloc" plan was ever implemented. Specifically, the part where the allocator subsystem would stop all active parsers (called "sinks" in the plan) when approaching OOM was never implemented.

> [User impact if declined]:

Users of 32-bit builds in particular will experience OOM crashes when a site (or extension) tries to feed an excessively large text node or attribute *value* to the HTML parser. With the patches applied, the parser will stop without an OOM crash instead. (The above patch queue does *not* address excessively large tag or attribute *names*.)

> [Describe test coverage new/current, TreeHerder]:

Bug 1173818 introduced fatal assertions that, in debug builds, identify bugs in the rest of the patches immediately if *existing tests* happen to trigger *new bugs*. (They did during the development of these patches.) There are no new tests, however, so if the existing tests don't happen to trigger newly-introduced bugs, the newly introduced bugs would go unnoticed. It would be arduous to develop test cases that make buffer boundaries fall across particular markup features to exercise all relevant buffer boundary placements for sure.

OK try run on treeherder:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1864b8ee821d

> [Risks and why]:

The patch queue decouples buffer sizing from buffer filling: Certain buffers are sized to the worst case ahead of filling, and at the sizing time, there's an opportunity to discover OOM gracefully. At buffer filling time, bound checks happen only in debug builds as fatal assertions. The number of code changes needed to accomplish this is large. Any failure to compute the worst case correctly or any failure to request buffer resizing before buffer filling at all will result in memory-safety bugs (i.e. array access out of bounds).

Additionally, these changes may, in some circumstances, increase the total amount of memory used during a parse. Optimizing out (the most of) that badness is bug 561311, but that's delicate and risky enough that it makes no sense to try it before I go away for three weeks.
 
> [String/UUID change made/needed]:

None.
(In reply to Henri Sivonen (:hsivonen) from comment #36)
> I'll rebase and re-push to try to see if the jemalloc crash is unrelated and
> goes away by rebasing.

Post-rebase try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=48155bb0c83f
Looks like I was wrong in Comment 31,
as build from Comment 30 crashing with STR from Comment 21 even on Firefox with clean new fresh profile without any addons (extensions and plugins)
https://hg.mozilla.org/try/rev/6f60b8247e8e
https://crash-stats.mozilla.com/report/index/4f761369-d777-4d48-b65d-fc0762150624

Build from Comment 35 also crashing with STR from Comment 21 even on Firefox with clean new fresh profile without any addons (extensions and plugins)
https://hg.mozilla.org/try/rev/1864b8ee821d
https://crash-stats.mozilla.com/report/index/52f81a30-d6bc-496e-86a8-328ad2150624

Build from Comment 38 also crashing with STR from Comment 21 even on Firefox with clean new fresh profile without any addons (extensions and plugins)
https://hg.mozilla.org/try/rev/48155bb0c83f
https://crash-stats.mozilla.com/report/index/1682111e-13c7-4cd9-8fbf-403562150624


Maybe it's other bug with my STR from Comment 21 unrelated to this one?
Flags: needinfo?(bernesb)
Comment 21 feeds the parser an infinite text node. The patches should address that case as far as the parser is concerned. The browser could still OOM on a giant text node in non-parser code. Or there could be some parser code that I haven't thought about.
(In reply to Henri Sivonen (:hsivonen) from comment #36)
> The try run (
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=1864b8ee821d ) looks
> OK, except W(2) shows a jemalloc crash in the JS engine somewhere under
> /html/dom/dynamic-markup-insertion/document-write/ on different platforms
> (the exact test varies but seems to always involve a document that is fully
> document.written without having a network stream at all).
> 
> I can't reproduce this crash locally. Also, I can't figure out what in my
> changes to document.write could manage to corrupt the allocator state (by
> double-free???) to make it crash later in the JS engine.

Restoring bound checking in opt builds makes the crashes go away, which suggests that the patch for resizing the buffers ahead of time has a bug lurking somewhere. I have no idea why it's not caught by the fatal assertions in debug builds.

I don't have the time to investigate before I go away, so I attached a band-aid patch to bug 489820 in case release management wants to land this stuff as a matter of urgency while I'm away. Clearly, the patch set isn't up to proper quality level. However, considering that I won't be around to increase the quality level right away, I'm at least attaching something that should improve on the OOM issue without causing known new crashes, so that there's something that can be landed if the OOM is deemed really bad.

Still, comment 38 suggests that it's not clear whether the patch set is effective in the sense that the OOM might just be moving elsewhere.
Andrew, this is a P1 FF40 tracked bug that hasn't got much attention in the past few weeks. Henri seems away. Could you please help find an owner? Thanks.
Flags: needinfo?(overholt)
For what it's worth, I haven't been pushing on this bug because the fix seems rather risky, and I'd rather wait for Henri's expertise than ship a regression.
I spoke with Anthony about it yesterday, and we concur with David; it's definitely safer to wait for Henri's return.

Of course, if it's more common than we think or you disagree, Ritu, let me know and I'll see what we can do.
Flags: needinfo?(overholt)
The crash rate is in an acceptable range for 40. Based on that and comment 42 and comment 43, I'm marking 40 as wontfix.
https://crash-stats.mozilla.com/report/index/9fa0d721-f652-4256-9b11-bd2842150802

how a crash with only 69% total ram used?
(with FF3.6 on the same machine I was able to open 100 tabs, now 16 only?)
I attach the file created at the crash. What the minimun ram requested by FF39?
thanks.
Attached file memory-report.json.gz
crash with 820MB FF ram only. its strange.
(Needed) Addons:
Noscript
ADBPlus
WOT
> https://crash-stats.mozilla.com/report/index/9fa0d721-f652-4256-9b11-bd2842150802
> how a crash with only 69% total ram used?

There are several factors that combine to create a difficult situation:
- The machine has a 32-bit operating system, which limits Firefox to a 2GB pool of address space.
- Ad Block Plus has been known to use a large amount of memory. This problem will mostly go away in Firefox 41 with bug 77999.
- Large allocations require _contiguous_ free space. Over time, fragmentation can leave many small blocks of free space. It can be deceiving when it looks like the total available memory is high. In this case you had hundreds of available 1MB regions, but the parser code needed a 2MB region and couldn't find one. This bug will update the parser code to handle that situation better.
-But the problem arises with 800MB too.
With 800 or 1GB = crash, never seen FF working with 2GB.

- OK for ADB, a very important addon.

- The maximum fragmentation on my PC = 1%

-Thanks for good news with FF41, I appreciate.
2GB is the theoretical maximum but in practice we will fail a lot earlier than that.

I should clarify that I'm referring to address space fragmentation. That's different from disk fragmentation, etc. If you are curious, the Sysinternals VMMap tool has a nice little viewer to help visualize the address space of a process.
I warmly thank you.
Also ram problem with adb plus: resolved in fF41
this bug: resolved in FF43?

thanks.
Attached file memory-report.json.gz
Fixed by the landing of the dependencies.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [MemShrink:P1] → [MemShrink:P1][fixed by dependencies]
Target Milestone: --- → mozilla43
Thanks Henri! Nightly is looking good so far.

Given the volume of changes here, I propose that this ride the trains normally. This crash is currently not in an urgently-high position on beta.
I trust David's call on this one. FF41 status is wontfix based on comment 58.
Can someone explain me exactly what is causing my crashes?
https://crash-stats.mozilla.com/report/index/22772447-2371-43cd-9eb9-e61202151104
thanks.
Its due to facebook?
Attached file memory-report.json.gz
(In reply to Yorgos from comment #60)
> Can someone explain me exactly what is causing my crashes?
> https://crash-stats.mozilla.com/report/index/22772447-2371-43cd-9eb9-e61202151104 thanks.

This should be fixed in Firefox 43 (shipping in Beta tomorrow). You could try getting the first Firefox 43 Beta build tomorrow or wait until Firefox 43 is released in 6 weeks.
thanks a lot, much appreciated.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #62)
> (In reply to Yorgos from comment #60)
> > Can someone explain me exactly what is causing my crashes?
> > https://crash-stats.mozilla.com/report/index/22772447-2371-43cd-9eb9-e61202151104 thanks.
> 
> This should be fixed in Firefox 43 (shipping in Beta tomorrow). You could
> try getting the first Firefox 43 Beta build tomorrow or wait until Firefox
> 43 is released in 6 weeks.

I dont notice big amelioraments using FF43, for example in watching and scrolling this long album, at the moment no crash but so may lag at 800MB ram. strange.
https://www.facebook.com/elisadifrancisca/photos_stream
(In reply to Anthony Hughes, QA Mentor (:ashughes) [away until Jan 4, 2016] from comment #62)
> (In reply to Yorgos from comment #60)
> > Can someone explain me exactly what is causing my crashes?
> > https://crash-stats.mozilla.com/report/index/22772447-2371-43cd-9eb9-e61202151104 thanks.
> 
> This should be fixed in Firefox 43 (shipping in Beta tomorrow). You could
> try getting the first Firefox 43 Beta build tomorrow or wait until Firefox
> 43 is released in 6 weeks.

http://postimg.org/image/8grykxfe9/
Attached file memory-report.json.gz
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: