Reporter

Description

•

6 years ago

This bug was filed from the Socorro interface and is
report bp-f60bc4b9-4083-4d38-89e9-246e80190110.

Top 10 frames of crashing thread:

0 xul.dll xul.dll@0x14716bc
1 xul.dll xul.dll@0x5d19f0
2 xul.dll xul.dll@0x5d003d
3 xul.dll xul.dll@0x5cf7a8
4 xul.dll xul.dll@0x3ae973
5 xul.dll xul.dll@0x5b54a4
6 xul.dll xul.dll@0x17a5d3e
7 xul.dll xul.dll@0x1523a59
8 xul.dll xul.dll@0x1522f67
9 xul.dll xul.dll@0x3c52caa

=============================================================

there's an increase in OOM | small crashes in the browser process from 32bit installations of firefox on the beta channel since 2019-01-08.

the spike is made up of reports that contain unsymbolized frames of xul.dll in their stack trace - other than that i didn't spot any obvious other correlations (happens across multiple versions of windows; no particular addons, locales or urls show up in the reports) - this query should catch most of the problem:
https://bit.ly/2Fn6029

as far as i can tell, it's only the beta channel that's affected by this spike - it started a few days after 65.0b8 got released to the beta audience but that version is also heavily affected still so i'd start from the assumption it was some sort of external change on affected systems causing the increase in crashes.

[:philipp]

Reporter

Updated

•

6 years ago

Summary: Crash in OOM | small → [@ OOM | small] browser crash spike hitting 32bit installations on beta channel since 2019-01-10

Andrew Overholt [:overholt]

Comment 1

•

6 years ago

It looks like about a five-fold increase which is potentially concerning - thanks for flagging, Philipp! Do we know of anything we turned on in that build that could lead to an increase in OOMs from small allocations?

Flags: needinfo?(erahm)

Eric Rahm [:erahm]

Comment 2

•

6 years ago

This is one of those things that's hard to track down. The crashing point for OOM small is usually the fallout of something else making large allocations. That said we should probably loop a soccoro person in to figure out why we're getting trashed stacks.

I took a look at the memory report from bp-245fdb4-073b-47ca-841d-0d9e40190110, it had a ton of heap-unclassifed and was on it's way to VM exhaustion:

2,604.36 MB (100.0%) -- explicit
├──2,452.58 MB (94.17%) ── heap-unclassified

4,095.94 MB (100.0%) -- address-space
├──3,493.80 MB (85.30%) -- commit
│  ├──2,638.82 MB (64.43%) -- private
│  │  ├──2,633.57 MB (64.30%) ── readwrite(segments=3624)
│  │  └──────5.25 MB (00.13%) -- (5 tiny)
│  │         ├──3.39 MB (00.08%) ── readwrite+stack(segments=144)
│  │         ├──1.47 MB (00.04%) ── readwrite+guard(segments=144)
│  │         ├──0.38 MB (00.01%) ── execute-read(segments=5)
│  │         ├──0.01 MB (00.00%) ── readonly(segments=2)
│  │         └──0.00 MB (00.00%) ── noaccess(segments=1)
│  ├────629.44 MB (15.37%) -- mapped
│  │    ├──598.86 MB (14.62%) ── readonly(segments=1577)
│  │    └───30.58 MB (00.75%) -- (2 tiny)
│  │        ├──30.31 MB (00.74%) ── readwrite(segments=123)
│  │        └───0.27 MB (00.01%) ── writecopy(segments=1)
│  └────225.55 MB (05.51%) -- image
│       ├──161.08 MB (03.93%) ── execute-read(segments=208)
│       ├───61.42 MB (01.50%) ── readonly(segments=440)
│       └────3.05 MB (00.07%) -- (2 tiny)
│            ├──1.80 MB (00.04%) ── readwrite(segments=225)
│            └──1.25 MB (00.03%) ── writecopy(segments=57)
├────354.11 MB (08.65%) -- reserved
│    ├──325.72 MB (07.95%) ── private(segments=1143)
│    └───28.39 MB (00.69%) ++ (2 tiny)
└────248.03 MB (06.06%) ── free(segments=1899)

2,696.71 MB ── resident-unique
3,847.91 MB ── vsize
   99.44 MB ── vsize-max-contiguous

Also I saw an RDD process in there, it looks like that was uplifted in b5 though so maybe not related.

RyanVM, is there some way we can get a list of uplifted bugs for 65.0b8 (particularly for windows)?

Flags: needinfo?(erahm) → needinfo?(ryanvm)

Ryan VanderMeulen [:RyanVM]

Comment 3

•

6 years ago

https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_65_0b7_RELEASE&tochange=FIREFOX_65_0b8_RELEASE

Flags: needinfo?(ryanvm)

Andrew Overholt [:overholt]

Comment 4

•

6 years ago

Lots of potentially interesting patches in there. New expat?

Jared Wein [:jaws] (please needinfo? me)

Updated

•

6 years ago

status-firefox65: affected → fix-optional

Neha Kochar [:neha]

Comment 5

•

6 years ago

Adding ni for Peter to look into the possibility of expat upgrade (bug 1374012) being a cause.

Flags: needinfo?(peterv)

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

6 years ago

tracking-firefox66: --- → +

Andrew Overholt [:overholt]

Comment 6

•

6 years ago

Eric, would you mind taking a look at the list of pushes in comment 3 and see if anything sticks out?

A crazy idea tossed out there was to run a Shield study with 2 different versions of expat to see if the cause here was indeed expat. I know, it's a stretch for the new expat version to even be the cause. Of course, I can't see how we'd get that done super soon so that leads us to the bigger question of whether we want to hold 65 to figure this out.

Interestingly, Philipp said the overall number of crashes didn't seem to change, just that the proportion of them that are OOM|small crashes increased. I'm not sure what that means :)

Flags: needinfo?(erahm)

Neha Kochar [:neha]

Comment 7

•

6 years ago

The expat upgrade fix (bug 1374012) went into beta on Dec 31 and https://calendar.google.com/calendar/embed?src=bW96aWxsYS5jb21fZGJxODRhbnI5aTh0Y25taGFiYXRzdHY1Y29AZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ&pli=1 shows Beta 8 build on Jan 3 but the spike in OOM | small crashes have been reported to increase after beta 9 release on Jan 8 so it doesn't look like expat upgrade can be a cause of this. We should probably be looking at the patches that are new in beta 9.

Flags: needinfo?(peterv)

[:philipp]

Reporter

Comment 8

•

6 years ago

when the crashes are broken down into versions, b8 was already somewhat affected:
https://screenshots.firefox.com/BXBhm9g6OvlEMdLp/crash-stats.mozilla.com

Eric Rahm [:erahm]

Comment 9

•

6 years ago

Neha's analysis looks correct, the spike is in b9. Lets look at the patches in that:

Bug 1516426 - this is a null check, probably fine
Bug 1510204, bug 1516289 - this looks okay, but it's large enough we might want emilio to confirm
Bug 1515793 is WR related, so probably not it
Bug 1464003 is Android
Bug 1515658 seems okay, although it's assigning a uint64_t into an int64_t. I'm pretty sure this is okay in this case and worst case scenario we just end up recalculating the size which is the old behavior. Maybe ping baku to see if CalculateStreamLength has side-effects? It's marked const so in theory it shouldn't.
Bug 1515463 looks okay
Bug 1517275 is android
Bug 1517221 is fixing a stack overflow by releasing memory earlier, so probably not but maybe ping olli to see if this could cause CC related weirdness
Bug 1513304 is a null pointer check, previously we would just crash but now we bail. It's possible that exposed a leak but given the prevalence of crashes (2 per day maybe) it doesn't seem to correlate
Bug 1517710 seems okay, in theory it was linux only (?). It did remove taking a strong ref to mGraph which seems it could be an issue by probably not related to leaks. maybe ping :pehrsons (he had a few changes in this list)
Bug 1513973 this is another :pehrsons change, there are enough patches that it might be worth looking at
Bug 1516738 this is just a codegen change
Bug 1515873 :pehrsons again, but if anything it looks like it's avoiding allocating
Bug 1514192 this is VR, probably not related
Bug 1513232 this is clamping viewport sizes, probably not related

:pehrsons your patches show up the most in this beta release, can you double check to see if it's possible they'd cause an uptick in OOMs?

Olli, is there any chance bug 1517221 might be involved?

Flags: needinfo?(erahm)

Flags: needinfo?(bugs)

Flags: needinfo?(apehrson)

Andreas Pehrson [:pehrsons]

Comment 10

•

6 years ago

I don't expect any of my fixes to cause any large allocations.

That said, bug 1517711 is a followup to bug 1517710 that fixed a remaining nullptr issue but only landed in b10. If we're back to normal in b10, bug 1517711 could be the explanation.

Both those bugs fixed regressions from bug 1513973 which is in the b9 list in comment 9.

Flags: needinfo?(apehrson)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 11

•

6 years ago

Unlikely bug 1517221 to be involved. That issue has been seen only on one intranet site basically, and the bug fixed it. We end up releasing memory sooner with the fix.

Flags: needinfo?(bugs)

Jim Mathies [:jimm]

Updated

•

6 years ago

Priority: -- → P3

[:philipp]

Reporter

Comment 12

•

6 years ago

the particular crash pattern described in comment #0 has receded again. since there isn't much tangible information to go on for a potential fix, i'll just close the bug...

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → INCOMPLETE

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox65: fix-optional → wontfix

status-firefox66: ? → wontfix

Bugzilla

[@ OOM | small] browser crash spike hitting 32bit installations on beta channel since 2019-01-10

Categories

(Core :: General, defect, P3)

Tracking

()

People

(Reporter: philipp, Unassigned)

References

Details

(Keywords: crash, regression, Whiteboard: [MemShrink] )

Crash Data

Security

(public)

User Story

Description

This bug was filed from the Socorro interface and is
report bp-f60bc4b9-4083-4d38-89e9-246e80190110.

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Updated

[@ OOM | small] browser crash spike hitting 32bit installations on beta channel since 2019-01-10

This bug was filed from the Socorro interface and is report bp-f60bc4b9-4083-4d38-89e9-246e80190110.

This bug was filed from the Socorro interface and is
report bp-f60bc4b9-4083-4d38-89e9-246e80190110.