increase in crashes with EMPTY dumps in Firefox 19 and 20 cycles

RESOLVED INCOMPLETE

Status

()

Core
General
--
critical
RESOLVED INCOMPLETE
5 years ago
4 years ago

People

(Reporter: Robert Kaiser, Assigned: Benjamin Smedberg)

Tracking

(Depends on: 4 bugs, {crash})

Trunk
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox19 wontfix, firefox20 wontfix, firefox21- affected, firefox22- affected)

Details

(Whiteboard: [native-crash], crash signature)

Attachments

(13 attachments, 1 obsolete attachment)

58.12 KB, image/png
Details
12.02 KB, text/plain
Details
25.00 KB, image/png
Details
38.55 KB, image/png
Details
6.57 KB, text/plain
Details
43.72 KB, image/png
Details
10.45 KB, text/plain
Details
44.97 KB, image/png
Details
10.45 KB, text/plain
Details
2.79 KB, patch
ted
: review+
Details | Diff | Splinter Review
287.14 KB, text/plain
Details
14.82 KB, application/x-zip-compressed
Details
33.50 KB, text/x-log
Details
(Reporter)

Description

5 years ago
This bug was filed from the Socorro interface and is 
report bp-6f488a29-cfdd-4dca-8430-68b2b2130203 .
============================================================= 

I'm filing this in Core::General for now, as we as of yet have no clue what could be behind those crashes. This also may be related to bug 830808.

Over the last versions of Firefox, crashes with the "EMPTY: no crashing thread identified; corrupt dump" signature have increased. They usually were at 4-5% of all crashes, now they're at 10-20% depending on channel, on ESR even at 45%.

We need to investigate if we can find when the volume increased on different channels and investigate if we can find a potential cause. Also, analyzing the metadata we have could give us some insight (annotations are there, thing like OS or modules not as those live in the dump).
Depends on: 838061
STR added in Bug 830808
(Reporter)

Comment 2

5 years ago
Created attachment 711629 [details]
Crash numbers per day on Nightly channel

This graph shows the numbers of EMPTY crashes from all builds on the nightly channel by crash day (I don't have access to the by-build-day stuff right now) since Jan 1, 2012.

Unfortunately, it doesn't really paint a bull's eye on anything as there's an up and down here. There was definitely a regression in late May and one in September/October, and both were fixed again as well, but it hard to make out why current numbers are between 100 and 200 per day when they were between 50 and 100 in the first few months of 2012.

I'll attach a text file with the queries and raw data.
(Reporter)

Comment 3

5 years ago
Created attachment 711630 [details]
Queries and raw data for "Crash numbers per day on Nightly channel"

Comment 4

5 years ago
(In reply to MarioMi (:MarioMi) from comment #1)
> STR added in Bug 830808
Can you try your STR in a debugger and file a new bug with the crash signature and mark it as dependent of this one?
(Assignee)

Updated

5 years ago
Depends on: 840632

Updated

5 years ago
Flags: needinfo?(mariomihai22)
Sorry for delay Scobbi, I will try tomorrow morning and get back with results.
Flags: needinfo?(mariomihai22)
(In reply to Scoobidiver from comment #4)
> (In reply to MarioMi (:MarioMi) from comment #1)
> > STR added in Bug 830808
> Can you try your STR in a debugger and file a new bug with the crash
> signature and mark it as dependent of this one?

I tried my STR from Bug 830808 Comment 9 in a debugger but nothing weird had happened. I only got one Error in Error Console: " Permission denied to access property 'toString'". I have done theese investigations on Nightly (2013-02-04)

Comment 7

5 years ago
It also spiked in absolute values for Fennec:
* 17.0 (latest week):   2.8%   0.04 crashes/100 ADU
* 18.0.2 (latest week): 6.8%   0.11 crashes/100 ADU
* 19.0 (current week):  4.8%   0.14 crashes/100 ADU
Whiteboard: [native-crash]

Comment 8

5 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #2)
> I don't have access to the by-build-day stuff right now
Do you have access now? It's interesting information to find regressions.

Updated

5 years ago
Flags: needinfo?(kairo)
(Reporter)

Comment 9

5 years ago
I don't.
Flags: needinfo?(kairo)

Comment 10

5 years ago
Created attachment 722130 [details]
Crash numbers per buildday on Android Nightly channel

Updated

5 years ago
Attachment #722130 - Attachment is obsolete: true

Comment 11

5 years ago
Created attachment 722266 [details]
Crash numbers per buildday on Android Nightly channel

Comment 12

5 years ago
The Android Nightly channel chart is misleading because it doesn't take into account fixes in Aurora and Beta channels.
The increase between 19 and 20 is low:
* 19.0 Beta (last week):     5.9%   0.12 crashes/100 ADU
* 20.0 Beta (current week):  5.5%   0.15 crashes/100 ADU

Comment 13

5 years ago
My Firefox (22.0a1 (2013-03-11)) crash every mornings the last 3 days with empty data:
https://crash-stats.mozilla.com/report/index/bp-7d532715-cd12-4587-9326-3ebd92130312

Comment 14

5 years ago
(In reply to Henrik Gemal from comment #13)
> My Firefox (22.0a1 (2013-03-11)) crash every mornings the last 3 days with
> empty data:
> https://crash-stats.mozilla.com/report/index/bp-7d532715-cd12-4587-9326-
> 3ebd92130312
If it's a recent issue in Nightly, it's not this bug which is about a slight increase between two Release versions.
Please file a new bug after getting a valid stack trace (see https://developer.mozilla.org/docs/How_to_get_a_stacktrace_with_WinDbg). Try in Safe Mode before to find a faulty extension (see https://support.mozilla.org/kb/troubleshoot-firefox-issues-using-safe-mode).
The only potential smoking gun I see in Henrik's crash report is: "IsGarbageCollecting": "1"

Henrik: if you can catch your crash in a debugger and get a stack (using the link Scoobidiver gave above) that would be really helpful.
(Reporter)

Comment 16

5 years ago
Created attachment 725214 [details]
Crash numbers per build date on Nightly channel

I now have access to the by-build-date numbers from Socorro, unfortunately, those aren't available as far back as I'd like as that feature only came around in August. In the available range, we don't really see the regression nicely. :(
(Reporter)

Comment 17

5 years ago
Created attachment 725216 [details]
Queries and raw data for "Crash numbers per build date on Nightly channel"
(Reporter)

Comment 18

5 years ago
Created attachment 725220 [details]
Crash numbers per day on Beta channel

So, Nightly and Aurora don't show the regression as nicely apparently. Beta and Release do, so I'm attaching data/graphs for "all of the Beta channel per crash date" and "all of the Release channel per crash date" as well.
(Reporter)

Comment 19

5 years ago
Created attachment 725221 [details]
Queries and raw data for "Crash numbers per day on Beta channel"
(Reporter)

Comment 20

5 years ago
Created attachment 725222 [details]
Crash numbers per day on Release channel
(Reporter)

Comment 21

5 years ago
Created attachment 725224 [details]
Queries and raw data for "Crash numbers per day on Release channel"
(Reporter)

Comment 22

5 years ago
And actually, both Beta and Release data point to a definite increase of EMPTY crashes in the 19 cycle (went to Beta in the second week of January 2013, released on February 19).
(Reporter)

Comment 23

5 years ago
Oh, and 20 on Beta seems to be even worse. As Nightly 20 started at Nov 19 and it looks like there was an external issue that made us spike mostly across channels from the start of September to the end of October, and after that, we have a few Nightly values down to the level predating that, I think we need to investigate the Nightly time frame between Nov 1 and Nov 20 for the 19 regression, from the attachment 711629 [details] graph we can actually narrow down to Nov 5 and Aurora uplift of 19.
For the additional 20 regression, the same Nightly graph makes me suspect somewhere between Nov 25 and Dec 10.
(Reporter)

Comment 24

5 years ago
Oh, and from those ranges, we can try to narrow down further by using the by-build-date attachment 725214 [details] but I'm too tired to do that today.

Comment 25

5 years ago
It's odd that MemShrink reduces the memory usage while these crashes likely OOM (e.g. bug 834667 was the cause of a spike in 17.0.2esr) increase.
(Reporter)

Comment 26

5 years ago
Scoobidiver, what has MemShrink to do with the bug here?

Comment 27

5 years ago
Is this memory spike due to ongoing effort of Paris Bindings?

Comment 28

5 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #26)
> Scoobidiver, what has MemShrink to do with the bug here?
OOM = Out-of-memory.
(Reporter)

Comment 29

5 years ago
Scoobidiver, I know what OOM is, and that's not what I asked about.
It's not even clear that this increase in bugs with empty dumps is an increase of OOM crashes, as there are clearly cases where we are not OOM where we hit empty dumps (even if we don't really know how that happens). And There is absolutely no correlation with MemShrink from what I can see, unless some very significant MemShrink work landed in the ranges I found in comment #23 and you can somehow correlate that work with higher likeliness of OOM (though I'd suspect the reverse) or if you can otherwise paint a clear picture of how those would relate to those empty dump crash increases.


henryfhchan:
Which memory spike? This is about crashes with empty dumps, not about memory per se. It's unclear if there is any relation to memory at all.

Comment 30

5 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #29)
> It's not even clear that this increase in bugs with empty dumps is an
> increase of OOM crashes, as there are clearly cases where we are not OOM
> where we hit empty dumps (even if we don't really know how that happens).
It's only a theory and I though the MemShrink team should be aware of this issue in case it rings them a bell. They can also find new variables to monitor in Telemetry based on that.
You can know the ratio of real OOMs with the OOMAllocationSize field in crash headers.
As Kairo said in channel meeting it's probably too late here for FF20 cycle given how late we are into it but perhaps we want to keep this on our radar for FF21.
tracking-firefox21: --- → ?

Updated

5 years ago
status-firefox19: --- → wontfix
status-firefox20: --- → wontfix
status-firefox21: --- → affected
tracking-firefox21: ? → +
(Reporter)

Comment 32

5 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #23)
> the 19 regression, from the attachment 711629 [details] graph we can
> actually narrow down to Nov 5 and Aurora uplift of 19.

Given attachment 725214 [details] I would look at the start of that period, even at what landed for the Nightly build of Nov 4, as that looks high already there.


> For the additional 20 regression, the same Nightly graph makes me suspect
> somewhere between Nov 25 and Dec 10.

While the Nov 27 build have something, it looks more likely to be between Dec 2 and 9 in builds, I'd actually look at what landed for the Dec 2 one first.
(Reporter)

Comment 33

5 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #32)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #23)
> > the 19 regression, from the attachment 711629 [details] graph we can
> > actually narrow down to Nov 5 and Aurora uplift of 19.
> 
> Given attachment 725214 [details] I would look at the start of that period,
> even at what landed for the Nightly build of Nov 4, as that looks high
> already there.

Not that http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2012-11-03+00%3A00%3A00&enddate=2012-11-06+04%3A00%3A00 would point to anything particularly bad/big, though.

> > For the additional 20 regression, the same Nightly graph makes me suspect
> > somewhere between Nov 25 and Dec 10.
> 
> While the Nov 27 build have something, it looks more likely to be between
> Dec 2 and 9 in builds, I'd actually look at what landed for the Dec 2 one
> first.

http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2012-12-01+00%3A00%3A00&enddate=2012-12-02+04%3A00%3A00 already has quite a list of checkins, an NSS update, CC cleanup, and the following days all kinds of additional fun, like JIT fixes, and a ton of other things. Hard to point to anything specific there.
Assigning to bsmedberg as engineering POC given his thoughts on a path forward:

"Short-term, there are some things we could try:

* reduce the minidump data request even further: don't ask for the stack memory, just ask for a list of loaded modules and memory mappings.
* annotate just the crash reason and address separately from the minidump"

Firefox 21 would be the first version where we'd make the crash changes, since we're only a week from FF20's release.
Assignee: nobody → benjamin
status-firefox22: --- → affected
tracking-firefox22: --- → +
(Reporter)

Comment 35

5 years ago
Ted commented on bug 724046, which might help us get better data from those crashes.

Comment 36

5 years ago
Hi. I don't know if what I have to say is new, of has some value.
I have a lot of this crashes. I would say more than 50% of my crashes have empty dumps. BUT my Firefox rarely crashes like 'BOOM' (when everything closes and I get that window to send the crash report). This 'BOOM' type of crash never has empty dump (I think).
I don't know when I get this empty dumps crashes. Sometimes I just type 'about:crashes' and I see that there's a crash that I didn't submit (and don't remeber of having a 'BOOM' crash). Then I try to submit it, and it's a empty one.

Comment 37

5 years ago
(In reply to Guilherme Lima from comment #36)
> I have a lot of this crashes. I would say more than 50% of my crashes have
> empty dumps. BUT my Firefox rarely crashes like 'BOOM' (when everything
> closes and I get that window to send the crash report). This 'BOOM' type of
> crash never has empty dump (I think).
They are probably Flash crashes or hangs. 
https://crash-analysis.mozilla.com/bsmedberg/flash-summary.html shows an increase of crashes and hangs by about 15% between November 2012 (Flash 11.4.402.287, Firefox 16) and January 2013 (Flash 11.5.502.135, Firefox 18). That might explain a part of the empty dump increase.

Comment 38

5 years ago
The hangs that don't automatically submit are usually hangs with the plugins. They are denoted with bp-hr- and always return a 404 when I click them. Therefore i doubt that these are the cause of the increase in the empty dump number.

FYI, the crashes I have that are empty do not seem to be caused by flash (e.g. Clicking on non-flash websites)

Comment 39

5 years ago
get a stacktrace with windbg (comment 14), and you may have something useful.
(Reporter)

Comment 40

5 years ago
(In reply to Scoobidiver from comment #37)
> They are probably Flash crashes or hangs. 
> [...]
> That might explain a part of the empty dump increase.

No, I heavily disagree. My analysis in here cleanly demonstrates that there were two regression in our code, in the 19 and 20 cycles. And those "empty dump" crashes are actually not plugin crashes. Flash crashes and hangs regressed, but that's something completely different, and even in different time periods.

Comment 41

5 years ago
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #33)
> Not that
> http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2012-11-
> 03+00%3A00%3A00&enddate=2012-11-06+04%3A00%3A00 would point to anything
> particularly bad/big, though.
I suspect bug 778993 in that range.
(In reply to Scoobidiver from comment #41)
> I suspect bug 778993 in that range.

As far as I know, that code isn't present on any branch. It was backed out everywhere.
Benjamin asked a few questions in email:

"Have you looked through the URLs and comments to see if there is anything interesting? Especially limited to nightly and aurora, where the comments are often much more helpful. Do these reports typically have a list of extensions or not? If so have we run extension correlation reports?"
Flags: needinfo?(kairo)
(Reporter)

Comment 44

5 years ago
Top URLs:

3963 	about:blank
3330 	https://www.facebook.com/
3127 	http://www.facebook.com/
899 	about:sessionrestore
885 	http://www.tumblr.com/dashboard
728 	about:home
549 	http://assets.tumblr.com/analytics.html?748b075014045cae7cd6ac4429aded74
348 	http://www.facebook.com/?ref=tn_tnmn
273 	http://vk.com/feed
253 	https://mail.google.com/mail/u/0/?shva=1#inbox
242 	about:newtab
215 	https://www.facebook.com/?ref=tn_tnmn
193 	http://vk.com/audio
182 	https://twitter.com/
142 	http://www.facebook.com/?ref=logo
138 	https://www.facebook.com/dialog/oauth?client_id=130402594779&response_type=token%2Csigned_request%2Ccode&display=none&domain=www.kingdomsofcamelot.com&origin=5&redirect_uri=https%3A%2F%2Fs-static.ak.facebook.com%2Fconnect%2Fxd_arbiter.php%3Fversion%3D19%2
137 	https://mail.google.com/mail/?shva=1#inbox
119 	https://www.facebook.com/?ref=logo
105 	http://www.facebook.com/home.php
104 	https://www.facebook.com/login.php?login_attempt=1
104 	https://www.google.com/

Comments are not painting anything near to clear picture, most users are confused why it crashes, many complain that they are seeing a lot of crashes.

That's for the general population of all versions.


For Nightly, a lot of comments talk about OOM, some about loading PDFs or image-heavy pages. See comments tab on https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2013-03-28&signature=EMPTY%3A%20no%20crashing%20thread%20identified%3B%20corrupt%20dump&version=Firefox%3A22.0a1

Top URLs for Nightly:
80 	about:blank
34 	http://www.tumblr.com/dashboard
31 	http://www.facebook.com/
26 	https://www.facebook.com/
24 	http://www.songbanc.com/photos
11 	http://planet.mozilla.org/
9 	about:newtab
8 	http://www.icefilms.info/
8 	http://www.dpreview.com/
8 	http://movies.netflix.com/WiHome


We don't have many reports with modules, apparently, as correlation reports are only available for beta and release, and it lists very few reports that data is being taken from:

2013-03-25_Firefox_20.0-interesting-modules:
  EMPTY: no crashing thread identified|EXCEPTION_ACCESS_VIOLATION_READ (22 crashes)
     91% (20/22) vs.   6% (2271/40644) credssp.dll
     95% (21/22) vs.  14% (5550/40644) schannel.dll
     82% (18/22) vs.   5% (2026/40644) FlashPlayerPlugin_11_6_602_180.exe
     95% (21/22) vs.  32% (13001/40644) Wldap32.dll
    100% (22/22) vs.  44% (17827/40644) mpr.dll
    100% (22/22) vs.  44% (17937/40644) sspicli.dll
    100% (22/22) vs.  44% (18054/40644) ntmarta.dll
    100% (22/22) vs.  49% (20002/40644) comdlg32.dll
    100% (22/22) vs.  54% (22043/40644) profapi.dll
    100% (22/22) vs.  57% (23046/40644) sechost.dll
    100% (22/22) vs.  57% (23047/40644) CRYPTBASE.dll
    100% (22/22) vs.  57% (23047/40644) KERNELBASE.dll
     45% (10/22) vs.   9% (3695/40644) BrowserProtect.dll
    100% (22/22) vs.  64% (25884/40644) secur32.dll
     91% (20/22) vs.  56% (22759/40644) apphelp.dll
     95% (21/22) vs.  61% (24949/40644) dwmapi.dll
    100% (22/22) vs.  67% (27186/40644) msctf.dll
    100% (22/22) vs.  71% (28811/40644) winspool.drv
     95% (21/22) vs.  69% (28054/40644) lpk.dll
    100% (22/22) vs.  78% (31799/40644) iertutil.dll
    100% (22/22) vs.  82% (33226/40644) urlmon.dll
     36% (8/22) vs.  19% (7712/40644) snxhk.dll
     82% (18/22) vs.  68% (27482/40644) normaliz.dll
     18% (4/22) vs.   6% (2489/40644) api-ms-win-downlevel-ole32-l1-1-0.dll
     18% (4/22) vs.   7% (2651/40644) api-ms-win-downlevel-normaliz-l1-1-0.dll
     18% (4/22) vs.   7% (2651/40644) api-ms-win-downlevel-version-l1-1-0.dll
     18% (4/22) vs.   7% (2651/40644) api-ms-win-downlevel-user32-l1-1-0.dll
     18% (4/22) vs.   7% (2654/40644) api-ms-win-downlevel-shlwapi-l1-1-0.dll
     18% (4/22) vs.   7% (2655/40644) api-ms-win-downlevel-advapi32-l1-1-0.dll
      9% (2/22) vs.   0% (29/40644) FlashPlayerPlugin_11_5_502_135.exe
    100% (22/22) vs.  92% (37235/40644) wininet.dll
    109% (24/22) vs. 101% (40993/40644) comctl32.dll
      9% (2/22) vs.   2% (872/40644) RocketDock.dll

2013-03-25_Firefox_19.0.2-interesting-modules.txt.gz
  EMPTY: no crashing thread identified|EXCEPTION_ACCESS_VIOLATION_READ (66 crashes)
     98% (65/66) vs.  14% (21149/147984) schannel.dll
     89% (59/66) vs.   5% (8107/147984) credssp.dll
     80% (53/66) vs.   4% (6515/147984) FlashPlayerPlugin_11_6_602_180.exe
     95% (63/66) vs.  36% (52664/147984) mpr.dll
     91% (60/66) vs.  33% (48215/147984) Wldap32.dll
     95% (63/66) vs.  41% (61086/147984) ntmarta.dll
    100% (66/66) vs.  57% (83692/147984) comdlg32.dll
     86% (57/66) vs.  47% (69849/147984) sspicli.dll
    100% (66/66) vs.  62% (91926/147984) secur32.dll
     86% (57/66) vs.  56% (83287/147984) profapi.dll
     91% (60/66) vs.  61% (90964/147984) apphelp.dll
     86% (57/66) vs.  59% (86759/147984) CRYPTBASE.dll
     86% (57/66) vs.  59% (86759/147984) sechost.dll
     86% (57/66) vs.  59% (86759/147984) KERNELBASE.dll
    100% (66/66) vs.  74% (109111/147984) winspool.drv
     98% (65/66) vs.  77% (113648/147984) msctf.dll
     91% (60/66) vs.  71% (105686/147984) lpk.dll
    118% (78/66) vs. 101% (149611/147984) comctl32.dll
    100% (66/66) vs.  85% (126405/147984) urlmon.dll
     21% (14/66) vs.   7% (10166/147984) api-ms-win-downlevel-ole32-l1-1-0.dll
     21% (14/66) vs.   7% (10543/147984) api-ms-win-downlevel-normaliz-l1-1-0.dll
     21% (14/66) vs.   7% (10543/147984) api-ms-win-downlevel-version-l1-1-0.dll
     21% (14/66) vs.   7% (10543/147984) api-ms-win-downlevel-user32-l1-1-0.dll
     21% (14/66) vs.   7% (10549/147984) api-ms-win-downlevel-shlwapi-l1-1-0.dll
     21% (14/66) vs.   7% (10550/147984) api-ms-win-downlevel-advapi32-l1-1-0.dll
     98% (65/66) vs.  85% (125868/147984) iertutil.dll
     20% (13/66) vs.   7% (10911/147984) BrowserProtect.dll
     80% (53/66) vs.  68% (100908/147984) dwmapi.dll
     83% (55/66) vs.  75% (110959/147984) normaliz.dll
     12% (8/66) vs.   4% (6292/147984) sahook.dll
    100% (66/66) vs.  93% (138136/147984) wininet.dll


I also did gather different installations affected:

breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT client_crash_date - install_age  * interval '1 second') as installations FROM reports WHERE product='Firefox' AND signature='EMPTY: no crashing thread identified; corrupt dump' AND utc_day_is(date_processed, '2013-03-27') GROUP BY version;
  version   | crashes | installations 
------------+---------+---------------
[...]
 18.0       |     323 |           206
 18.0.1     |     208 |           184
 18.0.2     |     248 |           230
 18.0a1     |       3 |             3
 18.0a2     |       4 |             4
 19.0       |     733 |           566
 19.0.1     |      17 |            16
 19.0.2     |   20021 |         18061
 19.0a1     |       4 |             2
 19.0a2     |       5 |             5
 20.0       |    7198 |          5271
 20.0a1     |       2 |             2
 20.0a2     |      23 |            13
 21.0a1     |       9 |             7
 21.0a2     |     504 |           388
 22.0a1     |    1256 |           826
[...]
Flags: needinfo?(kairo)
This isn't surprising, modules come from the minidump, and these reports have empty minidumps.
(Assignee)

Updated

5 years ago
Depends on: 829954

Comment 46

5 years ago
What do I do now if I have a 40 MB log from WinDbg?
(Assignee)

Comment 47

5 years ago
henryfhchan, please put it up on dropbox or google drive, so I can read it?
(Assignee)

Updated

5 years ago
Depends on: 859955
(Assignee)

Comment 48

5 years ago
Created attachment 736953 [details] [diff] [review]
Reserve VM space for breakpad, rev. 1
Attachment #736953 - Flags: review?(ted)
Comment on attachment 736953 [details] [diff] [review]
Reserve VM space for breakpad, rev. 1

Review of attachment 736953 [details] [diff] [review]:
-----------------------------------------------------------------

Have you tested that this is sufficient to fix the issue when we run out of VM space? I assume it's not terribly hard to write something to exhaust VM.

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +734,5 @@
> +
> +/**
> + * Reserve some VM space. In the event that we crash because VM space is
> + * being leaked without leaking memory, freeing this space before taking
> + * the minidump will allow us to collect a minidump.

So we don't expect this to help in real OOM, right? Just out-of-VM-space?
Attachment #736953 - Flags: review?(ted) → review+
(Assignee)

Comment 50

5 years ago
Correct, I don't want to commit actual memory because 12MB seems like a lot, and if we're running out of actual memory we can know if from the crash metadata. This is only going to help the cases where we're running out of VM space.
(Assignee)

Comment 51

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/ef802a6418f2
Whiteboard: [native-crash] → [native-crash][leave open]
(Reporter)

Comment 52

5 years ago
I guess that means that this patch is different from what bug 724046 would be targeting for?
(Assignee)

Comment 53

5 years ago
Kinda, yes. Although I'd say that bug may be WONTFIX if this one shows us that many/most of the existing EMPTY DUMP crashes are in fact the VM-exhaustion thing I'm seeing.
https://hg.mozilla.org/mozilla-central/rev/ef802a6418f2

Comment 55

5 years ago
I don't see any improvement compared to previous Nightly builds at the same time of the day, about 25 crashes.
(Assignee)

Comment 56

5 years ago
Seth, given that bug 859377 part 4 so dramatically improved the empty dump situation on Nightly, does any of what you did there apply to the situation on Aurora?
(Assignee)

Updated

5 years ago
Flags: needinfo?(seth)
By the way, has anyone considered the possibility that these empty crash reports are caused by stack exhaustion crashes (like you get from infinite recursion)?

The crashes André Reinard's been seeing at bug 865702, which triggered empty crash dumps, are stack exhaustion crashes.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #56)
> Seth, given that bug 859377 part 4 so dramatically improved the empty dump
> situation on Nightly, does any of what you did there apply to the situation
> on Aurora?

That's a good question, and I don't know the answer. There may be more than one cause at work here.

Just to document it clearly, the change I made in bug 859377 part 4 was to stop using hardware surfaces on Windows. Allocating hardware surfaces seems to be extremely wasteful for small images on Windows and my code was allocating a lot of them.

If the situation on Aurora is caused by the same thing, it couldn't be originating from the ClippedImage code being worked on in bug 859377 (since that isn't in Aurora), but it could be caused by other things that use hardware surfaces on Windows. These include, to my knowledge, raster images of other sorts and the layers subsystem.
Flags: needinfo?(seth)

Comment 59

5 years ago
I don't know if this will help you guys out but I am able to replicate this crash 100% of the time on Win7 x64 SP1 using the following method:

1. Visit www.speed-battle.com
2. Download and open Autoclicker 1.0.0.2 and set "Miliseconds" to at least 3000, "Number of Clicks to Automate" to 0 (which is infinity), and click the "L" button on the far right, then click "Pick Location" and click the location on speed-battle to be "Repeat Test", then click "Ok" to continue.
3. Click the "Start Clicking" button and navigate back to the already loaded speed-battle page and Autoclicker should begin clicking on the "Repeat Test" link every 3 seconds.

It crashes for me after around 200-400 clicks.

Comment 60

5 years ago
(In reply to Arthur K. from comment #59)
> I don't know if this will help you guys out but I am able to replicate this
> crash 100% of the time on Win7 x64 SP1 using the following method:
Crashes with empty dump are a collection of hundreds of unrelated bugs. This bug is only to figure out why it has increased. So please file a new bug with a valid stack trace (see https://developer.mozilla.org/docs/How_to_get_a_stacktrace_for_a_bug_report)

Comment 61

5 years ago
There have been a big improvement since 21.0: 20.5% in 20.0.1, 13.6% in 21.0, 12.2% in 22.0b1, 9.8% in 23.0a2, and 7.6% in 24.0a1.

Comment 62

5 years ago
Finally after a few days, it's not as good as in comment 61, 17.4% in 21.0 (62% > 1H), 14.2% in 22.0b1 (65% > 1H), 12.5% in 23.0a2 (71% > 1 H), and 7% in 24.0a1 (70% > 1H), but still promising.
There's no silver bullet for this bug, or any clear regression between releases. This will have to be investigated in an ongoing fashion.
tracking-firefox21: + → -
tracking-firefox22: + → -
Someone should follow up what I said in comment #57.

Whether or not all the empty dumps happen with infinite-recursion crashes, all the infinite-recursion crashes I've had since I made that comment have had empty dumps.

So this bug appears to be reproducible.
(Assignee)

Comment 65

5 years ago
Since these crashes are not primarily on mac, I don't think we need to immediately follow up on the mac issue.
Here's a bunch of crash reports from a user that had both a bunch of empty crashes as well as some in nsSupportsStringImpl::SetData.

https://support.mozilla.org/en-US/questions/953285

(Originally posted by :John99 in bug 767343)
> Since these crashes are not primarily on mac,

I wasn't speaking specifically of the Mac, though so far that's the only platform I've tested on.

I expect an infinite-recursion crash will lead to an empty dump on all platforms.
We handle stack overflow crashes fine on Windows, AFAIK. I've tested this in the past, and seen a number of them in crash-stats.
> I've tested this in the past

How far in the past? :-)
How about today?
https://crash-stats.mozilla.com/report/index/bp-42400927-e0c4-4089-9904-c199e2130604
Fair enough.

Is this with the current version of your crashme extension?  If so I'll try it on the Mac and let you know my results.  (It's possible that, even on the Mac, only *some* infinite recursion crashes trigger an empty dump.)
Yes, although there's some other bug with crashme on Mac that makes the UI non-functional. :-( You can manually crash after installing it by opening the browser console and executing:

Cu.import("resource://crashme/modules/Crasher.jsm");
Crasher.crash(Crasher.CRASH_STACK_OVERFLOW);
Thanks!

Using the same version (0.4) of your crashme extension on OS X 10.7.5 in today's mozilla-central nightly and your STR from comment #72, I also don't get an empty dump -- though the main thread isn't displayed nearly as nicely as in your Windows example:

bp-709018a7-a4c0-4490-9d5e-181742130604

I'm still convinced that infinite recursion crashes are likely to be the key to figuring out how to reproduce this bug.  But I don't know when I'm going to have the time to do the work to confirm or deny this.
Bug 865702, which was fixed in the 2013-05-24 mozilla-central nightly, had crashes that (for me) always resulted in empty Socorro dumps.  These were infinite recursion crashes, and were reproducible (in 2013-05-23 and earlier m-c nightlies) using the following STR (from bug 865702 comment #41):

1) Plug in an external monitor to your MacBook Pro and arrange it on top of your laptop's display.
2) Visit a page in bugzilla (this bug will do).
3) Move that page to the external monitor, if it doesn't open there.
4) Make the page just narrow enough for the horizontal scrollbar to disappear.
5) Scroll down to the bottom of the page.
6) Press Cmd-b to open the Bookmarks sidebar.
7) Click on the Status button -- its combobox should open.
8) Press Cmd-b again to close the Bookmarks sidebar.
9) Press Cmd-b again to open the Bookmarks sidebar.
10) Click on the Status button again ... and crash.
(In reply to Steven Michaud from comment #73)
> I'm still convinced that infinite recursion crashes are likely to be the key
> to figuring out how to reproduce this bug.  But I don't know when I'm going
> to have the time to do the work to confirm or deny this.

As Benjamin said, the vast majority of empty dump crashes are on Windows (as are the vast majority of most of our crashes). We don't have any evidence to show that it's a huge problem on OS X.
Nonetheless, if we can reliably reproduce the problem on OS X (or any specific platform, for that matter), it will likely be a lot easier to figure out the problem on all platforms.
We know what the problem is on Windows: crashes as a result of OOM (or virtual memory fragmentation causing OOM) frequently cause minidump writing to fail because Microsoft's minidump writer is not memory-safe.
> We know what the problem is on Windows: crashes as a result of OOM
> (or virtual memory fragmentation causing OOM) ...

Then why didn't the "Reserve VM space for breakpad" patch fix it?
(Reporter)

Comment 79

5 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #65)
> Since these crashes are not primarily on mac

To be fair, we don't even know how many are on which OS, even though we are pretty sure that most are on Windows. Bug 838061 would help a lot to shed more light on which actual OS those crashes happen with.

Comment 80

5 years ago
(In reply to Seth Fowler [:seth] from comment #58)
> (In reply to Benjamin Smedberg  [:bsmedberg] from comment #56)
> > Seth, given that bug 859377 part 4 so dramatically improved the empty dump
> > situation on Nightly, does any of what you did there apply to the situation
> > on Aurora?
> 
> That's a good question, and I don't know the answer. There may be more than
> one cause at work here.
> 
> Just to document it clearly, the change I made in bug 859377 part 4 was to
> stop using hardware surfaces on Windows. Allocating hardware surfaces seems
> to be extremely wasteful for small images on Windows and my code was
> allocating a lot of them.
> 
> If the situation on Aurora is caused by the same thing, it couldn't be
> originating from the ClippedImage code being worked on in bug 859377 (since
> that isn't in Aurora), but it could be caused by other things that use
> hardware surfaces on Windows. These include, to my knowledge, raster images
> of other sorts and the layers subsystem.

In my experience, my daily crashes have been during periods of long usage or lots of image activity. I finally put FF under windbg and found that the crash appears to be, indeed, related to hardware surfaces (at least that time) combined with memory pressure. 

Follows is the stack track, hopefully I stripped out enough since windbg "helpfully" pulled symbols from our servers...
# ChildEBP RetAddr  
00 00feba80 76f0d85e KERNELBASE!RaiseException(
			unsigned long dwExceptionCode = 0xe06d7363, 
			unsigned long dwExceptionFlags = 1, 
			unsigned long nNumberOfArguments = 3, 
			unsigned long * lpArguments = 0x00febaac)+0x6c [windows file]
01 00febab8 735b0687 msvcrt!_CxxThrowException(
			void * pExceptionObject = 0x00febac8, 
			struct _s__ThrowInfo * pThrowInfo = 0x73567f04)+0x48 [windows file]
02 00febadc 735c05a0 d3d11!ThrowFailure+0x7ba4a [windows file]
03 00febb38 69e47ca6 d3d11!NDXGI::CDevice::DeallocateCB+0x8e483
WARNING: Stack unwind information not available. Following frames may be wrong.
04 00febb80 69e6934e igd10umd32!OpenAdapter10+0xb046
05 00febbcc 69c4ac4e igd10umd32!OpenAdapter10+0x2c6ee
06 00febbf0 69c48843 igd10umd32+0xac4e
07 00febc04 69c43f3b igd10umd32+0x8843
08 00febc30 69c42eb3 igd10umd32+0x3f3b
09 00febc4c 69e4f0a4 igd10umd32+0x2eb3
0a 00febc70 69e40dd5 igd10umd32!OpenAdapter10+0x12444
0b 00febc94 73531286 igd10umd32!OpenAdapter10+0x4175
0c 00febd6c 7352f685 d3d11!CResource<ID3D11Texture2D>::CLS::FinalConstruct(
			class CContext * pC = 0x00000000, 
			struct D3D11DDIARG_CREATERESOURCE * pDDICreateResource = 0x00feca58, 
			struct SD3D11SharedResourceCreationArgs * pShared = 0x00fecfc4, 
			struct SD3D11CrossLayerData * pCrossLayerData = <Value unavailable error>, 
			struct D3D10DDI_HRTRESOURCE pRtHandle = struct D3D10DDI_HRTRESOURCE)+0x189 [windows file]
0d (Inline) -------- d3d11!CTexture2D::CLS::FinalConstruct+0x33 [windows file]
0e 00febd90 7352fe1c d3d11!TCLSWrappers<CTexture2D>::CLSFinalConstructFn(
			struct CTexture2D::CLS * pCLS = 0x2b23a73c, 
			class CContext * pContext = 0x00000000, 
			struct CTexture2D::TConstructorArgs * pArgs = 0x00fec15c)+0x38 [windows file]
0f (Inline) -------- d3d11!CLayeredObjectWithCLS<CTexture2D>::FinalConstruct+0x8c [windows file]
10 (Inline) -------- d3d11!CLayeredObjectWithCLS<CTexture2D>::{ctor}+0x100 [windows file]
11 (Inline) -------- d3d11!CLayeredObjectWithCLS<CTexture2D>::CreateInstance+0x142 [windows file]
12 00fecce8 7352b618 d3d11!CDevice::CreateLayeredChild(
			unsigned int ChildType = 2, 
			void * pLayeredChildArgs = 0x00fecda4, 
			struct ID3D11LayeredUseCounted * pOuterUnk = 0x2b23a630, 
			struct _GUID * iid = 0x735214b8 {8b21025d-3e48-4fe9-9800-acc2fd8d6541}, 
			void ** ppUnk = 0x2b23a660)+0x645 [windows file]
13 00fecd00 735306e9 d3d11!CBridgeImpl<ID3D11LayeredDevice,ID3D11LayeredDevice,CLayeredObject<CDevice> >::CreateLayeredChild(
			unsigned int a = 2, 
			void * b = 0x00fecda4, 
			unsigned long c = 0x30, 
			struct ID3D11LayeredUseCounted * d = 0x2b23a630, 
			struct _GUID * e = 0x735214b8 {8b21025d-3e48-4fe9-9800-acc2fd8d6541}, 
			void ** f = 0x2b23a660)+0x1f [windows file]
14 (Inline) -------- d3d11!CD3D11LayeredChild<ID3D11DeviceChild,NDXGI::CDevice,64>::FinalConstruct+0x21 [windows file]
15 00fecd3c 7353060e d3d11!NDXGI::CDeviceChild<IDXGIResource1>::FinalConstruct(
			ED3D11DeviceChildType eDeviceChildType = e_D3D11Texture2D (0n2), 
			struct SLayeredArgs * pLArgs = 0x00fecda4, 
			unsigned long uiArgSize = 0x30, 
			struct ID3D11LayeredUseCounted * pOutmstLyrIface = 0x2b23a630)+0x2d [windows file]
16 00fecd84 73530494 d3d11!NDXGI::CResource::FinalConstruct(
			struct NDXGI::CResource::TConstructorArgs * args = 0x00fecda0)+0x29 [windows file]
17 (Inline) -------- d3d11!CLayeredObject<NDXGI::CResource>::{ctor}+0x49fa [windows file]
18 (Inline) -------- d3d11!CLayeredObject<NDXGI::CResource>::CreateInstance+0x49fa [windows file]
19 00fece2c 7352b254 d3d11!NDXGI::CDevice::CreateLayeredChild(
			unsigned int ChildType = <Value unavailable error>, 
			void * pLayeredChildArgs = 0x00fece50, 
			unsigned long uiArgSize = <Value unavailable error>, 
			struct ID3D11LayeredUseCounted * pOuterUnk = 0x2b23a630, 
			struct _GUID * iid = 0x735214b8 {8b21025d-3e48-4fe9-9800-acc2fd8d6541}, 
			void ** ppUnk = 0x2b23a648)+0x2ea [windows file]
1a (Inline) -------- d3d11!CBridgeImpl<ID3D11LayeredDevice,ID3D11LayeredDevice,CLayeredObject<NDXGI::CDevice> >::CreateLayeredChild+0x24 [windows file]
1b (Inline) -------- d3d11!NOutermost::CDeviceChild::FinalConstruct+0x32 [windows file]
1c (Inline) -------- d3d11!CUseCountedObject<NOutermost::CDeviceChild>::{ctor}+0x9b [windows file]
1d (Inline) -------- d3d11!CUseCountedObject<NOutermost::CDeviceChild>::CreateInstance+0xa9 [windows file]
1e 00fecf30 7353092e d3d11!NOutermost::CDevice::CreateLayeredChild(
			unsigned int ChildType = 2, 
			void * pLayeredChildArgs = 0x00fecf78, 
			unsigned long uiArgSize = 0x30, 
			struct ID3D11LayeredUseCounted * pOuterUnk = 0x00000000, 
			struct _GUID * iid = 0x73522f58 {6f15aaf2-d208-4e89-9ab4-489535d34f9c}, 
			void ** ppUnk = 0x00fed0fc)+0x1e2 [windows file]
1f (Inline) -------- d3d11!CDevice::CreateAndRecreateLayeredChild+0x90 [windows file]
20 00fed0a0 73532ec1 d3d11!CDevice::CreateTexture2D_Worker(
			struct D3D11_TEXTURE2D_DESC * pDesc = 0x00fed100, 
			struct D3D11_SUBRESOURCE_DATA * pInitialData = 0x00000000, 
			int DWMException = 0n0, 
			struct ID3D11Texture2D ** ppTexture2D = 0x00fed0fc, 
			struct SD3D11SharedResourceCreationArgs * pSResArgs = 0x00000000, 
			bool bCalledFromD3D10 = true)+0x21a [windows file]
21 00fed130 0362cf60 d3d11!CDevice::ID3D10Device1_CreateTexture2D_(
			struct ID3D10Device1 * pIFace = 0x08f0a73c, 
			struct D3D10_TEXTURE2D_DESC * pDesc = 0x00fed15c, 
			struct D3D10_SUBRESOURCE_DATA * pInitialData = 0x00000000, 
			struct ID3D10Texture2D ** ppTexture2D = 0x21db0d70)+0xc7 [windows file]
22 00fed18c 038ac8a1 xul!mozilla::services::_external_GetChromeRegistryService+0x5daac
23 00fed2b8 038ad227 xul!XRE_InitEmbedding2+0x2699f
24 00fed328 038ad231 xul!XRE_InitEmbedding2+0x27325
25 00fed390 038acc05 xul!XRE_InitEmbedding2+0x2732f
26 00fedd50 038ad102 xul!XRE_InitEmbedding2+0x26d03
27 00feddac 02d94765 xul!XRE_InitEmbedding2+0x27200
28 00fedec4 02cfa2c9 xul!webrtc::VoEAudioProcessing::DriftCompensationSupported+0x6395
29 00fee37c 02da4206 xul!mozilla::scache::PathifyURI+0x256b9
2a 00fee4d4 02da453c xul!webrtc::VoEAudioProcessing::DriftCompensationSupported+0x15e36
2b 00fee530 02cea611 xul!webrtc::VoEAudioProcessing::DriftCompensationSupported+0x1616c
2c 00fee53c 02cea727 xul!mozilla::scache::PathifyURI+0x15a01
2d 00fee548 02d36f1e xul!mozilla::scache::PathifyURI+0x15b17
2e 00fee550 02ce1365 xul!NS_CycleCollectorSuspect2_P+0x909e
2f 00fee560 02cfedc8 xul!mozilla::scache::PathifyURI+0xc755
30 00fee680 02db109d xul!mozilla::scache::PathifyURI+0x2a1b8
31 00fee6e0 02d08256 xul!webrtc::VoEAudioProcessing::DriftCompensationSupported+0x22ccd
32 00fee7c8 02cff64e xul!mozilla::scache::PathifyURI+0x33646
33 00fee7d0 02d2777a xul!mozilla::scache::PathifyURI+0x2aa3e
34 00fee7ec 77a3a643 xul!mozilla::scache::PathifyURI+0x52b6a
35 00fee818 77a3a593 ntdll!RtlpEnterCriticalSectionContended(
			struct _RTL_CRITICAL_SECTION * CriticalSection = 0x00000000)+0x148 [windows file]
36 00fee824 67c02a6e ntdll!RtlEnterCriticalSection(
			struct _RTL_CRITICAL_SECTION * CriticalSection = 0x02d64c0d)+0x43 [windows file]
37 00fee83c 67c02ba8 nspr4!PR_Lock+0x2e
38 00fee88c 0306d551 nspr4!PR_Unlock+0x38
39 00fee8c4 0306d605 xul!NS_InvokeByIndex_P+0x5e36
3a 00fee8e4 02e3fc71 xul!NS_InvokeByIndex_P+0x5eea
3b 00fee9d8 02bbc04e xul!XRE_main+0x53a5
3c 00fee9fc 02e3a8fc xul!xpc::Base64Decode+0x43df
3d 00feeb14 0106157e xul!XRE_main+0x30
3e 01064230 2f2f3a73 firefox+0x157e
3f 01064234 73617263 0x2f2f3a73
40 (Inline) -------- d3d11!CLayeredObjectRoot<ID3D11LayeredDevice>::CondObjectLock::{dtor}+0x9 [windows file]
41 (Inline) -------- d3d11!CDevice::CondObjectLock::{dtor}+0x9 [windows file]
42 01064238 65722d68 d3d11!CContext::ID3D11DeviceContext1_Map_<2>(
			struct ID3D11DeviceContext1 * pIFace = 0x045300e0, 
			struct ID3D11Resource * pResource = <Memory access error>, 
			unsigned int Subresource = <Memory access error>, 
			D3D11_MAP MapType = <Memory access error>, 
			unsigned int MapFlags = <Memory access error>, 
			struct D3D11_MAPPED_SUBRESOURCE * pMappedSubresource = <Memory access error>)+0x52 [windows file]
43 01064254 3d64693f 0x65722d68
44 01064258 3863657b 0x3d64693f
45 0106425c 66303330 0x3863657b
46 01064260 32632d37 explorerframe!`string'+0x4
47 01064264 342d6130 0x32632d37
48 01064268 2d663436 0x342d6130
49 0106426c 65306239 0x2d663436
4a 01064270 6133312d 0x65306239
4b 01064274 65396133 0x6133312d
4c 01064278 38333739 0x65396133
4d 0106427c 76267d34 0x38333739
4e 01064280 69737265 shell32!ntdll_NULL_THUNK_DATA+0x1a08
4f 01064284 323d6e6f 0x69737265
50 01064288 26302e31 0x323d6e6f
51 0106428c 6c697562 0x26302e31
52 01064290 3d646964 0x6c697562
53 01064294 33313032 0x3d646964
54 01064298 31313530 0x33313032
55 0106429c 38303231 0x31313530
56 010642a0 00000000 0x38303231

Comment 81

5 years ago
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #56)
> given that bug 859377 part 4 so dramatically improved the empty dump situation on
> Nightly
It was a red herring. The crash ratio is 13.3% in 22.0b6 and 16.5% in 23.0a2.

Comment 82

4 years ago
If 5% of OOM crashes have a signature and 95% have the empty dump crash signature then we should fix OOM crashes like bug 767343 (1422 crashes in 22.0) and bug 764342 (927 crashes in 22.0). These two bugs would account for 43% of empty dump crashes in 22.0.
I crashed twice with an empty stack today, both times  I was creating routes or using street view in the new Google Maps.  I'm not sure it was an oom crash though.
These are my non-stacks:
https://crash-stats.mozilla.com/report/index/39ee4d84-2fd4-430c-9c4c-115d52130719
https://crash-stats.mozilla.com/report/index/8b12a70b-b213-4582-862f-b002e2130720

Comment 84

4 years ago
(In reply to Marco Bonardo [:mak] from comment #83)
> https://crash-stats.mozilla.com/report/index/39ee4d84-2fd4-430c-9c4c-
> 115d52130719
This one is bug 802152 based on the abort message in App Notes.

Comment 85

4 years ago
It accounts for 18% in 22.0, 14.7% in 23.0b6, 15.8% in 24.0a2, and 10.2% in 25.0a1.

(In reply to Scoobidiver from comment #82)
> If 5% of OOM crashes have a signature and 95% have the empty dump crash
> signature
Instead of assuming, I used https://crash-analysis.mozilla.com/crash_analysis/20130719/20130719-pub-crashdata.csv.gz. The breakdown per abort or error message in 22.0 is as follow:
Abort or error message           Bug            Total		16787	
OOM: file e:\builds\...\xpcom\string\src\nsTSubstring.cpp, line 533	Bug 802152	1675	9.98%
OOM: file e:\builds\...\xpcom\string\src\nsTSubstring.cpp, line 348	Bug 767343	1398	8.33%
Failed to create temporary texture in system memory. Error code: 2147942414	Bug 793126	1224	7.29%
ThebesLayerD3D10::Validate(): Failed to create texture Error code: 2147942414		527	3.14%
OOM: file e:\builds\...\xpcom\string\src\nsTSubstring.cpp, line 291	Bug 869294	333	1.98%
Attempt to create unsupported SourceSurface fromnon-image surface.: file e:/builds/.../gfx/thebes/gfxPlatform.cpp, line 655	Bug 844819	204	1.22%
out of memory: file e:/builds/.../layout/base/nsDisplayList.cpp, line 867		127	0.76%
OOM: file e:/builds/.../xpcom/string/src/nsReadableUtils.cpp, line 160	Bug 858791	111	0.66%
out of memory: file e:/builds/.../layout/base/nsPresArena.cpp, line 362		89	0.53%
bug836263: file e:/builds/.../modules/libpref/src/nsPrefBranch.cpp, line 330	Bug 836263	52	0.31%
OOM: file e:\builds\...\xpcom\string\src\nsTSubstring.cpp, line 393		37	0.22%
OOM: file e:\builds\...\obj-firefox\dist\include\nsTHashtable.h, line 99		27	0.16%
OOM: file e:/builds/.../layout/generic/nsLineLayout.cpp, line 584		18	0.11%
file e:/builds/.../build/ipc/ch/src/base/pickle.cc, line 60		18	0.11%
Can't allocate mozilla::ReentrantMonitor: file e:\builds\...\obj-firefox\dist\include\mozilla/ReentrantMonitor.h, line 49		14	0.08%
Can't allocate mozilla::Mutex: file e:\builds\...\obj-firefox\dist\include\mozilla/Mutex.h, line 51		8	0.05%
OOM: file e:\builds\...\obj-firefox\dist\include\nsTSubstring.h, line 132		6	0.04%
Depends on: 802152, 767343, 793126
(In reply to Scoobidiver from comment #84)
> (In reply to Marco Bonardo [:mak] from comment #83)
> > https://crash-stats.mozilla.com/report/index/39ee4d84-2fd4-430c-9c4c-
> > 115d52130719
> This one is bug 802152 based on the abort message in App Notes.

It would probably be interesting to make this code path annotate the OOMAllocationSize like the mozalloc-OOM case does, so we could see how much memory is being allocated here.

mak: if you can reproduce this, it would be pretty interesting to attach a debugger and find out where this is actually crashing.
Also, from one of mak's crashes:
Available Virtual Memory	
370790400

That's not particularly large, and if it's fragmented I could certainly see him hitting an OOM.

Comment 88

4 years ago
I'm a simple user and I come with this information, maybe it helps.
The following crash report was sent by me:
https://crash-stats.mozilla.com/report/index/47d26d8a-aa3b-4f22-99b9-581a42130914
I have seen the link of this page in the "Related bugs" under the description.
All I can say is that Firefox crashes often after I open too many tabs, I have 2 GB of memory and it crashes when it's about 70-80% full; this is the first time when I could send the crash report, before the crash sending dialog appeared but after I clicked "Send report" it said that the report could not be sent, even if I was connected to the internet.

Comment 89

4 years ago
One of my empty crashes:
http://crash-stats.mozilla.com/report/index/b5105c81-5298-468b-b25f-717ac2131016

How this could possibly be debugged if there is no dump?

Comment 90

4 years ago
User Dderss - you have to launch firefox in debugger and reproduce. Since every one of these crashes I've seen is due to memory pressure (OOM in physical space or fragmentation), that means you have to run in debugger for a while. It can be a pain.

Comment 91

4 years ago
Thanks for reply, Timothy. Do you mean "safe" mode? The problem with it is that is that the crash might core/video related (my guess), and it happens because I watch lot of YouTube videos. In safe mode all of extensions are turned off I will not be able to watch YouTube videos so I will not be able to build the pressure on FF resources to actually generate crash.

Or, is there an actual debugger which I can download and install, who would run in real-time in parallel with FF process and grab the mess that goes on, writing it on the fly to the disk, so no matter how abrupt the crash is, and no matter whether the dump is corrupt or not, something still could be traced?

Comment 92

4 years ago
(In reply to Tim from comment #90)
> User Dderss - you have to launch firefox in debugger and reproduce. Since
> every one of these crashes I've seen is due to memory pressure (OOM in
> physical space or fragmentation), that means you have to run in debugger for
> a while. It can be a pain.

If he were to run, say Fx 25.0b8, wouldn't debug mode be enabled by default?

Comment 93

4 years ago
https://developer.mozilla.org/en-US/docs/How_to_get_a_stacktrace_with_WinDbg

Comment 94

4 years ago
Thanks; I finally managed to catch the crash, though it took two days of slow browsing under the debugger. However, because it took so long, the log file is giant: 565 MB. How much I can cut from it so the log would be light enough for easy upload for developers to see what happened?

(The way it goes is I open a bunch of YouTube videos, which becomes unbearable for FF so it dies.)

Also, for just in case of need, I have made "minidump", which is not really mini since it takes 3.5 GB. But I would prefer not to upload it -- unless it would become absolutely necessary -- since has personal information.

Comment 95

4 years ago
The personal information is minimally useful in a dump, though if someone were interested I'm sure an ill-intentioned person could get something from it. You could start with just getting the stack trace following the above instructions.

Comment 96

4 years ago
Created attachment 819129 [details]
Since the debug log file is huge (565 MB), I only upload part of it since exception occured (stack et cetera)

If part before the exception needed, please let me know how much so I would cut out appropriately-sized chunk of the log.
Attachment #819129 - Flags: review+
Attachment #819129 - Flags: feedback+
Attachment #819129 - Attachment mime type: text/x-log → text/plain
Attachment #819129 - Flags: review+
Attachment #819129 - Flags: feedback+
Thanks, this is useful! The stack of the crashing thread here is:
#136  Id: 51e0.74a4 Suspend: 1 Teb: ffe20000 Unfrozen "Media Decode"
ChildEBP RetAddr  
ea1df480 583e1218 mozalloc!mozalloc_abort(char * msg = 0xea1df498 "out of memory: 0x0000000000151800 bytes requested")+0x2a [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc_abort.cpp @ 30]
ea1df4d0 583e10a2 mozalloc!mozalloc_handle_oom(unsigned int size = 0x151800)+0x5f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc_oom.cpp @ 50]
ea1df4e0 105e28ab mozalloc!moz_xmalloc(unsigned int size = 0x151800)+0x1b [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\memory\mozalloc\mozalloc.cpp @ 56]
ea1df4f8 106235c9 xul!mozilla::layers::BufferRecycleBin::GetBuffer(unsigned int aSize = 0x9c4cf9a4)+0x52 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 111]
ea1df504 10623536 xul!mozilla::layers::PlanarYCbCrImage::AllocateBuffer(unsigned int aSize = 0x151800)+0x10 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 427]
ea1df518 1064b315 xul!mozilla::layers::PlanarYCbCrImage::CopyData(struct mozilla::layers::PlanarYCbCrImage::Data * aData = 0xea1df53c)+0x2f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 463]
ea1df524 10a44a41 xul!mozilla::layers::PlanarYCbCrImage::SetData(struct mozilla::layers::PlanarYCbCrImage::Data * aData = 0xea1df53c)+0xa [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\gfx\layers\imagecontainer.cpp @ 485]
ea1df598 10a44e5f xul!mozilla::VideoData::Create(class mozilla::VideoInfo * aInfo = 0x3a0feae0, class mozilla::layers::ImageContainer * aContainer = 0x554c6fb0, class mozilla::layers::Image * aImage = 0xea1df5e0, int64 aOffset = 0n487161, int64 aTime = 0n300300, int64 aEndTime = 0n333666, struct mozilla::VideoData::YCbCrBuffer * aBuffer = 0xea1df668, bool aKeyframe = false, int64 aTimecode = 0n-1, struct nsIntRect aPicture = struct nsIntRect)+0x266 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderreader.cpp @ 252]
ea1df5e0 1007ad2a xul!mozilla::VideoData::Create(class mozilla::VideoInfo * aInfo = 0x3a0feae0, class mozilla::layers::ImageContainer * aContainer = 0x554c6fb0, int64 aOffset = 0n487161, int64 aTime = 0n300300, int64 aEndTime = 0n333666, struct mozilla::VideoData::YCbCrBuffer * aBuffer = 0xea1df668, bool aKeyframe = false, int64 aTimecode = 0n-1, struct nsIntRect aPicture = struct nsIntRect)+0x3a [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderreader.cpp @ 266]
ea1df6b4 1007b06f xul!mozilla::WMFReader::CreateBasicVideoFrame(struct IMFSample * aSample = 0x264b0208, int64 aTimestampUsecs = 0n300300, int64 aDurationUsecs = 0n33366, int64 aOffsetBytes = 0n487161, class mozilla::VideoData ** aOutVideoData = 0xea1df724)+0x1d1 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\wmf\wmfreader.cpp @ 856]
ea1df714 10a47885 xul!mozilla::WMFReader::DecodeVideoFrame(bool * aKeyframeSkip = 0xea1df7d3, int64 aTimeThreshold = 0n333666)+0x1e1 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\wmf\wmfreader.cpp @ 987]
ea1df7dc 10a489e2 xul!mozilla::MediaDecoderStateMachine::DecodeLoop(void)+0x248 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderstatemachine.cpp @ 905]
ea1df7f4 10589423 xul!mozilla::MediaDecoderStateMachine::DecodeThreadRun(void)+0x9f [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\content\media\mediadecoderstatemachine.cpp @ 507]
ea1df7f8 0fdc7a51 xul!nsRunnableMethodImpl<void (void)+0xe [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\obj-firefox\dist\include\nsthreadutils.h @ 351]
ea1df86c 0fe1b1b8 xul!nsThread::ProcessNextEvent(bool mayWait = true, bool * result = 0xea1df89c)+0x221 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\xpcom\threads\nsthread.cpp @ 632]
ea1df894 5176e927 xul!nsThread::ThreadFunc(void * arg = 0x532d0201)+0x98 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\xpcom\threads\nsthread.cpp @ 264]
ea1df8b4 5177329d nss3!_PR_NativeRunThread(void * arg = 0x223a5860)+0x167 [e:\builds\moz2_slave\rel-m-rel-w32_bld-000000000000\build\nsprpub\pr\src\threads\combined\pruthr.c @ 419]

Comment 98

4 years ago
Does this stack mean that this crash/bug might be related to this one?

http://bugzilla.mozilla.org/show_bug.cgi?id=887968
Dderss, no bug 887968 is unlikely to be your issue. I opened a clean bug for your specific case. The number is bug 930797.
Created attachment 826311 [details]
WinDbg trace. I have cut the trace post access violation. If previous content is needed then let me know. It is more than 100MB in size.

Another Windbg trace see if it helps. This is on Firefox 25. Steps to reproduce is to load multiple tabs simultaneously(~200) The attached file is trace since Access Violation occurred. I also have minidump and full trace saved.   Let me know if that is needed too.
(In reply to hitesh.seth@yahoo.co.in from comment #100)
> Created attachment 826311 [details]
> WinDbg trace. I have cut the trace post access violation. If previous
> content is needed then let me know. It is more than 100MB in size.
> 
> Another Windbg trace see if it helps. This is on Firefox 25. Steps to
> reproduce is to load multiple tabs simultaneously(~200) The attached file is
> trace since Access Violation occurred. I also have minidump and full trace
> saved.   Let me know if that is needed too.

There's something wrong with this log, it doesn't have symbols loaded for xul.dll, which makes it very hard to get useful info out. Also, when you first hit an exception in WinDBG, you can just enter the command to get the stack. Trying to continue at that point doesn't help much.

Comment 102

4 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #101)
> There's something wrong with this log, it doesn't have symbols loaded for
> xul.dll

Maybe because http://symbols.mozilla.org/firefox is currently giving error 404.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #101)
> (In reply to hitesh.seth@yahoo.co.in from comment #100)
> > Created attachment 826311 [details]
> > WinDbg trace. I have cut the trace post access violation. If previous
> > content is needed then let me know. It is more than 100MB in size.
> > 
> > Another Windbg trace see if it helps. This is on Firefox 25. Steps to
> > reproduce is to load multiple tabs simultaneously(~200) The attached file is
> > trace since Access Violation occurred. I also have minidump and full trace
> > saved.   Let me know if that is needed too.
> 
> There's something wrong with this log, it doesn't have symbols loaded for
> xul.dll, which makes it very hard to get useful info out. Also, when you
> first hit an exception in WinDBG, you can just enter the command to get the
> stack. Trying to continue at that point doesn't help much.

For loading symbols I followed instructions given at :
https://developer.mozilla.org/en/docs/How_to_get_a_stacktrace_with_WinDbg

If there are new instructions available to give a stack trace then please let me know.

Thanks for that tip about WinDbg. I had a doubt there, how should I differentiate between exception that can be handled(by continuing) versus exception that will lead to crash in firefox?

Comment 104

4 years ago
Created attachment 826322 [details]
firefox-debug_16ec_2013-11-02_18-10-49-220.log

Here's a log of a crash after the usual browsing I do, Firefox 24.1.0ESR.
Just browsing on image-heavy sites, and other sites known to be big on RAM (Slashdot, Amazon).
Is it normal that the crash-reporter isn't showing when WinDbg is used?

I hope I have done everything properly, at least I saw WinDbg loading xul.pdb, so I think that's covered.
(In reply to elbart from comment #102)
> Maybe because http://symbols.mozilla.org/firefox is currently giving error
> 404.

It always gives a 404, it's not designed to be human-browsable. It should return proper responses for paths to symbols.

(In reply to elbart from comment #104)
> Is it normal that the crash-reporter isn't showing when WinDbg is used?

Yes, the Mozilla crash reporter is fired from a last-chance exception handler, which doesn't get invoked when you have a debugger attached.
(In reply to hitesh.seth@yahoo.co.in from comment #103)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #101)
> > (In reply to hitesh.seth@yahoo.co.in from comment #100)
> > > Created attachment 826311 [details]
> > > WinDbg trace. I have cut the trace post access violation. If previous
> > > content is needed then let me know. It is more than 100MB in size.
> > > 
> > > Another Windbg trace see if it helps. This is on Firefox 25. Steps to
> > > reproduce is to load multiple tabs simultaneously(~200) The attached file is
> > > trace since Access Violation occurred. I also have minidump and full trace
> > > saved.   Let me know if that is needed too.
> > 
> > There's something wrong with this log, it doesn't have symbols loaded for
> > xul.dll, which makes it very hard to get useful info out. Also, when you
> > first hit an exception in WinDBG, you can just enter the command to get the
> > stack. Trying to continue at that point doesn't help much.
> 
> For loading symbols I followed instructions given at :
> https://developer.mozilla.org/en/docs/How_to_get_a_stacktrace_with_WinDbg
> 
> If there are new instructions available to give a stack trace then please
> let me know.
> 
> Thanks for that tip about WinDbg. I had a doubt there, how should I
> differentiate between exception that can be handled(by continuing) versus
> exception that will lead to crash in firefox?


I was going through WinDbg log from start and I found this:

"
0:000> .sympath SRV*c:\symbols*http://symbols.mozilla.org/firefox;SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*c:\symbols*http://symbols.mozilla.org/firefox;SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Expanded Symbol search path is: srv*c:\symbols*http://symbols.mozilla.org/firefox;srv*c:\symbols*http://msdl.microsoft.com/download/symbols
0:000> .symfix+ c:\symbols
0:000> .reload /f
Reloading current modules
...*** WARNING: Unable to verify checksum for G:\Program Files\Alwil Software\Avast5\snxhk.dll
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for G:\Program Files\Alwil Software\Avast5\snxhk.dll - 
........
"



From the warning it seems symbol files were not found at the location. But it is giving error for one of the DLL files of my anti-virus and not firefox. Is it normal? Or did symbol files were truly not found and that's why stack file doesn't have symbols? Please suggest.
I got another crash just now. Interestingly it is not Empty crash but step to reproduce it was same-- load multiple tabs (~200) Also, this crash also refers to xul.dll, the same dll referred by WinDbg trace.

https://crash-stats.mozilla.com/report/index/3a99d4ff-0ea2-4a22-8077-ffd402131102
Can somebody look at my bug if this can cause some of them ???

Bug 937651 - Replace the sessionstore.js with an sessionstore.sqlite
(Assignee)

Updated

4 years ago
Depends on: 939141
(In reply to hitesh.seth@yahoo.co.in from comment #107)
> I got another crash just now. Interestingly it is not Empty crash but step
> to reproduce it was same-- load multiple tabs (~200) Also, this crash also
> refers to xul.dll, the same dll referred by WinDbg trace.
> 
> https://crash-stats.mozilla.com/report/index/3a99d4ff-0ea2-4a22-8077-
> ffd402131102

The stack here is:
Thread 0 (crashed)
 0  xul.dll!mozilla::WebGLContext::PresentScreenBuffer() [WebGLContext.cpp:d86ad
7db1de3 : 1379 + 0x3]
    eip = 0x10c89796   esp = 0x001cbec8   ebp = 0x001cbf08   ebx = 0x00000000
    esi = 0x28554ec0   edi = 0x308c6900   eax = 0x28554ec4   ecx = 0x00000000
    edx = 0x05100048   efl = 0x00210202
    Found by: recovered by external stack walker
...

This seems like it's already filed as bug 881311.
(Reporter)

Updated

4 years ago
Crash Signature: [@ EMPTY: no crashing thread identified; corrupt dump] → [@ EMPTY: no crashing thread identified; corrupt dump] [@ EMPTY: no crashing thread identified; ERROR_NO_MINIDUMP_HEADER]

Comment 110

4 years ago
The same crash as Robert's:
https://crash-stats.mozilla.com/report/index/75b6fe4d-28bf-416c-ae60-a0cfe2131120

I debugged similar crash recently:
http://bugzilla.mozilla.org/show_bug.cgi?id=930797
(Reporter)

Comment 111

4 years ago
(In reply to User Dderss from comment #110)
> The same crash as Robert's:
> https://crash-stats.mozilla.com/report/index/75b6fe4d-28bf-416c-ae60-
> a0cfe2131120

I didn't have one of these, I only added that new signature here as reporting has changed to put that on on those reports - purely a change in our tools that process crash reports from users, not a change in what those rashes would be

> I debugged similar crash recently:
> http://bugzilla.mozilla.org/show_bug.cgi?id=930797

Thanks, please leave individual debugging of that in the bug there, would be good if we find specific cases of how one can reproducibly run into those issues, as that may help developers find out what code causes it and possibly how to improve the situation.

Comment 112

4 years ago
The debugging information is already there; thanks.

Is this bug is the same as https://bugzilla.mozilla.org/show_bug.cgi?id=711568?
(Reporter)

Comment 113

4 years ago
(In reply to User Dderss from comment #112)
> Is this bug is the same as
> https://bugzilla.mozilla.org/show_bug.cgi?id=711568?

That one is the generic meta bug for those crashes - the one here is specifically about the regressing in volume we have seen with those in the Firefox 19 and 20 cycles.
Summary: increase in crashes with EMPTY dumps → increase in crashes with EMPTY dumps in Firefox 19 and 20 cycles
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #111)
> (In reply to User Dderss from comment #110)
> > The same crash as Robert's:
> > https://crash-stats.mozilla.com/report/index/75b6fe4d-28bf-416c-ae60-
> > a0cfe2131120
> 
> I didn't have one of these, I only added that new signature here as
> reporting has changed to put that on on those reports - purely a change in
> our tools that process crash reports from users, not a change in what those
> rashes would be
> 
> > I debugged similar crash recently:
> > http://bugzilla.mozilla.org/show_bug.cgi?id=930797
> 
> Thanks, please leave individual debugging of that in the bug there, would be
> good if we find specific cases of how one can reproducibly run into those
> issues, as that may help developers find out what code causes it and
> possibly how to improve the situation.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #111)
> Thanks, please leave individual debugging of that in the bug there, would be
> good if we find specific cases of how one can reproducibly run into those
> issues, as that may help developers find out what code causes it and
> possibly how to improve the situation.

OK, I think I can give you easily an scenario where FF makes a lot of I/O, needs a lot of memory, slow down and makes a lot of crashes.

At the moment there is Bug 934935 so it is really easy to make the sessionstore.js big ... ;-)
I think this 'bug' is caused be the Facebook 'switching advertisement system', so you don't have to write much in the message page ... ;-)

- At first configure you FF to use no plugins and open the old open tabs after you start your FF again.
- Now open a page that don't waste much space in the sessionstore.js. E.g.: Bugzilla.
- Now log in to FB and open some message pages. Lets say 10.
- Wait some minutes a look how you sessionstore.js (ss.js) is growing.
- Lets say you make the first test with an ss.js with 15 MB.
- So now go back to your Bugzilla page and close FF.
- Restart FF and don't load the message pages.
- Try to work.
-> You will now see that it slows down, makes I/O, ... Maybe it starts to get instable ...

- Lets open some more message pages. (Don't reload the old ones!)
- Grow you ss.js to 30 MB.
- Restart.
- Don't reload the message pages !!!
- Try to work.
-> More slow, more I/O, more instable.

- Now do the same again and go to 50 MB.

- ...

-> FF crashes more and more!

Please look at Bug 937651 for more infos.
(Reporter)

Comment 116

4 years ago
Yes, the Facebook thing is known, but it was not shipped to some part with Firefox 19 and to some part with Firefox 20, and this bug is explicitely about why we have increased the number of OOM crashes of this kind in those two cycles, as depicted by the graphs in the attachments.

BTW, the patch from comment #48 seems to not have helped significantly here.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #116)
> Yes, the Facebook thing is known, but it was not shipped to some part with
> Firefox 19 and to some part with Firefox 20, and this bug is explicitely
> about why we have increased the number of OOM crashes of this kind in those
> two cycles, as depicted by the graphs in the attachments.
> 
> BTW, the patch from comment #48 seems to not have helped significantly here.

Sorry, I was thinking you need a case of an empty crash.
fb (seems now fixed) was just an example to grow the sessionstore ...
... but this is not limited to FF19 & FF20.
Sorry!

Updated

4 years ago
Depends on: 943051

Comment 118

4 years ago
I believe the bug #903842 is related to this one. Because the symptoms for me are the same: many open tabs, then browser windows start becoming black. If you create a new window from such a tab - the whole window becomes white and the browser soon crashes.
Duplicate of this bug: 527095

Comment 120

4 years ago
Today I again got a crash with the same symptoms as I defined in the bug #903842, but without crashing thread identified: https://crash-stats.mozilla.com/report/index/d566010c-15a0-40a9-8493-6867a2131210
(Assignee)

Comment 121

4 years ago
I'm working on spinning up data collection around memory usage but this bug is no longer useful.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → INCOMPLETE

Updated

4 years ago
Whiteboard: [native-crash][leave open] → [native-crash]
You need to log in before you can comment on or make changes to this bug.