Closed Bug 1095289 Opened 10 years ago Closed 10 years ago

crash in OOM | large | NS_ABORT_OOM(unsigned int) | nsACString_internal::Replace(unsigned int, unsigned int, char const*, unsigned int)

Categories

(Core :: General, defect)

36 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla36
Tracking Status
firefox35 + verified
firefox36 + verified

People

(Reporter: away, Assigned: milan)

References

Details

(Keywords: crash, topcrash-win)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-b5f7e031-0b02-406d-8d9a-026b32141031.
=============================================================

Oh the irony. gfxCriticalError spammed AnnotateCrashReport until we OOM crashed. 

Not sure if this should be addressed on the gfx side or crash reporter. Ted/Nical?
Flags: needinfo?(ted)
Flags: needinfo?(nical.bugzilla)
[Tracking Requested - why for this release]: #13 crash on aurora

There are some hits on beta but not with this annotation pattern. It might be other issues.
We should probably put a cap on the string for AppendAppNotesToCrashReport as a backstop here, but we should also fix that gfx code. We'd just cap the string somewhere around here:
http://hg.mozilla.org/mozilla-central/annotate/17e190839086/toolkit/crashreporter/nsExceptionHandler.cpp#l1856
Flags: needinfo?(ted)
I think it should be addressed by gfx since we wrote the little thing that is spamming the reporter.

We talked about limiting the gfxCriticalError at some point with something like keeping the first one or two errors, and the last N ones (and perhaps count the number of errors that we dropped).

This is a good incentive to put that in place.
Flags: needinfo?(nical.bugzilla)
Bug 1074952 limits gfxCriticalError to keeping the very first one, and the last N-1 where N is determined by a preference gfx.logging.crash.length.
Depends on: 1074952
Assignee: nobody → milan
However... if we are considering uplifting the fix to this, we would not want the bug 1074952 - it is too large for that.  So, it'd be good to know.
We should stop seeing these as of 2014/11/19 nightly, at least with this signature.
Here's one from 20141119030200, or did you mean starting from the nightly after that?
https://crash-stats.mozilla.com/report/index/a8dd6d71-bbcd-4a37-a792-ea6262141120
Flags: needinfo?(milan)
The one after, this landed on central on 2014/11/19.
Flags: needinfo?(milan)
Looks like it worked.

Is it possible carve bug 1074952 into something safe for uplift?
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(milan)
Resolution: --- → FIXED
Only thing I can think of is either removing gfxCriticalError in http://dxr.mozilla.org/mozilla-central/source/gfx/layers/d3d11/TextureD3D11.cpp#291 or replacing it with gfxWarning, but, I'd like to hear if :nical thinks that would reduce the useful information we get from (other) crash reports.
Flags: needinfo?(milan) → needinfo?(nical.bugzilla)
If it's specifically this line that spams the appnotes, maybe a good compromise would be to replace 

if (FAILED(hr)) {
  gfxCriticalError() << "[D3D11] CreateTexture2D failure " << mSize << " Code: " << gfx::hexa(hr);
  return;
}

by

if (FAILED(hr)) {
  static int sCount = 0;
  if (sCount <= 5) {
    gfxCriticalError() << "[D3D11] CreateTexture2D failure " << mSize << " Code: " << gfx::hexa(hr);
  }
  if (sCount = 5) {
    gfxCriticalError() << "The error above was reported 5 times, dropping next occurrences."
  }
  sCount++;
  return;
}
Flags: needinfo?(nical.bugzilla)
Taking note that interestingly, the logging involved a lot more allocations than what one would think is required to a allocate a 16*16 rgba texture. Looks like small texture allocations reserve significantly more memory than the amount they use to store actual pixels.
:nical, something like this?
Attachment #8529844 - Flags: review?(nical.bugzilla)
Comment on attachment 8529844 [details] [diff] [review]
Aurora patch.  Kludge to stop the annotation overflow.

Review of attachment 8529844 [details] [diff] [review]:
-----------------------------------------------------------------

Sure
Attachment #8529844 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8529844 [details] [diff] [review]
Aurora patch.  Kludge to stop the annotation overflow.

Review of attachment 8529844 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/TextureD3D11.cpp
@@ +293,5 @@
>      if (FAILED(hr)) {
> +      // Temporary patch; see bug 1095289 before bug 1074952 is fixed
> +      static int limitDueTo1095289 = 0;
> +      if (limitDueTo1095289 < 5) {
> +        gfx::gfxCriticalError() << "[D3D11] CreateTexture2D failure (1) " << aSize << " Code: " << gfx::hexa(hr);

mSize

@@ +295,5 @@
> +      static int limitDueTo1095289 = 0;
> +      if (limitDueTo1095289 < 5) {
> +        gfx::gfxCriticalError() << "[D3D11] CreateTexture2D failure (1) " << aSize << " Code: " << gfx::hexa(hr);
> +      } else if (limitDueTo1095289 == 5) {
> +        gfx::gfxCriticalError() << "[D3D11] CreateTexture2D failure (1-stopping further reports after 5+ occurences) " << aSize << " Code: " << gfx::hexa(hr);

mSize
Comment on attachment 8529922 [details] [diff] [review]
Aurora patch.  Kludge to stop the annotation overflow.  Carry r=nical

Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: topcrash
[Describe test coverage new/current, TBPL]:
[Risks and why]: Low
[String/UUID change made/needed]:
Attachment #8529922 - Flags: approval-mozilla-aurora?
Comment on attachment 8529922 [details] [diff] [review]
Aurora patch.  Kludge to stop the annotation overflow.  Carry r=nical

We merged in between. cf comment #18 for uplift the request.
Attachment #8529922 - Flags: approval-mozilla-aurora? → approval-mozilla-beta?
Attachment #8529922 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
A Socorro [1] report for the past 4 weeks shows:
- 35 Beta = 127 crashes
   * 35.0b2 = 65 crashes
   * 35.0b3 = 39 crashes
   * 35.0b4 = 17 crashes
   * 35.0b5 = 6 crashes
- 36 Aurora = 4 crashes

I'm not sure whether the expectation here was for this to not show anymore, or just show at a lower level. Milan, can you provide your insight on this?

[1] - https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=OOM+%7C+large+%7C+NS_ABORT_OOM%28unsigned+int%29+%7C+nsACString_internal%3A%3AReplace%28unsigned+int%2C+unsigned+int%2C+char+const%2A%2C+unsigned+int%29
Flags: needinfo?(milan)
The patch here only fixes one instance of this crash signature, the other crash reports in 36 look to be different issues. We should probably add nsACString_Internal::Replace to the skiplist to differentiate.
Right - this was just the crash that included CrashReporter::AnnotateCrashReport in it, or more to the point, mozilla::gfx::CriticalLogger::OutputMessage, so I think these are the crashes we weren't going after in this bug.  Probably need a new one?

Speaking of which, is there a way to see a raw .extra file from a crash report?
Flags: needinfo?(milan)
Not precisely, since that data isn't stored untouched, but you can get pretty close with:
https://crash-stats.mozilla.com/api/ProcessedCrash/?datatype=processed&crash_id=xxx
Thank you for clarifying this! Going through the crashes I see there are still a few crashes with CrashReporter::AnnotateCrashReport (and mozilla::gfx::CriticalLogger::OutputMessage):
- 35 Beta = 12 crashes
   * 35.0b2 = 4 crashes
      ** https://crash-stats.mozilla.com/report/index/4c6d6bf6-3a2b-4c32-9da4-5e0652141212
      ** https://crash-stats.mozilla.com/report/index/b66ded76-2102-4395-840f-6a7e22141211
      ** https://crash-stats.mozilla.com/report/index/581c31a2-a5de-4a34-a054-424a22141212
      ** https://crash-stats.mozilla.com/report/index/1d9509cf-87a9-43c6-ba83-1809f2141211
   * 35.0b3 = 5 crashes
      ** https://crash-stats.mozilla.com/report/index/2c8e57f1-3f32-4c0c-868e-fc1b22141218
      ** https://crash-stats.mozilla.com/report/index/24cb9973-2748-482e-87f5-612902141215
      ** https://crash-stats.mozilla.com/report/index/e734942a-ae34-40d1-bfbc-a20c42141214
      ** https://crash-stats.mozilla.com/report/index/01762f71-e2a7-44b2-9493-e1f722141217
      ** https://crash-stats.mozilla.com/report/index/9e0dffc5-f762-495b-9992-55abf2141216
   * 35.0b4 = 2 crashes 
      ** https://crash-stats.mozilla.com/report/index/3f4147bb-9553-4463-b129-404982141220
      ** https://crash-stats.mozilla.com/report/index/bc5f245f-2300-44db-83fb-0d3722141221
   * 35.0b5 = 1 crash 
      ** https://crash-stats.mozilla.com/report/index/a5baf082-b14d-4543-9eb4-3b2a72141220
- 36 Aurora = 0 crashes

These remaining crashes look similar to the original one, but the low rate of occurrence I think is enough to close this.

Also I'm not sure we want to pursue the other crashes as their number is not very high.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: