Closed Bug 1028802 Opened 10 years ago Closed 10 years ago

crash in mozilla::gfx::AlphaBoxBlur::Blur(unsigned char*)

Categories

(Core :: Graphics: Layers, defect)

All
Android
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox32 --- wontfix
firefox33 - verified
firefox34 - verified
firefox35 - verified
b2g-v2.0 --- fixed
b2g-v2.0M --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed
fennec 32+ ---

People

(Reporter: emorley, Assigned: jwatt)

References

Details

(Keywords: crash, topcrash-android-armv7)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-b5ad273f-5386-482a-a8af-80d662140623.
=============================================================

Crash occurred on bugzilla after I checked the "add me to CC" box ( which was very slow to visually update) + clicked submit changes (which I had to press multiple times, since our didn't visually show the click the first time), on Firefox for android nightly 2014-06-22 on nexus 5.
This is the top crash on Android Nightly at this time.
This started happening on Android Nightly on 2014-06-17, across a larger number of devices (lead by Nexus 5 and Nexus 7) with API level 18 and 19 (Android 4.3 and 4.4), mostly on Adreno 330, 320, and some Mali-T628 graphics chips.
See more reports and data on https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Agfx%3A%3AAlphaBoxBlur%3A%3ABlur%28unsigned%20char%2A%29
Matt, Jonathan, possibly related to the recent blur work?
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(jwatt)
Looks like we're getting a nullptr crash on this line: http://hg.mozilla.org/mozilla-central/annotate/366b5c0c02d3/gfx/2d/Tools.h#l154

That doesn't make much sense to me since I can't see us dereferencing anything there.

A regression window would be useful.
Flags: needinfo?(matt.woodrow)
Can reproduce by visiting either of the URLs below and scrolling up and down the page a few times then waiting 10 or so seconds. If no crash repeat scrolling up and down the page.

http://www.karmicsangoma.co.za/2013/05/putio-honest-review.html 
http://www.xxxconnect.com/dating/allaboutsex

working on a regression range
Any luck with that regression range, Kevin?
Flags: needinfo?(jwatt) → needinfo?(kbrosnan)
m-c regression range
Last good revision: 18c21532848a (2014-06-11)
First bad revision: 9e8e3e903484 (2014-06-12)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=18c21532848a&tochange=9e8e3e903484

inbound regression range
Last good revision: 177db1927db2
First bad revision: 9a38c2b23536
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=177db1927db2&tochange=9a38c2b23536

Of the bugs in that range Bug 1016680 seems the most likely? Else it is the set of Bug 1022874, Bug 1022873, and Bug 1022872 but Android should not be affected by Loop.
Flags: needinfo?(kbrosnan)
jwatt, this topcrash is believed caused by bug 1016680 - please may you take a look? :-)
Blocks: 1016680
Flags: needinfo?(jwatt)
(In reply to Ed Morley [:edmorley UTC+0] from comment #8)
> jwatt, this topcrash is believed caused by bug 1016680 - please may you take
> a look? :-)

Daniel, jwatt's display name says mostly away until July 28th - I don't suppose as reviewer of the commits in comment 7, you would mind taking a look? (A backout didn't apply cleanly). Thank you :-)
Flags: needinfo?(dholbert)
I'm here until the end of the week. My display name change is to head off review requests since it's better that I prioritize regression cleanup before heading off.

How did we manage to get a regression range without a means of reproducing this?
Flags: needinfo?(jwatt)
Flags: needinfo?(dholbert)
This is all being confused by the fact that there are multiple different stacks containing AlphaBoxBlur::Blur being reported, some of which have been fixed (such as bug 1027107).

Clicking through on the crash report link in comment 0 and then clicking on "More Reports", the page says there have been 66 crashes in the last 7 days, 95% of them on Linux. Clicking on the "Reports" tab only shows three reports though, and all on Windows...
I had the Android variant occur twice again this morning
Pretty sure those are bug 1027107. In fact this bug may just be a dup.
(In reply to Kevin Brosnan [:kbrosnan] from comment #7)
> Of the bugs in that range Bug 1016680 seems the most likely? Else it is the
> set of Bug 1022874, Bug 1022873, and Bug 1022872 but Android should not be
> affected by Loop.

My commits there can't have caused this, since that code is only run when you open about:memory.

The other changes also don't look like they can have caused this since they seem to just be about un-minifying some JS.
No longer blocks: 1016680
(In reply to Jonathan Watt [mostly offline until July 28] from comment #14)
> Pretty sure those are bug 1027107. In fact this bug may just be a dup.

The fix for that bug should be in today's nightly - so I'm happy to call my issue fixed by that until proven otherwise.

The stacks in comment 0 are the same as comment 13; and so the issue as I filed is fixed. If there are other issues being lumped into here - let's file a new bug if they are still occurring.
Status: NEW → RESOLVED
Closed: 10 years ago
Depends on: 1027107
Resolution: --- → FIXED
(In reply to Ed Morley [:edmorley UTC+0] from comment #16)
> The fix for that bug should be in today's nightly - so I'm happy to call my
> issue fixed by that until proven otherwise.

To be clearer: bug 1027107 has only just been marked fixed so actually hasn't been in a nightly yet, but will be in today's (when I first read comment 11 I thought my comment 13 crashes had occurred even with the patch in bug 1027107).

Thank you :-)
Assignee: nobody → matt.woodrow
Target Milestone: --- → mozilla33
(In reply to Jonathan Watt [mostly offline until July 28] from comment #14)
> Pretty sure those are bug 1027107. In fact this bug may just be a dup.

Bug 1027107 comment #11 says it's NOT a dupe.

(In reply to Jonathan Watt [mostly offline until July 28] from comment #11)
> Clicking through on the crash report link in comment 0 and then clicking on
> "More Reports", the page says there have been 66 crashes in the last 7 days,
> 95% of them on Linux. Clicking on the "Reports" tab only shows three reports
> though, and all on Windows...

You're getting to a wrong conclusion here. "Linux" probably means Android there (we rewrite Android to Linux due to some internal Socorro limitations), and you are looking at a page limited to the "Firefox" product so you are not seeing Fennec in the reports tab.
Looking at https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Agfx%3A%3AAlphaBoxBlur%3A%3ABlur%28unsigned+char%2A%29 (or clicking at "View ALL products and versions for this signature." in the "Signature Summary" tab) instead will have the FennecAndroid ones listed.
And looking at the "graph" tab on https://crash-stats.mozilla.com/report/list?product=FennecAndroid&range_unit=days&range_value=28&signature=mozilla%3A%3Agfx%3A%3AAlphaBoxBlur%3A%3ABlur(unsigned+char*) makles it look like the first appearances on this signature on Android trunk were in the 6/17 build. This also matches what I stated in comment #2 out of my explosiveness reports. And that is the source of the regression range given here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I've just crashed again,  even with that fix in today's nightly :
bp-90686422-d9f4-4aac-9204-72fe22140710
Yes, I didn't expect bug 1027107 to fix this here, given it's older than nightly of 6/17 (e.g. the OOM signature appears in 30 in significant volume, the one here doesn't - and the one of this bug also has a sharp point where it started appearing on nightly).

Ed, thanks for confirming that this is not a dupe.
Status: REOPENED → NEW
(So comment 7 is not in fact the regression range for this bug here, then, correct?

Do we know if the URLs in comment 5 actually trigger this bug? [i.e. will they still reproduce this, in builds new or old enough to be unaffected by bug 1027107]  And if not, do we have other STR here?)
I can try re-running the range. Quite sure about the m-c range stated. Anything stand out from that?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #18)
> (In reply to Jonathan Watt [mostly offline until July 28] from comment #11)
> > Clicking through on the crash report link in comment 0 and then clicking on
> > "More Reports", the page says there have been 66 crashes in the last 7 days,
> > 95% of them on Linux. Clicking on the "Reports" tab only shows three reports
> > though, and all on Windows...
> 
> You're getting to a wrong conclusion here. "Linux" probably means Android...

I wasn't making any conclusions from that. I was noting that the links seem broken. (The page saying there are 66 crashes mostly on Linux, but then only showing three Windows crashes when you ask to see them.)
(In reply to Kevin Brosnan [:kbrosnan] from comment #22)
> I can try re-running the range. Quite sure about the m-c range stated.
> Anything stand out from that?

Yes:

39371ca856c4	Matt Woodrow — Bug 940845 - Part 6: Cache blurs when possible. r=Bas

(It's the m-i range is almost certainly wrong.)
(In reply to Daniel Holbert [:dholbert] from comment #21)
> (So comment 7 is not in fact the regression range for this bug here, then,
> correct?

From crash-stats data, it looks like the regression started between the 6/16 and 6/17 Nightly builds for Android, which conflicts with Kevin's regression range in comment #7, that's true.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #25)
> (In reply to Daniel Holbert [:dholbert] from comment #21)
> > (So comment 7 is not in fact the regression range for this bug here, then,
> > correct?
> 
> From crash-stats data, it looks like the regression started between the 6/16
> and 6/17 Nightly builds for Android, which conflicts with Kevin's regression
> range in comment #7, that's true.

(OK. Sorry, I was mistakenly suspecting that the regression range (& the STR used to determine it) might've in fact been for bug 1027107. But on closer inspection of bug 1027107, that bug is nearly a month older than comment 7's regression range, so I was off on that suspicion.)

In any case, it sounds like comment 24 is on to something worth investigating, which may or may not be associated with the 6/17 crash-stats spike, but is worth following up on.
Flags: needinfo?(matt.woodrow)
FWIW I tried to reproduce this in a debug build, but whereas I can get nightly builds to crash, I could not get the debug build to do so.
Attached patch Avoid nullptr crash — — Splinter Review
I can't see any way we could crash at the line specified in the crash reports, but this fixes a potential crash nearby. Fingers crossed!
Attachment #8454220 - Flags: review?(jwatt)
Flags: needinfo?(matt.woodrow)
Comment on attachment 8454220 [details] [diff] [review]
Avoid nullptr crash

Indeed.
Attachment #8454220 - Flags: review?(jwatt) → review+
https://hg.mozilla.org/mozilla-central/rev/dba4eb858d74
Status: NEW → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
I experienced this crash on 33b1.  Here's the crash report: https://crash-stats.mozilla.com/report/index/42a7748b-365c-418a-8b23-f5c0e2140906
[Tracking Requested - why for this release]: top crash and we have a regression in crash rate
tracking-fennec: --- → ?
Actually, I see reports with this signature in 35.0a1 and 34.0a2 and the patch landed in 33 Nightly, so let's reopen this and mark the younger trains as affected.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Matt, NI to make you aware that this is reopened and is the #1 top crash on beta
tracking-fennec: ? → 33+
Flags: needinfo?(matt.woodrow)
I don't think I'll get time to look at this before I go onto PTO. Jet is going to try find someone else to take this in the meantime.
Flags: needinfo?(matt.woodrow)
Jet, any update here?
I'm looking into this.
Assignee: matt.woodrow → jwatt
Going back to the stack again, it would seem to be:

AlignedArray<uint32_t,16>::Realloc
http://hg.mozilla.org/releases/mozilla-beta/annotate/9502b0a5c5f1/gfx/2d/Tools.h#l154
AlignedArray<uint32_t,16>::AlignedArray(size_t aCount)
http://hg.mozilla.org/releases/mozilla-beta/annotate/9502b0a5c5f1/gfx/2d/Tools.h#l110
AlphaBoxBlur::Blur
http://hg.mozilla.org/releases/mozilla-beta/annotate/9502b0a5c5f1/gfx/2d/Blur.cpp#l542
gfxAlphaBoxBlur::DoBlur
http://hg.mozilla.org/releases/mozilla-beta/annotate/9502b0a5c5f1/gfx/thebes/gfxBlur.cpp#l117

The middle two frames are not in the crash reports, but are extrapolated from the other two.

I concur with comment 4. I really don't see how we can crash here. We know that |storageByteCount>=15|, and by stepping through in the debugger and forcing storageByteCount to be very large I can confirm that the |operator new| with std::nothrow invokes moz_malloc and returns a null pointer without throwing.
Depends on: 1067998
I managed to crash with a related stack here:

http://hg.mozilla.org/releases/mozilla-beta/annotate/9502b0a5c5f1/gfx/thebes/gfxBlur.cpp#l64

with the stack:

  mozalloc_abort
  mozalloc_handle_oom
  moz_xmalloc
  operator new [] (size=17683123) at ../../dist/include/mozilla/mozalloc.h:213
  gfxAlphaBoxBlur::Init 
  gfxAlphaBoxBlur::BlurRectangle
  nsContextBoxBlur::BlurRectangle

Despite the stacks reported in the crash reports related to this bug being a bit off I don't think that is the same crash so I filed it as bug 1067998.
I just landed the fix for bug 1067998 on Beta branch, and there's now a .apk build up at:

http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-beta-android/1411036441/

It would be helpful for anyone who's been able to reproduce crashes to test that build and see if they can trigger any more crash reports.
tracking-fennec: 33+ → 32+
Comment on attachment 8454220 [details] [diff] [review]
Avoid nullptr crash


[Approval Request Comment]
Uplifting this bug (after bug 1015785) will make it possible to uplift bug 1063733.
Attachment #8454220 - Flags: approval-mozilla-b2g32?
Jonathan, how do we look on this bug? Thanks
Flags: needinfo?(jwatt)
See Also: → 1067018
Sylvestre, not great. Given bug 1067018, it seems very unlikely that bug 1067998 will fix this bug. I'm a bit stumped, so I'm hoping that some of my needinfos on bug 1067018 will give us some ideas.
Flags: needinfo?(jwatt)
Sylvestre: over in bug 1067018 there is now progress in understanding what's going. I'd suggest moving all the blocking/tracking flags over there and marking this bug as a dupe (this bug has more noise, less progress, and the wrong stack frame is being blamed).
As Jonathan proposed, we are going to track bug 1067018 instead of this one. Thanks
In case you need a way to reproduce:

My mobile Firefox Beta version 33 crashes all the time on this URL:
http://hanneksverden.blogspot.dk/2010/01/lasagne-med-oksekd-og-spinat.html?m=1

wait for the page to load and scroll a bit up and don
Attachment #8454220 - Flags: approval-mozilla-b2g32? → approval-mozilla-b2g32+
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/63ee9fa3966b

Given that there isn't any work being done in this bug anymore, maybe we should just close it out for saner tracking?
Flags: needinfo?(jwatt)
Yes. I'm not sure what the preferred option is nowadays though (dupe of bug 1067018, or fixed by it).
Flags: needinfo?(jwatt)
Let's just call it fixed with a dep :)
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Depends on: 1067018
Resolution: --- → FIXED
I only see one crash in the last week for Firefox 35.0a1 and it's with a build from September 21st. Hence, marking this bug verified fixed.
Status: RESOLVED → VERIFIED
Just checking other Fennec versions...

Fennec 34.0a2 has 1 crash in the last week but it's with a build from September 16.

Fennec 33.0b7 has 940 crashes, but Fennec 33.0b8 and 33.0b10 don't have any. I'm going to guess this landed for Beta after 33.0b7.

Please fix the flags if my assumptions are incorrect.
issue is resolved - removing old / stale qa-wanted keywords
Hi, Josh
    Please provide the bug reproduce steps and videos, which could help with our verification.
Flags: needinfo?(jocheng)
Flags: needinfo?(jocheng) → needinfo?(hlu)
(In reply to Coler from comment #55)
> Hi, Josh
>     Please provide the bug reproduce steps and videos, which could help with
> our verification.

It's a null-check, there's nothing to be done here other than check on crash volume on http://crash-stats.mozilla.com which has beeb done in comment #52 already.
Flags: needinfo?(hlu)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: