Last Comment Bug 736421 - crash in mozilla::AndroidLayerRendererFrame::DrawForeground @ CgDrv_Create on MB860, LG-P990 and LG-P999 (Tegra2 + ICS)
: crash in mozilla::AndroidLayerRendererFrame::DrawForeground @ CgDrv_Create on...
Status: RESOLVED FIXED
[native-crash][gfx]
: crash, regression, topcrash
Product: Core
Classification: Components
Component: Widget: Android (show other bugs)
: 14 Branch
: ARM Android
: -- critical (vote)
: mozilla16
Assigned To: Vladimir Vukicevic [:vlad] [:vladv]
:
Mentors:
Depends on: 703056
Blocks:
  Show dependency treegraph
 
Reported: 2012-03-16 03:34 PDT by Scoobidiver (away)
Modified: 2012-09-15 23:45 PDT (History)
14 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix
fixed
fixed
.N+


Attachments
maybe like this? (1.78 KB, patch)
2012-06-04 12:24 PDT, Vladimir Vukicevic [:vlad] [:vladv]
snorp: review+
Details | Diff | Splinter Review
better (2.68 KB, patch)
2012-06-05 11:35 PDT, Vladimir Vukicevic [:vlad] [:vladv]
snorp: review+
blassey.bugs: approval‑mozilla‑aurora+
blassey.bugs: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description Scoobidiver (away) 2012-03-16 03:34:19 PDT
It first appeared in 14.0a1/20120315:
bp-5ed33483-218c-4419-b1f1-cce432120316

It's currently #3 top crasher in 14.0a1.

More reports at:
https://crash-stats.mozilla.com/report/list?signature=CgDrv_Create
Comment 1 Chris Peterson [:cpeterson] 2012-03-19 11:56:14 PDT
ajuma, do you have any recommendations for this topcrash? It looks like we're crashing in the Atrix MB860's GL driver.
Comment 2 Ali Juma [:ajuma] 2012-03-19 12:07:33 PDT
We need to get our hands on the GL driver from this device so we can grab symbols and find out where in the driver this is happening.

Knowing the URLs where this crash happens would be helpful too.
Comment 3 Scoobidiver (away) 2012-03-19 12:33:21 PDT
There's one crash on LG-P990 CyanogenMod: bp-3a0f9dab-423b-4345-9823-5c33f2120318.
Comment 4 Marcia Knous [:marcia - use ni] 2012-03-23 10:25:12 PDT
Only two URLs came up in the search:

     17 \N
      1 http://www.bbc.co.uk/news/10318089
      1 http://www.abovetopsecret.com/forum/thread819897/pg1
Comment 5 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-04-05 09:33:45 PDT
The crash seemed to have died down.  Last crash from build: 	20120331031108 	

Found only one other URL:
	http://www.ouest-france.fr/actu/actuDet_-Orelsan-programme-a-La-Reunion-la-Region-retire-sa-subvention_39382-2060296_actu.Htm
Comment 6 Scoobidiver (away) 2012-04-12 02:26:24 PDT
(In reply to Naoki Hirata :nhirata from comment #5)
> The crash seemed to have died down.  Last crash from build: 	20120331031108 	
There are still recent crashes, but it's no longer a top crasher.
Comment 7 Scoobidiver (away) 2012-05-08 23:45:25 PDT
There's a spike in crashes from two users that use those devices unsupported on native Fennec, but supported on XUL Fennec.
Comment 8 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-05-14 07:44:26 PDT
It's within the top 10 top crashers now... not placing in topcrash due to comment 7
Comment 9 Scoobidiver (away) 2012-05-14 08:27:43 PDT
It affects 11 users over the last week.
Comment 10 Scoobidiver (away) 2012-05-15 05:39:53 PDT
Should we consider MB860 (2.3% of Android devices - source [1]) and LG-P99* as unsupported?

[1]: https://docs.google.com/spreadsheet/ccc?key=0ArpSb7XMTvzydDhVNWFXbXRkQ3VoQW4yaWptTjJjY3c&authkey=COzTgpEG&authkey=COzTgpEG
Comment 11 Vladimir Vukicevic [:vlad] [:vladv] 2012-05-15 11:36:23 PDT
Well, I have an MB860/Atrix... would be nice to figure out what's going on :)
Comment 12 Scoobidiver (away) 2012-05-16 00:14:47 PDT
It's #6 top crasher in the first days of 14.0b1.
It occurs also after the fix of bug 748531.
It would be interesting to have recent URLs now that 14.0b1 is released.
Comment 13 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-05-16 11:39:47 PDT
Scoobidiver, what's the 14.0b1 that you are looking at?  We have some bad data for 14b1 graphs.  Yesterday was the first official release of 14b1 Native; 14b1 XUL is not released.
Comment 14 Joe Drew (not getting mail) 2012-05-16 11:40:25 PDT
FWIW, those LG phones and the Atrix are all Tegra devices, and CgDrv_Create makes me think NVIDIA too.
Comment 15 Scoobidiver (away) 2012-05-16 23:48:55 PDT
(In reply to Naoki Hirata :nhirata from comment #13)
> Scoobidiver, what's the 14.0b1 that you are looking at?  We have some bad
> data for 14b1 graphs.  Yesterday was the first official release of 14b1
> Native; 14b1 XUL is not released.
This 14.0b1, https://crash-stats.mozilla.com/query/query?product=FennecAndroid&version=FennecAndroid%3A14.0b1&range_value=1&range_unit=weeks&query_search=signature&query_type=contains&query=&do_query=1, updated continuously contrarily to https://crash-stats.mozilla.com/topcrasher/byversion/FennecAndroid/14.0b1 updated once a day.
Comment 16 Robert Kaiser 2012-05-17 05:40:44 PDT
Yes, /query (advanced search) is doing live searches on what's in the database, topcrasher is generated once every day for the recent UTC day.
Comment 17 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-05-18 11:21:06 PDT
(In reply to Scoobidiver from comment #15)
> (In reply to Naoki Hirata :nhirata from comment #13)
> > Scoobidiver, what's the 14.0b1 that you are looking at?  We have some bad
> > data for 14b1 graphs.  Yesterday was the first official release of 14b1
> > Native; 14b1 XUL is not released.
> This 14.0b1,
> https://crash-stats.mozilla.com/query/
> query?product=FennecAndroid&version=FennecAndroid%3A14.
> 0b1&range_value=1&range_unit=weeks&query_search=signature&query_type=contains
> &query=&do_query=1, updated continuously contrarily to
> https://crash-stats.mozilla.com/topcrasher/byversion/FennecAndroid/14.0b1
> updated once a day.

Thanks Scoobi for the link.  Would this link qualify the public beta (release date of 5/15) better?
https://crash-stats.mozilla.com/query/query?product=FennecAndroid&version=FennecAndroid%3A14.0b1&range_value=1&range_unit=weeks&date=05%2F22%2F2012+18%3A19%3A56&query_search=signature&query_type=contains&query=&reason=&build_id=&process_type=any&hang_type=any&do_query=1
Comment 18 Scoobidiver (away) 2012-05-18 11:33:13 PDT
(In reply to Naoki Hirata :nhirata from comment #17)
> Thanks Scoobi for the link.  Would this link qualify the public beta
> (release date of 5/15) better?
My link is shorter because, with the date field, new crashes are not taken into account, and the reason, build_id, process_type and hang_type fields have their default value.
Comment 19 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-05-18 12:38:39 PDT
Ah.  I was referring to adding the date of 5/22/2012 in the search field so that the start date would be from the official release date of 5/15.
Comment 20 Vladimir Vukicevic [:vlad] [:vladv] 2012-05-23 13:07:04 PDT
These seem to have disappeared since the 0516 nightly build; I'm guessing that this is another bug 752368 dup.  Also, given that most of the crashes are from CM users, not sure this needs to block.  Keeping this open until we can verify crash numbers when next beta goes out.
Comment 21 Scoobidiver (away) 2012-05-23 23:30:43 PDT
(In reply to Vladimir Vukicevic (:vlad) from comment #20)
> These seem to have disappeared since the 0516 nightly build
Nightly is not statistically representative. There are still recent crashes in Aurora after the fix of bug 752368: bp-6a5085ee-1414-474f-bb31-542132120523.
Comment 22 Vladimir Vukicevic [:vlad] [:vladv] 2012-05-24 06:45:17 PDT
True, I guess 3 crashes are enough to say "It's still happening", though much lower on the list.  It could still be bug 752368 but in a different situation, but like I said, let's keep this open until the beta ships and see what happens.
Comment 24 Jeff Muizelaar [:jrmuizel] 2012-05-28 08:14:46 PDT
Bug 757944 might help here.
Comment 25 Vladimir Vukicevic [:vlad] [:vladv] 2012-05-28 11:18:12 PDT
I'd be surprised if bug 757944 helps, but maybe!
Comment 26 JP Rosevear [:jpr] 2012-05-29 05:31:34 PDT
Bug 757944 has stalled - do we need to unstick it?
Comment 27 Jeff Muizelaar [:jrmuizel] 2012-06-01 13:31:22 PDT
(In reply to JP Rosevear [:jpr] from comment #26)
> Bug 757944 has stalled - do we need to unstick it?

Bug 757944 has now landed on inbound. I'm also going to try to get symbols for these crashes, so that we can get a better idea of what might be causing this.
Comment 28 Jeff Muizelaar [:jrmuizel] 2012-06-01 14:06:10 PDT
FWIW, the entry point into the driver is unsurprisingly glCompileShader()
Comment 29 Jeff Muizelaar [:jrmuizel] 2012-06-01 14:14:31 PDT
(In reply to Jeff Muizelaar [:jrmuizel] from comment #28)
> FWIW, the entry point into the driver is unsurprisingly glCompileShader()

And it crashes because the compiler does a division by 0. It would be nice to know which shader is causing the crash.
Comment 30 Jeff Muizelaar [:jrmuizel] 2012-06-04 05:47:09 PDT
Interestingly, there haven't been any of these on FF15 for quite a while.
Comment 31 Jeff Muizelaar [:jrmuizel] 2012-06-04 08:25:45 PDT
Does anyone have one of the devices mentioned?
Comment 32 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-04 08:59:19 PDT
We suspect that this might be flash related; it would be helpful if we could get a module correlation for all of these crashes, to see if libflashplayer.so appears in all of them.
Comment 33 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-04 09:30:40 PDT
So looking more at this, these all look to be Tegra 2 devices that are running some form of ICS rom (almost all non-vendor-provided, e.g. CM9).  The flash version is likely to be the same one that shipped with gingerbread on these phones (which is the stock vendor-provided one), and so we're most likely getting some sort of form of bug 703056, but we're not being blocked because the devices are running ICS.

I think we need a finer-grained block here somehow.
Comment 34 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-04 10:17:30 PDT
All of the crashes in b4/b5 are with 1) homebrew ICS; 2) libflashplayer.so with a 'debug id' of 79B73C212164CC7699C039D8B6C646570 .  Some of the online reviews about CM9 on some of these devices say that flash does not work with the built in browser.

That 'debug id' is generated by taking the first 4096 bytes of the text section of the .so and xor'ing guid-sized chunks together, as seen here http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/common/linux/file_id.cc#144 .

So maybe we block flash on Tegra 2 devices that have libflashplayer.so that has that signature?
Comment 35 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-04 10:51:18 PDT
The latest libflashplayer.so that's served by the market is:

SDK version 8..14 (downloaded on my Atrix with gingerbread): 
   79 b7 31 21 14 20 cc 76 15 a7 39 d8 b6 c6 46 57 0

SDK version 15 onward (downloaded on my Galaxy Nexus with ICS):
   79 b7 3c 21 cc 76 21 64 d8 39 c0 99 57 46 c6 b6 0

(note if anyone repeats this exercise: you can xor the bytes together, but the version that's displayed on breakpad is the GUID form -- int, short, short, bytes -- so you have to rearrange bytes appropriately).  The flash version that we're crashing with is the ICS one which makes sense, that's what they'd get from the market given that they'll be prsenting an ICS SDK version.

So we have:
   1) Homebrew ICS
   2) Tegra 2
   3) ICS Flash 11
Comment 36 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-04 11:59:27 PDT
Renomming.  This only seems to happen with aftermarket android firmware; most common with CM9 but with others as well.  I suggest we either minus and ignore this crash for now, or add some filtering based on kernel version and block some common kernel strings (CM9, cyanogen, nova, a few others).  I may just write up that version anyway, since should just be able to look at /proc/version.
Comment 37 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-04 12:24:18 PDT
Created attachment 629877 [details] [diff] [review]
maybe like this?

Sort of band-aid-y, but should work to get rid of this crash.  It would be nice to have a way to override our blacklisting of Flash, but meh.
Comment 38 James Willcox (:snorp) (jwillcox@mozilla.com) 2012-06-04 14:18:31 PDT
Comment on attachment 629877 [details] [diff] [review]
maybe like this?

Review of attachment 629877 [details] [diff] [review]:
-----------------------------------------------------------------

Should probably put vreader.close() in a finally() clause since readLine() and other stuff can throw. Looks fine otherwise.
Comment 39 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-05 11:35:41 PDT
Created attachment 630239 [details] [diff] [review]
better

Better; adds some logging as well, just in case.
Comment 40 James Willcox (:snorp) (jwillcox@mozilla.com) 2012-06-05 11:37:50 PDT
Comment on attachment 630239 [details] [diff] [review]
better

Review of attachment 630239 [details] [diff] [review]:
-----------------------------------------------------------------

Might as well put the bug number in the log message too :)
Comment 41 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-05 11:53:44 PDT
Pushed to inbound with bug # in log message:

https://hg.mozilla.org/integration/mozilla-inbound/rev/18c5fc416d4a
Comment 42 Scoobidiver (away) 2012-06-06 00:13:24 PDT
Based on affected devices, I add the kill | raise signature: https://crash-stats.mozilla.com/report/list?signature=kill+|+raise
Comment 43 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-06 07:14:04 PDT
It's definitely the same set of devices, but it might not be the same issue.  I wouldn't necessarily expect it to go away with this patch.
Comment 44 Ed Morley [:emorley] 2012-06-06 08:42:41 PDT
https://hg.mozilla.org/mozilla-central/rev/18c5fc416d4a
Comment 45 Naoki Hirata :nhirata (please use needinfo instead of cc) 2012-06-07 14:08:34 PDT
removing QA wanted.  Not going to root a specific phone w/ CM specifically for this.  Will monitor in Socorro.
Comment 46 Brad Lassey [:blassey] (use needinfo?) 2012-06-14 11:40:57 PDT
Vlad, please nominate this for aurora and beta
Comment 47 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-14 13:03:54 PDT
Comment on attachment 630239 [details] [diff] [review]
better

[Approval Request Comment]
User impact if declined: ugly crashes with flash plugin on some devices, though largely ones with CyanogenMod and other custom firmware
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): none; just adds some more checks to our existing flash blacklisting code
String or UUID changes made by this patch: none
Comment 49 Vladimir Vukicevic [:vlad] [:vladv] 2012-06-18 07:09:34 PDT
Yup, those are all on platforms where we didn't block it -- "p990-ics" is one of them, because it wasn't clear whether that was a custom firmware or an early build of ics for the p990.  The third one there could be a normal motorola kernel; they have a bad habit of building some random git revision without giving it any useful name.  Regardless, none of those crashes would have been stopped by the patch here, but the overall number should have gone down.
Comment 50 Vladimir Vukicevic [:vlad] [:vladv] 2012-07-06 06:35:40 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/b31e3ad7fc16
Comment 51 Scoobidiver (away) 2012-07-16 00:51:53 PDT
Target Milestone is for m-c.
Comment 52 Scoobidiver (away) 2012-09-15 23:45:34 PDT
There are still crashes: https://crash-stats.mozilla.com/report/list?signature=CgDrv_Create

Note You need to log in before you can comment on or make changes to this bug.