Closed Bug 740778 Opened 12 years ago Closed 12 years ago

Random whole-device lock-ups on Adreno-chipset devices

Categories

(Firefox for Android Graveyard :: General, defect, P1)

ARM
Android
defect

Tracking

(blocking-fennec1.0 beta+)

RESOLVED DUPLICATE of bug 741984
Tracking Status
blocking-fennec1.0 --- beta+

People

(Reporter: cwiiis, Assigned: cwiiis)

References

Details

Attachments

(1 file)

At some point this week, Fennec has regressed so that devices with dodgy Adreno drivers (most pre-ICS HTC devices, amongst others) lock up unrecoverably at random points.

Steps to reproduce are hard to describe, as it can sometimes takes several minutes, even hours, other times it can happen very quickly.

I seem to crash going to this link quite often: https://wiki.mozilla.org/Mobile/Notes/28-Mar-2012#Chris_Lord_.28cwiiis.29 and I've crashed on other URLs too.

Possible STR:

1. Go to https://wiki.mozilla.org/Mobile/Notes/28-Mar-2012#Chris_Lord_.28cwiiis.29
2. Wait

The crash manifests as the screen not updating anymore (you can sometimes lock the device afterwards, but you cannot unlock and it will eventually reboot itself), and quite often, corruption shows on the screen (photo attached).


I can reproduce this on current Nightly, and it seemed to manifest somewhere around the 28th/29th on my local builds.
Possible duplicate of bug 739584?
blocking-fennec1.0: --- → ?
I was able to reproduce this issue on the latest Nightly build, doing the same steps as in comment #0, so I assume that we have at least one certain way to reproduce it. Device is totally locked so I cannot extract the logs. It seems that it would stay like that forever and I have no choice to fix it than to pull out the battery.

Also I tried to reproduce it on Samsung Captivate and sometimes oom occurs, but the device is still usable.

--
Firefox 14.0a1 (2012-03-30)
Device: HTC Desire Z
OS: Android 2.3.3
I couldn't reproduce this issue using the build from 03/29, but it's pretty easy to reproduce it using the Nightly from 03/30 by performing the steps from comment #0.

First I thought that it might happen because of the merge since 03/14, but it seems that I was wrong.

A possible regression range is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1965a2c89d61&tochange=92fe907ddac8
Getting a tighter regression range and better steps to reproduce are likely the only way we can make progress here.
blocking-fennec1.0: ? → beta+
Nicolae, Kbrosnan: can you look for a tighter regression range to assist this bug?  As always, tighter repro also.
(In reply to Tony Chung [:tchung] from comment #5)
> Nicolae, Kbrosnan: can you look for a tighter regression range to assist
> this bug?  As always, tighter repro also.

I can always reproduce this bug by performing the following steps:
1. Go to http://goo.gl/50QyT
2. Zoom in at the maximum level
3. Perform a short pan
4. Wait

I narrowed down the regression range from 03/30 using the tinderbox builds and I have found a new regression range. It seems that this issue occurs since 03/29:

- good build:
20120329031156
http://hg.mozilla.org/mozilla-central/rev/1965a2c89d61

-bad build:
20120329083924
http://hg.mozilla.org/mozilla-central/rev/ff3521bc6559

Possible regression range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1965a2c89d61&tochange=ff3521bc6559

I uploaded a video while I was reproducing this bug on the Nightly from 03/29 (http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1333035564/): http://youtu.be/HZiDIEnWcyk?hd=1

Sometimes just before the device will lock-up, this error occurs in console: 
04-04 17:25:44.290: E/HtcEbdLog(1259): [write_kernel_log] Need Rotate Log! g_outByteCount = 1048645

In general, when the device will lock-up, the console just stops to update new events and it will be impossible to get any logs at this point. Please let me know if I can help more about this bug.
blocking-fennec1.0: beta+ → ?
(In reply to Cristian Nicolae (:xti) from comment #6)
> (In reply to Tony Chung [:tchung] from comment #5)
> I narrowed down the regression range from 03/30 using the tinderbox builds
> and I have found a new regression range. It seems that this issue occurs
> since 03/29:
> 
> - good build:
> 20120329031156
> http://hg.mozilla.org/mozilla-central/rev/1965a2c89d61
> 
> -bad build:
> 20120329083924
> http://hg.mozilla.org/mozilla-central/rev/ff3521bc6559
> 
> Possible regression range:
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=1965a2c89d61&tochange=ff3521bc6559

This is useful, but still quite a huge range (99 commits) - is there any chance of narrowing this down any further?

> Sometimes just before the device will lock-up, this error occurs in console: 
> 04-04 17:25:44.290: E/HtcEbdLog(1259): [write_kernel_log] Need Rotate Log!
> g_outByteCount = 1048645
> 
> In general, when the device will lock-up, the console just stops to update
> new events and it will be impossible to get any logs at this point. Please
> let me know if I can help more about this bug.

Googling, g_outByteCount is a global variable used in logcat.cpp to store how many bytes have been output to logcat (over a certain number (16k, by default), and it rotates the log files). This may not be useful, but looking at the output, it seems as if there must have been a single massive (1 megabyte) write to the log... Perhaps this is causing the issue? This is an incredibly long shot, however.

This could be tested by writing a patch that overrides the formatted print functions to not do anything(?)
From that range I see bug 735230, 739604
I hope it isn't bug 739604, but if it is, it ought to have been fixed by bug 741984 (though if it's that, it's likely to be a driver bug that we're triggering...)

I'm not sure what in bug 735230 would cause such a thing. We'll no more once we've narrowed down the range.
blocking-fennec1.0: ? → beta+
Assignee: nobody → chrislord.net
Depends on: 741984
Cristian narrowed down the regression range to http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ec441303e32e&tochange=2f0536c9c497, and also found that the issue is resolved in the latest inbound build.

Based on this, the most plausible explanation is the one suggested by Chris in Comment 9: this was caused by Bug 739604, and fixed by Bug 741984.

This means that on Adreno, calling TexImage2D very frequently (which is what happened with Bug 739604, and what was fixed by Bug 741984) can lock up the device.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: