Closed Bug 1030803 Opened 5 years ago Closed 5 years ago

crash in @0x0 | MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)

Categories

(Core :: Panning and Zooming, defect, critical)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla34
blocking-b2g 2.0+
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- fixed
b2g-v1.4 --- unaffected
b2g-v2.0 --- verified
b2g-v2.1 --- verified

People

(Reporter: astole, Assigned: botond)

References

()

Details

(Keywords: crash, regression, reproducible, Whiteboard: [osrestartcrash][b2g-crash])

Crash Data

Attachments

(4 files)

Attached file logcat
This bug was filed from the Socorro interface and is 
report bp-ede849dc-e266-48a4-b466-3fd502140626.
=============================================================
While viewing an image in a pop up window, the device will crash when zooming in and out.

Repro Steps:
1) Update a Flame to BuildID: 20140626040205
2) Go to a website with images that open via a pop up window
3) Tap on image to open the pop up window
4) Zoom in and out a few times while viewing the image

Actual:
Device crashes while zooming in and out

Expected:
Device does not crash while zooming in and out

2.1 Environmental Variables:
Device: Flame 2.1
BuildID: 20140626040205
Gaia: 87a7746568ac5708e828026160c0732ba252300f
Gecko: c43be7e4ec49
Version: 33.0a1
Firmware Version: v122

Repro frequency: 100%
See attached: Video, logcat
Attaching video and adding qawanted to check the other devices and builds.

User Agent: Mozilla/5.0 (Mobile; rv:33.0) Gecko/33.0 Firefox/33.0

Additional information to the repro steps:
The website used in Step 2 is luxology.com (same website used in the video).
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qawanted
nomming as 2.1 blocker - Crash with a 100% repro
blocking-b2g: --- → 2.1?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Component: Gaia::Browser → DOM: Content Processes
Product: Firefox OS → Core
QA Contact: rpribble
This issue reproduces on the Buri v2.1 MOZ ril, Flame v2.0 MOZ ril, Buri v2.0 MOZ ril.

v2.1 Environmental Variables:
Device: Buri v2.1 MOZ ril
BuildID: 20140626040205
Gaia: 87a7746568ac5708e828026160c0732ba252300f
Gecko: c43be7e4ec49
Version: 33.0a1
Firmware Version: v1.2-device.cfg
User Agent: Mozilla/5.0 (Mobile; rv33.0) Gecko/33.0 Firefox/33.0

v2.0 Environmental Variables:
Device: Flame v2.0 MOZ
BuildID: 20140627000201
Gaia: 8df02268fcd7e80c5fab8c1ec099772e37f8659d
Gecko: 731a5e8831e6
Version: 32.0a2
Firmware Version: v121-2
User Agent: Mozilla/5.0 (Mobile; rv32.0) Gecko/32.0 Firefox/32.0

v2.0 Environmental Variables:
Device: Buri v2.0 MOZ
BuildID: 20140626000202
Gaia: 6a1373340b40fcfe901336bc9e80676e5f2ba979
Gecko: 82ef9bf64d87
Version: 32.0a2
Firmware Version: v1.2-device.cfg
User Agent: Mozilla/5.0 (Mobile; rv32.0) Gecko/32.0 Firefox/32.0

A crash occurs when a pop up image is zoomed in, panned back and forth, and zoomed out repeatedly.

----------------------------------------------------------------------

I am unable to reproduce the crash on the Flame v1.4 MOZ ril, or the Buri v1.4 MOZ ril.

v1.4 Environmental Variables:
Device: Flame v1.4 MOZ
BuildID: 20140625000201
Gaia: c9416de14acf9e94ab006619cd2418c768422fcb
Gecko: cddf88f78632
Version: 30.0
Firmware Version: v121-2
User Agent: Mozilla/5.0 (Mobile; rv30.0) Gecko/30.0 Firefox/30.0

v1.4 Environmental Variables:
Device: Buri v1.4 MOZ
BuildID: 20140626000203
Gaia: a054aa221df20ca550e6c67f3fbb20d0ad086598
Gecko: e054ac1b5408
Version: 30.0
Firmware Version: v1.2-device.cfg
User Agent: Mozilla/5.0 (Mobile; rv30.0) Gecko/30.0 Firefox/30.0

Lag is seen when a pop up image is zoomed in, panned back and forth, and zoomed out repeatedly, and the image is very slow to load in correctly, but no crash occurs.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: qawanted
switching my nom from 2.1 to 2.0
blocking-b2g: 2.1? → 2.0?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
blocking-b2g: 2.0? → 2.0+
Keywords: reproducible
striking regression-window unless specifically requested. The repro rate is a bit deceptive, and repro's can take 10+ minutes of zooming in and out aggressively. The time required for a regression-window seems prohibitive here.
QA Whiteboard: [QAnalyst-Triage+]
(In reply to Joshua Mitchell from comment #6)
> striking regression-window unless specifically requested. The repro rate is
> a bit deceptive, and repro's can take 10+ minutes of zooming in and out
> aggressively. The time required for a regression-window seems prohibitive
> here.

Andrew - Is the information given here enough to go off to fix this?
Flags: needinfo?(overholt)
(In reply to Jason Smith [:jsmith] from comment #7)
> (In reply to Joshua Mitchell from comment #6)
> > striking regression-window unless specifically requested. The repro rate is
> > a bit deceptive, and repro's can take 10+ minutes of zooming in and out
> > aggressively. The time required for a regression-window seems prohibitive
> > here.
> 
> Andrew - Is the information given here enough to go off to fix this?

Probably not but it's a good start.  Is there a chance of catching this with a debug build?  Can it be reproduced with other pages?  Is there any way we can get a reduced test case that reproduces the problem?

kats is probably interested from an APZC standpoint.
Flags: needinfo?(overholt)
The STR needs to be spelled out here more. What site is being used for testing here?
Keywords: qawanted
Repro Steps:
1) Open Browser
2) Go to http://luxology.com/
3) Tap on image to open the pop up window
4) Zoom in and out while viewing the image
Keywords: qawanted
Whiteboard: [osrestartcrash][b2g-crash]
(In reply to Joshua Mitchell (Joshua_M) from comment #10)
> Repro Steps:
> 1) Open Browser
> 2) Go to http://luxology.com/
> 3) Tap on image to open the pop up window
> 4) Zoom in and out while viewing the image

Can we try seeing if this reproduces more easily with an etherpad? Etherpad has been known to be a stress point for zoom in & out, so that might this bug easier to reproduce.
QA Whiteboard: [QAnalyst-Triage+]
Keywords: qawanted
QA Contact: rpribble → jmitchell
Attached file logcat.txt
(In reply to Jason Smith [:jsmith] from comment #11)

> Can we try seeing if this reproduces more easily with an etherpad? Etherpad
> has been known to be a stress point for zoom in & out, so that might this
> bug easier to reproduce.

I WAS able to repro this with etherpad BUT I was not able to generate a crash report to compare with this bug, instead the device would just reboot. I am attaching a crash-log from this.

The was able to repeat this 3 times and the average repro time was 3 minutes.

If this is determined to be a legitimate repro of this bug then it IS feasible to get a regression-window using this method (just still time consuming)
Keywords: qawanted
sorry - I meant to say I attached a logcat from this, and NOT a crash-log
I'll see if I can reproduce this w/ gdb attached.
Flags: needinfo?(erahm)
After about 25 minutes of testing I managed to get a crash, but GDB failed to intercept it. The assert I got was in APZ in the b2g process:

> F/MOZ_Assert( 2147): Assertion failure: false (In an OVERSCROLL_BOTH condition in ScaleWillOverscrollAmount), at ../../../gecko/gfx/layers/apz/src/Axis.cpp:280

Hopefully that's a good starting point, I'm not sure if it's your original issue but it's definitely a real issue that would manifest in the wild.

kats, any thoughts on this?
Flags: needinfo?(erahm) → needinfo?(bugmail.mozilla)
I would need a bunch more information in order to figure out what's going on. As far as I know that assertion should never trip, so we would need to figure out why it gets tripped and what the right fix is. Unless we have reliable STR it's going to be pretty hard.
Flags: needinfo?(bugmail.mozilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)
> I would need a bunch more information in order to figure out what's going
> on. As far as I know that assertion should never trip, so we would need to
> figure out why it gets tripped and what the right fix is. Unless we have
> reliable STR it's going to be pretty hard.

Could that assertion imply that this is a regression potentially from bug 1020045?
Flags: needinfo?(bugmail.mozilla)
Not necessarily. The concept of overscroll that the assertion is referring to, and the assertion itself, predate bug 1020045 by quite a bit. It's more likely to be some sort of rounding error resulting from the zooming in/out.
Flags: needinfo?(bugmail.mozilla)
I was 'testing the waters' to see the feasibility of a regression window using the etherpad repro and I was able to crash it with a crash report. 
https://crash-stats.mozilla.com/report/index/5657c5fe-7bb6-42e8-8c80-5b0fe2140710

This report indicates that it is related to bug 1031355 and did not point to this bug so it seems that the Etherpad method does not repro this crash. I'll go update that bug with the information that I have.

Environmental Variables:
Device: Flame Master
Build ID: 20140522193023
Gaia: b61129780e085636d09406f2a46e922d0f8b9757
Gecko: e9b2b72f4e6c
Version: 32.0a1 (Master)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage+][lead-review+]
Let's assume to start that the assertion is where the problem lies (rather than something gets us in a bad state and the assertion is a way down the line side effect, which is also a possibility.)
Component: DOM: Content Processes → Panning and Zooming
Assignee: nobody → botond
It's not clear whether the assertion mentioned in comment 16 is related to the original problem (the crash is DeferOrRunPendingTask()).

The likely cause of the assertion is floating-point error. I notice that we are not using any EPSILON in comparisons leading up to the failing condition. 

I will post a patch that introduces some EPSILONs to this code. That should fix the assertion, and then I suppose we can see if the original problem persists.
Attachment #8458186 - Flags: review?(bugmail.mozilla) → review+
(In reply to Botond Ballo [:botond] from comment #24)
> Try push: https://tbpl.mozilla.org/?tree=Try&rev=4bd32b9efe38

This Try push does not seem to have worked. Trying again: https://tbpl.mozilla.org/?tree=Try&rev=13b939bfdece
Landed assertion fix: https://hg.mozilla.org/integration/mozilla-inbound/rev/30169511da15

Marking bug as leave-open, so we can verify whether this fixes the original issue before closing.
Keywords: leave-open
Rachel, can you reproduce this manually on 2.1? with the latest nightly?  Andrew, any crash reports (I know it's early :) with the latest nightly?
Flags: needinfo?(rpribble)
Flags: needinfo?(astole)
Please feel free to reassign to me if there is a remaining issue and it looks APZ-related.
Assignee: botond → nobody
I'd rather mark this closed and then reopen it (or better, file a follow-up) if it didn't fix the original issue.
Status: NEW → RESOLVED
Closed: 5 years ago
Keywords: leave-open
Resolution: --- → FIXED
(Although we probably don't want to uplift this until it's verified)
Whiteboard: [osrestartcrash][b2g-crash] → [osrestartcrash][b2g-crash][NO_UPLIFT]
Assignee: nobody → botond
Target Milestone: --- → mozilla34
Josh - We'll need to verify this before this can be uplifted. Can have someone from your team verify this patch on trunk?
Flags: needinfo?(jmitchell)
Will do! I'll get someone on it soon
Flags: needinfo?(jmitchell)
Tested it myself - I aggressively zoomed and panned until my fingers were numb (about 30-35 minutes) and could not get this crash. I would say this issue is indeed fixed.

Environmental Variables:
Device: Flame Master
Build ID: 20140722065443
Gaia: e423c3be8d19c9a8a5ae2571f499c36dc6b0df89
Gecko: 14a98501048c
Version: 34.0a1 (Master)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0
QA Whiteboard: [QAnalyst-Triage+][lead-review+] → [QAnalyst-Triage?][lead-review+]
Flags: needinfo?(rpribble)
Flags: needinfo?(pbylenga)
Flags: needinfo?(astole)
Thanks!
Whiteboard: [osrestartcrash][b2g-crash][NO_UPLIFT] → [osrestartcrash][b2g-crash]
QA Whiteboard: [QAnalyst-Triage?][lead-review+] → [QAnalyst-Triage+][lead-review+]
Flags: needinfo?(pbylenga)
This issue has been successfully verified on Flame 2.1.
See attachment: verified_v2.1.MP4.
Reproduce rate: 0/5

Steps:
1) Open Browser
2) Go to http://luxology.com/
3) Tap on image to open the pop up window
4) Zoom in and out while viewing the image,many times.

Flame 2.1 versions:
Gaia-Rev        ccb49abe412c978a4045f0c75abff534372716c4
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/18fb67530b22
Build-ID        20141130001203
Version         34.0
Device-Name     flame
FW-Release      4.4.2
FW-Incremental  eng.cltbld.20141130.034738
FW-Date         Sun Nov 30 03:47:49 EST 2014
Bootloader      L1TC00011880
This issue has been successfully verified on Flame 2.0.
Reproduce rate: 0/5

Flame 2.0 version:
Gaia-Rev        8d1e868864c8a8f1e037685f0656d1da70d08c06
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/c756bd8bf3c3
Build-ID        20141201000201
Version         32.0
Device-Name     flame
FW-Release      4.4.2
FW-Incremental  eng.cltbld.20141201.034308
FW-Date         Mon Dec  1 03:43:18 EST 2014
Bootloader      L1TC00011880
You need to log in before you can comment on or make changes to this bug.