Closed Bug 1187056 Opened 4 years ago Closed 4 years ago

test_gallery_edit_photo.py: "Timed out after 10.3 seconds"

Categories

(Core :: Layout, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla42
blocking-b2g 2.5+
Tracking Status
firefox42 --- fixed
b2g-v2.2 --- unaffected
b2g-v2.5 --- verified
b2g-master --- verified

People

(Reporter: onelson, Assigned: kats)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Description:
test_gallery_edit_photo test is failing after opening the gallery app and selecting a photo to embiggen. While on large photo view, test taps the edit button but no action resolves, leaving the test to timeout. Saw this issue reproduce once manually but have not discovered STR at this time.

* http://jenkins1.qa.scl3.mozilla.com/view/UI/job/flame-kk-319.mozilla-central.nightly.ui.functional.non-smoke.2.bitbar/181/HTML_Report/
* http://jenkins1.qa.scl3.mozilla.com/job/flame-kk.ui.adhoc.bitbar/146/HTML_Report/

PreReq:
* picture in gallery app
Repro Steps:
1) Update phone to 20150723010205
2) Open Gallery app
3) Tap picture
4) Tap Edit button

Actual:
Phone does not transition to edit view; test times out

Expected: 
Phone transitions to edit view; test continues editting

Traceback (most recent call last):
File "/var/lib/jenkins/jobs/flame-kk-319.mozilla-central.nightly.ui.functional.non-smoke.2.bitbar/workspace/.env/lib/python2.7/site-packages/marionette_client-0.16-py2.7.egg/marionette/marionette_test.py", line 296, in run
testMethod()
File "/var/lib/jenkins/jobs/flame-kk-319.mozilla-central.nightly.ui.functional.non-smoke.2.bitbar/workspace/tests/python/gaia-ui-tests/gaiatest/tests/functional/gallery/test_gallery_edit_photo.py", line 28, in test_gallery_edit_photo
edit_image = image.tap_edit_button()
File "/var/lib/jenkins/jobs/flame-kk-319.mozilla-central.nightly.ui.functional.non-smoke.2.bitbar/workspace/tests/python/gaia-ui-tests/gaiatest/apps/gallery/regions/fullscreen_image.py", line 92, in tap_edit_button
return EditPhoto(self.marionette)
File "/var/lib/jenkins/jobs/flame-kk-319.mozilla-central.nightly.ui.functional.non-smoke.2.bitbar/workspace/tests/python/gaia-ui-tests/gaiatest/apps/gallery/regions/edit_photo.py", line 27, in __init__
*self._canvas_locator))))
File "/var/lib/jenkins/jobs/flame-kk-319.mozilla-central.nightly.ui.functional.non-smoke.2.bitbar/workspace/.env/lib/python2.7/site-packages/marionette_driver-0.9-py2.7.egg/marionette_driver/wait.py", line 143, in until
cause=last_exc)
TimeoutException: TimeoutException: Timed out after 10.3 seconds


Environmental Variables:
Device firmware (base) 	L1TC000118D0
Device firmware (date) 	23 Jul 2015 08:24:55
Device firmware (incremental) 	eng.cltbld.20150723.042443
Device firmware (release) 	4.4.2
Device identifier 	flame
Device memory 	219772 kB
Device serial 	356cd044
Device uptime 	0 days 0 hours 2 minutes 8 seconds
Gaia date 	22 Jul 2015 18:44:09
Gaia revision 	f04fdbfa1943
Gecko build 	20150723010205
Gecko revision 	2ddec2dedced
Gecko version 	42.0a1

Reproducible manually: Once, cannot reproduce at this time

Repro frequency: 3/6
Ok, I can reproduce this manually too, when you after the Gallery is open, you quickly tap on the image to open it as a fullscreen image.

So this looks like a regression to me. We should find out when this regressed.
Component: Gaia::UI Tests → Gaia::Gallery
Confirmed comment 1. It seems that the bug will only reproduce if I tap into a picture while Gallery is still loading.

Flame 2.2 prevented this behavior. If a certain picture is not ready to enter Edit mode, Edit button is grayed out until it's ready.

Working on getting the window.
QA Contact: pcheng
I cannot reproduce this issue using shallow flashing whatsoever. That means I'll have to full flash to find the window and won't be able to do swaps to find out whether it's Gecko or Gaia issue.
This is also a problem with test_gallery_delete_image.py.
And test_gallery_switch_to_camera.py.
I don't understand why ./flash_pvt.py makes this reproduce, but when I do ./flash_pvt.py -d flame-kk -v mozilla-central --eng -g -G -b 20150624080416 it doesn't.
Flags: needinfo?(pcheng)
Fortunately I don't have to do swaps for this one.

mozilla-inbound regression window:

Last Working
Device: Flame
BuildID: 20150721075041
Gaia: 4fe0507781f3ed56c8ae5e66dd9489165d1ff68e
Gecko: 49ae0961591e
Gonk: 41d3e221039d1c4486fc13ff26793a7a39226423
Version: 42.0a1 (2.5 Master) 
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0

First Broken
Device: Flame
BuildID: 20150721075241
Gaia: 4fe0507781f3ed56c8ae5e66dd9489165d1ff68e
Gecko: fee45cd9a4d9
Gonk: 41d3e221039d1c4486fc13ff26793a7a39226423
Version: 42.0a1 (2.5 Master) 
Firmware Version: v18D
User Agent: Mozilla/5.0 (Mobile; rv:42.0) Gecko/42.0 Firefox/42.0

Gaia is the same so it's a Gecko issue.

Gecko pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=49ae0961591e&tochange=fee45cd9a4d9

This issue is likely caused by changes made in Bug 1178847.
Blocks: 1178847
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pcheng) → needinfo?(ktucker)
Duplicate of this bug: 1188007
[Blocking Requested - why for this release]:

Broken functionality and a regression so nominating this 2.5?

Kartikaya, can you take a look at this please? This looks to have been caused by the changes made for bug 1178847.
blocking-b2g: --- → 2.5?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(bugmail.mozilla)
Thanks, I'll take a look.
Assignee: nobody → bugmail.mozilla
Flags: needinfo?(bugmail.mozilla)
I can reproduce this behaviour very intermittently. When I do, I see in WebIDE that the buttons in question don't have their onclick listeners registered, which seems very odd. I don't see how my change could have caused that behaviour, since the onclick listeners are registered unconditionally in frames.js. I'll keep digging.
Duplicate of this bug: 1188226
So I'm unable to reproduce the problem manually any more. I also can't reproduce it while running test_gallery_edit_photo.py; instead it just times out while waiting for files to load, even though visually I can see on the device that the files are done loading. This is the command I tried:

GAIATEST_SKIP_WARNING=1 GAIATEST_ACKNOWLEDGED_RISKS=1 $MARIONETTE_ENV/bin/gaiatest --address=localhost:2828 --testvars gaiatest/testvars_template.json --restart --timeout 120000 gaiatest/tests/functional/gallery/test_gallery_edit_photo.py

Without a way to reproduce this problem I can't really do much else. I don't think this test/suite has enough documentation on how to properly run it on a local dev machine.
Assignee: bugmail.mozilla → nobody
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #13)
> Without a way to reproduce this problem I can't really do much else. I don't
> think this test/suite has enough documentation on how to properly run it on
> a local dev machine.

What do you mean? You were able to run it, no?
So I locally ran the test using today's nightly build, and I can repro the error.  Basically it times out after tapping the edit button, and waiting for the canvas view.  (Because canvas view doesn't come up).  I tapped the edit button manually after the test failed, and it was still unresponsive.  

Kats, if you need me to try something out here, let me know.  It's strange that this happens with low repro rate when I do it manually.
(In reply to Martijn Wargers [:mwargers] (QA) from comment #14)
> What do you mean? You were able to run it, no?

Yes, but the environment on the device is not specified in enough detail. It loads pictures off the SD card and in my case I have around 80-100 pictures that I took yesterday while trying to reproduce this bug. It seems to me that the test is not expecting that, which is why it's timing out for me. I don't know what set of pictures the test is expecting, and it doesn't appear to be documented anywhere.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #16)
> (In reply to Martijn Wargers [:mwargers] (QA) from comment #14)
> > What do you mean? You were able to run it, no?
> 
> Yes, but the environment on the device is not specified in enough detail. It
> loads pictures off the SD card and in my case I have around 80-100 pictures
> that I took yesterday while trying to reproduce this bug. It seems to me
> that the test is not expecting that, which is why it's timing out for me. I
> don't know what set of pictures the test is expecting, and it doesn't appear
> to be documented anywhere.

Oh, I see. That sounds like a bug in the framework. It should actually delete those pictures and only use the picture that was added in the test.
So what I found out was that I can only repro the bug once for each cold startup.  I have to shutdown and power up the phone and run the script to repro the bug, and after that, I cannot reproduce this unless I turn off the phone again.  Also I am on 319MB config as well.  mwargers is investigating 512MB setup.
(In reply to No-Jun Park [:njpark] from comment #18)
>  Also I am on 319MB config as well.  mwargers is investigating
> 512MB setup.

Yeah, I couldn't reproduce after 20 retries, but I could reproduce it 2 times out of 10 using 319MB.
If one would want to reproduce manually, I would recommend trying with 256MB.
(In reply to Martijn Wargers [:mwargers] (QA) from comment #17)
> Oh, I see. That sounds like a bug in the framework. It should actually
> delete those pictures and only use the picture that was added in the test.

It's actually strange that it doesn't do that, because in bug 1044019 this was done supposedly.
Test blocker.
blocking-b2g: 2.5? → 2.5+
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][memory-failure]
Ok, I tried on a flame-kk and I can reproduce the test failure there. For some reason it doesn't repro on Aries. Anyway I can flash a local gecko build on top of a stock pvtbuild and still repro so I do have a chance at debugging this now. I tried flashing a full local build but that made the issue go away. Haven't tried flashing just gaia yet.
So what appears to be happening is that the gallery.js code is loaded before the meta-viewport tag has taken effect. This results in ScreenLayout.getCurrentLayout('tiny') returning false (because the innerWidth is 980) and so the fullscreenButtons array gets populated with empty things. This causes the test to fail later on.

I'm not really sure why the meta-viewport tag is getting processed so late, since it's right at the top of the HTML file, before the <script> tags. Need to dig some more.
Assignee: nobody → bugmail.mozilla
Attached patch Patch (obsolete) — Splinter Review
Ok, so the problem is the async-ness of the DOMMetaAdded dispatch at [1]. The meta viewport tag is parsed before the JS is run, but the DOMMetaAdded event itself isn't dispatched until after, so the MobileViewportManager doesn't update the viewport until after.

I'm not entirely sure how this was working prior to bug 1178847 since it seems like it should have had the same problem. Nonetheless I think the correct fix is to add a synchronous notification of the meta-viewport tag so that we can update the screen properties immediately. The patch also has the side benefit of not running MobileViewportManager code with random non-viewport meta tags are added, which is nice.

However I'm not entirely sure it's ok to be running a reflow synchronously from the DOM building code, if it isn't then I don't see a way to fix this in gecko and it will have to be fixed in the gallery app. Smaug, do you know if this is ok?

Try push in progress at https://treeherder.mozilla.org/#/jobs?repo=try&revision=341db37de415

[1] http://mxr.mozilla.org/mozilla-central/source/dom/html/HTMLMetaElement.cpp?rev=bd079aadd3fe#97
Attachment #8641169 - Flags: feedback?(bugs)
Comment on attachment 8641169 [details] [diff] [review]
Patch

This is unfortunately most probably not safe.
We won't reflow at that point, since there are script blockers on stack, but
doing anything like MetaViewportChanged() during BindToTree looks
scary to me. One would need to explicitly prove that MetaViewportChanged call may not run scripts.

So either prove that and assert somewhere hard (RELEASE_ASSERT) that script aren't run, nor events dispatched etc.
 or
Would it help if HTMLMetaElement::CreateAndDispatchEvent didn't use 
asyncDispatcher->PostDOMEvent(); but 
asyncDispatcher->RunDOMEventWhenSafe() ?
Attachment #8641169 - Flags: feedback?(bugs) → feedback-
Attached patch Patch v2Splinter Review
Yeah using RunDOMEventWhenSafe also fixes it, thanks for the tip! Try push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe342ec1f1f9
Attachment #8641169 - Attachment is obsolete: true
Attachment #8641210 - Flags: review?(bugs)
Comment on attachment 8641210 [details] [diff] [review]
Patch v2

r+, naturally assuming tryserver results look good
Attachment #8641210 - Flags: review?(bugs) → review+
Try looks good. There's a frequent M-3 failure on B2G but looking at the "similar jobs" pane shows a lot of other pushes from the same time hit that too. More retriggers on it are coming up green, so I assume some glitch in the environment caused those.

I'll land it once inbound reopens.
Component: Gaia::Gallery → Layout
Product: Firefox OS → Core
adding qawanted to test out with gaia ui test once it lands.
Keywords: qawanted
QA Contact: pcheng
https://hg.mozilla.org/mozilla-central/rev/8fa22ffa2da2
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
I just tested locally test_gallery_edit_photo.py and repeated 10 times. I didn't get any failure. Normally, I would definitely see this failure within 5 times, so I think this is really fixed.

Thanks for fixing this, Kats!
Keywords: qawanted
This issue is verified as fixed on Flame and Aries. Ran the 'test_gallery_edit_photo.py' locally 10 times on each device/branch and they all passed without issue.

Device: Aries 2.6
BuildID: 20151210122424
Gaia: 7e962276913bd4da7ce5fa7540767107ce322c78
Gecko: 412e4d7ce98ca4dbc37de133d0f26d7e1a59946f
Gonk: a19052e4389c3ae2d8fc3e7a74a475401baacc56
Version: 45.0a1 (2.6) 
Firmware Version: D5803_23.1.A.1.28_NCB.ftf
User Agent: Mozilla/5.0 (Mobile; rv:45.0) Gecko/45.0 Firefox/45.0

Device: Flame 2.6
BuildID: 20151210030222
Gaia: 961528f4391668bc89ec0be14fa367cea099b588
Gecko: b40ba117fa757861c9caa660ae989122718b494b
Gonk: 205ac4204bbbb2098a8046444acba551ba5dc75a
Version: 45.0a1 (2.6) 
Firmware Version: v18Dv4
User Agent: Mozilla/5.0 (Mobile; rv:45.0) Gecko/45.0 Firefox/45.0

Device: Flame 2.5
BuildID: 20151209170211
Gaia: 7ca639a7bb0bacf27f548841c52617bfc0e3b21f
Gecko: a35e8eb98969970d1af28b265bf99a9edd11e9c2
Gonk: 205ac4204bbbb2098a8046444acba551ba5dc75a
Version: 44.0a2 (2.5) 
Firmware Version: v18Dv4
User Agent: Mozilla/5.0 (Mobile; rv:44.0) Gecko/44.0 Firefox/44.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+][memory-failure] → [QAnalyst-Triage?][memory-failure]
Flags: needinfo?(jmercado)
Flags: needinfo?(jmercado)
QA Whiteboard: [QAnalyst-Triage?][memory-failure] → [QAnalyst-Triage+][memory-failure]
You need to log in before you can comment on or make changes to this bug.