Closed
Bug 800102
Opened 13 years ago
Closed 13 years ago
Emulator sometimes stalls while Marionette is installing a new gecko in it resulting in a "Timeout exceeded for _checkCmd call"
Categories
(Testing :: Marionette Client and Harness, defect)
Testing
Marionette Client and Harness
Tracking
(firefox18 fixed, firefox19 fixed)
RESOLVED
FIXED
mozilla19
People
(Reporter: jgriffin, Assigned: ahal)
References
Details
(Keywords: intermittent-failure)
Attachments
(3 files, 3 obsolete files)
892 bytes,
patch
|
ahal
:
review+
bajaj
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
4.33 KB,
patch
|
ahal
:
review+
|
Details | Diff | Splinter Review |
3.62 KB,
patch
|
jgriffin
:
review+
|
Details | Diff | Splinter Review |
When passing --gecko-path to Marionette, the "adb push" call which copies a new gecko to the emulator sometimes stalls for an indefinite period. It usually can be woken up again by doing a few simple adb calls in another terminal, e.g., "adb shell".
I have no idea why this happens, but I have a patch that fixes it on my machine.
Reporter | ||
Comment 1•13 years ago
|
||
I have no idea why this works, but I haven't seen this issue at all with this patch.I didn't do a more aggressive 'rm -rf *' because it deletes files that aren't part of the separate gecko builds...I'm guessing they're unpacked during the initial boot.
Attachment #670091 -
Flags: review?(ahalberstadt)
Reporter | ||
Updated•13 years ago
|
Assignee: nobody → jgriffin
Assignee | ||
Comment 2•13 years ago
|
||
Comment on attachment 670091 [details] [diff] [review]
Delete *.so files before updating gecko in the emulator
Review of attachment 670091 [details] [diff] [review]:
-----------------------------------------------------------------
r+. I was hoping to test it out, but by murphy's law I stopped being able to reproduce the issue when I went to do so. But this change shouldn't hurt anything at the least so if it works for you then lgtm.
Attachment #670091 -
Flags: review?(ahalberstadt) → review+
Reporter | ||
Comment 3•13 years ago
|
||
Target Milestone: --- → mozilla18
Reporter | ||
Comment 4•13 years ago
|
||
Comment on attachment 670091 [details] [diff] [review]
Delete *.so files before updating gecko in the emulator
[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A
User impact if declined: We won't be able to run Marionette tests on TBPL on aurora without this patch.
Testing completed (on m-c, etc.): N/A
Risk to taking this patch (and alternatives if risky): None
String or UUID changes made by this patch: N/A
Attachment #670091 -
Flags: approval-mozilla-aurora?
Comment 5•13 years ago
|
||
Comment on attachment 670091 [details] [diff] [review]
Delete *.so files before updating gecko in the emulator
Approving , as this is much needed for tests
Attachment #670091 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Reporter | ||
Comment 6•13 years ago
|
||
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 7•13 years ago
|
||
Sadly, this appears not be fixed, as it's showing up on cedar.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 8•13 years ago
|
||
Will land this on cedar to see if it helps.
Reporter | ||
Comment 9•13 years ago
|
||
Comment 10•13 years ago
|
||
Reporter | ||
Comment 11•13 years ago
|
||
Reporter | ||
Comment 12•13 years ago
|
||
Pushed another attempted fix to cedar: https://hg.mozilla.org/projects/cedar/rev/0705e84d547d
Reporter | ||
Comment 13•13 years ago
|
||
Reporter | ||
Updated•13 years ago
|
Attachment #673030 -
Attachment is obsolete: true
Reporter | ||
Comment 14•13 years ago
|
||
Reporter | ||
Comment 15•13 years ago
|
||
This is a modified version of a patch which has been running successfully on cedar for about 5 days, with no occurrences of this problem.
The 'fix' (really workaround) is to first copy the files to the /data partition, then copy them from there to /system/b2g using 'dd'. Additionally, the initial copy must be performed *before* b2g is shut down. The reason is that 'adb push' is much slower while b2g is running, and this seems to help prevent the deadlock.
Attachment #676710 -
Flags: review?(ahalberstadt)
Reporter | ||
Updated•13 years ago
|
Attachment #674700 -
Attachment is obsolete: true
Reporter | ||
Updated•13 years ago
|
Attachment #675663 -
Attachment is obsolete: true
Assignee | ||
Comment 16•13 years ago
|
||
Comment on attachment 676710 [details] [diff] [review]
Work around adb deadlock when installing gecko in emulator,
Review of attachment 676710 [details] [diff] [review]:
-----------------------------------------------------------------
r+, keeping in mind that these changes will need to be upstreamed to the mozdevice emulator.py
::: testing/marionette/client/marionette/emulator.py
@@ +387,5 @@
> + # partition.
> + for root, dirs, files in os.walk(gecko_path):
> + for filename in files:
> + data_local_file = os.path.join('/data/local', filename)
> + print 'pushing', data_local_file
Were these print statements left on purpose? Won't this generate a lot of redundant logs?
Attachment #676710 -
Flags: review?(ahalberstadt) → review+
Reporter | ||
Comment 17•13 years ago
|
||
I left the print statements in so if it hangs on a TPBL job, we can at least see where it hangs (e.g., which copying stage and which file).
I've filed bug 807119 to port this (and other changes) to mozdevice's emulator classes.
Reporter | ||
Comment 18•13 years ago
|
||
Since this touches setup.py, I'm taking the precaution of running it through try: https://tbpl.mozilla.org/?tree=Try&rev=a6be101c6d42&noignore=1
Reporter | ||
Comment 19•13 years ago
|
||
try run 100% green
Reporter | ||
Comment 20•13 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Whiteboard: [automation-needed-in-aurora]
Reporter | ||
Comment 21•13 years ago
|
||
Updated•13 years ago
|
status-firefox19:
--- → fixed
Target Milestone: mozilla18 → mozilla19
Assignee | ||
Comment 23•13 years ago
|
||
I saw this happen once again on Cedar with reftests:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16759480&tree=Cedar
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 24•13 years ago
|
||
cc'ing sheriffs since it's possible this will occur in the wild before it's fixed.
The output in the log when this happens is something like:
11:53:55 INFO - pushing /data/local/omni.ja
11:53:55 INFO - Traceback (most recent call last):
11:53:55 INFO - File "runreftestb2g.py", line 563, in <module>
11:53:55 INFO - sys.exit(main())
11:53:55 INFO - File "runreftestb2g.py", line 483, in main
11:53:55 INFO - marionette = Marionette(**kwargs)
11:53:55 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/marionette.py", line 148, in __init__
11:53:55 INFO - self.emulator.install_gecko(self.gecko_path, self)
11:53:55 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/marionette/emulator.py", line 392, in install_gecko
11:53:55 INFO - self.dm.pushFile(os.path.join(root, filename), data_local_file)
11:53:55 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/mozdevice/devicemanagerADB.py", line 180, in pushFile
11:53:55 INFO - self._checkCmd(["push", os.path.realpath(localname), destname])
11:53:55 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/mozdevice/devicemanagerADB.py", line 769, in _checkCmd
11:53:55 INFO - raise DMError("Timeout exceeded for _checkCmd call")
11:53:55 INFO - mozdevice.devicemanager.DMError: Timeout exceeded for _checkCmd call
Comment 25•13 years ago
|
||
Blocks: 438871
Whiteboard: [automation-needed-in-aurora] → [automation-needed-in-aurora][orange]
Comment 26•13 years ago
|
||
Summary: Emulator sometimes stalls while Marionette is installing a new gecko in it → Emulator sometimes stalls while Marionette is installing a new gecko in it resulting in a "Timeout exceeded for _checkCmd call"
Comment 27•13 years ago
|
||
Comment 28•13 years ago
|
||
Comment 29•13 years ago
|
||
Comment 30•13 years ago
|
||
Comment 31•13 years ago
|
||
Comment 32•13 years ago
|
||
Assignee | ||
Comment 33•13 years ago
|
||
I'll take a look at this, but I'm having trouble reproducing it locally. I may just have to implement a system that will retry the install-gecko step if it fails.
Assignee: jgriffin → ahalberstadt
Status: REOPENED → ASSIGNED
Comment 34•13 years ago
|
||
Comment 35•13 years ago
|
||
Assignee | ||
Comment 36•13 years ago
|
||
Pushed a fix to cedar: https://tbpl.mozilla.org/?tree=Cedar&rev=1bc7d82bebf2
I'll re-trigger a bunch of times to try and reproduce. I'm pretty sure this will get around the problem (though I can't say I'm super happy about it).
Assignee | ||
Comment 37•13 years ago
|
||
Attachment #678858 -
Flags: review?(jgriffin)
Reporter | ||
Comment 38•13 years ago
|
||
Comment on attachment 678858 [details] [diff] [review]
Patch 1.0 - Retry adb push if it times out when installing gecko
Review of attachment 678858 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks. Third time's the charm!
Attachment #678858 -
Flags: review?(jgriffin) → review+
Reporter | ||
Comment 39•13 years ago
|
||
Comment 40•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago → 13 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Reporter | ||
Comment 41•13 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/c1e430ff299e
https://hg.mozilla.org/releases/mozilla-aurora/rev/7e9e4acbf58b
status-firefox18:
--- → fixed
Whiteboard: [automation-needed-in-aurora][orange] → [orange]
Comment 42•13 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #36)
> Pushed a fix to cedar: https://tbpl.mozilla.org/?tree=Cedar&rev=1bc7d82bebf2
>
> I'll re-trigger a bunch of times to try and reproduce. I'm pretty sure this
> will get around the problem (though I can't say I'm super happy about it).
So ahal, this on cedar somehow caused an additional "head" on "default" with hg.
Two options which I'd like to do one of for us:
(a) Merge this change to cedar's tip, along with a current m-c
(b) Backout this change, *then* merge the head down to tip [effectively doing a no-op -- I think, would verify]
So to summarize, "do we still want this changeset on cedar"?
This does block me asking IT to [make sure and] enable the single-head hook.
Flags: needinfo?(ahalberstadt)
Reporter | ||
Comment 43•13 years ago
|
||
Yes, we still want this on cedar, so I think 1) is the way to go.
Assignee | ||
Comment 44•13 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #42)
> So to summarize, "do we still want this changeset on cedar"?
Yes, sorry about that
Flags: needinfo?(ahalberstadt)
Comment 45•13 years ago
|
||
Just to round back, we talked about this on IRC. the patch landed on cedar conflicts with what is currently on cedar's tip (which likely includes the patch that was pushed to m-c)
So we'll just merge this head away, and keep what is/was on m-c instead. I'll take care of that.
Flags: needinfo?(bugspam.Callek)
Comment 46•13 years ago
|
||
hg heads
hg up -r 112389
hg commit --close-branch -m "rogue head"
hg up -r default
hg ident
hg pull http://hg.mozilla.org/mozilla-central
hg merge
hg commit
hg push
Rogue head closed.
Flags: needinfo?(bugspam.Callek)
Comment 47•13 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #46)
> hg heads
> hg up -r 112389
> hg commit --close-branch -m "rogue head"
> hg up -r default
> hg ident
> hg pull http://hg.mozilla.org/mozilla-central
> hg merge
> hg commit
> hg push
>
> Rogue head closed.
This extra head will need to be dummy merged rather than closed, if cedar ever needs to merge to mozilla-central (due to the single head hook). Or is cedar just a testing-ground and accepted changes going to be transplanted/cherry-picked to m-c?
Reporter | ||
Comment 48•13 years ago
|
||
For B2G we won't need to merge cedar to m-c; I can't say about other users.
Comment 49•13 years ago
|
||
I've only got a single changed file for mozharness talos; pretty easy to cherry pick that one when I'm ready to merge.
Updated•13 years ago
|
Keywords: intermittent-failure
Updated•13 years ago
|
Whiteboard: [orange]
Updated•2 years ago
|
Product: Testing → Remote Protocol
Comment 50•2 years ago
|
||
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in
before you can comment on or make changes to this bug.
Description
•