Closed
Bug 904839
Opened 12 years ago
Closed 12 years ago
B2G emulator tests still timing out after 1200s, even with the mozprocess-timeout code
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jgriffin, Assigned: jgriffin)
References
Details
Attachments
(2 files, 2 obsolete files)
|
827 bytes,
patch
|
ahal
:
review+
|
Details | Diff | Splinter Review |
|
3.13 KB,
patch
|
k0scist
:
review+
|
Details | Diff | Splinter Review |
B2G emulator unittests are still being killed by buildbot after 1200s with no output, despite the fact that we are running them with mozprocess and an output_timeout of 1000, which should cause them to be killed by mozharness instead, and thus allow a dump of the logcat to occur.
sample log: https://tbpl.mozilla.org/php/getParsedLog.php?id=26160870&full=1&branch=mozilla-central
| Assignee | ||
Comment 1•12 years ago
|
||
This adds some logging to the mozprocess section of run_command; hopefully this will help us figure out why the problem described in this bug's summary is occurring.
Attachment #793563 -
Flags: review?(ahalberstadt)
| Assignee | ||
Updated•12 years ago
|
Assignee: nobody → jgriffin
Comment 2•12 years ago
|
||
Comment on attachment 793563 [details] [diff] [review]
Add some mozprocess logging,
Review of attachment 793563 [details] [diff] [review]:
-----------------------------------------------------------------
A print statement in the on_timeout method might also be helpful.
Attachment #793563 -
Flags: review?(ahalberstadt) → review+
| Assignee | ||
Comment 3•12 years ago
|
||
I added your suggestion and landed: https://hg.mozilla.org/build/mozharness/rev/a4e6f54010d3
I'll trigger a run on cedar to make sure nothing bad happens, then push to production.
Comment 4•12 years ago
|
||
Merged to production branch of mozharness.
Live as of now.
| Assignee | ||
Comment 5•12 years ago
|
||
In all cases this logging shows that we are using mozprocess to run the command, but in some cases we're not catching the timeout. I can't tell why.
Comment 6•12 years ago
|
||
In bug 917252 I noticed that the messages in the on_timeout handler can get interleaved with the process output somehow. Did you ctrl-f for "mozprocess timed out"? Because it could just be hidden among some random line of output above.
| Assignee | ||
Comment 7•12 years ago
|
||
That doesn't appear to be the case here; see a recent log: https://tbpl.mozilla.org/php/getParsedLog.php?id=28100440&full=1&branch=mozilla-inbound
| Assignee | ||
Comment 9•12 years ago
|
||
WIP. This seems to resolve this issue on try, but I need to do a full try run to make sure it doesn't break anything else.
Without this patch: https://tbpl.mozilla.org/?tree=Try&rev=2471fcda8961
With this patch: https://tbpl.mozilla.org/?tree=Try&rev=fc8f06309cfa
| Assignee | ||
Comment 10•12 years ago
|
||
full try run: https://tbpl.mozilla.org/?tree=Try&rev=6dcdc9b326cb
| Assignee | ||
Comment 11•12 years ago
|
||
This patch seems to work well for B2G mochitests/reftests, but fails for linux tests and for B2G marionette. :(
| Assignee | ||
Comment 12•12 years ago
|
||
v2..handle the case where os.read returns only one or more newlines. try: https://tbpl.mozilla.org/?tree=Try&rev=6f94f56145c8
| Assignee | ||
Updated•12 years ago
|
Attachment #815961 -
Attachment is obsolete: true
| Assignee | ||
Comment 13•12 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #12)
> Created attachment 816074 [details] [diff] [review]
> Use os.read instead of readline in readWithTimeout,
>
> v2..handle the case where os.read returns only one or more newlines. try:
> https://tbpl.mozilla.org/?tree=Try&rev=6f94f56145c8
This is much better; the only problem left is that we're somehow messing up Windows builds. Will investigate a little more.
| Assignee | ||
Comment 14•12 years ago
|
||
v3: locally fixes the Windows build problems
Attachment #816074 -
Attachment is obsolete: true
| Assignee | ||
Comment 15•12 years ago
|
||
pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=247cd269d6fe
| Assignee | ||
Comment 16•12 years ago
|
||
Comment on attachment 816650 [details] [diff] [review]
mozprocess.diff
Review of attachment 816650 [details] [diff] [review]:
-----------------------------------------------------------------
This is looking good on try, so I'm putting up for review.
Attachment #816650 -
Flags: review?(jhammel)
Comment 17•12 years ago
|
||
Comment on attachment 816650 [details] [diff] [review]
mozprocess.diff
Review of attachment 816650 [details] [diff] [review]:
-----------------------------------------------------------------
Nice. Looks good.
::: testing/mozbase/mozprocess/mozprocess/processhandler.py
@@ +695,5 @@
> lineReadTimeout = outputTimeout
>
> + (lines, self.didTimeout) = self.readWithTimeout(logsource, lineReadTimeout)
> + while lines != "" and not self.didTimeout:
> + for line in lines.split('\n'):
why not splitlines? Is this from universal newlines? (do we enforce that?)
@@ +793,5 @@
> + self.read_buffer += output
> + if '\n' not in self.read_buffer:
> + time.sleep(0.01)
> + continue
> + tmp = self.read_buffer.split('\n')
again, '\n' v splitlines
Attachment #816650 -
Flags: review?(jhammel) → review+
| Assignee | ||
Comment 18•12 years ago
|
||
split('\n') seemed more precise. Here's a try run using splitlines() instead:
https://tbpl.mozilla.org/?tree=Try&rev=8b2d8b3c7af7
| Assignee | ||
Comment 19•12 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #18)
> split('\n') seemed more precise. Here's a try run using splitlines()
> instead:
>
> https://tbpl.mozilla.org/?tree=Try&rev=8b2d8b3c7af7
It turns out this doesn't quite work, at least for the second case. In that case, we want to differentiate between 'test\nfoo\n' and 'test\nfoo', which is a little clumsier with splitlines(). So, I'll keep split('\n') in that case.
| Assignee | ||
Comment 20•12 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #19)
> (In reply to Jonathan Griffin (:jgriffin) from comment #18)
> > split('\n') seemed more precise. Here's a try run using splitlines()
> > instead:
> >
> > https://tbpl.mozilla.org/?tree=Try&rev=8b2d8b3c7af7
>
> It turns out this doesn't quite work, at least for the second case. In that
> case, we want to differentiate between 'test\nfoo\n' and 'test\nfoo', which
> is a little clumsier with splitlines(). So, I'll keep split('\n') in that
> case.
And here's a green try run which shows this works ok: https://hg.mozilla.org/try/rev/1c31fd8992c5
| Assignee | ||
Comment 21•12 years ago
|
||
https://github.com/mozilla/mozbase/commit/576b5285c981b6ba938cb1f8716aa96cd7e3b046
We'll have to mirror to m-c before we can tell if this fixes this problem, so I'll leave this bug open until then.
| Assignee | ||
Comment 22•12 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #21)
> https://github.com/mozilla/mozbase/commit/
> 576b5285c981b6ba938cb1f8716aa96cd7e3b046
>
> We'll have to mirror to m-c before we can tell if this fixes this problem,
> so I'll leave this bug open until then.
Totally green! After the versionbump patch is r+'d, I'll land that in github and do another try run with the result, since it will include a few more packages than the above.
| Assignee | ||
Comment 23•12 years ago
|
||
(In reply to Jonathan Griffin (:jgriffin) from comment #22)
> (In reply to Jonathan Griffin (:jgriffin) from comment #21)
> > https://github.com/mozilla/mozbase/commit/
> > 576b5285c981b6ba938cb1f8716aa96cd7e3b046
> >
> > We'll have to mirror to m-c before we can tell if this fixes this problem,
> > so I'll leave this bug open until then.
>
> Totally green! After the versionbump patch is r+'d, I'll land that in
> github and do another try run with the result, since it will include a few
> more packages than the above.
wrong bug, sorry !
| Assignee | ||
Comment 24•12 years ago
|
||
This appears to be fixed by some recent changes to mozrunner and/or mozprocess. I'll re-open if I see this again.
Getting stack traces appropriately is being handled under separate bugs, like bug 919353.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 25•12 years ago
|
||
this is causing talos to fail if I run with mozprocess 0.13. I get a never ending stream of __metrics (a line printed at the end of the logfile for the warmup run). I really need another fix that is in 0.13 and not in 0.11.
Comment 26•12 years ago
|
||
I added a \n to the last __metrics line printed out here: http://hg.mozilla.org/build/talos/file/tip/talos/getInfo.html#l66, and the problem is solved. We really need to fix this in the mozprocess code as this is not handling generic process output.
| Assignee | ||
Comment 27•12 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #26)
> I added a \n to the last __metrics line printed out here:
> http://hg.mozilla.org/build/talos/file/tip/talos/getInfo.html#l66, and the
> problem is solved. We really need to fix this in the mozprocess code as
> this is not handling generic process output.
Agreed. Is this problem you're seeing on Windows, or everywhere?
Comment 28•12 years ago
|
||
oh, bad bug reporting. this was seen on all linux and osx platforms, which was easily seen here:
https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/processhandler.py#L779
windows worked great.
| Assignee | ||
Comment 29•12 years ago
|
||
Thanks, I'll try to write a test to reproduce this.
Updated•8 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•