Closed Bug 1205694 Opened 9 years ago Closed 7 years ago

Increase timeout for unittests on B2G emulator

Categories

(Testing :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: tzimmermann, Assigned: tzimmermann)

References

Details

(Whiteboard: [b2g-build-support])

Attachments

(1 file)

Some B2G emulator unit tests constantly fail because their timeouts are too short, (e.g., ICS X2) [1] Increaing the timeout fixes this problem.

[1] https://treeherder.allizom.org/#/jobs?repo=try&revision=c4a11f18422f 
[2] https://treeherder.allizom.org/#/jobs?repo=try&revision=eb9b82b38cdb
[2] has a fix applied.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #2)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d4cff28cc602

(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #3)
> https://treeherder.allizom.org/#/jobs?repo=try&revision=cb2b79bc0d28

These tests seem to fail randomly. Fixing timeouts is not enough. :(
Depends on bug 1188330: we first have to use Gecko's mozharness for B2G.
Depends on: 1188330
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #0)
> Some B2G emulator unit tests constantly fail because their timeouts are too
> short, (e.g., ICS X2) [1] Increaing the timeout fixes this problem.

Hi Thomas, just share some information about the timeout in xpcshell tests to you. We found that having busybox configuration can improve it [1].

[1] Please see bug 1192135 comment 4 and bug 1192135 comment 5.

> 
> [1] https://treeherder.allizom.org/#/jobs?repo=try&revision=c4a11f18422f 
> [2] https://treeherder.allizom.org/#/jobs?repo=try&revision=eb9b82b38cdb
Whiteboard: [tc-build-support]
Whiteboard: [tc-build-support] → [b2g-build-support]
Hi!

Two things I noticed:

1) I have not yet been able to pin down the location of these errors. It looks like sometimes some files in the loop at [1] do not have the 'name' key. And this leads to an IndexError [2], and consequentially to bug 1194299. The logs then contain '1194299 Intermittent wpt INFO - IndexError: list index out of range on startup'. Any idea what could be the problem, or who might know?


2) I just looked at the log at [3] and saw that the emulator cannot open the host GL library. Is this a known problem. Can you fix this?


[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#269
[2] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#285
[3] https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/K8fyt0OxQtqmDkI2QBvwNg/0/public/test_info//qemu.log
Flags: needinfo?(garndt)
re: #1 I'm not entirely sure, including Wander to see if he knows (or knows who we could ping).

re: #2, do you know if that just requires a particular package to be available in the task container?  We install these:
  mesa-libGL-devel
  mesa-libGL-devel.i686
Flags: needinfo?(garndt) → needinfo?(wcosta)
(In reply to Greg Arndt [:garndt] from comment #16)
> re: #1 I'm not entirely sure, including Wander to see if he knows (or knows
> who we could ping).
> 
> re: #2, do you know if that just requires a particular package to be
> available in the task container?  We install these:
>   mesa-libGL-devel
>   mesa-libGL-devel.i686

The package names look like you're using Fedora or something similar. I attached my list of required Debian packages. The packages seem correct, but maybe you need a symlink 'libGL.so' to the actual file.

Gabriele, you're on Fedora, right? Do you know if the emulator needs additional symlinks to find libGL.so?
Flags: needinfo?(gsvelto)
(In reply to Greg Arndt [:garndt] from comment #16)
> re: #1 I'm not entirely sure, including Wander to see if he knows (or knows
> who we could ping).
> 

Maybe jlund, redirecting to him.

> re: #2, do you know if that just requires a particular package to be
> available in the task container?  We install these:
>   mesa-libGL-devel
>   mesa-libGL-devel.i686
Flags: needinfo?(wcosta) → needinfo?(jlund)
> Two things I noticed:
> 
> 1) I have not yet been able to pin down the location of these errors. It
> looks like sometimes some files in the loop at [1] do not have the 'name'
> key. And this leads to an IndexError [2], and consequentially to bug
> 1194299. The logs then contain '1194299 Intermittent wpt INFO - IndexError:
> list index out of range on startup'. Any idea what could be the problem, or
> who might know?
>

can you point me to a log of this happening? specifically one that makes it to https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#285 or a job that says you are hitting https://bugzil.la/1194299 ? If you are hitting line 285 that would suggest somehow you made it here: https://dxr.mozilla.org/mozilla-central/rev/acdb22976ff86539dc10413c5f366e1fb429a680/testing/mozharness/mozharness/mozilla/testing/testbase.py#309

which I don't think we should from taskcluster based jobs. that is reserved for buildbot based.


As a side note, I took a look at some of your try pushes (and all TC tests for that matter) e.g. https://treeherder.allizom.org/logviewer.html#?job_id=11963230&repo=try

We are highlighting "We have not been able to determine which artifacts to use in order to run the tests."[1] as an error. That should just be a red herring as that code is trying to determine the installer and test url location. But that is not expected to succeed yet. IIUC, we rely on the cli args of the mozharness call for those artifacts still:

" 08:24:56     INFO - Run as ./mozharness/scripts/b2g_emulator_unittest.py ... more cli ... --installer-url https://queue.taskcluster.net/v1/task/UBZa_3zOT5W3wtX2nY9eyA/artifacts/public/build/emulator.tar.gz --test-packages-url https://queue.taskcluster.net/v1/task/UBZa_3zOT5W3wtX2nY9eyA/artifacts/public/build/test_packages.json
"

Armen, I believe the intention for Bug 1203085 is for BBB. maybe we should silence the error raised in [1] so this doesn't look like it is a reason for failures? Or am I missing something? 


[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#313
Flags: needinfo?(jlund) → needinfo?(armenzg)
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #18)
> The package names look like you're using Fedora or something similar. I
> attached my list of required Debian packages. The packages seem correct, but
> maybe you need a symlink 'libGL.so' to the actual file.
> 
> Gabriele, you're on Fedora, right? Do you know if the emulator needs
> additional symlinks to find libGL.so?

I haven't built the ARM ICS emulator in a while, I'm using the KitKat version of the x86 emulator for my personal testing, and that one doesn't need any special symlinks. I can try building it and report back if it needs some special dependencies. I'll leave the NI for now.
I will deal with the misleading error in bug 1203085. My apologies for the red-herring. (UnCCing myself)
Flags: needinfo?(armenzg)
Hi

(In reply to Jordan Lund (:jlund) from comment #20)
> > Two things I noticed:
> > 
> > 1) I have not yet been able to pin down the location of these errors. It
> > looks like sometimes some files in the loop at [1] do not have the 'name'
> > key. And this leads to an IndexError [2], and consequentially to bug
> > 1194299. The logs then contain '1194299 Intermittent wpt INFO - IndexError:
> > list index out of range on startup'. Any idea what could be the problem, or
> > who might know?
> >
> 
> can you point me to a log of this happening? specifically one that makes it
> to
> https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/
> mozilla/testing/testbase.py#285 or a job that says you are hitting
> https://bugzil.la/1194299 ? If you are hitting line 285 that would suggest
> somehow you made it here:
> https://dxr.mozilla.org/mozilla-central/rev/
> acdb22976ff86539dc10413c5f366e1fb429a680/testing/mozharness/mozharness/
> mozilla/testing/testbase.py#309
> 
> which I don't think we should from taskcluster based jobs. that is reserved
> for buildbot based.

I suspect that this is what happens, because the logs and error message indicates it, but as I said I've not yet been able to pin down the exact error. Debugging on try is slow and cumbersome. For example, I looked at B2G ICS debug X1 [1][2] and into the patches with the debug noise. It appears that |postflight_read_buildbot_config| calls |find_artifacts_from_buildbot_changes|, which throws the exception.

[1] https://treeherder.allizom.org/#/jobs?repo=try&revision=1c4f0b36e26d
[2] https://treeherder.allizom.org/logviewer.html#?job_id=11900067&repo=try
Hi

> can you point me to a log of this happening? specifically one that makes it
> to
> https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/
> mozilla/testing/testbase.py#285 or a job that says you are hitting
> https://bugzil.la/1194299 ?

I pushed a patch set to try that will report a specific message when this line is reached.

https://treeherder.allizom.org/#/jobs?repo=try&revision=6975ce322d84
I've double-checked the ARM emulator and on my machine I've got a symlink that points from /usr/lib/libGL.so to /usr/lib/libGL.so.1.2.0. This was generated by the package manager though so it should work out-of-the-box. I've tried removing the library and the emulator started up just fine too so I'm unsure if it's still a requirement or not.

BTW we had bug 897727 for a similar issue but again, it doesn't seem like this should be causing problems on a Fedora-based distro, installing the mesa-libGL.i686 package should be enough (the emulator is a 32-bit executable IIRC).
Flags: needinfo?(gsvelto)
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #24)
> Hi
> 
> > can you point me to a log of this happening? specifically one that makes it
> > to
> > https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/
> > mozilla/testing/testbase.py#285 or a job that says you are hitting
> > https://bugzil.la/1194299 ?
> 
> I pushed a patch set to try that will report a specific message when this
> line is reached.
> 
> https://treeherder.allizom.org/#/jobs?repo=try&revision=6975ce322d84

It doesn't look like we're triggering bug 1194299 any longer. Maybe something changes in the test scripts? But 'adb push' operations fail more often now.
I don't know what to do about these error with adb push

 * I tried to increase the partition size but that didn't help
 * I tried to increase the timeouts, but that didn't help either.

What else could be the problem?
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #27)
> I don't know what to do about these error with adb push
> 
>  * I tried to increase the partition size but that didn't help
>  * I tried to increase the timeouts, but that didn't help either.
> 
> What else could be the problem?

Hi Thomas, sharing some information to you.
We found an issue regarding to emulator adb [1]. With the fix I can pass most of xpcshell test on emulator-x86-kk which was usually stucking on pushing necessary files [2].

The xpcshell test running on taskcluser doesn't specify busybox options [3], so script won't zip all necessary files and push the zip one, but push files one by one separately, which is much easier to hit the adb issue.

For the test running on buildbot, it should have busybox configured, but I don't know why it pushes files separately on your try runs.

Anyway, I believe the adb issue you met is related to bug 1207039.

Thank you.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1207039
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d3a99e8f8d2&group_state=expanded
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1200928
Oh, that's interesting. Thanks, Edgar!

Here's a try push with the fix in qemu

 https://treeherder.allizom.org/#/jobs?repo=try&revision=d4a812314922
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #29)
> Oh, that's interesting. Thanks, Edgar!
> 
> Here's a try push with the fix in qemu
> 
>  https://treeherder.allizom.org/#/jobs?repo=try&revision=d4a812314922

Nice! :) This patch improved the situation significantly.
Priority: -- → P2
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: