Closed Bug 1612070 Opened 5 years ago Closed 5 years ago

Perma fenix: Critical: Tests failed to finish! Application timed out

Categories

(Testing :: General, defect, P1)

Version 3
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: alexandrui, Unassigned)

References

Details

Attachments

(1 file)

The build from yesterday, jan 28 passed, but the one from today, jan 29 fails.
Fenix team that did the latest commits on fenix (jan 28) had their tests passed, also the code doesn't really seem to relate to the timeout error.
I assume there something missing here. A (fenix) infra not logged into https://changelog.dev.mozaws.net/ ?

See Also: → 1600704

I checked a bunch of logcat files and searched for differences.
Here are some of the conclusions:

[1]

the main difference in the logcat logs is this:

FAIL:
01-29 01:24:10.791 7952 7952 D GeckoSession: handleMessage GeckoView:OnLoadRequest uri=about:blank
01-29 01:24:10.805 7952 7952 D GeckoSession: handleMessage GeckoView:OnLoadRequest uri=about:blank

SUCCESS:
01-28 01:19:14.424 7679 7679 D GeckoSession: handleMessage GeckoView:OnLoadRequest uri=about:blank
01-28 01:19:14.454 7679 7679 D GeckoSession: handleMessage GeckoView:OnLoadRequest uri=about:blank
01-28 01:19:46.516 7679 7679 D GeckoSession: handleMessage GeckoView:OnLoadRequest uri=https://www.amazon.com/
01-28 01:20:10.865 8128 8128 D GeckoSession: handleMessage GeckoView:OnLoadRequest uri=about:blank
01-28 01:20:10.879 8128 8128 D GeckoSession: handleMessage GeckoView:OnLoadRequest uri=about:blank
01-28 01:20:42.906 8128 8128 D GeckoSession: handleMessage GeckoView:OnLoadRequest uri=https://www.amazon.com/

  • on perma fail jobs, the OnLoadRequest doesn't call the website URL, only the about:blank

[2]

"[raptor-pageloadjs] pageloadjs raptorContentHandler!"

  • did not appear on failures

https://searchfox.org/mozilla-central/rev/ad6234148892bc29bf08d736604ac71925138040/testing/raptor/webext/raptor/pageload.js#54-55,372
The function from 54 - 55 is used at 372 inside a
window.addEventListener that was not triggered on load.

There seems to be a loading issue for the web pages.

We also talked to the people from #fenix:mozilla.org and waiting for some details.

(In reply to Alexandru Ionescu :alexandrui (needinfo me) from comment #0)

On central it looks like an infra: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&group_state=expanded&searchStr=Rap-fenix&tochange=e3b1a0aa6b1a55687e5aadcc5a419def3d76f326&fromchange=df59b74d33d74c9e4fda377616b20c40da832a71

I think we are on the same page now, but just to clarify: This pattern -- a test passes on a push, but later fails when retriggered on the same push -- is usually a good indicator of a regression due to infra changes. But in the case of the Fenix raptor tests, which normally run via cron, the retriggers can actually run a different build.

Test logs show what's happening:
First run (pass):

[task 2020-01-28T03:04:37.065Z] 03:04:32     INFO - Downloading https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/project.mobile.fenix.v2.performance-test.latest/artifacts/public/build/armeabi-v7a/geckoNightly/target.apk to /builds/task_1580180639/workspace/build/target.apk
[task 2020-01-28T03:04:37.065Z] 03:04:32     INFO - retry: Calling _download_file with args: (), kwargs: {'url': u'https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/project.mobile.fenix.v2.performance-test.latest/artifacts/public/build/armeabi-v7a/geckoNightly/target.apk', 'file_name': u'/builds/task_1580180639/workspace/build/target.apk'}, attempt #1
[task 2020-01-28T03:04:37.065Z] 03:04:35     INFO - Downloaded 68138092 bytes.

Retrigger (fail):

[task 2020-01-29T10:02:45.689Z] 10:02:42     INFO - Downloading https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/project.mobile.fenix.v2.performance-test.latest/artifacts/public/build/armeabi-v7a/geckoNightly/target.apk to /builds/task_1580292127/workspace/build/target.apk
[task 2020-01-29T10:02:45.689Z] 10:02:42     INFO - retry: Calling _download_file with args: (), kwargs: {'url': u'https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/project.mobile.fenix.v2.performance-test.latest/artifacts/public/build/armeabi-v7a/geckoNightly/target.apk', 'file_name': u'/builds/task_1580292127/workspace/build/target.apk'}, attempt #1
[task 2020-01-29T10:02:45.689Z] 10:02:43     INFO - Downloaded 68154942 bytes.

Note that both runs downloaded the apk from the same index url, but the apk's have different sizes: the index was updated between runs.

I don't know what the problem is, but it could be infra OR it could be the builds.

Hi Geoff,

Thanks for the details.

The tests are running fine for now, the last cron job is green:
https://treeherder.mozilla.org/#/jobs?repo=fenix&searchStr=rap&group_state=expanded

It seems that it was a corrupted apk.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

Post-mortem:
Petru helped me with generating a new build from the fenix revision that is failing locally with just the same error. Here you ca see a video recording with the crash and attached the local log.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: