Closed Bug 1776751 Opened 3 years ago Closed 2 years ago

Permanent condprof TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 370 seconds with no output when version increase hasn't been merged to autoland yet

Categories

(Testing :: Performance, defect, P5)

defect

Tracking

(firefox-esr91 unaffected, firefox-esr102 unaffected, firefox102 unaffected, firefox103 disabled, firefox111 fixed)

RESOLVED INCOMPLETE
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox102 --- unaffected
firefox103 --- disabled
firefox111 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jmaher)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell unknown])

Attachments

(2 files)

Filed by: archaeopteryx [at] coole-files.de
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=382653011&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/CTf_roecQL6-SKZSN9GgGQ/runs/0/artifacts/public/logs/live_backing.log


mozilla-central got the version increased from 103 to 104 and a new conditioned profile artifact got created. On autoland, the version is still 103 (merge back is scheduled after the next merge from autoland to mozilla-central, also merging autoland to mozilla-central could be blocked for older, good merge candidates).

The tests with conditioned profile fail on autoland because the application has a lower version number and rejects to launch with the profile with a higher version number.

These didn't show up beginning with this range, looks like they're fixed.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 104 Branch

:jmaher, this is very likely to come back again. One idea to resolve this could be to somehow perform a profile upgrade in condprof just before the test. This would be similar to having it upgraded on mozilla-central.

:sparky, I am leaning towards a versioned profile in the index- then we can retrigger older stuff. Do you have thoughts on versioned profiles vs upgrade in place?

Flags: needinfo?(jmaher)

Good idea :jmaher! If you can trigger condprof to run on version changes in autoland, that should resolve this issue (assuming version changes are the only thing that impact the profile compatibility).

oh, I wasn't thinking that- I might consider that. I was thinking of just calling the artifact profile-v104-linux64-full.tgz or something like that. I guess if there are other needs to upgrade the profile then we are sort of screwed. I have a mostly completed patch to add versions with a fallback to existing profiles- not sure if it makes sense to continue on with that, or investigate ways to upload artifacts to an index from autoland and trigger as needed

Is there someone we could ask about whether version changes are the only culprit or if there are others?

This is why I suggested upgrading the profile with condprof before using it in the test. I'm not suggesting a full re-run of the conditioning scenarios, but just opening the profile, allowing an upgrade, then using it in the test. It's relatively simple to use condprof within the test harnesses also, here's where we make use of it in raptor: https://searchfox.org/mozilla-central/source/testing/raptor/raptor/perftest.py#274

If you set force_new to false, then it'll take the latest archive produced on m-c and update it. You'd need to add some code to skip running the scenarios though: https://searchfox.org/mozilla-central/source/testing/condprofile/condprof/creator.py#180

that seems straightforward, launch the profile, let it upgrade, then use it for copies for tests that need it.

in a scenario where we need to downgrade, we are sort of screwed. I think we need to take both approaches:

  1. add an artifact which includes the version number so we can always reference it. This will allow for "downgrades" or older try pushes to work
  2. add a profile upgrade step after downloading a profile, basically download, run firefox and let it upgrade, then copy the profile around as needed for tests.

Good point! I agree, using both approaches solves the issue. The only question now is if the version is enough.

Assignee: nobody → jmaher

A patch has been attached on this bug, which was already closed. Filing a separate bug will ensure better tracking. If this was not by mistake and further action is needed, please alert the appropriate party.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/72650713a83e use version with conditioned profile. r=sparky
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
See Also: → 1781133

going to fix this to update the INDEX with a version, not just the artifact name.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 104 Branch → ---
Whiteboard: [stockwell needswork:owner]

There have been 43 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • linux1804-64-shippable-qr
  • windows10-64-2004-qr
  • windows10-64-2004-shippable-qr
[task 2022-09-21T19:00:54.750Z] 19:00:54     INFO - TEST-INFO | started process GECKO(824)
[task 2022-09-21T19:00:55.214Z] 19:00:55     INFO - GECKO(824) | 1663786855226	Marionette	INFO	Marionette enabled
[task 2022-09-21T19:00:55.228Z] 19:00:55     INFO - GECKO(824) | 1663786855228	Marionette	TRACE	Received observer notification final-ui-startup
[task 2022-09-21T19:00:55.230Z] 19:00:55     INFO - GECKO(824) | 1663786855229	Marionette	INFO	Listening on port 2828
[task 2022-09-21T19:00:55.231Z] 19:00:55     INFO - GECKO(824) | 1663786855229	Marionette	DEBUG	Marionette is listening
[task 2022-09-21T19:03:55.735Z] 19:03:55     INFO - runtests.py | Waiting for browser...
[task 2022-09-21T19:07:05.246Z] 19:07:05     INFO - Buffered messages finished
[task 2022-09-21T19:07:05.248Z] 19:07:05    ERROR - TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 370 seconds with no output
[task 2022-09-21T19:07:05.248Z] 19:07:05    ERROR - Force-terminating active process(es).
[task 2022-09-21T19:07:05.249Z] 19:07:05     INFO - Determining child pids from psutil...
[task 2022-09-21T19:07:05.253Z] 19:07:05     INFO - [8264]
[task 2022-09-21T19:07:05.254Z] 19:07:05     INFO - ==> process 8264 launched child process 2360 ("Z:\task_166378584991358\build\application\firefox\firefox.exe" -contentproc --channel="8264.0.787247084\2020129246" -parentBuildID 20220921180701 -prefsHandle 1900 -prefMapHandle 1892 -prefsLen 21845 -prefMapSize 255114 -appDir "Z:\task_166378584991358\build\application\firefox\browser" - {c1a50efb-7386-47b1-ae34-f73b13038234} 8264 "\\.\pipe\gecko-crash-server-pipe.8264" 1980 158a2df8258 gpu)
[task 2022-09-21T19:07:05.254Z] 19:07:05     INFO - Found child pids: {8264, 2360}
[task 2022-09-21T19:07:05.255Z] 19:07:05     INFO - Killing process: 8264
[task 2022-09-21T19:07:05.261Z] 19:07:05     INFO - TEST-INFO | started process screenshot
[task 2022-09-21T19:07:05.335Z] 19:07:05     INFO - TEST-INFO | screenshot: exit 0
Flags: needinfo?(jmaher)

another issue with bumping firefox versions- one of these days I need to fix this.

Flags: needinfo?(jmaher)
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → INCOMPLETE
Flags: needinfo?(jmaher)

we never did get a fix for condprof tasks and the version change. My simple half day of hacking didn't work. The good news is this seems to be the last remaining condprof issue (we had a lot of small issues when we first turned these on)

Flags: needinfo?(jmaher)
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

Joel, could you have another go at this? Each release cycle we're hitting this again and it's not really pleasant. Thank you.

Flags: needinfo?(jmaher)

we will know in a few weeks if this fix works reliably.

Flags: needinfo?(jmaher)
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/54c6a2aa9f2e allow condprof tests to fallback to previous version. r=sparky
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 111 Branch

I will reopen this because it is happening again on autoland only for now. On linux/windows https://treeherder.mozilla.org/logviewer?job_id=411863229&repo=autoland / https://treeherder.mozilla.org/logviewer?job_id=411860037&repo=autoland

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 111 Branch → ---

this failed again, what happened was:

so this is an edge case- really the only case I saw in the last 6 weeks.

:sparky- any thoughts on what to do here? leave it alone, fix something?

Flags: needinfo?(gmierz2)

I agree, I don't think we'll see this happen too often either so we can leave it alone.

Flags: needinfo?(gmierz2)
Status: REOPENED → RESOLVED
Closed: 3 years ago2 years ago
Resolution: --- → INCOMPLETE
See Also: → 1855669
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: