Open Bug 1609505 Opened 6 years ago Updated 3 years ago

GeckoView tests take too long to run for try builds requested by GeckoView developers

Categories

(Firefox Build System :: Task Configuration, defect)

Unspecified
Android
defect

Tracking

(Not tracked)

People

(Reporter: bugzilla, Unassigned)

References

Details

(Whiteboard: [geckoview:p2])

Those of us working on GeckoView often have to wait a pretty long time for GV tests to run on our try builds. A straw poll during a team meeting indicated that a gv-junit test might sit queued up for 4 - 6 hours before finally running.

IMHO this is not acceptable for those of us actually working on GeckoView.

I realize that we have limited resources to run mobile tests, and autoland and the release repos obviously need high priority, but starving the GV team of being able to run our own tests on try in a timely fashion is a bad side effect that we should try to mitigate.

  • Could we increase the priority of try jobs for android tests when the are submitted by a GV committer with appropriate perms?
  • Alternatively, could we modify the prioritization so that, even though other trees are higher priority, we don't completely starve try runs of Android testing machines during the North/South American business day?

:wcosta - Can you update us on the transition from packet.net to aws? Is that happening soon? Will it alleviate these issues?

Flags: needinfo?(wcosta)

:aerickson - Unless aws is right around the corner, can you add more packet.net capacity as a short-term fix?

Flags: needinfo?(aerickson)

We are investigating the availability of metal instances in spot market. That the last issue for the transition to happen. In the mean time, I can increase the number of machines, with coop's approval.

Flags: needinfo?(wcosta) → needinfo?(coop)

A turnaround time of 50 minutes is the best we can do it seems per https://treeherder.mozilla.org/#/jobs?repo=try&revision=983add1938072148cc014a989ed5ebd9e5135454. The gv-junit tests seem fast (<10 minutes), but the jobs aren't running right away due to a dependency on a build which seem to take around 40 minutes. The build job was started immediately when scheduled on the gecko-1/b-linux cluster.

In the last 30 days, the android-em/packet.net queue has had at most 700 jobs and we average 21 jobs (but other jobs use the android-em/packet.net queue). We have 60 workers that can run 4 jobs at a time. If we were only running gv-junit we'd need 85 workers to clear 700 jobs in an hour.

I don't think TC has a way of allowing a few lower priority tasks through, but it seems possible to increase the priority of tasks from certain users.

:tomprince, what do you think about increasing the priority of gv-junit jobs for certain users?

:gbrown, I can't increase the packet.net instances. TC team still controls that.

Flags: needinfo?(aerickson) → needinfo?(mozilla)
See Also: → 1597871

(In reply to Wander Lairson Costa from comment #3)

We are investigating the availability of metal instances in spot market. That the last issue for the transition to happen. In the mean time, I can increase the number of machines, with coop's approval.

Yes, this is fine. Even a small increase (5 instances) would probably solve this due to concurrency (5 instances == 20 workers).

Flags: needinfo?(coop)

I added 10 new instances

:aklotz How is the situation now for GeckoView tests and the time they take to run? I ask this because jbonisteel pinged me about a series of Android build/test/related issues and this is one of them.

Flags: needinfo?(aklotz)

It has significantly improved since the number of instances was bumped.

Flags: needinfo?(aklotz)

:aklotz

Is it within an hour that referenced here?

https://bugzilla.mozilla.org/show_bug.cgi?id=1597871#c13

Flags: needinfo?(aklotz)

I would say so, but note that my results are based on gv-junit jobs. I cannot comment on the state of mochitests or other job types.

Flags: needinfo?(aklotz)
See Also: → 1578460
Whiteboard: [geckoview] → [geckoview:p2]

:tomprince, what do you think about increasing the priority of gv-junit jobs for certain users?

We don't really have a good way to control priority of jobs for specific users.

Flags: needinfo?(mozilla)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.