Open Bug 1155879 Opened 7 years ago Updated 7 years ago

Linux reftest runtimes on trunk have doubled since early April


(Testing :: Reftest, defect)

Not set


(firefox40 affected)

Tracking Status
firefox40 --- affected


(Reporter: RyanVM, Unassigned)



We've been seeing a large number of timeouts lately on Linux reftests (see bug 1073442 where the majority have been getting starred). Looking at a graph of ASAN reftest runtimes over the last week, the spike looks to have been around early April.

Aurora runtimes are consistently 60-70min for the same job and a very similar number of tests.

As of now, this is causing the majority of our Linux reftests to not meet visibility standards.
Some changes under suspicion:

I also want to see how many stacks are in the log.
Depends on: 1156426
I get permissions errors when accessing the failure log here:

Can you post the log elsewhere? A log from an earlier successful run would help too.
Flags: needinfo?(ryanvm)
TBPL was decommissioned last month. Use Treeherder instead.

Failing run:

Green run:
Flags: needinfo?(ryanvm)
Failing run: fails after 7200s at 94%
Green Run: passes after 6836s at 100%

It seems we're running rather close to the limit here, even when we're passing. Do we have green logs from early April when runtimes were reported doubling?
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #7)
> Aurora is pretty representative.
> aurora

Thanks, Ryan!

The elapsed time for layout/reftests/invalidation/ went from 18s (aurora) to 32s (inbound.) Not quite doubled, but points to bug 994541.

Flags: needinfo?(nical.bugzilla)
With OMTC, reading back from the x server (which reftests do a lot) has gotten much more expensive. If reftests are timing out too often we should split them into two chunks until we can get rid of our dependency to xrender (which requires that we switch from gtk2 to gtk3).
Flags: needinfo?(nical.bugzilla)
OMTC landing was one of the points when the times increased, but not the biggest.
I had a closer look at the actual treeherder data for the inbound pushes. It seems like the noise in the test VM's is +/- 20 mins which makes the graphs rather useless. It's quite likely that we've been inching up towards the 120 min ceiling over a longer period than reported. I compiled a report with my findings here:

Given these findings, I agree that splitting the ASAN tests across 2 machines is the way forward here. I'll post more comments in bug 1156426.
You need to log in before you can comment on or make changes to this bug.