Closed Bug 1246947 Opened 6 years ago Closed 6 years ago

Cache workspace, /tmp on testers in hopes it's faster

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: dustin)

References

Details

Attachments

(1 file)

If tests are being placed under /tmp we might be getting slower tests because of the file system.
Assignee: nobody → dustin
Attachment #8717462 - Flags: review?(armenzg) → review+
Comment on attachment 8717462 [details]
MozReview Request: Bug 1246947: cache test workspaces to get SSD/ext4 performance; r?armenzg

https://reviewboard.mozilla.org/r/34179/#review30861
We haven't actually determined that this is faster, but anyway it doesn't hurt.

In fact, we should probably cache /tmp too, since tests do drop files in there (e.g., with mktemp)
Summary: Determine reason for slow tests under docker → Cache workspace, /tmp on testers in hopes it's faster
The testing profile, for example, is always in /tmp. Also for xpcshell tests any files created during the test are in /tmp.
Comment on attachment 8717462 [details]
MozReview Request: Bug 1246947: cache test workspaces to get SSD/ext4 performance; r?armenzg

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34179/diff/1-2/
oh great!  I am not sure this will solve all our differences, but I can imagine this would make a noticeable dent!  Thanks for driving on this.
Is this ready to be landed?
Caching /tmp doesn't work:

https://tools.taskcluster.net/task-inspector/#IDCd_8HZTUuF31BYph_fBg/0
+ pulseaudio --fail --daemonize --start                                                                                 
E: [pulseaudio] client-conf-x11.c: xcb_connection_has_error() returned true                                             
E: [autospawn] core-util.c: Failed to create random directory /tmp/pulse-9SeiEhAG7w1f: Permission denied                
W: [autospawn] lock-autospawn.c: Cannot access autospawn lock.                                                          
E: [pulseaudio] main.c: Failed to acquire autospawn lock                                                                
cleanup                                                                                                                 
+ cleanup                                                                                                               
+ [[ -s /home/worker/.xsession-errors ]]                                                                                
+ '[' -n '' ']'                               

I don't think we should pursue /tmp further until we're confident it's helpful.
Fair enough.
Let's land the first patch for now. It helped with some of the mochitest timeouts.
I only backed out the /tmp part of the second patch, the result being equivalent to the first patch.
Thanks \o/
(In reply to Dustin J. Mitchell [:dustin] from comment #15)
> Caching /tmp doesn't work:

Is it worth adding a new feature to docker-worker that can mount non-aufs volumes to certain places (like the caching mechanism), but treats them as ephemeral, not an actual cache?
Not immediately -- they would still share space with other caches, and docker-worker will remove these caches with an LRU mechanism, so the effect would be very similar.
What should we do next?

We're prepping to time the harnessess this weekend.
Should we try to time a change for /tmp as well to compare?
No, let's see how the timing for this change works.
Armen, I think there was some work to take timing measurements over the weekend -- any results?
Flags: needinfo?(armenzg)
We decided not to go for it as we want to clear some more issues before that.
We will be doing it this following weekend.
Flags: needinfo?(armenzg)
We did some timing work this weekend, and it looks good.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Without /tmp, we're about 10% slower instead of 30% slower.

/tmp would be helpful to gain that 10% back but I won't insist unless we start being asked how to reduce costs by reduce run times.
My calculation had us 0.9% faster.

Don't assume that /tmp is the right fix :)
comparing m1.medium <-> m1.medium we still have a 12% slowdown with docker/taskcluster vs buildbot.  The 1% improvement includes all the xlarge instances for wpt and gtests which are great speed improvements (and they should be with the faster hardware)
What was the fs system differences between docker and the host? For posterity purposes.
docker is aufs; base OS is ext4 (and would be bind-mounted into the container)
You need to log in before you can comment on or make changes to this bug.