Closed Bug 1409876 Opened 7 years ago Closed 7 years ago

Intermittent abort: cannot enable sparse profile on existing non-sparse checkout

Categories

(Firefox Build System :: General, defect, P5)

defect

Tracking

(firefox58 fixed)

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gps)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files)

Filed by: archaeopteryx [at] coole-files.de https://treeherder.mozilla.org/logviewer.html#?job_id=137982094&repo=mozilla-inbound https://queue.taskcluster.net/v1/task/DUdG48iOTMGuI8-_A9bdIw/runs/0/artifacts/public/logs/live_backing.log [taskcluster 2017-10-18 20:49:02.672Z] Task ID: DUdG48iOTMGuI8-_A9bdIw [taskcluster 2017-10-18 20:49:02.673Z] Worker ID: i-06052b53a3146031e [taskcluster 2017-10-18 20:49:02.673Z] Worker Group: us-west-2 [taskcluster 2017-10-18 20:49:02.673Z] Worker Node Type: m3.xlarge [taskcluster 2017-10-18 20:49:02.673Z] Worker Type: gecko-3-decision [taskcluster 2017-10-18 20:49:02.673Z] Public IP: 34.212.24.156 [taskcluster 2017-10-18 20:49:02.673Z] using cache "level-3-checkouts-sparse-v1" -> /builds/worker/checkouts [taskcluster 2017-10-18 20:49:34.265Z] === Task Starting === [setup 2017-10-18T20:49:34.505269Z] run-task started [cache 2017-10-18T20:49:34.507680Z] cache /builds/worker/checkouts exists; requirements: gid=1000 uid=1000 version=1 [setup 2017-10-18T20:49:34.507863Z] running as worker:worker [vcs 2017-10-18T20:49:34.507982Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--sparseprofile', 'build/sparse-profiles/taskgraph', '--revision', '8c7e214c7a5c20315be9f4d3cc087fec3a6268d6', 'https://hg.mozilla.org/integration/mozilla-inbound', '/builds/worker/checkouts/gecko'] [vcs 2017-10-18T20:49:34.571182Z] (using Mercurial 4.3.1) [vcs 2017-10-18T20:49:34.573158Z] ensuring https://hg.mozilla.org/integration/mozilla-inbound@8c7e214c7a5c20315be9f4d3cc087fec3a6268d6 is available at /builds/worker/checkouts/gecko [vcs 2017-10-18T20:49:34.573206Z] abort: cannot enable sparse profile on existing non-sparse checkout [vcs 2017-10-18T20:49:34.573243Z] (use a separate working directory to use sparse) [taskcluster 2017-10-18 20:49:34.762Z] === Task Finished === [taskcluster 2017-10-18 20:49:34.860Z] Unsuccessful task run with exit code: 255 completed in 32.19 seconds
It is good this is failing: that means the verification code in run-task is detecting "bad" state. The bad state in this instance is a task that is using a cache with the "sparse" flag enabled but isn't actually using a sparse checkout. The question now becomes who is "poisoning" the caches. Given that this was on the gecko-decision worker pool (which is pretty much shared across the world), my guess is someone was hacking about with something (possibly on maple) and inadvertently used a bad combination of taskgraph settings. Let's add some more forensics and error checking to make debugging these failures easier...
Assignee: nobody → gps
Status: NEW → ASSIGNED
Comment on attachment 8922143 [details] Bug 1409876 - Record cache history to aid failure analysis; https://reviewboard.mozilla.org/r/193140/#review198620 This is great, just pick one name for it :) ::: taskcluster/docker/recipes/run-task:329 (Diff revision 1) > b'gid=%d' % gid, > } > > + def write_audit_entry(path, msg): > + now = datetime.datetime.utcnow().isoformat() > + with open(path, 'ab') as fh: why bytes? ::: taskcluster/docker/recipes/run-task:340 (Diff revision 1) > print('error: cache %s is not a directory; this should never ' > 'happen' % cache) > return 1 > > requires_path = os.path.join(cache, '.cacherequires') > + forensics_path = os.path.join(cache, '.cachelog') audit or forensics? ::: taskcluster/docker/recipes/run-task:410 (Diff revision 1) > + write_audit_entry(forensics_path, > + 'requirements mismatch; wanted: %s' % > + ', '.join(sorted(our_requirements))) > + > + print('') > + print('activity log:') ..or activity log?
Attachment #8922143 - Flags: review?(dustin) → review+
Comment on attachment 8922144 [details] Bug 1409876 - Refuse to perform a VCS checkout into a cache root; https://reviewboard.mozilla.org/r/193142/#review198624
Attachment #8922144 - Flags: review?(dustin) → review+
Comment on attachment 8922145 [details] Bug 1409876 - Print warning when vcs operations aren't running on a cache or volume; https://reviewboard.mozilla.org/r/193144/#review198626
Attachment #8922145 - Flags: review?(dustin) → review+
Comment on attachment 8922143 [details] Bug 1409876 - Record cache history to aid failure analysis; https://reviewboard.mozilla.org/r/193140/#review198620 > why bytes? Because you write bytes to files :)
Pushed by gszorc@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/994704c5337e Record cache history to aid failure analysis; r=dustin https://hg.mozilla.org/integration/autoland/rev/2f54b783114c Refuse to perform a VCS checkout into a cache root; r=dustin https://hg.mozilla.org/integration/autoland/rev/92b6898ef97b Print warning when vcs operations aren't running on a cache or volume; r=dustin
Product: Core → Firefox Build System
No. But we should get the decision tasks running on a newer version of run-task with the fixes from this bug. I'll file a bug.
Blocks: 1456326
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: