Closed Bug 1409876 Opened 4 years ago Closed 4 years ago

Intermittent abort: cannot enable sparse profile on existing non-sparse checkout

Categories

(Firefox Build System :: General, defect, P5)

defect

Tracking

(firefox58 fixed)

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gps)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files)

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=137982094&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/DUdG48iOTMGuI8-_A9bdIw/runs/0/artifacts/public/logs/live_backing.log

[taskcluster 2017-10-18 20:49:02.672Z] Task ID: DUdG48iOTMGuI8-_A9bdIw
[taskcluster 2017-10-18 20:49:02.673Z] Worker ID: i-06052b53a3146031e
[taskcluster 2017-10-18 20:49:02.673Z] Worker Group: us-west-2
[taskcluster 2017-10-18 20:49:02.673Z] Worker Node Type: m3.xlarge
[taskcluster 2017-10-18 20:49:02.673Z] Worker Type: gecko-3-decision
[taskcluster 2017-10-18 20:49:02.673Z] Public IP: 34.212.24.156
[taskcluster 2017-10-18 20:49:02.673Z] using cache "level-3-checkouts-sparse-v1" -> /builds/worker/checkouts

[taskcluster 2017-10-18 20:49:34.265Z] === Task Starting ===
[setup 2017-10-18T20:49:34.505269Z] run-task started
[cache 2017-10-18T20:49:34.507680Z] cache /builds/worker/checkouts exists; requirements: gid=1000 uid=1000 version=1
[setup 2017-10-18T20:49:34.507863Z] running as worker:worker
[vcs 2017-10-18T20:49:34.507982Z] executing ['hg', 'robustcheckout', '--sharebase', '/builds/worker/checkouts/hg-store', '--purge', '--upstream', 'https://hg.mozilla.org/mozilla-unified', '--sparseprofile', 'build/sparse-profiles/taskgraph', '--revision', '8c7e214c7a5c20315be9f4d3cc087fec3a6268d6', 'https://hg.mozilla.org/integration/mozilla-inbound', '/builds/worker/checkouts/gecko']
[vcs 2017-10-18T20:49:34.571182Z] (using Mercurial 4.3.1)
[vcs 2017-10-18T20:49:34.573158Z] ensuring https://hg.mozilla.org/integration/mozilla-inbound@8c7e214c7a5c20315be9f4d3cc087fec3a6268d6 is available at /builds/worker/checkouts/gecko
[vcs 2017-10-18T20:49:34.573206Z] abort: cannot enable sparse profile on existing non-sparse checkout
[vcs 2017-10-18T20:49:34.573243Z] (use a separate working directory to use sparse)
[taskcluster 2017-10-18 20:49:34.762Z] === Task Finished ===
[taskcluster 2017-10-18 20:49:34.860Z] Unsuccessful task run with exit code: 255 completed in 32.19 seconds
It is good this is failing: that means the verification code in run-task is detecting "bad" state. The bad state in this instance is a task that is using a cache with the "sparse" flag enabled but isn't actually using a sparse checkout.

The question now becomes who is "poisoning" the caches. Given that this was on the gecko-decision worker pool (which is pretty much shared across the world), my guess is someone was hacking about with something (possibly on maple) and inadvertently used a bad combination of taskgraph settings.

Let's add some more forensics and error checking to make debugging these failures easier...
Assignee: nobody → gps
Status: NEW → ASSIGNED
Comment on attachment 8922143 [details]
Bug 1409876 - Record cache history to aid failure analysis;

https://reviewboard.mozilla.org/r/193140/#review198620

This is great, just pick one name for it :)

::: taskcluster/docker/recipes/run-task:329
(Diff revision 1)
>          b'gid=%d' % gid,
>      }
>  
> +    def write_audit_entry(path, msg):
> +        now = datetime.datetime.utcnow().isoformat()
> +        with open(path, 'ab') as fh:

why bytes?

::: taskcluster/docker/recipes/run-task:340
(Diff revision 1)
>              print('error: cache %s is not a directory; this should never '
>                    'happen' % cache)
>              return 1
>  
>          requires_path = os.path.join(cache, '.cacherequires')
> +        forensics_path = os.path.join(cache, '.cachelog')

audit or forensics?

::: taskcluster/docker/recipes/run-task:410
(Diff revision 1)
> +                write_audit_entry(forensics_path,
> +                                  'requirements mismatch; wanted: %s' %
> +                                  ', '.join(sorted(our_requirements)))
> +
> +                print('')
> +                print('activity log:')

..or activity log?
Attachment #8922143 - Flags: review?(dustin) → review+
Comment on attachment 8922144 [details]
Bug 1409876 - Refuse to perform a VCS checkout into a cache root;

https://reviewboard.mozilla.org/r/193142/#review198624
Attachment #8922144 - Flags: review?(dustin) → review+
Comment on attachment 8922145 [details]
Bug 1409876 - Print warning when vcs operations aren't running on a cache or volume;

https://reviewboard.mozilla.org/r/193144/#review198626
Attachment #8922145 - Flags: review?(dustin) → review+
Comment on attachment 8922143 [details]
Bug 1409876 - Record cache history to aid failure analysis;

https://reviewboard.mozilla.org/r/193140/#review198620

> why bytes?

Because you write bytes to files :)
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/994704c5337e
Record cache history to aid failure analysis; r=dustin
https://hg.mozilla.org/integration/autoland/rev/2f54b783114c
Refuse to perform a VCS checkout into a cache root; r=dustin
https://hg.mozilla.org/integration/autoland/rev/92b6898ef97b
Print warning when vcs operations aren't running on a cache or volume; r=dustin
Product: Core → Firefox Build System
No. But we should get the decision tasks running on a newer version of run-task with the fixes from this bug. I'll file a bug.
Blocks: 1456326
You need to log in before you can comment on or make changes to this bug.