Closed Bug 1292071 Opened 4 years ago Closed 3 years ago

lint jobs show errors on later pushes, not on the one which landed the problematic code

Categories

(Release Engineering :: General, defect, P1, major)

Tracking

(firefox52 fixed)

RESOLVED FIXED
Tracking Status
firefox52 --- fixed

People

(Reporter: aryx, Assigned: gps)

References

Details

(Keywords: regression)

Attachments

(2 files)

The test has been fixed in the next commit: https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=41072825cb0d9bb0914bd211c54bd6dc51a917fb

Kept an eye on this and noticed that Eslint passed each time after that.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
So this happened again: Eslint continued to pass tests when it should have failed. Because Eslint is like a canary for broken JavaScript, the undetected issues can cause tree closures later.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=eslint&fromchange=4b9944879c9a60a9aba4a744a7401bc38e0f39c4
has the first Eslint run failing for the merge from fx-team which contains this change to an unrelated .eslintignore file:
https://hg.mozilla.org/mozilla-central/rev/41cee0dc469fcfd233279fc2756045fe583e048f

The changes which caused the eslint failures:

https://hg.mozilla.org/integration/autoland/rev/e0ef5898308b#l4.259
https://hg.mozilla.org/integration/autoland/rev/23887d112341#l3.85

Both landed earlier and Eslint didn't catch them. Does a cached Eslint output gets evaluated after Eslint runs and does this only get invalidated after either the enabled rules or the file exclusion rules change?

CC Fallen, maybe he knows more.
Severity: normal → major
Status: RESOLVED → REOPENED
Component: Buildduty → General Automation
QA Contact: bugspam.Callek → catlee
Resolution: FIXED → ---
A local test running |mach lint path/to/file.js| detects a change which violates eslint rules, so this might be a releng issue.
Huh, this is weird. For people following along you can this locally:
$ hg update 1de5e07bed66
$ ./mach lint -l eslint devtools/client/sourceeditor/editor.js
/home/ahal/hg/mozilla-central/devtools/client/sourceeditor/editor.js
  1296:9  error  'cm' is defined but never used.  no-unused-vars (eslint)

✖ 1 problem (1 error, 0 warnings)

That revision corresponds to this push which has a green ES task:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=1de5e07bed6601d2647e88ed2690b3f30418d8aa&filter-searchStr=ES
So I tried to reproduce this with an interactive loaner, I got errors when trying to update the repo like this:
abort: No such file or directory: '/home/worker/checkouts/gecko/media/webrtc/signaling/src'

The file it aborted on was different each time, and then finally the update worked. This might explain why |mach lint| passed (if the update aborts, files will be missing). The only problem is this error doesn't show up in the log [1]. Is it somehow possible the log isn't displaying stderr?

Gps, any ideas? Could this be vcs related?

[1] https://public-artifacts.taskcluster.net/BpIOCGC7Rxu7d1LvrtSqJQ/0/public/logs/live_backing.log
Flags: needinfo?(gps)
This is not more than a gut feeling. When fiddling around with eslint upgrades I was running eslint and getting no errors or warnings. I didn't suspect a problem until it wasn't finding an obvious issue. I don't remember what I did then, it may have been an eslint configuration or dependency issue, but afterwards it was finding the issues as expected and also showed a few more that were hidden before.

I suspect also that stderr is not displayed or otherwise swallowed. I recently filed/fixed bug 1303725 which may be marginally related, but the error case there is much more obvious.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=555269567efe&selectedJob=28820699 is a try push of mozilla-aurora, being given an eslint failure in a file which does not exist on aurora, a failure which did not exist when the test file landed a week ago, so I was apparently given the results from someone's random try push based on another branch ten or fifteen days ago.

Having a hard time believing this is a suite that should be visible.
This is almost certainly VCS related. This is definitely a P1. I have some ideas what's going on. I'll look around.
Priority: -- → P1
(In reply to Andrew Halberstadt [:ahal] from comment #6)
> So I tried to reproduce this with an interactive loaner, I got errors when
> trying to update the repo like this:
> abort: No such file or directory:
> '/home/worker/checkouts/gecko/media/webrtc/signaling/src'
> 
> The file it aborted on was different each time, and then finally the update
> worked.

That's *very* interesting. Do you have exact steps to reproduce? I'd appreciate knowing which bundle file it is downloading and which revision(s) you are updating to/between.

> The only problem is this error doesn't show up in
> the log [1]. Is it somehow possible the log isn't displaying stderr?

stderr should be captured by run-task. Not sure why you would be seeing warnings in the interactive worker but not in the TC task log.
Flags: needinfo?(gps) → needinfo?(ahalberstadt)
STR were something like:
1) Request interactive loaner of ES task
2) Copy paste the exact "hg robustcheckout" command from the log..  that failed on abort
3) Run "hg update" manually to the push revision, update failed again
4) Tried again, update worked

I don't have time to try reproducing again now (and tomorrow is a Canadian holiday), but if I had to guess I don't think it is very reproducible. There is also a very good chance that this is completely unrelated to the ES problem at hand, and instead related to environment differences when using one-click loaner.
Flags: needinfo?(ahalberstadt)
Yeah, tried and wasn't able to reproduce again. But for posterity, after connecting to [1], I ran:
> /usr/bin/hg --config hostsecurity.hg.mozilla.org:fingerprints=sha256:8e:ad:f7:6a:eb:44:06:15:ed:f3:e4:69:a6:64:60:37:2d:ff:98:88:37:bf:d7:b8:40:84:01:48:9c:26:ce:d9 robustcheckout --sh
arebase /home/worker/hg-shared --purge --upstream https://hg.mozilla.org/mozilla-unified --revision 1de5e07bed6601d2647e88ed2690b3f30418d8aa https://hg.mozilla.org/mozilla-central/ /home/worker/checkouts/gecko

[1] https://tools.taskcluster.net/one-click-loaner/connect/#Unf8_WdiQw-FT31vJ6rVsg
This also hits flake8:

https://treeherder.mozilla.org/logviewer.html#?job_id=5000148&repo=autoland

It's only orange on that push not before or after. jmaher says that the file is missing from the repo and ahal removed it last week or so.

Failure messages:
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/testing/mochitest/runtestsb2g.py:222:100 | line too long (113 > 99 characters) (E501)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/testing/mochitest/runtestsb2g.py:315:100 | line too long (114 > 99 characters) (E501)
Yeah, if this is a VCS problem (which seems likely), then it'll affect all lint jobs. Maybe as a band-aid fix we can get the lint jobs to run an `hg purge` before running.

So to clarify, is files being linted that don't exist anymore the only issue here? I.e, we've never seen a file that still exists but was checked out at the wrong revision?
Previous eslint failures have all been for already existing or new files, not for ones which got removed.
Hm, in that case a purge wouldn't be good enough, I'm not sure how to debug this further. Gps, you said you might have some ideas in comment 9?
Flags: needinfo?(gps)
Just a heads up, aside from possible VCS issues see bug 1309963. mach lint is swallowing some errors, so in case something else is borked running the linter it won't show up as an error.
Thanks for bringing that to my attention, but I don't think bug 1309963 is related as we've seen this with flake8 and the wpt linter as well.
Flake8 issue similar to the one from yesterday: https://treeherder.mozilla.org/logviewer.html#?job_id=5054746&repo=autoland
Summary: eslint job shows errors on later pushes, not on the one which landed the problematic code → lint jobs show errors on later pushes, not on the one which landed the problematic code
Now that I'm back from traveling I can look into this.
Assignee: nobody → gps
Status: REOPENED → ASSIGNED
Flags: needinfo?(gps)
I suspect this has to do with mixing and matching Mercurial store and checkout caches on worker instances due to workers running multiple tasks concurrently. (The lint tasks allow up to 4 simultaneous tasks whereas most other worker types only allow running a single task.)

I'll try to (dis)prove this within a few hours.
eslint and flake8 hidden on all trees, we can unhide them when they actually run on the revision they claim and when someone has cleaned up the current bustage and when someone has cleaned up all the bustage they'll pick up between now and then.
Blocks: 1311248
Comment on attachment 8802361 [details]
Bug 1292071 - Verify VCS checkout is pristine;

https://reviewboard.mozilla.org/r/86776/#review86244
Attachment #8802361 - Flags: review?(dustin) → review+
Comment on attachment 8802362 [details]
Bug 1292071 - Put Mercurial store on same cache as checkout;

https://reviewboard.mozilla.org/r/86778/#review86246
Attachment #8802362 - Flags: review?(dustin) → review+
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4d50a677b989
Verify VCS checkout is pristine; r=dustin
https://hg.mozilla.org/integration/autoland/rev/332a08725ed0
Put Mercurial store on same cache as checkout; r=dustin
Blocks: 1311791
https://hg.mozilla.org/mozilla-central/rev/4d50a677b989
https://hg.mozilla.org/mozilla-central/rev/332a08725ed0
Status: ASSIGNED → RESOLVED
Closed: 4 years ago3 years ago
Resolution: --- → FIXED
Blocks: 1314981
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.