1292071 - lint jobs show errors on later pushes, not on the one which landed the problematic code

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Description

•

8 years ago

2016-08-03 01:00 +0000
Bug 1180944 landed as https://hg.mozilla.org/integration/fx-team/rev/474f38fc48be9f97b49b084ebf0b59293b81cf16
Eslint passes: https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=474f38fc48be&filter-searchStr=eslint

2016-08-03 01:53 +0000
Bug 1285863 landed as https://hg.mozilla.org/integration/fx-team/rev/4bd1976fbb4a
Eslint passes: https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=555caa0082c3&filter-searchStr=eslint

With the next merge 13 hours later, eslint returns errors for these changes: https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=4f072e4cc25ac1b0d68887e58f3ee2d5912a8173&filter-searchStr=eslint

So does ESlint lint outdated files? Tomcat mentioned he observed this behavior already one day earlier.

Alin Selagea [:aselagea]

Comment 1

•

8 years ago

The test has been fixed in the next commit: https://treeherder.mozilla.org/#/jobs?repo=fx-team&revision=41072825cb0d9bb0914bd211c54bd6dc51a917fb

Kept an eye on this and noticed that Eslint passed each time after that.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Comment 2

•

8 years ago

So this happened again: Eslint continued to pass tests when it should have failed. Because Eslint is like a canary for broken JavaScript, the undetected issues can cause tree closures later.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=eslint&fromchange=4b9944879c9a60a9aba4a744a7401bc38e0f39c4
has the first Eslint run failing for the merge from fx-team which contains this change to an unrelated .eslintignore file:
https://hg.mozilla.org/mozilla-central/rev/41cee0dc469fcfd233279fc2756045fe583e048f

The changes which caused the eslint failures:

https://hg.mozilla.org/integration/autoland/rev/e0ef5898308b#l4.259
https://hg.mozilla.org/integration/autoland/rev/23887d112341#l3.85

Both landed earlier and Eslint didn't catch them. Does a cached Eslint output gets evaluated after Eslint runs and does this only get invalidated after either the enabled rules or the file exclusion rules change?

CC Fallen, maybe he knows more.

Severity: normal → major

Status: RESOLVED → REOPENED

Component: Buildduty → General Automation

QA Contact: bugspam.Callek → catlee

Resolution: FIXED → ---

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Comment 3

•

8 years ago

A local test running |mach lint path/to/file.js| detects a change which violates eslint rules, so this might be a releng issue.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Comment 4

•

8 years ago

This issue also affects Try runs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab943d749cb9aadba84e24d10105585067cd4c23
should have warned about
https://hg.mozilla.org/try/rev/7c17fe4f669d0bc8c904475d2f03d0a3754d50c1#l2.162

Andrew Halberstadt [:ahal]

Comment 5

•

8 years ago

Huh, this is weird. For people following along you can this locally:
$ hg update 1de5e07bed66
$ ./mach lint -l eslint devtools/client/sourceeditor/editor.js
/home/ahal/hg/mozilla-central/devtools/client/sourceeditor/editor.js
  1296:9  error  'cm' is defined but never used.  no-unused-vars (eslint)

✖ 1 problem (1 error, 0 warnings)

That revision corresponds to this push which has a green ES task:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=1de5e07bed6601d2647e88ed2690b3f30418d8aa&filter-searchStr=ES

Andrew Halberstadt [:ahal]

Comment 6

•

8 years ago

So I tried to reproduce this with an interactive loaner, I got errors when trying to update the repo like this:
abort: No such file or directory: '/home/worker/checkouts/gecko/media/webrtc/signaling/src'

The file it aborted on was different each time, and then finally the update worked. This might explain why |mach lint| passed (if the update aborts, files will be missing). The only problem is this error doesn't show up in the log [1]. Is it somehow possible the log isn't displaying stderr?

Gps, any ideas? Could this be vcs related?

[1] https://public-artifacts.taskcluster.net/BpIOCGC7Rxu7d1LvrtSqJQ/0/public/logs/live_backing.log

Flags: needinfo?(gps)

Philipp Kewisch [:Fallen] [:📆][:🧩]

Comment 7

•

8 years ago

This is not more than a gut feeling. When fiddling around with eslint upgrades I was running eslint and getting no errors or warnings. I didn't suspect a problem until it wasn't finding an obvious issue. I don't remember what I did then, it may have been an eslint configuration or dependency issue, but afterwards it was finding the issues as expected and also showed a few more that were hidden before.

I suspect also that stderr is not displayed or otherwise swallowed. I recently filed/fixed bug 1303725 which may be marginally related, but the error case there is much more obvious.

Phil Ringnalda (:philor)

Comment 8

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=555269567efe&selectedJob=28820699 is a try push of mozilla-aurora, being given an eslint failure in a file which does not exist on aurora, a failure which did not exist when the test file landed a week ago, so I was apparently given the results from someone's random try push based on another branch ten or fifteen days ago.

Having a hard time believing this is a suite that should be visible.

Gregory Szorc [:gps]

Assignee

Comment 9

•

8 years ago

This is almost certainly VCS related. This is definitely a P1. I have some ideas what's going on. I'll look around.

Priority: -- → P1

Gregory Szorc [:gps]

Assignee

Comment 10

•

8 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #6)
> So I tried to reproduce this with an interactive loaner, I got errors when
> trying to update the repo like this:
> abort: No such file or directory:
> '/home/worker/checkouts/gecko/media/webrtc/signaling/src'
> 
> The file it aborted on was different each time, and then finally the update
> worked.

That's *very* interesting. Do you have exact steps to reproduce? I'd appreciate knowing which bundle file it is downloading and which revision(s) you are updating to/between.

> The only problem is this error doesn't show up in
> the log [1]. Is it somehow possible the log isn't displaying stderr?

stderr should be captured by run-task. Not sure why you would be seeing warnings in the interactive worker but not in the TC task log.

Flags: needinfo?(gps) → needinfo?(ahalberstadt)

Andrew Halberstadt [:ahal]

Comment 11

•

8 years ago

STR were something like:
1) Request interactive loaner of ES task
2) Copy paste the exact "hg robustcheckout" command from the log..  that failed on abort
3) Run "hg update" manually to the push revision, update failed again
4) Tried again, update worked

I don't have time to try reproducing again now (and tomorrow is a Canadian holiday), but if I had to guess I don't think it is very reproducible. There is also a very good chance that this is completely unrelated to the ES problem at hand, and instead related to environment differences when using one-click loaner.

Flags: needinfo?(ahalberstadt)

Andrew Halberstadt [:ahal]

Comment 12

•

8 years ago

Yeah, tried and wasn't able to reproduce again. But for posterity, after connecting to [1], I ran:
> /usr/bin/hg --config hostsecurity.hg.mozilla.org:fingerprints=sha256:8e:ad:f7:6a:eb:44:06:15:ed:f3:e4:69:a6:64:60:37:2d:ff:98:88:37:bf:d7:b8:40:84:01:48:9c:26:ce:d9 robustcheckout --sh
arebase /home/worker/hg-shared --purge --upstream https://hg.mozilla.org/mozilla-unified --revision 1de5e07bed6601d2647e88ed2690b3f30418d8aa https://hg.mozilla.org/mozilla-central/ /home/worker/checkouts/gecko

[1] https://tools.taskcluster.net/one-click-loaner/connect/#Unf8_WdiQw-FT31vJ6rVsg

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Comment 13

•

8 years ago

This also hits flake8:

https://treeherder.mozilla.org/logviewer.html#?job_id=5000148&repo=autoland

It's only orange on that push not before or after. jmaher says that the file is missing from the repo and ahal removed it last week or so.

Failure messages:
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/testing/mochitest/runtestsb2g.py:222:100 | line too long (113 > 99 characters) (E501)
TEST-UNEXPECTED-ERROR | /home/worker/checkouts/gecko/testing/mochitest/runtestsb2g.py:315:100 | line too long (114 > 99 characters) (E501)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Comment 14

•

8 years ago

This has been removed a week ago (see bug 1308202): https://hg.mozilla.org/integration/mozilla-inbound/rev/5c28890906c3

Andrew Halberstadt [:ahal]

Comment 15

•

8 years ago

Yeah, if this is a VCS problem (which seems likely), then it'll affect all lint jobs. Maybe as a band-aid fix we can get the lint jobs to run an `hg purge` before running.

So to clarify, is files being linted that don't exist anymore the only issue here? I.e, we've never seen a file that still exists but was checked out at the wrong revision?

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Comment 16

•

8 years ago

Previous eslint failures have all been for already existing or new files, not for ones which got removed.

Andrew Halberstadt [:ahal]

Comment 17

•

8 years ago

Hm, in that case a purge wouldn't be good enough, I'm not sure how to debug this further. Gps, you said you might have some ideas in comment 9?

Flags: needinfo?(gps)

Philipp Kewisch [:Fallen] [:📆][:🧩]

Comment 18

•

8 years ago

Just a heads up, aside from possible VCS issues see bug 1309963. mach lint is swallowing some errors, so in case something else is borked running the linter it won't show up as an error.

Andrew Halberstadt [:ahal]

Comment 19

•

8 years ago

Thanks for bringing that to my attention, but I don't think bug 1309963 is related as we've seen this with flake8 and the wpt linter as well.

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Comment 20

•

8 years ago

Flake8 issue similar to the one from yesterday: https://treeherder.mozilla.org/logviewer.html#?job_id=5054746&repo=autoland

Summary: eslint job shows errors on later pushes, not on the one which landed the problematic code → lint jobs show errors on later pushes, not on the one which landed the problematic code

Gregory Szorc [:gps]

Assignee

Comment 21

•

8 years ago

Now that I'm back from traveling I can look into this.

Assignee: nobody → gps

Status: REOPENED → ASSIGNED

Flags: needinfo?(gps)

Gregory Szorc [:gps]

Assignee

Comment 22

•

8 years ago

I suspect this has to do with mixing and matching Mercurial store and checkout caches on worker instances due to workers running multiple tasks concurrently. (The lint tasks allow up to 4 simultaneous tasks whereas most other worker types only allow running a single task.)

I'll try to (dis)prove this within a few hours.

Comment hidden (mozreview-request)

The working directory state in automation is somehow incorrect. To
help debug what is doing on, we increase the verbosity of the
`hg robustcheckout` command and perform an explicit `hg status`
after `hg robustcheckout` and verify that the working directory has
no differences from the pristine checkout state.

`hg status` can be expensive. So this is meant as a temporary
stop-gap until we can figure out the underlying problem.

Review commit: https://reviewboard.mozilla.org/r/86776/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/86776/

Comment hidden (mozreview-request)

We were seeing issues with the Mercurial working directory not being
pristine. While I can't reproduce this, I have a hunch it is due to
mixing and matching stores and checkouts in TaskCluster. For example,
if a worker supports running concurrent tasks and 2 tasks arrive at
the same time, the caches for the store and checkout may look like:

  (store0, checkout0)
  (store1, checkout1)

However, the next task may get:

  (store1, checkout0)

This may confuse Mercurial.

This commit eliminates the "hg-shared" cache and places the shared
stores as a sibling directory of the checkout.

Review commit: https://reviewboard.mozilla.org/r/86778/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/86778/

Phil Ringnalda (:philor)

Comment 25

•

8 years ago

eslint and flake8 hidden on all trees, we can unhide them when they actually run on the revision they claim and when someone has cleaned up the current bustage and when someone has cleaned up all the bustage they'll pick up between now and then.

Phil Ringnalda (:philor)

Updated

•

8 years ago

Blocks: 1311248

Comment hidden (mozreview-request)

Comment on attachment 8802362 [details]
Bug 1292071 - Put Mercurial store on same cache as checkout;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/86778/diff/1-2/

Dustin J. Mitchell [:dustin] (he/him)

Comment 27

•

8 years ago

mozreview-review

Comment on attachment 8802361 [details]
Bug 1292071 - Verify VCS checkout is pristine;

https://reviewboard.mozilla.org/r/86776/#review86244

Attachment #8802361 - Flags: review?(dustin) → review+

Dustin J. Mitchell [:dustin] (he/him)

Comment 28

•

8 years ago

mozreview-review

Comment on attachment 8802362 [details]
Bug 1292071 - Put Mercurial store on same cache as checkout;

https://reviewboard.mozilla.org/r/86778/#review86246

Attachment #8802362 - Flags: review?(dustin) → review+

Comment hidden (mozreview-request)

Comment on attachment 8802361 [details]
Bug 1292071 - Verify VCS checkout is pristine;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/86776/diff/1-2/

Comment hidden (mozreview-request)

Comment on attachment 8802362 [details]
Bug 1292071 - Put Mercurial store on same cache as checkout;

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/86778/diff/2-3/

Pulsebot

Comment 31

•

8 years ago

Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4d50a677b989
Verify VCS checkout is pristine; r=dustin
https://hg.mozilla.org/integration/autoland/rev/332a08725ed0
Put Mercurial store on same cache as checkout; r=dustin

Gregory Szorc [:gps]

Assignee

Updated

•

8 years ago

Blocks: 1311791

Carsten Book [:Tomcat]

Comment 32

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/4d50a677b989
https://hg.mozilla.org/mozilla-central/rev/332a08725ed0

Status: ASSIGNED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox52: --- → fixed

Resolution: --- → FIXED

Gregory Szorc [:gps]

Assignee

Updated

•

8 years ago

Blocks: 1314981

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: General Automation → General

Bug 1292071 - Verify VCS checkout is pristine; 8 years ago Gregory Szorc [:gps] 58 bytes, text/x-review-board-request	dustin : review+	Details
Bug 1292071 - Put Mercurial store on same cache as checkout; 8 years ago Gregory Szorc [:gps] 58 bytes, text/x-review-board-request	dustin : review+	Details