Closed Bug 1871425 Opened 7 months ago Closed 5 months ago

Can't land D195419 because of (wrong) lint error (wpt lint broken in lando?)

Categories

(Conduit :: Lando, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emilio, Assigned: sheehan)

Details

(Keywords: leave-open)

Attachments

(8 files)

Tried to land https://lando.services.mozilla.com/D195419/, got twice:

Your request to land D195419 failed.

See https://lando.services.mozilla.com/D195419/ for details.

Reason:
Lando failed to format your patch for conformity with our formatting policy. Please see the details below.

error: problem with lint setup, skipping android-api-lint, android-checkstyle, android-format, android-javadoc, android-lint, android-test
/repos/mozilla-central/tools/lint/wpt.yml
  0   error  Lint process exited with return code 1  (wpt)
  0   error  Lint process exited with return code 1  (wpt)
  0   error  Lint process exited with return code 1  (wpt)

✖ 3 problems (3 errors, 4 warnings, 1 fixed)

However applying the patch locally and running ./mach lint doesn't show any error, even if I lint the whole tree:

$ mach lint -l wpt .
No specific files specified, running the full wpt lint (this is slow)
✖ 0 problems (0 errors, 0 warnings, 0 fixed)

so something doesn't add up. Probably wpt lint failing incorrectly somehow?

I think what happens is that lando somehow applied the whole stack even though I only wanted to land the first patch?

There are some wptlint failures in https://phabricator.services.mozilla.com/D197052 for example. Is that expected?

(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)

I think what happens is that lando somehow applied the whole stack even though I only wanted to land the first patch?

There are some wptlint failures in https://phabricator.services.mozilla.com/D197052 for example. Is that expected?

That's definitely not expected, but it's the behaviour I'm seeing as well. I'm looking into this now.

Assignee: nobody → sheehan

Even after removing the child dependency, it still seems to error.

I ran some queries in the database and it looks like Lando is only associating one revision with the landing job, and the patch looks as expected. Given your removal of the child revision didn't fix the issue, I don't think this bug is due to Lando erroneously trying to land more than one revision in the stack.

I'm not sure what else could be causing this problem at the moment, it could be something related to mach lint or some other issue in Lando. Before patches are applied we remove all the .rej files, run hg --quiet revert --no-backup --all, hg purge, and then hg strip --no-backup -r "not public()", so if some state in mach lint is kept around that isn't removed by that process that may be affected by other patches landing, that could be the cause of the problem. To run the linters we do ./mach lint --fix --outgoing, which should select only the locally applied patches for linting.

Beyond that I'm not entirely sure what could be going on here.

FYI. I also have a similar issue to land https://lando.services.mozilla.com/D197084/, but there is no wpt lint issue in the patch https://phabricator.services.mozilla.com/D197084. Also, I've tried to rebase my patch to latest m-c, but it doesn't seem to help.

:jgraham, :ahal, any idea what might be causing these WPT lint failures in Lando, in patches where there shouldn't be WPT linter issues? In each case it shows the file with unfixable errors as tools/lint/wpt.yml, and the error itself is simply error Lint process exited with return code 1 (wpt). Is it possible to make those errors more detailed in some way? Or perhaps the linter itself is failing to start, and I could add something to Lando's mach lint call to make the output more verbose so we could debug this further?

In the meantime I think we should disable Lando's automated code formatting to unblock landings, given we're in the holiday code freeze and deploying to Lando is a no-go. :)

Flags: needinfo?(james)
Flags: needinfo?(ahal)

Disable automated code formatting in Lando while we debug issues with
the WPT linter failing on seemingly unrelated patches.

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dc2a0bb70097
disable Lando's automated code formatting r=TYLin DONTBUILD
Keywords: leave-open

The error is coming from:
https://searchfox.org/mozilla-central/source/tools/lint/wpt/wpt.py#54

And yeah, that linter doesn't log the subprocess output, so it's impossible to tell why it returned 1. I don't think we actually want to treat a non-zero return code of that subprocess as a lint issue though. If that return code indicates that there were unfixed warnings, we should ignore it. Or if it indicates a legit infrastructure failure, then we should return 1 from that function ourselves (instead of appending it to the list of issues).

Flags: needinfo?(ahal)

wpt lint doesn't really have the concept of non-fatal warnings, it's simple pass/fail. But that makes it surprising that we're seeing the return code 1 without any actual failures; the only place where we set that explicitly is https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/lint/lint.py#1037-1039 which depends on having a non-zero error count. The first and most obvious thing to do would be to change https://searchfox.org/mozilla-central/source/tools/lint/wpt/wpt.py#25 to output any line that can't be parsed as JSON, rather than just ignoring such errors.

Flags: needinfo?(james)

When I implemented that, I locally got the output:

warning: linting the entire repo takes a long time, using --outgoing and --workdir instead. If you want to lint the entire repo, run `./mach lint .`
Got non-JSON output: Traceback (most recent call last):
Got non-JSON output:   File "/home/jgraham/develop/gecko/testing/web-platform/tests/wpt", line 10, in <module>
Got non-JSON output:     wpt.main()
Got non-JSON output:   File "/home/jgraham/develop/gecko/testing/web-platform/tests/tools/wpt/wpt.py", line 210, in main
Got non-JSON output:     venv = setup_virtualenv(main_args.venv, main_args.skip_venv_setup, props)
Got non-JSON output:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Got non-JSON output:   File "/home/jgraham/develop/gecko/testing/web-platform/tests/tools/wpt/wpt.py", line 168, in setup_virtualenv
Got non-JSON output:     venv.start()
Got non-JSON output:   File "/home/jgraham/develop/gecko/testing/web-platform/tests/tools/wpt/virtualenv.py", line 141, in start
Got non-JSON output:     self.create()
Got non-JSON output:   File "/home/jgraham/develop/gecko/testing/web-platform/tests/tools/wpt/virtualenv.py", line 49, in create
Got non-JSON output:     shutil.rmtree(self.path)
Got non-JSON output:   File "/usr/lib/python3.11/shutil.py", line 732, in rmtree
Got non-JSON output:     _rmtree_safe_fd(fd, path, onerror)
Got non-JSON output:   File "/usr/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
Got non-JSON output:     _rmtree_safe_fd(dirfd, fullname, onerror)
Got non-JSON output:   File "/usr/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
Got non-JSON output:     _rmtree_safe_fd(dirfd, fullname, onerror)
Got non-JSON output:   File "/usr/lib/python3.11/shutil.py", line 660, in _rmtree_safe_fd
Got non-JSON output:     _rmtree_safe_fd(dirfd, fullname, onerror)
Got non-JSON output:   [Previous line repeated 2 more times]
Got non-JSON output:   File "/usr/lib/python3.11/shutil.py", line 683, in _rmtree_safe_fd
Got non-JSON output:     onerror(os.unlink, fullname, sys.exc_info())
Got non-JSON output:   File "/usr/lib/python3.11/shutil.py", line 681, in _rmtree_safe_fd
Got non-JSON output:     os.unlink(entry.name, dir_fd=topfd)
Got non-JSON output: FileNotFoundError: [Errno 2] No such file or directory: '__init__.py'
/home/jgraham/develop/gecko/tools/lint/wpt.yml
  0   error  Lint process exited with return code 1  (wpt)

Which could plausibly be what's going on here, although I'm not exactly sure why it would have changed recently. I added a patch to skip errors when cleaning up the virtualenv.

Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/7766178aef3f
Log non-JSON output from wpt lint, r=ahal
https://hg.mozilla.org/integration/autoland/rev/f3216cc886f4
Ignore errors when cleaning up old wpt virtualenv, r=ahal
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/43850 for changes under testing/web-platform/tests

Thanks, :jgraham. I'm going to try re-enabling Lando autoformatting and we will see if the patches resolved this problem.

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a643828dc873
re-enable Lando autoformatting after WPT linter fixes r=zeid DONTBUILD
Upstream PR merged by moz-wptsync-bot

This just happened again to me in https://lando.services.mozilla.com/D197778/

I'm not sure where to look for the actual output of this task?

Flags: needinfo?(sheehan)

In Lando we're just running mach lint --fix --outgoing after the patch stack is applied, so this is all the output that's available. I could add --verbose if that would make this easier to debug.

Flags: needinfo?(sheehan) → needinfo?(james)

We should try with --verbose I guess, but I think it's only going to affect the frontend in the sense that that flag doesn't seem to actually be passed down to wpt lint (and wouldn't change the output if it was).

If it's not super clear what's going on with --verbose I think we'll need to re-disable the autolint so that we can actually land stuff.

Flags: needinfo?(james) → needinfo?(sheehan)

Okay, I've posted a patch to add --verbose to mach lint. I'll try and deploy it first thing next week, then we can re-queue your patch and see if the added output helps us resolve this.

Flags: needinfo?(sheehan)

FWIW, this also happened to me with https://phabricator.services.mozilla.com/D196957

Blocks: 1765615

:jgraham, I added --verbose to the mach lint call in Lando. Can you try and land your patch and let me know if this helps debug the issue any further?

Flags: needinfo?(james)
Your request to land D197778 failed.

See https://lando.services.mozilla.com/D197778/ for details.

Reason:
Lando failed to format your patch for conformity with our formatting policy. Please see the details below.

error: problem with lint setup, skipping android-api-lint, android-checkstyle, android-format, android-javadoc, android-lint, android-test
/repos/mozilla-central/tools/lint/wpt.yml
  0   error  Lint process exited with return code 1  (wpt)
  0   error  Lint process exited with return code 1  (wpt)
  0   error  Lint process exited with return code 1  (wpt)

✖ 3 problems (3 errors, 1 warning, 0 fixed)

So that doesn't seem to have helped, unfortunately.

Flags: needinfo?(james) → needinfo?(sheehan)

This was preventing me from landing a patch as of yesterday, too; I think maybe this is blocking landing of all patches that touch WPT, maybe?

Could we re-disable the linter as a stopgap?

I have queued a patch to disable the autoformatting. Thanks :TYLin for the quick review!

I suppose we have a few paths forward:

  • jgraham or someone with more knowledge of the WPT linter fixes the underlying issue.
  • Add code to Lando to avoid running the WPT linter. We could run mach lint --list to get all linters, then filter out wpt and add the remaining linters to the mach lint call (ie mach lint -l black,clang-format... etc).
  • Resolve bug 1807711 so we can use mach format in Lando (bug 1807712).
Flags: needinfo?(sheehan)
Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e47d9424990f
disable Lando autoformatting r=TYLin DONTBUILD

I'd like to fix the underlying issue, but unless we're able to reproduce it in a way that allows collecting more information about what's failing it's hard to see how we're going to do that. Given that the lint seems to work both locally and on taskcluster, I think there's likely something specific to the lando setup that's causing the problem. Do you have a lando test environment, or similar, which could be used to reproduce the issue?

No longer blocks: 1765615

What's the plan for re-enabling this? Could we enable just clang-format?

My patches are starting to get unrelated clang-format fixes mixed in which is kind of annoying.

Flags: needinfo?(sheehan)

(In reply to Jan de Mooij [:jandem] from comment #37)

What's the plan for re-enabling this? Could we enable just clang-format?

My patches are starting to get unrelated clang-format fixes mixed in which is kind of annoying.

I started trying to fix this by implementing bug 1807712, but then I encountered bug 1881295.

You're right though, we should re-enable this. I will update Lando to explicitly run a specific subset of linters, avoiding the issue with the WPT lint.

Flags: needinfo?(sheehan)

(In reply to Connor Sheehan [:sheehan] from comment #38)

I started trying to fix this by implementing bug 1807712, but then I encountered bug 1881295.

You're right though, we should re-enable this. I will update Lando to explicitly run a specific subset of linters, avoiding the issue with the WPT lint.

Awesome, thank you!

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1da547604167
re-enable autoformatting after disabling WPT linter r=zeid DONTBUILD

I've updated Lando to run a specific subset of formatting-related linters at landing time, and re-enabled autoformatting by landing D203191 just a few minutes ago.

This should resolve the issues with the WPT linter on the Lando side, so I'm going to close this out.

Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: