Closed Bug 1354232 Opened 3 years ago Closed Last year

Enable LeakSanitizer for web-platform-tests

Categories

(Testing :: web-platform-tests, enhancement)

Version 3
enhancement
Not set

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: mccr8, Assigned: jgraham)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2][wptsync upstream])

Attachments

(19 files)

59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
mccr8
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
Details
59 bytes, text/x-review-board-request
ahal
: review+
mccr8
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
ato
: review+
Details
59 bytes, text/x-review-board-request
ato
: review+
Details
59 bytes, text/x-review-board-request
ato
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
ato
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
59 bytes, text/x-review-board-request
maja_zf
: review+
Details
Once XPCOM leak checking is enabled for web-platform-tests, and AddressSanitizer builds are running them, too, we should make sure the LeakSanitizer is enabled. It catches some kind of low level buffer leaks that the XPCOM leak checker doesn't find.
Depends on: 1354230
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
I just pushed a review for this; I'm not absoutely sure it all works, as I just discovered I forgot to make treeherder/mozharness notice the lsan_* log actions. But particularly given that Maja's going to be away and the code at least nearly works, I think it's ready for review.

The background here is that when we import new web-platform-tests we need a way to ensure that we can make the tree green even though there could be new problems (in this case LSAN) failures. So the approach is to add LSAN to structured logging (via lsan_leak and lsan_summary) actions, and to redirect all the things that can cause failures through there rather than logging strings directly. Then I added the possibility to provide an additional list of stackframes that should cause the leak to be marked as "allowed" (or expected) and added some code in web-platform-tests to update the list of allowed leaks based on the results of a try run. I don't know if there's some better hueristic for picking the best frame from a stack to use as the ignore frame other than just the top one that isn't excluded by the existing exclusions of low-level frames in the LSAN handler.

mccr8 - If this works, how important is the XPCOM leak checking? That works, but probably to enable it I need to repeat the same process of allowing a dynamically updated list of allowed leaks (which is not that hard to do, but of course it's easier to do nothing if LSAN is a pure superset).
Thanks for working on this!

I was going to say that white listing things at the harness level wasn't ideal, but it looks like there are only a handful of failures, and the way that you have them marked looks more finegrained than is available in LSan itself.

Would you like me to file bugs on these leaks? I'm surprised this DNS one shows up in a number of places in WPT, but it isn't a problem elsewhere. It looks like there's only that leak, and some kind of bad worker leak in the file reader directory.

What happens if a leak goes away? Does the list get automatically updated? Also, is there any notification if a new leak shows up?

(In reply to James Graham [:jgraham] from comment #20)
> mccr8 - If this works, how important is the XPCOM leak checking? That works,
> but probably to enable it I need to repeat the same process of allowing a
> dynamically updated list of allowed leaks (which is not that hard to do, but
> of course it's easier to do nothing if LSAN is a pure superset).

We still need XPCOM leak checking because LSan is only run on Linux64. Well, maybe it is run on Windows now, too. LSan will also not report leaks via things held in static variables, while the XPCOM leak checker will.
Also, I'd be interested in a link to a try run, so I could look at what the actual LSan output is for these leaks. Thanks.
> What happens if a leak goes away? Does the list get automatically updated? Also, is there any notification if a new leak shows up?

If a leak goes away we don't update the list because of intermittents. That's not ideal, but I don't know what a good solution looks like.

I'll add notification for this in the wpt sync bugs; likely we will extend that to file bugs for specific failures in the second half of this year.

> Also, I'd be interested in a link to a try run, so I could look at what the actual LSan output is for these leaks.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=182fdd5c694a3ad68a9865b2e5a3ab7123761ca6

Is the one I just pushed after fixing the make-the-tree-turn-orange stuff (so hopefully it's orange)

> We still need XPCOM leak checking because LSan is only run on Linux64. Well, maybe it is run on Windows now, too. LSan will also not report leaks via things held in static variables, while the XPCOM leak checker will.

OK. I tried turning that on and didn't see anything, so either I was holding it wrong or I'm going to need a hint on how to add a leak to a custom build so I can write the metadata update parts for that case.
Comment on attachment 8984482 [details]
Bug 1354232 - Support skipping output lines in handlers,

https://reviewboard.mozilla.org/r/250312/#review256606
Attachment #8984482 - Flags: review?(ato) → review+
Comment on attachment 8984483 [details]
Bug 1354232 - Add group_metadata to test group metadata and pass it through to the browser,

https://reviewboard.mozilla.org/r/250314/#review256608
Attachment #8984483 - Flags: review?(ato) → review+
Comment on attachment 8984485 [details]
Bug 1354232 - Always try to process the leak log whenever we stop firefox,

https://reviewboard.mozilla.org/r/250318/#review256614

::: commit-message-00d0e:5
(Diff revision 1)
> +Bug 1354232 - Always try to process the leak log whenever we stop firefox, r=ato
> +
> +The `stop` method is always called to shutdown firefox, but the
> +cleanup method is only called at the end of a test run. Therefore we
> +need all the leak processing stuff ot happen in stop().

to
Attachment #8984485 - Flags: review?(ato) → review+
(In reply to James Graham [:jgraham] from comment #23)
> If a leak goes away we don't update the list because of intermittents.
> That's not ideal, but I don't know what a good solution looks like.

That's fine. People will just have to manually edit the lists, which is the same as any other harness.

> https://treeherder.mozilla.org/#/
Thanks.

> OK. I tried turning that on and didn't see anything, so either I was holding
> it wrong or I'm going to need a hint on how to add a leak to a custom build
> so I can write the metadata update parts for that case.

You can delete this line from ipc/chromium/src/chrome/common/ipc_message.cc
  MOZ_COUNT_DTOR(IPC::Message);
and it should register every IPC message as leaked.
Oh, I guess there are more leaks in the final patch besides the worker one and the DNS one.

I also see another failure or two in the logs I'm looking at that don't seem to be captured by the existing white lists. Maybe they are intermittent.
Comment on attachment 8984477 [details]
Bug 1354232 - Add support for LSAN to mozlog,

https://reviewboard.mozilla.org/r/250302/#review256714

I only looked at the lsan.py changes, but those look reasonable to me.
Attachment #8984477 - Flags: review?(continuation) → review+
Comment on attachment 8984479 [details]
Bug 1354232 - Enable LSAN Leak detection in wpt,

https://reviewboard.mozilla.org/r/250306/#review256728
Attachment #8984479 - Flags: review?(continuation) → review+
I just noticed this in one of the logs you linked to:
INFO | LeakSanitizer | SUMMARY: (expected) AddressSanitizer: %d byte(s) leaked in %d allocation(s).
The %d's should be numbers.
"mozilla::BackgroundHangThread::ReportHang"

Leaks like this show up in other LSan tests, so I actually have a patch up for review in bug 1467549 to disable BHR in ASan builds to "fix" this leak.
Comment on attachment 8984480 [details]
Bug 1354232 - Allow wpt manifest files to specify LSAN errors to ignore,

https://reviewboard.mozilla.org/r/250308/#review257132
Attachment #8984480 - Flags: review?(mjzffr) → review+
Comment on attachment 8984484 [details]
Bug 1354232 - Add support for updating LSAN data in wpt-update

https://reviewboard.mozilla.org/r/250316/#review257152

::: commit-message-9a8af:14
(Diff revision 1)
> +
> +The rules themselves are generated by taking the topmost frame of any
> +stack that's reported as unexpectedly leaking, and adding that to the
> +list of permitted frames in the lsan-allowed property. We never remove
> +entries from this list since intermittents may be present which won't
> +appear on a specific run. Instead we rely on humand fixing the issues

humans
Attachment #8984484 - Flags: review?(mjzffr) → review+
Comment on attachment 8984486 [details]
Bug 1354232 - Support creating ConditionalValue objects containing a list,

https://reviewboard.mozilla.org/r/250320/#review257156
Attachment #8984486 - Flags: review?(mjzffr) → review+
Comment on attachment 8984487 [details]
Bug 1354232 - Support a __dir__.ini file in the metadata root,

https://reviewboard.mozilla.org/r/250322/#review257158
Attachment #8984487 - Flags: review?(mjzffr) → review+
Comment on attachment 8984492 [details]
Bug 1354232 - Run web-platform-tests on linux64-asan,

https://reviewboard.mozilla.org/r/250332/#review257160
Attachment #8984492 - Flags: review?(mjzffr) → review+
Comment on attachment 8984494 [details]
Bug 1354232 - Update metadata for lsan failures,

https://reviewboard.mozilla.org/r/250336/#review257162
Attachment #8984494 - Flags: review?(mjzffr) → review+
Comment on attachment 8984493 [details]
Bug 1354232 - Chunk by dir more for wpt on asan,

https://reviewboard.mozilla.org/r/250334/#review257164
Attachment #8984493 - Flags: review?(mjzffr) → review+
Comment on attachment 8984491 [details]
Bug 1354232 - Don't store the result of update_expected,

https://reviewboard.mozilla.org/r/250330/#review257166
Attachment #8984491 - Flags: review?(mjzffr) → review+
Comment on attachment 8984488 [details]
Bug 1354232 - Fix updating assert count when there's an exising value,

https://reviewboard.mozilla.org/r/250324/#review257168

::: testing/web-platform/tests/tools/wptrunner/wptrunner/manifestupdate.py:475
(Diff revision 1)
>          if old_value is not None:
>              old_value = self.value_type(old_value)
> -        if old_value and new_value < old_value:
> +        if old_value is not None and new_value < old_value:
>              return 0
>          if old_value is None:
>              # If we are getting some asserts for the first time, set the minimum to 0

This comment is incorrect now?
Attachment #8984488 - Flags: review?(mjzffr) → review+
Comment on attachment 8984489 [details]
Bug 1354232 - Refactor data storage in metadata.py,

https://reviewboard.mozilla.org/r/250326/#review257240

::: testing/web-platform/tests/tools/wptrunner/wptrunner/metadata.py:65
(Diff revision 1)
> -            for tree in expected_map.itervalues():
> -                for test in tree.iterchildren():
> +            for tree in expected:
> +                if not tree.modified:
> +                    continue
> +                for test in expected.iterchildren():
>                      for subtest in test.iterchildren():
>                          if subtest.new_disabled:

I'm not sure about the relationship between `tree`, `test` and `expected`. Seems like the changes result in visiting each test many times, once per modified tree, whereas previously each test was visited only once.

::: testing/web-platform/tests/tools/wptrunner/wptrunner/metadata.py:112
(Diff revision 1)
>      for item in changes:
>          rv[item[1]] = status_keys[item[0]]
>      return rv
>  
>  
>  def unexpected_changes(manifests, change_data, files_changed):

Looks like you can remove this function now.

::: testing/web-platform/tests/tools/wptrunner/wptrunner/metadata.py:331
(Diff revision 1)
>  def create_test_tree(metadata_path, test_manifest, property_order=None,
>                       boolean_properties=None):
> -    expected_map = {}
> +    """Create a map of expectation manifests for all tests in test_manifest,
> +    reading existing manifests under manifest_path
> +
> +    :returns: A map of test_id to (manifest, test, expectation_data)

Misleading comment? It seems to actually be test_id to (manifest, expectation_data)... and you get retrieve the test from expectation_data.
Attachment #8984489 - Flags: review?(mjzffr) → review+
Comment on attachment 8984490 [details]
Bug 1354232 - Fix, enable, and add to, the manifest update tests,

https://reviewboard.mozilla.org/r/250328/#review257296
Attachment #8984490 - Flags: review?(mjzffr) → review+
Comment on attachment 8984476 [details]
Bug 1354232 - Copy LSANLeaks to mozleak,

https://reviewboard.mozilla.org/r/250300/#review257300
Attachment #8984476 - Flags: review?(ahal) → review+
Comment on attachment 8984477 [details]
Bug 1354232 - Add support for LSAN to mozlog,

https://reviewboard.mozilla.org/r/250302/#review257302

::: testing/mozbase/mozlog/mozlog/formatters/machformatter.py:228
(Diff revision 1)
>          return rv
>  
> +    def lsan_leak(self, data):
> +        allowed = data.get("allowed_match")
> +        if allowed:
> +            prefix = self.term.yellow("FAIL")

Should this be more obvious that the failure is allowed? Not really sure what `allowed_match` signifies.

::: testing/mozbase/mozlog/mozlog/formatters/tbplformatter.py:282
(Diff revision 1)
>  
> +    def lsan_leak(self, data):
> +        frames = data.get("frames")
> +        allowed_match = data.get("allowed_match")
> +        frame_list = ", ".join(frames)
> +        prefix = "TEST-UNEXPECTED-FAIL" if not allowed_match else "TEST-FAIL"

Ditto. Should this be TEST-KNOWN-FAIL or similar?
Attachment #8984477 - Flags: review?(ahal) → review+
Comment on attachment 8984478 [details]
Bug 1354232 - Ensure lsan failures cause treeherder jobs to fail,

https://reviewboard.mozilla.org/r/250304/#review257306
Attachment #8984478 - Flags: review?(ahal) → review+
Comment on attachment 8984479 [details]
Bug 1354232 - Enable LSAN Leak detection in wpt,

https://reviewboard.mozilla.org/r/250306/#review257308
Attachment #8984479 - Flags: review?(ahal) → review+
(In reply to Andrew Halberstadt [:ahal] from comment #47)
> Comment on attachment 8984477 [details]
> Bug 1354232 - Add support for LSAN to mozlog,
> 
> https://reviewboard.mozilla.org/r/250302/#review257302
> 
> ::: testing/mozbase/mozlog/mozlog/formatters/machformatter.py:228
> (Diff revision 1)
> >          return rv
> >  
> > +    def lsan_leak(self, data):
> > +        allowed = data.get("allowed_match")
> > +        if allowed:
> > +            prefix = self.term.yellow("FAIL")
> 
> Should this be more obvious that the failure is allowed? Not really sure
> what `allowed_match` signifies.

I wasn't sure what wording to use. The content here is the topmost frame in the stack that matched a rule indicating that the leak is expected. So I'm very happy to hear alternative suggestions, but this was the best I had.

> 
> ::: testing/mozbase/mozlog/mozlog/formatters/tbplformatter.py:282
> (Diff revision 1)
> >  
> > +    def lsan_leak(self, data):
> > +        frames = data.get("frames")
> > +        allowed_match = data.get("allowed_match")
> > +        frame_list = ", ".join(frames)
> > +        prefix = "TEST-UNEXPECTED-FAIL" if not allowed_match else "TEST-FAIL"
> 
> Ditto. Should this be TEST-KNOWN-FAIL or similar?

We don't use that in other places (except in assertion checks, for some reason). So I guess it's not harmful, but it isn't what I would consider the precedent set by the existing behaviour.
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/mozilla-inbound/rev/05441f0a29b3
Copy LSANLeaks to mozleak, r=ahal
https://hg.mozilla.org/integration/mozilla-inbound/rev/6eda80370f59
Add support for LSAN to mozlog, r=ahal, mccr8
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d28db898070
Ensure lsan failures cause treeherder jobs to fail, r=ahal
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0d106c2112d
Enable LSAN Leak detection in wpt, r=ahal,mccr8
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8da0ef2695d
Allow wpt manifest files to specify LSAN errors to ignore, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b9ba77b7316
Log run-by-dir setting, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/7c976baa7194
Support skipping output lines in handlers, r=ato
https://hg.mozilla.org/integration/mozilla-inbound/rev/07049ac67119
Add group_metadata to test group metadata and pass it through to the browser, r=ato
https://hg.mozilla.org/integration/mozilla-inbound/rev/9e8a66385f66
Add support for updating LSAN data in wpt-update r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/8cc292a22975
Always try to process the leak log whenever we stop firefox, r=ato
https://hg.mozilla.org/integration/mozilla-inbound/rev/c1ebcf01626a
Support creating ConditionalValue objects containing a list, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e161374b583
Support a __dir__.ini file in the metadata root, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f2a7154454c
Fix updating assert count when there's an exising value, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/6bf4a20193b8
Refactor data storage in metadata.py, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/5028820efd9e
Fix, enable, and add to, the manifest update tests, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/004b079b76d0
Don't store the result of update_expected, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/9d5ed234247c
Run web-platform-tests on linux64-asan, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/a7dec081ead2
Chunk by dir more for wpt on asan, r=maja_zf
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a5720fa6da9
Update metadata for lsan failures, r=maja_zf
Assignee: nobody → james
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/12073 for changes under testing/web-platform/tests
Whiteboard: [MemShrink:P2] → [MemShrink:P2][wptsync upstream]
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b9193e72be19
Annotate html/infrastructure/urls/resolving-urls/query-encoding/navigation.sub.html as passing on Linux asan. r=RyanVM on IRC
Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f271db900a41
[wpt PR 12073] - [Gecko Bug 1354232] Enable LSAN Leak detection in wpt, a=testonly
You need to log in before you can comment on or make changes to this bug.