Failing to produce a leak log does not cause a test failure

NEW
Unassigned

Status

defect
P2
normal
4 months ago
16 days ago

People

(Reporter: mccr8, Unassigned)

Tracking

(Depends on 1 bug, Blocks 1 bug, {regression})

Version 3
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox64 unaffected, firefox65 wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 affected)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(4 attachments, 1 obsolete attachment)

Reporter

Description

4 months ago

I did a try push with a patch that makes it so that it never prints out a leak log, and this does not cause an orange on TreeHerder:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=222757782&repo=try&lineNumber=5844

Note that you do get a FAIL if you run the tests locally through mach.

Bug 1518115 is about how we print out the "missing output line" message even if the failure is expected, while this is how the test job does not turn orange if we actually hit that failure.

This is probably not hiding any leaks because the most common cause of this error is a crash, but we should make sure of that.

Reporter

Comment 1

4 months ago

I'm not exactly sure when this code is run, but it only handles the leaking failure case from mozleak_total, and not the missing log case, so maybe that's the reason for not failing? https://hg.mozilla.org/mozilla-central/rev/d40ebdfc91a0

Reporter

Comment 2

4 months ago
Here's the patch I used to disable leak logging.
Reporter

Comment 3

4 months ago

James, can you take a look at this? Thanks.

Flags: needinfo?(james)
Reporter

Comment 4

4 months ago

I think I have a handle on how to fix this. I think the logic for mozleak_total in statushandler.py needs to be extended to handle this case.

Flags: needinfo?(james)
Reporter

Updated

4 months ago
Assignee: nobody → continuation
Reporter

Comment 5

4 months ago

In order to test the test harness's handling of a process failing to
produce a leak log, add a special function that disables the bloat log
output.

Reporter

Comment 7

4 months ago

This is needed to test the test harness's handling of a negative leak
being reported by the XPCOM leak checker.

Depends on D17535

Reporter

Updated

4 months ago
Attachment #9037691 - Attachment is obsolete: true

Comment 9

4 months ago
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d6ce4b187195
part 1 - Add method to disable dump statistics. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/38d3da4804d7
part 2 - Ensure missing leak logs cause mozharness to fail. r=ahal
https://hg.mozilla.org/integration/autoland/rev/3782d011cc9f
part 3 - Add a function to create a negative leak. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/138e162d2778
part 4 - Add a test for producing an error if there's a negative leak. r=ahal
Reporter

Comment 11

4 months ago

There's a leak in the webrtc/ directory, which is hitting the "NSS is still alive" fatal assertion, so we crash and don't produce a leak log. That assertion should probably be changed because "crash on leak" is not helpful. Anyways, I'm not sure if this leak is new or I just failed to do a try push with these patches.

Reporter

Updated

4 months ago
Flags: needinfo?(continuation)
Reporter

Comment 12

4 months ago

There's a number of issues here:

  • I think this WebRTC leak is a regression, so that should get tracked down.
  • The "NSS_Shutdown failed" crash should be turned into a leak, not a crash.
  • There needs to be a way to specify in a dir.ini file that there is no leak for a process type, and that needs to be updated on import. I don't see anything like that right now, but maybe I'm just missing it. I meant to look at this before I landed this, but I forgot about it.
Reporter

Comment 13

4 months ago

I found my try push for these patches:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=42f5f6be042d6a915f37bc3943cf3d11ee9411ba

It does contain the NSS_Shutdown crash, but it is still somehow green. I think this implies that:

  1. The WebRTC leak isn't a regression.
  2. My patch somehow doesn't actually detect a missing leak log for the WebRTC leak.
  3. There's some other crash that has started happening?

The logs don't seem to contain the pids of things that crash which makes it hard to figure out what is going on.

In fact, with my try push, there are a bunch of other crashes in WebRTC that show up in the "failure summary" but didn't cause the test to go orange, which is peculiar:
PROCESS-CRASH | /webrtc/RTCPeerConnection-removeTrack.https.html | application crashed [@ libxul.so + 0xd1a42f]

Reporter

Comment 14

4 months ago

Oh, I see, bug 1519862 recently landed and made the leak checker ignore missing logs for content processes. The orange in my attempted landed was from a main process crash. That at least explains that.

Reporter

Updated

4 months ago
Blocks: 1517309
Reporter

Updated

4 months ago
Depends on: 1523445
Reporter

Updated

4 months ago
Duplicate of this bug: 1520654
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Priority: -- → P2
Reporter

Comment 16

16 days ago

I'm not going to be able to get to this any time soon.

Assignee: continuation → nobody
You need to log in before you can comment on or make changes to this bug.