Open Bug 1521191 Opened 5 years ago Updated 2 years ago

Failing to produce a leak log does not cause a test failure

Categories

(Testing :: Mochitest, defect, P2)

Version 3
defect

Tracking

(firefox-esr60 unaffected, firefox64 unaffected, firefox65 wontfix, firefox66 wontfix, firefox67 wontfix, firefox68 wontfix, firefox69 fix-optional, firefox70 fix-optional)

Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- fix-optional
firefox70 --- fix-optional

People

(Reporter: mccr8, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [MemShrink:P2])

Attachments

(4 files, 1 obsolete file)

I did a try push with a patch that makes it so that it never prints out a leak log, and this does not cause an orange on TreeHerder:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=222757782&repo=try&lineNumber=5844

Note that you do get a FAIL if you run the tests locally through mach.

Bug 1518115 is about how we print out the "missing output line" message even if the failure is expected, while this is how the test job does not turn orange if we actually hit that failure.

This is probably not hiding any leaks because the most common cause of this error is a crash, but we should make sure of that.

I'm not exactly sure when this code is run, but it only handles the leaking failure case from mozleak_total, and not the missing log case, so maybe that's the reason for not failing? https://hg.mozilla.org/mozilla-central/rev/d40ebdfc91a0

Here's the patch I used to disable leak logging.

James, can you take a look at this? Thanks.

Flags: needinfo?(james)

I think I have a handle on how to fix this. I think the logic for mozleak_total in statushandler.py needs to be extended to handle this case.

Flags: needinfo?(james)
Assignee: nobody → continuation

In order to test the test harness's handling of a process failing to
produce a leak log, add a special function that disables the bloat log
output.

This is needed to test the test harness's handling of a negative leak
being reported by the XPCOM leak checker.

Depends on D17535

Attachment #9037691 - Attachment is obsolete: true
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d6ce4b187195
part 1 - Add method to disable dump statistics. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/38d3da4804d7
part 2 - Ensure missing leak logs cause mozharness to fail. r=ahal
https://hg.mozilla.org/integration/autoland/rev/3782d011cc9f
part 3 - Add a function to create a negative leak. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/138e162d2778
part 4 - Add a test for producing an error if there's a negative leak. r=ahal

There's a leak in the webrtc/ directory, which is hitting the "NSS is still alive" fatal assertion, so we crash and don't produce a leak log. That assertion should probably be changed because "crash on leak" is not helpful. Anyways, I'm not sure if this leak is new or I just failed to do a try push with these patches.

Flags: needinfo?(continuation)

There's a number of issues here:

  • I think this WebRTC leak is a regression, so that should get tracked down.
  • The "NSS_Shutdown failed" crash should be turned into a leak, not a crash.
  • There needs to be a way to specify in a dir.ini file that there is no leak for a process type, and that needs to be updated on import. I don't see anything like that right now, but maybe I'm just missing it. I meant to look at this before I landed this, but I forgot about it.

I found my try push for these patches:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=42f5f6be042d6a915f37bc3943cf3d11ee9411ba

It does contain the NSS_Shutdown crash, but it is still somehow green. I think this implies that:

  1. The WebRTC leak isn't a regression.
  2. My patch somehow doesn't actually detect a missing leak log for the WebRTC leak.
  3. There's some other crash that has started happening?

The logs don't seem to contain the pids of things that crash which makes it hard to figure out what is going on.

In fact, with my try push, there are a bunch of other crashes in WebRTC that show up in the "failure summary" but didn't cause the test to go orange, which is peculiar:
PROCESS-CRASH | /webrtc/RTCPeerConnection-removeTrack.https.html | application crashed [@ libxul.so + 0xd1a42f]

Oh, I see, bug 1519862 recently landed and made the leak checker ignore missing logs for content processes. The orange in my attempted landed was from a main process crash. That at least explains that.

Blocks: 1517309
Depends on: 1523445
Whiteboard: [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
Priority: -- → P2

I'm not going to be able to get to this any time soon.

Assignee: continuation → nobody

Marking fix-optional across all versions to remove this from weekly regression triage.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: