Closed Bug 1507912 Opened 6 years ago Closed 6 years ago

Intermittent awsy\test_base_memory_usage.py TestMemoryUsage.test_open_tabs | AssertionError: heap-unclassified was negative

Categories

(Core :: Graphics: WebRender, defect, P5)

Unspecified
Windows
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure)

Filed by: kgupta [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=212275682&repo=try

https://queue.taskcluster.net/v1/task/d7n8AZk5STakGHF-fgOzhg/runs/0/artifacts/public/logs/live_backing.log

I see this failure in try pushes occasionally. From the python stack, it looks like we're producing a negative heap-unclassified number?
OS: Unspecified → Windows
Summary: Intermittent awsy\test_base_memory_usage.py TestMemoryUsage.test_open_tabs | AssertionError → Intermittent awsy\test_base_memory_usage.py TestMemoryUsage.test_open_tabs | AssertionError (Windows 10 x64 QuantumRender)
I just landed bug 1507914 which should allow us to distinguish the "negative heap unclassified" failures from everything else.
Summary: Intermittent awsy\test_base_memory_usage.py TestMemoryUsage.test_open_tabs | AssertionError (Windows 10 x64 QuantumRender) → Intermittent awsy\test_base_memory_usage.py TestMemoryUsage.test_open_tabs | AssertionError: heap-unclassified was negative
:bholley, any thoughts on the "negative heap unclassified" failures? These only show up on the awsy-base-dmd jobs, which only run on try. But they're probably indicative of some miscounting going on.

https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-11-27&endday=2018-11-28&tree=all&bug=1507912 has some links to logs
Flags: needinfo?(bobbyholley)
This is presumably because something is being double-counted. It happened on non-qr as well, so it's unlikely to be related to WR. It probably happens more on WR because WR may decrease the total heap-unclassified in one of the processes (parent or content), leaving less room for double-counting.

That said, it would still be good to fix. This is failing on the DMD runs, and the reports produced by those runs are exactly what we need to figure out what's being counted twice. Unfortunately, the assertion prevents those reports from being generated, or at least from being uploaded as artifacts.

Eric/Nick, is this something you could fix?
Flags: needinfo?(n.nethercote)
Flags: needinfo?(erahm)
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (:bholley) from comment #5)
> This is presumably because something is being double-counted. It happened on
> non-qr as well, so it's unlikely to be related to WR. It probably happens
> more on WR because WR may decrease the total heap-unclassified in one of the
> processes (parent or content), leaving less room for double-counting.
> 
> That said, it would still be good to fix. This is failing on the DMD runs,
> and the reports produced by those runs are exactly what we need to figure
> out what's being counted twice. Unfortunately, the assertion prevents those
> reports from being generated, or at least from being uploaded as artifacts.
> 
> Eric/Nick, is this something you could fix?

Sure, can you file a bug in Testing :: AWSY? The issue is the assertion causes us to skip moving the files to the artifact upload directory [1]. I believe we could catch the AssertionException and use logger.error instead so that the test still fails but at least finishes.

[1] https://searchfox.org/mozilla-central/rev/f2028b4c38bff2a50ed6aa1763f6dc5ee62b0cc4/testing/awsy/awsy/awsy_test_case.py#84-87,97
Flags: needinfo?(erahm)
(In reply to Bobby Holley (:bholley) from comment #7)
> Filed bug 1510829.

This has been fixed, so we should now get the DMD reports we need. Unfortunately, I can't seem to find any pushes associated with this bug. When we have one, we can probably diagnose this easily.
Flags: needinfo?(n.nethercote)

I downloaded the DMD logs from the try run failure in comment 13 and ran them through dmd.py. Only one of the files (dmd-TabsOpenForceGC-0-12128.json.gz) had twice-reported entries, and that amounted to a total of 448 bytes, while the failure message was complaining about heap-unclassified being -3152. So either the DMD logs are not catching the relevant twice-reported, or we're actually reporting a negative number somewhere. Not sure what's the deal here.

Also I guess the failure in comment 13 is not from a -qr build. There haven't been any failures on -qr builds for a while (the previous two comments where on try pushes in december). So from a QR point of view this is effectively WFM. I'm gonna close it. If it happens on non-try builds somebody should file a new bug (or reopen this one if the builds are QR builds).

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.