Fix and re-enable the GPUAdapterReporter

RESOLVED FIXED in mozilla31

Status

()

RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: njn, Assigned: Hughman)

Tracking

unspecified
mozilla31
x86
Windows 7
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

5 years ago
Bug 917496 will disable the GPUAdapterReporter because the "gpu-shared" and "gpu-dedicated" values are often bogus (i.e. huge negative values) on Win7 debug builds, which triggers an assertion in about:memory.  They seems to be ok on other Windows builds.

toolkit/components/aboutmemory/tests/test_aboutmemory5.xul (which bug 929797 will add) demonstrates the problem -- it fails more than half the time with a message like this:

  ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/components/aboutmemory/tests/test_aboutmemory5.xul | Assertion count 1 is greater than expected range 0-0 assertions.

See https://tbpl.mozilla.org/?tree=Try&rev=b9a80850a174 for an example try trun showing the test failures (though note that this link will expire in a few days or maybe weeks).

I suspect that there is a problem with the structs in gfx/thebes/d3dkmtQueryStatistics.h, but I could be wrong.
(Reporter)

Updated

5 years ago
OS: Mac OS X → Windows 7
(Reporter)

Updated

5 years ago
No longer blocks: 929797
(Assignee)

Comment 1

5 years ago
Finally got together a build on a Win7 machine. I seem to be able to always have bad values from child processes but not the main process.

I am also regularly hitting what appears to be a some sort of race condition when running this test. Everything just halts with the child windows open for about a minute until a stack trace is output in the console. Other times there is just a stack trace from the child process as it was closing.

As for the bogus values, if I make the test leave open the child processes and re-measure the memory moments later there are no bogus values.
(Assignee)

Comment 2

5 years ago
Created attachment 8355996 [details] [diff] [review]
WIP

I have found what looks to be a hack around this problem in the process hacker code. BytesCommitted on segment info needs to be recast to ULONG from its normal ULONGLONG which seem to make the value 0 all the time for Win7 sub-processes. I am not sure why this is needed but it the same values were seen in both process hacker and process explorer so I believe it should be ok.

Still WIP as I am going to check its all still working on Win8 later then remove the 'reporter disabled' comments.

njn: Would you mind checking that this does in fact fix the problem and/or send it to try?
Assignee: nobody → hughnougher
Status: NEW → ASSIGNED
Flags: needinfo?(n.nethercote)
(Reporter)

Comment 3

5 years ago
I don't have a Windows machine suitable for this. And I believe you have try access now? It'd be quicker and simpler for everyone if you did the try run.
Flags: needinfo?(n.nethercote)
(In reply to Hugh Nougher [:Hughman] from comment #2)
> BytesCommitted on segment info needs to be recast to ULONG from
> its normal ULONGLONG which seem to make the value 0 all the time for Win7
> sub-processes.
I had a hard time parsing this sentence. Do you mean this fix makes the value always 0 for all sub-processes, instead of some really high bogus value? In other words, the high 32 bits get set to some bogus value for sub-processes, and by casting you're setting them to zero? Or do you mean they're always zero *without* the fix? If the former, I personally think |bytesCommitted &= 0xffffffffULL;| would be clearer than the cast :)
Oh, and does this limit the amount of memory that can be reported for the main process to 32-bits as well? That seems like it could be bad on 64-bit builds (though those aren't tier1 right now) in a few years if someone gets a bad leak.
(Assignee)

Comment 6

5 years ago
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #4)
> Do you mean this fix makes the value always 0 for all
> sub-processes, instead of some really high bogus value?
> In other words, the high 32 bits get set to some bogus value for
> sub-processes, and by casting you're setting them to zero? Or do you mean
> they're always zero *without* the fix?

The result of the cast is that 0 has appeared every time for both gpu-dedicated and gpu-shared on sub-processes.

From my investigation over 3 machine I have access to I found Win8 never showed any problems and Win7 only showed it on one of my configurations. No idea why. The machine that was able to reproduce showed about 50% of the time a segment reported a value over 2^32 and the other times it was zero. Given there seemed to be an average 3 segments per process, some interesting numbers showed up.

I never saw any problems with the main process leading me to believe it must be something to do with how Firefox shares surface(s) from the main process to the children.

> I personally think
> |bytesCommitted &= 0xffffffffULL;| would be clearer than the cast :)

If that does the same as casting then I agree that it is clearer.
I was just going with what was done at 
http://sourceforge.net/p/processhacker/code/HEAD/tree/2.x/trunk/plugins/ExtendedTools/gpumon.c#l434

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #5)
> Oh, and does this limit the amount of memory that can be reported for the
> main process to 32-bits as well? That seems like it could be bad on 64-bit
> builds (though those aren't tier1 right now) in a few years if someone gets
> a bad leak.

The fix does not seem to be needed in Win8 as it always gives a proper ULL value so future 64bit will be OK (so far).

(In reply to Nicholas Nethercote [:njn] from comment #3)
> If the former, I believe you have try access now?
Hmm. I have not heard anything about this. Unless it comes with editbugs access? I will attempt it tomorrow in any case.

PS: I'm going to attempt to see if changing the segment info struct could give other results tomorrow but my hopes are not high as the main process is giving proper results.
(Reporter)

Comment 7

5 years ago
> > If the former, I believe you have try access now?
> Hmm. I have not heard anything about this. Unless it comes with editbugs
> access? I will attempt it tomorrow in any case.

Oh, I mixed up editbugs and try access. We should definitely give you level 1 (try) access. Details are at https://www.mozilla.org/hacking/committer/. File a bug to get the access. Bug 710590 is an example you can copy.
(Assignee)

Comment 8

4 years ago
Created attachment 8409268 [details] [diff] [review]
Patch

I didn't realise so much time had passed since the last time I had a patch.

Anyway, new patch. Main changes compared to WIP is there is not an explicit cast to ULONG any more as that logic was moved to the stuct making naming obvious. I have to conclude that this strange anomaly only effects some configurations of Win7 and most likely versions of drivers.

Tested on try a few times with minor changes, the most recent being
https://tbpl.mozilla.org/?tree=Try&rev=1175fb96eca3
Attachment #8355996 - Attachment is obsolete: true
Attachment #8409268 - Flags: review?(n.nethercote)
(Reporter)

Comment 9

4 years ago
Comment on attachment 8409268 [details] [diff] [review]
Patch

Review of attachment 8409268 [details] [diff] [review]:
-----------------------------------------------------------------

Apologies for the review delay.

I don't really have any idea about these magic Windows structs, but if try server says it's getting reasonable results, that's good enough for me. Thanks.
Attachment #8409268 - Flags: review?(n.nethercote) → review+
(Assignee)

Comment 10

4 years ago
Created attachment 8413319 [details] [diff] [review]
Patch Final

Fixed commit message, carried review forward.
Attachment #8409268 - Attachment is obsolete: true
Attachment #8413319 - Flags: review+
Attachment #8413319 - Flags: checkin?
(Assignee)

Updated

4 years ago
Keywords: checkin-needed
Comment on attachment 8413319 [details] [diff] [review]
Patch Final

Please just use checkin-needed :)
Attachment #8413319 - Flags: checkin? → checkin+
https://hg.mozilla.org/mozilla-central/rev/1f037080962f
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
You need to log in before you can comment on or make changes to this bug.