Closed Bug 934783 Opened 11 years ago Closed 10 years ago

Fix and re-enable the GPUAdapterReporter

Categories

(Core :: Graphics, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla31

People

(Reporter: n.nethercote, Assigned: Hughman)

References

Details

Attachments

(1 file, 2 obsolete files)

Bug 917496 will disable the GPUAdapterReporter because the "gpu-shared" and "gpu-dedicated" values are often bogus (i.e. huge negative values) on Win7 debug builds, which triggers an assertion in about:memory.  They seems to be ok on other Windows builds.

toolkit/components/aboutmemory/tests/test_aboutmemory5.xul (which bug 929797 will add) demonstrates the problem -- it fails more than half the time with a message like this:

  ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/components/aboutmemory/tests/test_aboutmemory5.xul | Assertion count 1 is greater than expected range 0-0 assertions.

See https://tbpl.mozilla.org/?tree=Try&rev=b9a80850a174 for an example try trun showing the test failures (though note that this link will expire in a few days or maybe weeks).

I suspect that there is a problem with the structs in gfx/thebes/d3dkmtQueryStatistics.h, but I could be wrong.
OS: Mac OS X → Windows 7
No longer blocks: 929797
Finally got together a build on a Win7 machine. I seem to be able to always have bad values from child processes but not the main process.

I am also regularly hitting what appears to be a some sort of race condition when running this test. Everything just halts with the child windows open for about a minute until a stack trace is output in the console. Other times there is just a stack trace from the child process as it was closing.

As for the bogus values, if I make the test leave open the child processes and re-measure the memory moments later there are no bogus values.
Attached patch WIP (obsolete) — Splinter Review
I have found what looks to be a hack around this problem in the process hacker code. BytesCommitted on segment info needs to be recast to ULONG from its normal ULONGLONG which seem to make the value 0 all the time for Win7 sub-processes. I am not sure why this is needed but it the same values were seen in both process hacker and process explorer so I believe it should be ok.

Still WIP as I am going to check its all still working on Win8 later then remove the 'reporter disabled' comments.

njn: Would you mind checking that this does in fact fix the problem and/or send it to try?
Assignee: nobody → hughnougher
Status: NEW → ASSIGNED
Flags: needinfo?(n.nethercote)
I don't have a Windows machine suitable for this. And I believe you have try access now? It'd be quicker and simpler for everyone if you did the try run.
Flags: needinfo?(n.nethercote)
(In reply to Hugh Nougher [:Hughman] from comment #2)
> BytesCommitted on segment info needs to be recast to ULONG from
> its normal ULONGLONG which seem to make the value 0 all the time for Win7
> sub-processes.
I had a hard time parsing this sentence. Do you mean this fix makes the value always 0 for all sub-processes, instead of some really high bogus value? In other words, the high 32 bits get set to some bogus value for sub-processes, and by casting you're setting them to zero? Or do you mean they're always zero *without* the fix? If the former, I personally think |bytesCommitted &= 0xffffffffULL;| would be clearer than the cast :)
Oh, and does this limit the amount of memory that can be reported for the main process to 32-bits as well? That seems like it could be bad on 64-bit builds (though those aren't tier1 right now) in a few years if someone gets a bad leak.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #4)
> Do you mean this fix makes the value always 0 for all
> sub-processes, instead of some really high bogus value?
> In other words, the high 32 bits get set to some bogus value for
> sub-processes, and by casting you're setting them to zero? Or do you mean
> they're always zero *without* the fix?

The result of the cast is that 0 has appeared every time for both gpu-dedicated and gpu-shared on sub-processes.

From my investigation over 3 machine I have access to I found Win8 never showed any problems and Win7 only showed it on one of my configurations. No idea why. The machine that was able to reproduce showed about 50% of the time a segment reported a value over 2^32 and the other times it was zero. Given there seemed to be an average 3 segments per process, some interesting numbers showed up.

I never saw any problems with the main process leading me to believe it must be something to do with how Firefox shares surface(s) from the main process to the children.

> I personally think
> |bytesCommitted &= 0xffffffffULL;| would be clearer than the cast :)

If that does the same as casting then I agree that it is clearer.
I was just going with what was done at 
http://sourceforge.net/p/processhacker/code/HEAD/tree/2.x/trunk/plugins/ExtendedTools/gpumon.c#l434

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #5)
> Oh, and does this limit the amount of memory that can be reported for the
> main process to 32-bits as well? That seems like it could be bad on 64-bit
> builds (though those aren't tier1 right now) in a few years if someone gets
> a bad leak.

The fix does not seem to be needed in Win8 as it always gives a proper ULL value so future 64bit will be OK (so far).

(In reply to Nicholas Nethercote [:njn] from comment #3)
> If the former, I believe you have try access now?
Hmm. I have not heard anything about this. Unless it comes with editbugs access? I will attempt it tomorrow in any case.

PS: I'm going to attempt to see if changing the segment info struct could give other results tomorrow but my hopes are not high as the main process is giving proper results.
> > If the former, I believe you have try access now?
> Hmm. I have not heard anything about this. Unless it comes with editbugs
> access? I will attempt it tomorrow in any case.

Oh, I mixed up editbugs and try access. We should definitely give you level 1 (try) access. Details are at https://www.mozilla.org/hacking/committer/. File a bug to get the access. Bug 710590 is an example you can copy.
Attached patch Patch (obsolete) — Splinter Review
I didn't realise so much time had passed since the last time I had a patch.

Anyway, new patch. Main changes compared to WIP is there is not an explicit cast to ULONG any more as that logic was moved to the stuct making naming obvious. I have to conclude that this strange anomaly only effects some configurations of Win7 and most likely versions of drivers.

Tested on try a few times with minor changes, the most recent being
https://tbpl.mozilla.org/?tree=Try&rev=1175fb96eca3
Attachment #8355996 - Attachment is obsolete: true
Attachment #8409268 - Flags: review?(n.nethercote)
Comment on attachment 8409268 [details] [diff] [review]
Patch

Review of attachment 8409268 [details] [diff] [review]:
-----------------------------------------------------------------

Apologies for the review delay.

I don't really have any idea about these magic Windows structs, but if try server says it's getting reasonable results, that's good enough for me. Thanks.
Attachment #8409268 - Flags: review?(n.nethercote) → review+
Attached patch Patch FinalSplinter Review
Fixed commit message, carried review forward.
Attachment #8409268 - Attachment is obsolete: true
Attachment #8413319 - Flags: review+
Attachment #8413319 - Flags: checkin?
Keywords: checkin-needed
Comment on attachment 8413319 [details] [diff] [review]
Patch Final

Please just use checkin-needed :)
Attachment #8413319 - Flags: checkin? → checkin+
https://hg.mozilla.org/mozilla-central/rev/1f037080962f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
You need to log in before you can comment on or make changes to this bug.