Closed Bug 917496 Opened 6 years ago Closed 6 years ago

Bogus "gpu-shared" and "gpu-dedicated" memory reports

Categories

(Core :: Graphics, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: njn, Assigned: njn)

References

Details

Attachments

(1 file)

While working on bug 904321 I triggered the following try run:
https://tbpl.mozilla.org/php/getParsedLog.php?id=27957762&tree=Try&full=1#error0.
I got numerous assertions for "gpu-dedicated" and "gpu-shared", e.g.:

02:25:13     INFO -  [Parent 380] ###!!! ASSERTION: Invalid value for gpu-dedicated: 'false', file aboutMemory.js, line 0

02:25:13     INFO -  [Parent 380] ###!!! ASSERTION: Invalid value for gpu-shared: 'false', file aboutMemory.js, line 0


I added code to about:memory to print the invalid values, and did another try run:
https://tbpl.mozilla.org/php/getParsedLog.php?id=27968514&full=1&branch=try.

I didn't get any complaints about "gpu-dedicated" that time, but I did get several ones like this:

05:51:07     INFO -  [Parent 2440] ###!!! ASSERTION: Invalid value (-544102958321631200 / -544102958321631200) for gpu-shared: 'false', file aboutMemory.js, line 0

05:51:08     INFO -  [Parent 2440] ###!!! ASSERTION: Invalid value (-8490265031173210000 / -8490265031173210000) for gpu-shared: 'false', file aboutMemory.js, line 0


(Ignore the fact that the values are printed twice, that's irrelevant in this case.)

These numbers are close enough to the bounds of an int64_t (9223372036854775807 and -9223372036854775808) that I wonder if some kind of overflow is happening.


(BTW, the silver lining to this bug is that these assertion failures caused me to discover bug 917489.)
Hmm. Since this is most likely from the child processes (happening in test_memoryReporters2.xul) I wonder if it has something to do with windows error codes?

I'll try have a look at this sometime using that test as the testcase (got to get my dev env working again).
(In reply to Hugh Nougher [:Hughman] from comment #1)
> (got to get my dev env working again).

This is where I am failing. Stupid IA2Marshal.
This is blocking bug 929797.  In that bug's patch I add a new test which loads about:memory, and on Windows 7 debug builds I'm regularly getting assertion failures due to gpu-shared and gpu-dedicated having these bogus values.

I suspect that some of the structs defined in gfx/thebes/d3dkmtQueryStatistics.h don't match reality.  For example, there's this:

typedef union _D3DKMTQS_RESULT
{
    D3DKMTQS_ADAPTER_INFO AdapterInfo;
    D3DKMTQS_SEGMENT_INFO_WIN7 SegmentInfoWin7;
    D3DKMTQS_SEGMENT_INFO_WIN8 SegmentInfoWin8;
    D3DKMTQS_PROCESS_INFO ProcessInfo;
    D3DKMTQS_PROCESS_SEGMENT_INFO ProcessSegmentInfo;
} D3DKMTQS_RESULT;

which doesn't match http://sourceforge.net/p/processhacker/code/HEAD/tree/2.x/trunk/plugins/ExtendedTools/d3dkmt.h, which d3dkmtQueryStatistics.h was purportedly based on:

typedef union _D3DKMT_QUERYSTATISTICS_RESULT
{
D3DKMT_QUERYSTATISTICS_ADAPTER_INFORMATION AdapterInformation;
D3DKMT_QUERYSTATISTICS_SEGMENT_INFORMATION_V1 SegmentInformationV1; // WIN7
D3DKMT_QUERYSTATISTICS_SEGMENT_INFORMATION SegmentInformation; // WIN8
D3DKMT_QUERYSTATISTICS_NODE_INFORMATION NodeInformation;
D3DKMT_QUERYSTATISTICS_VIDPNSOURCE_INFORMATION VidPnSourceInformation;
D3DKMT_QUERYSTATISTICS_PROCESS_INFORMATION ProcessInformation;
D3DKMT_QUERYSTATISTICS_PROCESS_ADAPTER_INFORMATION ProcessAdapterInformation;
D3DKMT_QUERYSTATISTICS_PROCESS_SEGMENT_INFORMATION ProcessSegmentInformation;
D3DKMT_QUERYSTATISTICS_PROCESS_NODE_INFORMATION ProcessNodeInformation;
D3DKMT_QUERYSTATISTICS_PROCESS_VIDPNSOURCE_INFORMATION ProcessVidPnSourceInformation;
} D3DKMT_QUERYSTATISTICS_RESULT; 

That's just one discrepancy, there may be more, though I haven't checked beyond that.  
These structs are certainly complex enough.

Given that, I basically don't trust GPUAdapterReporter at all, and I think it should be disabled.
Blocks: 929797
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
https://tbpl.mozilla.org/?tree=Try&rev=b9a80850a174 is the try run showing the frequent assertion failure -- 4 out of 6 runs asserted.
Before review, I assume the test is also being run on Win 7 Opt and Win 8. Have you ever seen the assertion happen on anything other than Win 7 Debug?

If it hasn't then I am against the comment about the structs being the blame. Since the reporter itself has no difference between DEBUG and OPT the Win7 Opt should at minimum be asserting just as often. I would suspect some other currently unknown factor to be the cause.
Flags: needinfo?(n.nethercote)
Correct, it only happens on Win7 debug -- see the try link.

Even if the structs aren't to blame, something is wrong, and it's blocking bug 929797, and I think disabling it until the cause is known is the best course of action.
Flags: needinfo?(n.nethercote)
Also, we don't know the values are correct on the other Windows configurations, just that they're not sufficiently bogus (e.g. negative) to trigger an assertion.
Comment on attachment 826674 [details] [diff] [review]
Disable GPUAdapterReporter because its values are frequently bogus.

Review of attachment 826674 [details] [diff] [review]:
-----------------------------------------------------------------

I can agree with that. r=me

Could you create a bug for the re-enable of this that contains what test is failing and/or reproduction details it would be helpful.
Well... I do not seem to have permission to change the review flag so maybe not??
Comment on attachment 826674 [details] [diff] [review]
Disable GPUAdapterReporter because its values are frequently bogus.

Review of attachment 826674 [details] [diff] [review]:
-----------------------------------------------------------------

You must not have the edit_bugs privilege.  I've marked the r+ myself.  And I filed but 934783 about re-enabling the reporter.

Thanks!
Attachment #826674 - Flags: review?(hughnougher) → review+
> You must not have the edit_bugs privilege.

Turns out it's the "canconfirm" privilege that you lacked, and jdm just gave it to you :)
And if you want to test, you could try r+'ing the attached patch.
(In reply to Nicholas Nethercote [:njn] from comment #13)
> > You must not have the edit_bugs privilege.
> 
> Turns out it's the "canconfirm" privilege that you lacked, and jdm just gave
> it to you :)

No it was most likely edit_bugs. I requested it yesterday morning and got it last night. The flags are enabled now so I should be good for next time.
https://hg.mozilla.org/mozilla-central/rev/cea7b6880788
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
You need to log in before you can comment on or make changes to this bug.