Closed
Bug 764660
Opened 12 years ago
Closed 12 years ago
Collect and report on the CPU utilization of a hung plugin process
Categories
(Core Graveyard :: Plug-ins, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla17
People
(Reporter: benjamin, Assigned: gfritzsche)
References
Details
Attachments
(2 files, 1 obsolete file)
15.02 KB,
patch
|
bbondy
:
review+
|
Details | Diff | Splinter Review |
62.68 KB,
text/plain
|
Details |
One of the things that we came up with this week for better classifying plugin hang reports: we should report a "profile" of what the plugin process is doing while we're waiting for it. Actual profiling isn't really possible without frame pointers, but we should be able to collect information about whether the plugin process is using CPU. I think it should be pretty interesting to classify plugin crashes according to deadlocks, idle, or active processes.
Reporter | ||
Comment 1•12 years ago
|
||
Oh, and we really only care about this on Windows. The other platforms are not worth doing.
Comment 2•12 years ago
|
||
We could use the performance counters API for this: http://msdn.microsoft.com/en-us/library/aa373078%28v=vs.85%29.aspx
Comment 3•12 years ago
|
||
An easier option for Vista and up: QueryProcessCycleTime http://msdn.microsoft.com/en-us/library/ms684929
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → georg.fritzsche
OS: Mac OS X → Windows 7
Assignee | ||
Comment 4•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Attachment #641065 -
Flags: review?(netzen)
Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Georg Fritzsche [:gfritzsche] from comment #4) > Created attachment 641065 [details] [diff] [review] Worth pointing out that i ended up using GetProcessTimes() as the performance counters API is a little more involved: http://gfritzsche.blogspot.de/2012/06/pdh-api-gotchas.html
Status: NEW → ASSIGNED
Comment 6•12 years ago
|
||
Sorry for the delay in reviewing this. I started reviewing this today, I'll try to finish it up over the weekend.
Comment 7•12 years ago
|
||
Comment on attachment 641065 [details] [diff] [review] Collect and report plugin CPU utilization + test Review of attachment 641065 [details] [diff] [review]: ----------------------------------------------------------------- Nice work on the tests, thanks for including them :) ::: dom/plugins/ipc/PluginModuleParent.cpp @@ +173,4 @@ > notes.Put(CS("HangID"), NS_ConvertUTF16toUTF8(hangID)); > + if (mPluginCpuUsageOnHang >= 0) { > + notes.Put(CS("PluginCpuUsage"), > + nsPrintfCString("%.2f", mPluginCpuUsageOnHang)); Since this value is divided by the number of processors, I think it's of critical importance that we also have the number of processors reported here via PR_GetNumberOfProcessors(). @@ +173,5 @@ > notes.Put(CS("HangID"), NS_ConvertUTF16toUTF8(hangID)); > + if (mPluginCpuUsageOnHang >= 0) { > + notes.Put(CS("PluginCpuUsage"), > + nsPrintfCString("%.2f", mPluginCpuUsageOnHang)); > + } Do we need to reset the value of mPluginCpuUsageOnHang to -1 after this? I don't think so but just wanted to point it out in case. @@ +205,4 @@ > Close(); > } > > +namespace { nit: put this inside the #ifdef XP_WIN @@ +208,5 @@ > +namespace { > + > +#ifdef XP_WIN > + > +PRUint64 FileTimeToUTC(const FILETIME& ftime) PRUint64 on the previous line. Also put static before the return type. Also use ULARGE_INTEGER since we're dealing with unsigned int64 here @@ +216,5 @@ > + li.HighPart = ftime.dwHighDateTime; > + return li.QuadPart; > +} > + > +bool nit: Put static before the return type since this function isn't used anywhere else. @@ +236,5 @@ > + sampleTimes[0] = FileTimeToUTC(currentTime); > + cpuTimes[0] = FileTimeToUTC(kernelTime) + FileTimeToUTC(userTime); > + > + // we already hung for a while, a little bit longer won't matter > + ::Sleep(50); Please check with bsmedberg whether a sleep on the main thread here for 50ms is OK. I'm not very familiar with the plugin code, so I can't confirm this myself. @@ +257,5 @@ > +} > + > +#endif // #ifdef XP_WIN > + > +} // anonymous namespace nit: put this inside the #endif ::: dom/plugins/ipc/PluginModuleParent.h @@ +307,4 @@ > nsString mPluginDumpID; > nsString mBrowserDumpID; > nsString mHangID; > + float mPluginCpuUsageOnHang; nit: Surround this with #ifdef XP_WIN ::: dom/plugins/test/mochitest/hang_test.js @@ +80,5 @@ > + let extraData = parseKeyValuePairsFromFile(pluginExtraFile); > + ok("PluginCpuUsage" in extraData, "got extra field for plugin cpu usage"); > + let cpuUsage = parseFloat(extraData["PluginCpuUsage"].replace(',', '.')); > + if (this.idleHang) { > + ok(cpuUsage == 0, "plugin cpu usage is 0%"); I think this may be susceptible to intermittent failures. Could you push to try and then re-run the chrome mochitest that contains this test a bunch of times on each platform? If it is we could try using <= some smaller value instead of explicitly 0.
Attachment #641065 -
Flags: review?(netzen)
Comment 8•12 years ago
|
||
Ignore the static before return type comments, I forgot momentarily when writing that, that the functions were inside an anonymous namespace.
Assignee | ||
Comment 9•12 years ago
|
||
Sorry for the delay in getting back to this. Addressed review comments and limited a few parts to Windows that i had overlooked.
Attachment #641065 -
Attachment is obsolete: true
Assignee | ||
Comment 10•12 years ago
|
||
(In reply to Brian R. Bondy [:bbondy] from comment #7) > @@ +236,5 @@ > > + sampleTimes[0] = FileTimeToUTC(currentTime); > > + cpuTimes[0] = FileTimeToUTC(kernelTime) + FileTimeToUTC(userTime); > > + > > + // we already hung for a while, a little bit longer won't matter > > + ::Sleep(50); > > Please check with bsmedberg whether a sleep on the main thread here for 50ms > is OK. I'm not very familiar with the plugin code, so I can't confirm this > myself. I checked this with him and it's ok as the browser already hung for a while (default timeout is 45 seconds). > ::: dom/plugins/test/mochitest/hang_test.js > @@ +80,5 @@ > > + let extraData = parseKeyValuePairsFromFile(pluginExtraFile); > > + ok("PluginCpuUsage" in extraData, "got extra field for plugin cpu usage"); > > + let cpuUsage = parseFloat(extraData["PluginCpuUsage"].replace(',', '.')); > > + if (this.idleHang) { > > + ok(cpuUsage == 0, "plugin cpu usage is 0%"); > > I think this may be susceptible to intermittent failures. Could you push to > try and then re-run the chrome mochitest that contains this test a bunch of > times on each platform? If it is we could try using <= some smaller value > instead of explicitly 0. Will do try tests later.
Comment 11•12 years ago
|
||
Comment on attachment 645731 [details] [diff] [review] Collect and report plugin CPU utilization + test Review of attachment 645731 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, given that the try tests mentioned earlier passes.
Attachment #645731 -
Flags: review+
Assignee | ||
Comment 12•12 years ago
|
||
Try builds in progress: https://tbpl.mozilla.org/?tree=Try&rev=5e40194ac943
Assignee | ||
Comment 13•12 years ago
|
||
The tests look fine, ran the chrome-tests 6 times on Win/WinXP debug/opt. No failures from this test, only 2 unrelated oranges.
Keywords: checkin-needed
Comment 14•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d2805aefa5b7
Flags: in-testsuite+
Keywords: checkin-needed
Comment 15•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d2805aefa5b7
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla17
Reporter | ||
Comment 16•12 years ago
|
||
Georg, here is a data dump of all the reports with PluginCpuUsage numbers. I think that we're probably going to need to try collecting the CPU for the other Flash processes on Windows as well in order to make this useful. Can you file and take the followup for that?
Reporter | ||
Comment 17•12 years ago
|
||
Also, I think we have locale-specific formatting sneaking in: some of the numbers are '00.00' and some are '00,00'
Comment 18•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] [away 27-July until 7-Aug] from comment #16) > Created attachment 649806 [details] > Data dump, tab-delimited > > Georg, here is a data dump of all the reports with PluginCpuUsage numbers. I > think that we're probably going to need to try collecting the CPU for the > other Flash processes on Windows as well in order to make this useful. Can > you file and take the followup for that? Seems like ~90% of these are cross process message hangs vs cpu spikes. I see a lot of zeros, and a lot of calls I know process events in the background. Some of these we might be able to hook, like NtUserSetWindowPos, which seems pretty common.
Assignee | ||
Comment 19•12 years ago
|
||
Hm, looks like the occasional CPU spikes for stacks that are basically waiting on some synchronisation etc. might skew the picture. Also, from the naming i would have expected "fastzero_I | _VEC_memzero" and "ycc_rgb_convert" to not be idle. Otherwise the data looks sensible to me so far. (In reply to Benjamin Smedberg [:bsmedberg] [away 27-July until 7-Aug] from comment #16) > I think that we're probably going to need to try collecting the CPU for the > other Flash processes on Windows as well in order to make this useful. Can > you file and take the followup for that? Filed bug 781124. (In reply to Benjamin Smedberg [:bsmedberg] [away 27-July until 7-Aug] from comment #17) > Also, I think we have locale-specific formatting sneaking in: some of the > numbers are '00.00' and some are '00,00' Oh, i fixed the locale issue in the test instead of where the .extra fields are written - filed bug 781133.
Updated•2 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•