715774 - Simple jank diagnosis

Reporter

Description

•

14 years ago

I whipped up a simple profiler to diagnose the janks reported by the basic jank meter. The profiler just does a StackWalk64 on the Firefox main thread every T ms (100 at first, but 10 would probably be more informative) and logs the stacks to a file with timestamps (from GetTickCount out of laziness). Then I tweaked the jank meter a bit to also print out timestamps. A python script merges the two, printing stack walk samples for the long events. (By the way, before I did that, I tried an approach of having Firefox send off IPC messages when it starts and finishes handling events, and taking a sample if 100ms passed after the start event without finishing. The idea was to sample only when there was jank, just to be simple and have only a small amount of data. But it didn't work: janks detected in browser were not correlated with janks detected in the profiling server, probably because of queueing/IO delays.)

David Mandelin [:dmandelin]

Reporter

Comment 1

•

14 years ago

Attached patch Updated jank meter patch — Details — Splinter Review

This patch is a total mess, and has a lot of stuff from my failed approach. The important part is just printing the timestamps on events, although the SendPid is used with the new profile server (optionally).

David Mandelin [:dmandelin]

Reporter

Comment 2

•

14 years ago

Attached file Profiling server code — Details

Profiling server code. If Firefox is running when it starts, it will profile that process. Otherwise, it waits for a pid from Firefox starting up (with the other patch).

David Mandelin [:dmandelin]

Reporter

Comment 3

•

14 years ago

Attached file Results from one run — Details

In this run, I started with a 20-tab profile, clicked on tabs a bit, opened about:memory, and ran a GC. The first part of the file shows the detected janks over 100 ms, from longest to shortest, with the (top 8 frames from) samples collected during that jank. At the end is a summary of all samples that were collected during janks over 100 ms. Two of the 3s janks were from calling FileWrite from stacks like this: ZwWriteFile WriteFile _PR_MD_WRITE FileWrite nsDiskCacheBlockFile::Write nsDiskCacheBlockFile::WriteBlocks nsDiskCacheMap::WriteDataCacheBlocks nsDiskCacheStreamIO::Flush nsDiskCacheStreamIO::CloseOutputStreamInternal nsDiskCacheStreamIO::CloseOutputStream nsDiskCacheOutputStream::Close nsCacheEntryDescriptor::nsOutputStreamWrapper::Close nsCacheEntryDescriptor::nsOutputStreamWrapper::~nsOutputStreamWrapper nsCacheEntryDescriptor::nsOutputStreamWrapper::`scalar deleting destructor' nsCacheEntryDescriptor::nsOutputStreamWrapper::Release The other one was from a lock call like this: NtWaitForSingleObject RtlIntegerToUnicodeString PR_Lock nsCacheService::Lock nsCacheEntryDescriptor::GetMetaDataElement nsHttpChannel::CheckCache nsHttpChannel::Connect nsHttpChannel::OnNormalCacheEntryAvailable nsHttpChannel::OnCacheEntryAvailable nsCacheListenerEvent::Run nsThread::ProcessNextEvent Not sure what that's about, or if it's really significant. I quickly classified the jank samples as: 58 WriteFile 30 PR_Lock 11 JS execution 6 Stylesheets 5 JS parsing 5 layout/reflow 3 GetFontData 3 gfx 3 waiting for event (glitch/bug in tools) 2 about:memory 2 ReadFile 1 GetFileInfo Summarized more: 61 IO 30 PR_Lock 16 JS 11 HTML 3 gfx 3 GetFontData 5 bogus What I see in this run: - Big pauses (>1s) generally from IO or locks, seems like we do have issues - Medium pauses (100-500ms) mostly from content. GetFontData shows up a bit, but maybe that's just startup? - GC/CC don't show up here because they are <70ms in this run, and I was just looking at over 100ms, because that was my profiler resolution. GC/CC can certainly be longer, although I did observe that they are a relatively small fraction of all long events in the previous results. Of course, this run is very limited: I didn't test apps (Zimbra etc), awesome bar, autocomplete, etc. I can try that later, maybe with a better profiler resolution.

(not currently active) Ted Mielczarek

Comment 4

•

14 years ago

Benoit is working on something similar in bug 653703.

Benoit Girard (:BenWa)

Comment 5

•

14 years ago

We've been working in a built in profiler. See bug 713227 and https://developer.mozilla.org/en/Performance/Profiling_with_the_Built-in_Profiler for more information. We're adding some platform code to log the information and making it available for extension to preprocess with a sample profiling extension being developed. I'm currently working on adding stack walking on Mac, it would be useful if you could help with the windows changed to use StackWalk64 properly. Ehsan landed a change to NS_StackWalk (wrapping StackWalk64) to allow a thread handle to be specified. See bug 713240 for the windows changes.

(no longer active)

Comment 6

•

14 years ago

(In reply to Benoit Girard (:BenWa) from comment #5) > Ehsan > landed a change to NS_StackWalk (wrapping StackWalk64) to allow a thread > handle to be specified. This is bug 713278, which allows calling NS_StackWalk from one thread on another thread.

(no longer active)

Updated

•

14 years ago

Attachment #586326 - Attachment mime type: application/octet-stream → text/plain

(no longer active)

Updated

•

14 years ago

Attachment #586326 - Attachment mime type: text/plain → application/x-tar

(no longer active)

Comment 7

•

14 years ago

Hmm, looking at the server code, seems like it contains the code that we need in order to symbolicate in Benoit's add-on, which means that we can get it to work on Windows, I think! I'll try to spend some time on this. Thanks David for doing this work! :-)

David Mandelin [:dmandelin]

Reporter

Comment 8

•

14 years ago

(In reply to Ehsan Akhgari [:ehsan] from comment #7) > Hmm, looking at the server code, seems like it contains the code that we > need in order to symbolicate in Benoit's add-on, which means that we can get > it to work on Windows, I think! I'll try to spend some time on this. > Thanks David for doing this work! :-) Cool. I didn't even know NS_StackWalk existed. By the way, I was looking at that file, and IIUC, NS_StackWalk (on Windows) actually just posts a message to a "stack walk thread" which then wakes up and does the capture. Is there a particular reason for that design (which I assume is from long ago)? It seems pretty inflexible, and based on my failed attempt above, I'm concerned that the event sync could make the stack walk not happen when you want it to happen. It sounds like you're covering the integration of the piece Benoit wants, but let me know if there's anything else you want from me. Otherwise, I'm going to continue with prototyping and benchtop experiments.

Benoit Girard (:BenWa)

Comment 9

•

14 years ago

(In reply to David Mandelin from comment #8) > (In reply to Ehsan Akhgari [:ehsan] from comment #7) > > Hmm, looking at the server code, seems like it contains the code that we > > need in order to symbolicate in Benoit's add-on, which means that we can get > > it to work on Windows, I think! I'll try to spend some time on this. > > Thanks David for doing this work! :-) > > Cool. I didn't even know NS_StackWalk existed. > > By the way, I was looking at that file, and IIUC, NS_StackWalk (on Windows) > actually just posts a message to a "stack walk thread" which then wakes up > and does the capture. Is there a particular reason for that design (which I > assume is from long ago)? It seems pretty inflexible, and based on my failed > attempt above, I'm concerned that the event sync could make the stack walk > not happen when you want it to happen. I'm not sure why that is, maybe dbaron can explain? It does seem like it's not optimal for our use case. At the very least it looks like it would slow down sampling needlessly. We've been using it because we want to have the least stalk walking code possible in the code base. We should try to fix up NS_StackWalk or maybe introduce NS_StackWalk_profiling for high performance signal safe stack walking.

Michal Novotny [:michal]

Updated

•

14 years ago

Depends on: 716146

Michal Novotny [:michal]

Updated

•

14 years ago

Depends on: 716293

Brendan Eich [:brendan]

Comment 10

•

14 years ago

Hate when that bad old cache code does that! Josh, who is working on that code these days? /be

Josh Aas

Comment 11

•

14 years ago

(In reply to Brendan Eich [:brendan] from comment #10) > Josh, who is working on that code these days? Michal Novotny and Nick Hurley are working on the cache right now. CC'd them.

Andrew McCreight [:mccr8]

Updated

•

14 years ago

Whiteboard: [Snappy]

(no longer active)

Comment 12

•

14 years ago

(In reply to David Mandelin from comment #8) > (In reply to Ehsan Akhgari [:ehsan] from comment #7) > > Hmm, looking at the server code, seems like it contains the code that we > > need in order to symbolicate in Benoit's add-on, which means that we can get > > it to work on Windows, I think! I'll try to spend some time on this. > > Thanks David for doing this work! :-) > > Cool. I didn't even know NS_StackWalk existed. > > By the way, I was looking at that file, and IIUC, NS_StackWalk (on Windows) > actually just posts a message to a "stack walk thread" which then wakes up > and does the capture. Is there a particular reason for that design (which I > assume is from long ago)? It seems pretty inflexible, and based on my failed > attempt above, I'm concerned that the event sync could make the stack walk > not happen when you want it to happen. At least one reason why you would want to do things that way if you want to walk the stack of a thread on which NS_StackWalk is called is that on Windows, it is possible for the code to raise a Win32 structured exception, which results in the stack to be unwound, and you don't want that to happen while you're walking it. ;-) But none of that is necessary if you're walking the stack of *another* thread, which is what I implemented in bug 713278. We should probably hack to code to avoid all of that overhead in that case. > It sounds like you're covering the integration of the piece Benoit wants, > but let me know if there's anything else you want from me. Otherwise, I'm > going to continue with prototyping and benchtop experiments. Please continue with your awesome experiments. My plan is to call NS_StackWalk from the built-in profiler to get a stack trace, and in Benoit's add-on, use the ideas in your code via js-ctypes in order to symbolicate the addresses that NS_StackWalk comes up with. When I would have time to do this stuff is a whole different story. :/ (In reply to Benoit Girard (:BenWa) from comment #9) > We've been using it because we want to have the least stalk walking code > possible in the code base. We should try to fix up NS_StackWalk or maybe > introduce NS_StackWalk_profiling for high performance signal safe stack > walking. Once we do what I laid out above, we should be fine with just calling NS_StackWalk.

(dormant account)

Comment 13

•

14 years ago

Dave, can you assign a snappy priority to this or remove [snappy] tag if you feel you are done with this.

David Mandelin [:dmandelin]

Reporter

Comment 14

•

14 years ago

(In reply to Taras Glek (:taras) from comment #13) > Dave, can you assign a snappy priority to this or remove [snappy] tag if you > feel you are done with this. I'll just remove it--I don't have any immediate plans to work on it and I think the built-in profiler is just about ready on Windows anyway.

Whiteboard: [Snappy]

Vladan Djeric (:vladan)

Comment 15

•

11 years ago

I'm going to mark this wontfix, since we have a proper profiler now (bug 713227)

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WONTFIX

Updated jank meter patch 14 years ago David Mandelin [:dmandelin] 9.35 KB, patch		Details \| Diff \| Splinter Review
Profiling server code 14 years ago David Mandelin [:dmandelin] 4.44 KB, application/x-tar		Details
Results from one run 14 years ago David Mandelin [:dmandelin] 22.40 KB, text/plain		Details