Closed Bug 715774 Opened 10 years ago Closed 7 years ago

Simple jank diagnosis

Categories

(Firefox :: General, defect)

x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dmandelin, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(3 files)

I whipped up a simple profiler to diagnose the janks reported by the basic jank meter. The profiler just does a StackWalk64 on the Firefox main thread every T ms (100 at first, but 10 would probably be more informative) and logs the stacks to a file with timestamps (from GetTickCount out of laziness). Then I tweaked the jank meter a bit to also print out timestamps. A python script merges the two, printing stack walk samples for the long events.

(By the way, before I did that, I tried an approach of having Firefox send off IPC messages when it starts and finishes handling events, and taking a sample if 100ms passed after the start event without finishing. The idea was to sample only when there was jank, just to be simple and have only a small amount of data. But it didn't work: janks detected in browser were not correlated with janks detected in the profiling server, probably because of queueing/IO delays.)
This patch is a total mess, and has a lot of stuff from my failed approach. The important part is just printing the timestamps on events, although the SendPid is used with the new profile server (optionally).
Attached file Profiling server code
Profiling server code. If Firefox is running when it starts, it will profile that process. Otherwise, it waits for a pid from Firefox starting up (with the other patch).
Attached file Results from one run
In this run, I started with a 20-tab profile, clicked on tabs a bit, opened about:memory, and ran a GC. The first part of the file shows the detected janks over 100 ms, from longest to shortest, with the (top 8 frames from) samples collected during that jank. At the end is a summary of all samples that were collected during janks over 100 ms.

Two of the 3s janks were from calling FileWrite from stacks like this:

    ZwWriteFile
    WriteFile
    _PR_MD_WRITE
    FileWrite
    nsDiskCacheBlockFile::Write
    nsDiskCacheBlockFile::WriteBlocks
    nsDiskCacheMap::WriteDataCacheBlocks
    nsDiskCacheStreamIO::Flush
    nsDiskCacheStreamIO::CloseOutputStreamInternal
    nsDiskCacheStreamIO::CloseOutputStream
    nsDiskCacheOutputStream::Close
    nsCacheEntryDescriptor::nsOutputStreamWrapper::Close
    nsCacheEntryDescriptor::nsOutputStreamWrapper::~nsOutputStreamWrapper
    nsCacheEntryDescriptor::nsOutputStreamWrapper::`scalar deleting destructor'
    nsCacheEntryDescriptor::nsOutputStreamWrapper::Release

The other one was from a lock call like this:

    NtWaitForSingleObject
    RtlIntegerToUnicodeString
    PR_Lock
    nsCacheService::Lock
    nsCacheEntryDescriptor::GetMetaDataElement
    nsHttpChannel::CheckCache
    nsHttpChannel::Connect
    nsHttpChannel::OnNormalCacheEntryAvailable
    nsHttpChannel::OnCacheEntryAvailable
    nsCacheListenerEvent::Run
    nsThread::ProcessNextEvent

Not sure what that's about, or if it's really significant. I quickly classified the jank samples as:

  58  WriteFile
  30  PR_Lock
  11  JS execution
   6  Stylesheets
   5  JS parsing
   5  layout/reflow
   3  GetFontData
   3  gfx
   3  waiting for event (glitch/bug in tools)
   2  about:memory
   2  ReadFile
   1  GetFileInfo

Summarized more:

  61  IO
  30  PR_Lock

  16  JS
  11  HTML
   3  gfx
   3  GetFontData

   5  bogus

What I see in this run:

 - Big pauses (>1s) generally from IO or locks, seems like we do have issues
 - Medium pauses (100-500ms) mostly from content. GetFontData shows up a bit, but maybe that's just startup?
 - GC/CC don't show up here because they are <70ms in this run, and I was just looking at over 100ms, because that was my profiler resolution. GC/CC can certainly be longer, although I did observe that they are a relatively small fraction of all long events in the previous results.

Of course, this run is very limited: I didn't test apps (Zimbra etc), awesome bar, autocomplete, etc. I can try that later, maybe with a better profiler resolution.
Benoit is working on something similar in bug 653703.
We've been working in a built in profiler. See bug 713227 and https://developer.mozilla.org/en/Performance/Profiling_with_the_Built-in_Profiler for more information. We're adding some platform code to log the information and making it available for extension to preprocess with a sample profiling extension being developed.

I'm currently working on adding stack walking on Mac, it would be useful if you could help with the windows changed to use StackWalk64 properly. Ehsan landed a change to NS_StackWalk (wrapping StackWalk64) to allow a thread handle to be specified. See bug 713240 for the windows changes.
(In reply to Benoit Girard (:BenWa) from comment #5)
> Ehsan
> landed a change to NS_StackWalk (wrapping StackWalk64) to allow a thread
> handle to be specified.

This is bug 713278, which allows calling NS_StackWalk from one thread on another thread.
Attachment #586326 - Attachment mime type: application/octet-stream → text/plain
Attachment #586326 - Attachment mime type: text/plain → application/x-tar
Hmm, looking at the server code, seems like it contains the code that we need in order to symbolicate in Benoit's add-on, which means that we can get it to work on Windows, I think!  I'll try to spend some time on this.  Thanks David for doing this work!  :-)
(In reply to Ehsan Akhgari [:ehsan] from comment #7)
> Hmm, looking at the server code, seems like it contains the code that we
> need in order to symbolicate in Benoit's add-on, which means that we can get
> it to work on Windows, I think!  I'll try to spend some time on this. 
> Thanks David for doing this work!  :-)

Cool. I didn't even know NS_StackWalk existed.

By the way, I was looking at that file, and IIUC, NS_StackWalk (on Windows) actually just posts a message to a "stack walk thread" which then wakes up and does the capture. Is there a particular reason for that design (which I assume is from long ago)? It seems pretty inflexible, and based on my failed attempt above, I'm concerned that the event sync could make the stack walk not happen when you want it to happen.

It sounds like you're covering the integration of the piece Benoit wants, but let me know if there's anything else you want from me. Otherwise, I'm going to continue with prototyping and benchtop experiments.
(In reply to David Mandelin from comment #8)
> (In reply to Ehsan Akhgari [:ehsan] from comment #7)
> > Hmm, looking at the server code, seems like it contains the code that we
> > need in order to symbolicate in Benoit's add-on, which means that we can get
> > it to work on Windows, I think!  I'll try to spend some time on this. 
> > Thanks David for doing this work!  :-)
> 
> Cool. I didn't even know NS_StackWalk existed.
> 
> By the way, I was looking at that file, and IIUC, NS_StackWalk (on Windows)
> actually just posts a message to a "stack walk thread" which then wakes up
> and does the capture. Is there a particular reason for that design (which I
> assume is from long ago)? It seems pretty inflexible, and based on my failed
> attempt above, I'm concerned that the event sync could make the stack walk
> not happen when you want it to happen.

I'm not sure why that is, maybe dbaron can explain? It does seem like it's not optimal for our use case. At the very least it looks like it would slow down sampling needlessly.

We've been using it because we want to have the least stalk walking code possible in the code base. We should try to fix up NS_StackWalk or maybe introduce NS_StackWalk_profiling for high performance signal safe stack walking.
Depends on: 716146
Depends on: 716293
Hate when that bad old cache code does that!

Josh, who is working on that code these days?

/be
(In reply to Brendan Eich [:brendan] from comment #10)

> Josh, who is working on that code these days?

Michal Novotny and Nick Hurley are working on the cache right now. CC'd them.
Whiteboard: [Snappy]
(In reply to David Mandelin from comment #8)
> (In reply to Ehsan Akhgari [:ehsan] from comment #7)
> > Hmm, looking at the server code, seems like it contains the code that we
> > need in order to symbolicate in Benoit's add-on, which means that we can get
> > it to work on Windows, I think!  I'll try to spend some time on this. 
> > Thanks David for doing this work!  :-)
> 
> Cool. I didn't even know NS_StackWalk existed.
> 
> By the way, I was looking at that file, and IIUC, NS_StackWalk (on Windows)
> actually just posts a message to a "stack walk thread" which then wakes up
> and does the capture. Is there a particular reason for that design (which I
> assume is from long ago)? It seems pretty inflexible, and based on my failed
> attempt above, I'm concerned that the event sync could make the stack walk
> not happen when you want it to happen.

At least one reason why you would want to do things that way if you want to walk the stack of a thread on which NS_StackWalk is called is that on Windows, it is possible for the code to raise a Win32 structured exception, which results in the stack to be unwound, and you don't want that to happen while you're walking it.  ;-)

But none of that is necessary if you're walking the stack of *another* thread, which is what I implemented in bug 713278.  We should probably hack to code to avoid all of that overhead in that case.

> It sounds like you're covering the integration of the piece Benoit wants,
> but let me know if there's anything else you want from me. Otherwise, I'm
> going to continue with prototyping and benchtop experiments.

Please continue with your awesome experiments.  My plan is to call NS_StackWalk from the built-in profiler to get a stack trace, and in Benoit's add-on, use the ideas in your code via js-ctypes in order to symbolicate the addresses that NS_StackWalk comes up with.

When I would have time to do this stuff is a whole different story.  :/

(In reply to Benoit Girard (:BenWa) from comment #9)
> We've been using it because we want to have the least stalk walking code
> possible in the code base. We should try to fix up NS_StackWalk or maybe
> introduce NS_StackWalk_profiling for high performance signal safe stack
> walking.

Once we do what I laid out above, we should be fine with just calling NS_StackWalk.
Dave, can you assign a snappy priority to this or remove [snappy] tag if you feel you are done with this.
(In reply to Taras Glek (:taras) from comment #13)
> Dave, can you assign a snappy priority to this or remove [snappy] tag if you
> feel you are done with this.

I'll just remove it--I don't have any immediate plans to work on it and I think the built-in profiler is just about ready on Windows anyway.
Whiteboard: [Snappy]
I'm going to mark this wontfix, since we have a proper profiler now (bug 713227)
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.