Closed Bug 1428076 Opened 6 years ago Closed 6 years ago

The gecko profiler produces an invalid JSON when a thread ends without emitting any marker and/or sample

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox59 --- wontfix
firefox60 --- fixed

People

(Reporter: julienw, Assigned: julienw)

References

Details

Attachments

(4 files)

STR:
0. Requirements: have the profiler addon installed.
1. Configure the profile to capture workers (add ,Worker in the list of threads to profile)
2. Start the profiler.
3. Open the debugger
4. Close the devtools
5. Capture
Here is an example.

You can see errors with markers data at line 6696:

   "markers": {
    "schema": {"name": 0, "time": 1, "data": 2},
    "data": [
     
,
     [
      8,
...

or line 7569:

   "samples": {
    "schema": {"stack": 0, "time": 1, "responsiveness": 2, "rss": 3, "uss": 4},
    "data": [
     
,
     [11, 34995.826937],
     [14, 34995.900643],
     [26, 34996.907289],
     [32, 34997.905775],
     [52, 34998.905071]
    ]
   },

So errors (the empty space before the comma) are both with markers and data.
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing

Hey Markus,

This fixes the issue for me, but I have some questions :)

1. I'm not sure of when "FlushSamplesAndMarkers" is really called. I looked at all its call sites but it's still blurry to me. Especially what are the cases where this is called but the worker is still running ? This would be interesting to be able to reproduce with a simpler worker than the Debugger, and write a test.

2. I wonder if I can use ChunkedJSONWriteFunc::IsEmpty [1] to replace my "wroteSomething" booleans but I'm not sure if this is really equivalent.

[1] https://searchfox.org/mozilla-central/rev/7fb999d1d39418fd331284fab909df076b967ac6/tools/profiler/core/ProfileJSONWriter.h#32

I know I left some "printf", this is for you to try it. Here is my fastest STR. Your gecko profiler needs to be configured to capture only the threads "Worker" (you can remove everything).

1. Launch firefox.
2. Ctrl shift 1 => start the profiler.
3. ctrl shift s => start the debugger.
4. ctrl shift s => stop the debugger.
  => you'll see the printf logs in the console
5. ctrl shift 2 => capture.

Thanks
Attachment #8942133 - Flags: feedback?(mstange)
The lifetime of a worker thread is as follows:
 - Thread starts up
 - Thread runs a given worker script: (this can happen multiple times during the lifetime of the OS thread)
   - Thread is registered with the profiler
   - Thread tells the profiler about its JSContext
   - Thread runs until the worker is done.
   - Thread tells the profiler to clear its JSContext
   - Thread unregisters itself from the profiler
 - Thread shuts down

FlushSamplesAndMarkers is called when the JSContext is cleared but before the thread is unregistered from the profiler.

Here's a profile I captured with your patch applied: https://perfht.ml/2r93ozD
The 27ms gap in the selected thread is probably the point at which this thread's existing data gets serialized to mSavedStreamedSamples and mSavedStreamedMarkers.
However, you can see that none of the other threads have any data for the duration before the selected thread's gap. This is most likely due to bug 1429904: We accidentally discard data from all threads instead of just the flushed thread.
Then, when one of the other threads clears its JS context, we want to store its data inside its mSavedStreamedSamples and mSavedStreamedMarkers, but because that thread's data has accidentally been flushed, it doesn't find any data and mSavedStreamedSamples and mSavedStreamedMarkers are empty.
(In reply to Julien Wajsberg [:julienw] from comment #3)
> 2. I wonder if I can use ChunkedJSONWriteFunc::IsEmpty [1] to replace my
> "wroteSomething" booleans but I'm not sure if this is really equivalent.

It doesn't seem equivalent. ChunkedJSONWriteFunc::IsEmpty looks like it's currently unused, and I think it can never return false because AllocChunk is called in the constructor, and AllocChunk sets mChunkPtr to something non-null.
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing

https://reviewboard.mozilla.org/r/212392/#review218682

::: tools/profiler/core/ThreadInfo.cpp:271
(Diff revision 1)
> -                                  mContext, *mUniqueStacks);
> +                                          mContext, *mUniqueStacks);
>      }
> +    printf(">>>>>>>>>>> Samples: wrote is %s for thread %d (%s)\n", wrote ? "true" : "false", mThreadId, Name());
>      b.EndBareList();
> +    if (wrote) {
> -    mSavedStreamedSamples = b.WriteFunc()->CopyData();
> +      mSavedStreamedSamples = b.WriteFunc()->CopyData();

I think we should set it to null explicitly when nothing was written, in order to clear out any old data.

So something like:
mSavedStreamedSamples = haveSamples ? b.WriteFunc()->CopyData() : nullptr;

::: tools/profiler/core/ThreadInfo.cpp:289
(Diff revision 1)
>      b.EndBareList();
> +
> +    // https://bugzilla.mozilla.org/show_bug.cgi?id=1428076
> +    // Because it happens than some threads output no markers at all, we 
> +    if (wrote) {
> -    mSavedStreamedMarkers = b.WriteFunc()->CopyData();
> +      mSavedStreamedMarkers = b.WriteFunc()->CopyData();

Same comment here.
Summary: The gecko profiler produces an invalid JSON when a thread ends without emitting any marker → The gecko profiler produces an invalid JSON when a thread ends without emitting any marker and/or sample
(In reply to Markus Stange [:mstange] from comment #4)
> The lifetime of a worker thread is as follows:
>  - Thread starts up
>  - Thread runs a given worker script: (this can happen multiple times during
> the lifetime of the OS thread)
>    - Thread is registered with the profiler
>    - Thread tells the profiler about its JSContext
>    - Thread runs until the worker is done.
>    - Thread tells the profiler to clear its JSContext
>    - Thread unregisters itself from the profiler
>  - Thread shuts down
> 
> FlushSamplesAndMarkers is called when the JSContext is cleared but before
> the thread is unregistered from the profiler.

I see it's used:
* https://searchfox.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2927
  This is what you explain above.
* https://searchfox.org/mozilla-central/source/js/xpconnect/src/XPCJSContext.cpp#911
  Not sure.
* https://searchfox.org/mozilla-central/source/xpcom/build/XPCOMInit.cpp#1000
  I think it's at shutdown.

> 
> Here's a profile I captured with your patch applied:
> https://perfht.ml/2r93ozD

=> I can't load it as it uses gecko profile version v10, which doesn't seem to be parsed by perf-html.io.

> The 27ms gap in the selected thread is probably the point at which this
> thread's existing data gets serialized to mSavedStreamedSamples and
> mSavedStreamedMarkers.
> However, you can see that none of the other threads have any data for the
> duration before the selected thread's gap. This is most likely due to bug
> 1429904: We accidentally discard data from all threads instead of just the
> flushed thread.

Ouch, I can look into this other bug too.
I was also wondering what happens if we call Flush several times for one thread (say: we restart a worker several times over the course of a capture). I think we lose the first samples and markers.

> Then, when one of the other threads clears its JS context, we want to store
> its data inside its mSavedStreamedSamples and mSavedStreamedMarkers, but
> because that thread's data has accidentally been flushed, it doesn't find
> any data and mSavedStreamedSamples and mSavedStreamedMarkers are empty.

(In reply to Markus Stange [:mstange] from comment #5)
> (In reply to Julien Wajsberg [:julienw] from comment #3)
> > 2. I wonder if I can use ChunkedJSONWriteFunc::IsEmpty [1] to replace my
> > "wroteSomething" booleans but I'm not sure if this is really equivalent.
> 
> It doesn't seem equivalent. ChunkedJSONWriteFunc::IsEmpty looks like it's
> currently unused, and I think it can never return false because AllocChunk
> is called in the constructor, and AllocChunk sets mChunkPtr to something
> non-null.

OK thanks! Now I see what you say :)
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing

(some bad comments, pushing a new version soon)
Attachment #8942133 - Flags: review?(mstange)
Attached file testcase-worker.html
I reproduce the error 100% with this file.
I use a data-url as worker source so that it's self-contained. But the code doesn't even have the time to run as I call "terminate" right after creating it.
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing

https://reviewboard.mozilla.org/r/212392/#review219410

::: tools/profiler/core/ProfileBufferEntry.cpp:703
(Diff revision 3)
>  //     DynamicStringFragment("aaa44d75")
>  //     DynamicStringFragment("a800.bun")
>  //     DynamicStringFragment("dle.js:2")
>  //     DynamicStringFragment("5)")
>  //
> -void
> +/* This method returns true if it wrote anything to the writer */

//

::: tools/profiler/core/ProfileBufferEntry.cpp:916
(Diff revision 3)
>  
> +  return wroteSomething;
>    #undef ERROR_AND_CONTINUE
>  }
>  
> -void
> +/* returns true if it wrote anything to the writer */

Please use // and start with a capital letter and end with a full stop.

::: tools/profiler/core/ProfileBufferEntry.cpp:927
(Diff revision 3)
>                                     UniqueStacks& aUniqueStacks) const
>  {
>    EntryGetter e(*this);
>  
>    int currentThreadID = -1;
> +  bool wroteSomething = false;

Let's call this haveMarkers

::: tools/profiler/core/ThreadInfo.cpp:262
(Diff revision 3)
>    mUniqueStacks.emplace(mContext);
>  
>    {
>      SpliceableChunkedJSONWriter b;
>      b.StartBareList();
> +    bool wrote = false;

call this haveSamples

::: tools/profiler/core/ThreadInfo.cpp:269
(Diff revision 3)
> +      if (mSavedStreamedSamples) {
> +        b.Splice(mSavedStreamedSamples.get());
> +        wrote = true;
> +      }
> +
> +      bool hadNewSamples =

no need for this variable, I think you can just || the return value into haveSamples directly

::: tools/profiler/core/ThreadInfo.cpp:277
(Diff revision 3)
> +    // https://bugzilla.mozilla.org/show_bug.cgi?id=1428076
> +    // Because it may happen than some threads output no samples at all, we
> +    // need to prevent an invalid JSON by saving something only when needed.

// If we don't have any data, keep mSavedStreamSamples set to null. That // way we won't try to splice it into the JSON later on, which would
// result in invalid region due to stray commas.
Attachment #8942133 - Flags: review?(mstange)
s/invalid region/invalid JSON/
Comment on attachment 8945071 [details]
Bug 1428076 - Add a mochitest

Removing the r? request until try runs. I think I'll have to disable the debug until bug 1428072 is fixed.
Attachment #8945071 - Flags: review?(mstange)
Here is a try on central with only this mochitest to show it permafails: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3b625273f1429fee05e19999ad74a4b0909b681

Initially I had a shorter timeout before stopping the profiler, but it wasn't permafailing in that case.

Here is a try with the patch + this mochitest: https://treeherder.mozilla.org/#/jobs?repo=try&revision=39301760962d934ad3fe0bf9fdb71fa1c6bbfc71

I'm still waiting for the "debug" runs to know if I should disable the test in debug builds until bug 1428072 is fixed.
The test seems to run on a debug build. So I think we can leave it enabled and I can figure it out later in a test for bug 1428072.
Attachment #8945071 - Flags: review?(mstange)
Now I scheduled a new more complete try.
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing

https://reviewboard.mozilla.org/r/212392/#review221094
Attachment #8942133 - Flags: review?(mstange) → review+
Comment on attachment 8945071 [details]
Bug 1428076 - Add a mochitest

https://reviewboard.mozilla.org/r/215286/#review221098
Attachment #8945071 - Flags: review?(mstange) → review+
Pushed by jwajsberg@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fcf73947d244
Fix bad JSON format when profiling workers that do nothing r=mstange
https://hg.mozilla.org/integration/autoland/rev/167ef8dc0c60
Add a mochitest r=mstange
https://hg.mozilla.org/mozilla-central/rev/fcf73947d244
https://hg.mozilla.org/mozilla-central/rev/167ef8dc0c60
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: