Closed
Bug 1428076
Opened 7 years ago
Closed 7 years ago
The gecko profiler produces an invalid JSON when a thread ends without emitting any marker and/or sample
Categories
(Core :: Gecko Profiler, defect)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla60
People
(Reporter: julienw, Assigned: julienw)
References
Details
Attachments
(4 files)
STR:
0. Requirements: have the profiler addon installed.
1. Configure the profile to capture workers (add ,Worker in the list of threads to profile)
2. Start the profiler.
3. Open the debugger
4. Close the devtools
5. Capture
Comment hidden (mozreview-request) |
Assignee | ||
Comment 2•7 years ago
|
||
Here is an example.
You can see errors with markers data at line 6696:
"markers": {
"schema": {"name": 0, "time": 1, "data": 2},
"data": [
,
[
8,
...
or line 7569:
"samples": {
"schema": {"stack": 0, "time": 1, "responsiveness": 2, "rss": 3, "uss": 4},
"data": [
,
[11, 34995.826937],
[14, 34995.900643],
[26, 34996.907289],
[32, 34997.905775],
[52, 34998.905071]
]
},
So errors (the empty space before the comma) are both with markers and data.
Assignee | ||
Comment 3•7 years ago
|
||
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing
Hey Markus,
This fixes the issue for me, but I have some questions :)
1. I'm not sure of when "FlushSamplesAndMarkers" is really called. I looked at all its call sites but it's still blurry to me. Especially what are the cases where this is called but the worker is still running ? This would be interesting to be able to reproduce with a simpler worker than the Debugger, and write a test.
2. I wonder if I can use ChunkedJSONWriteFunc::IsEmpty [1] to replace my "wroteSomething" booleans but I'm not sure if this is really equivalent.
[1] https://searchfox.org/mozilla-central/rev/7fb999d1d39418fd331284fab909df076b967ac6/tools/profiler/core/ProfileJSONWriter.h#32
I know I left some "printf", this is for you to try it. Here is my fastest STR. Your gecko profiler needs to be configured to capture only the threads "Worker" (you can remove everything).
1. Launch firefox.
2. Ctrl shift 1 => start the profiler.
3. ctrl shift s => start the debugger.
4. ctrl shift s => stop the debugger.
=> you'll see the printf logs in the console
5. ctrl shift 2 => capture.
Thanks
Attachment #8942133 -
Flags: feedback?(mstange)
Comment 4•7 years ago
|
||
The lifetime of a worker thread is as follows:
- Thread starts up
- Thread runs a given worker script: (this can happen multiple times during the lifetime of the OS thread)
- Thread is registered with the profiler
- Thread tells the profiler about its JSContext
- Thread runs until the worker is done.
- Thread tells the profiler to clear its JSContext
- Thread unregisters itself from the profiler
- Thread shuts down
FlushSamplesAndMarkers is called when the JSContext is cleared but before the thread is unregistered from the profiler.
Here's a profile I captured with your patch applied: https://perfht.ml/2r93ozD
The 27ms gap in the selected thread is probably the point at which this thread's existing data gets serialized to mSavedStreamedSamples and mSavedStreamedMarkers.
However, you can see that none of the other threads have any data for the duration before the selected thread's gap. This is most likely due to bug 1429904: We accidentally discard data from all threads instead of just the flushed thread.
Then, when one of the other threads clears its JS context, we want to store its data inside its mSavedStreamedSamples and mSavedStreamedMarkers, but because that thread's data has accidentally been flushed, it doesn't find any data and mSavedStreamedSamples and mSavedStreamedMarkers are empty.
Comment 5•7 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #3)
> 2. I wonder if I can use ChunkedJSONWriteFunc::IsEmpty [1] to replace my
> "wroteSomething" booleans but I'm not sure if this is really equivalent.
It doesn't seem equivalent. ChunkedJSONWriteFunc::IsEmpty looks like it's currently unused, and I think it can never return false because AllocChunk is called in the constructor, and AllocChunk sets mChunkPtr to something non-null.
Comment 6•7 years ago
|
||
mozreview-review |
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing
https://reviewboard.mozilla.org/r/212392/#review218682
::: tools/profiler/core/ThreadInfo.cpp:271
(Diff revision 1)
> - mContext, *mUniqueStacks);
> + mContext, *mUniqueStacks);
> }
> + printf(">>>>>>>>>>> Samples: wrote is %s for thread %d (%s)\n", wrote ? "true" : "false", mThreadId, Name());
> b.EndBareList();
> + if (wrote) {
> - mSavedStreamedSamples = b.WriteFunc()->CopyData();
> + mSavedStreamedSamples = b.WriteFunc()->CopyData();
I think we should set it to null explicitly when nothing was written, in order to clear out any old data.
So something like:
mSavedStreamedSamples = haveSamples ? b.WriteFunc()->CopyData() : nullptr;
::: tools/profiler/core/ThreadInfo.cpp:289
(Diff revision 1)
> b.EndBareList();
> +
> + // https://bugzilla.mozilla.org/show_bug.cgi?id=1428076
> + // Because it happens than some threads output no markers at all, we
> + if (wrote) {
> - mSavedStreamedMarkers = b.WriteFunc()->CopyData();
> + mSavedStreamedMarkers = b.WriteFunc()->CopyData();
Same comment here.
Assignee | ||
Updated•7 years ago
|
Summary: The gecko profiler produces an invalid JSON when a thread ends without emitting any marker → The gecko profiler produces an invalid JSON when a thread ends without emitting any marker and/or sample
Assignee | ||
Comment 7•7 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #4)
> The lifetime of a worker thread is as follows:
> - Thread starts up
> - Thread runs a given worker script: (this can happen multiple times during
> the lifetime of the OS thread)
> - Thread is registered with the profiler
> - Thread tells the profiler about its JSContext
> - Thread runs until the worker is done.
> - Thread tells the profiler to clear its JSContext
> - Thread unregisters itself from the profiler
> - Thread shuts down
>
> FlushSamplesAndMarkers is called when the JSContext is cleared but before
> the thread is unregistered from the profiler.
I see it's used:
* https://searchfox.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2927
This is what you explain above.
* https://searchfox.org/mozilla-central/source/js/xpconnect/src/XPCJSContext.cpp#911
Not sure.
* https://searchfox.org/mozilla-central/source/xpcom/build/XPCOMInit.cpp#1000
I think it's at shutdown.
>
> Here's a profile I captured with your patch applied:
> https://perfht.ml/2r93ozD
=> I can't load it as it uses gecko profile version v10, which doesn't seem to be parsed by perf-html.io.
> The 27ms gap in the selected thread is probably the point at which this
> thread's existing data gets serialized to mSavedStreamedSamples and
> mSavedStreamedMarkers.
> However, you can see that none of the other threads have any data for the
> duration before the selected thread's gap. This is most likely due to bug
> 1429904: We accidentally discard data from all threads instead of just the
> flushed thread.
Ouch, I can look into this other bug too.
I was also wondering what happens if we call Flush several times for one thread (say: we restart a worker several times over the course of a capture). I think we lose the first samples and markers.
> Then, when one of the other threads clears its JS context, we want to store
> its data inside its mSavedStreamedSamples and mSavedStreamedMarkers, but
> because that thread's data has accidentally been flushed, it doesn't find
> any data and mSavedStreamedSamples and mSavedStreamedMarkers are empty.
(In reply to Markus Stange [:mstange] from comment #5)
> (In reply to Julien Wajsberg [:julienw] from comment #3)
> > 2. I wonder if I can use ChunkedJSONWriteFunc::IsEmpty [1] to replace my
> > "wroteSomething" booleans but I'm not sure if this is really equivalent.
>
> It doesn't seem equivalent. ChunkedJSONWriteFunc::IsEmpty looks like it's
> currently unused, and I think it can never return false because AllocChunk
> is called in the constructor, and AllocChunk sets mChunkPtr to something
> non-null.
OK thanks! Now I see what you say :)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 9•7 years ago
|
||
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing
(some bad comments, pushing a new version soon)
Attachment #8942133 -
Flags: review?(mstange)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 11•7 years ago
|
||
I reproduce the error 100% with this file.
I use a data-url as worker source so that it's self-contained. But the code doesn't even have the time to run as I call "terminate" right after creating it.
Comment 12•7 years ago
|
||
mozreview-review |
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing
https://reviewboard.mozilla.org/r/212392/#review219410
::: tools/profiler/core/ProfileBufferEntry.cpp:703
(Diff revision 3)
> // DynamicStringFragment("aaa44d75")
> // DynamicStringFragment("a800.bun")
> // DynamicStringFragment("dle.js:2")
> // DynamicStringFragment("5)")
> //
> -void
> +/* This method returns true if it wrote anything to the writer */
//
::: tools/profiler/core/ProfileBufferEntry.cpp:916
(Diff revision 3)
>
> + return wroteSomething;
> #undef ERROR_AND_CONTINUE
> }
>
> -void
> +/* returns true if it wrote anything to the writer */
Please use // and start with a capital letter and end with a full stop.
::: tools/profiler/core/ProfileBufferEntry.cpp:927
(Diff revision 3)
> UniqueStacks& aUniqueStacks) const
> {
> EntryGetter e(*this);
>
> int currentThreadID = -1;
> + bool wroteSomething = false;
Let's call this haveMarkers
::: tools/profiler/core/ThreadInfo.cpp:262
(Diff revision 3)
> mUniqueStacks.emplace(mContext);
>
> {
> SpliceableChunkedJSONWriter b;
> b.StartBareList();
> + bool wrote = false;
call this haveSamples
::: tools/profiler/core/ThreadInfo.cpp:269
(Diff revision 3)
> + if (mSavedStreamedSamples) {
> + b.Splice(mSavedStreamedSamples.get());
> + wrote = true;
> + }
> +
> + bool hadNewSamples =
no need for this variable, I think you can just || the return value into haveSamples directly
::: tools/profiler/core/ThreadInfo.cpp:277
(Diff revision 3)
> + // https://bugzilla.mozilla.org/show_bug.cgi?id=1428076
> + // Because it may happen than some threads output no samples at all, we
> + // need to prevent an invalid JSON by saving something only when needed.
// If we don't have any data, keep mSavedStreamSamples set to null. That // way we won't try to splice it into the JSON later on, which would
// result in invalid region due to stray commas.
Attachment #8942133 -
Flags: review?(mstange)
Comment 13•7 years ago
|
||
s/invalid region/invalid JSON/
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 16•7 years ago
|
||
Comment on attachment 8945071 [details]
Bug 1428076 - Add a mochitest
Removing the r? request until try runs. I think I'll have to disable the debug until bug 1428072 is fixed.
Attachment #8945071 -
Flags: review?(mstange)
Assignee | ||
Comment 17•7 years ago
|
||
Here is a try on central with only this mochitest to show it permafails: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3b625273f1429fee05e19999ad74a4b0909b681
Initially I had a shorter timeout before stopping the profiler, but it wasn't permafailing in that case.
Here is a try with the patch + this mochitest: https://treeherder.mozilla.org/#/jobs?repo=try&revision=39301760962d934ad3fe0bf9fdb71fa1c6bbfc71
I'm still waiting for the "debug" runs to know if I should disable the test in debug builds until bug 1428072 is fixed.
Assignee | ||
Comment 18•7 years ago
|
||
The test seems to run on a debug build. So I think we can leave it enabled and I can figure it out later in a test for bug 1428072.
Assignee | ||
Updated•7 years ago
|
Attachment #8945071 -
Flags: review?(mstange)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 20•7 years ago
|
||
Now I scheduled a new more complete try.
Comment 21•7 years ago
|
||
mozreview-review |
Comment on attachment 8942133 [details]
Bug 1428076 - Fix bad JSON format when profiling workers that do nothing
https://reviewboard.mozilla.org/r/212392/#review221094
Attachment #8942133 -
Flags: review?(mstange) → review+
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8945071 [details]
Bug 1428076 - Add a mochitest
https://reviewboard.mozilla.org/r/215286/#review221098
Attachment #8945071 -
Flags: review?(mstange) → review+
Comment 23•7 years ago
|
||
Pushed by jwajsberg@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fcf73947d244
Fix bad JSON format when profiling workers that do nothing r=mstange
https://hg.mozilla.org/integration/autoland/rev/167ef8dc0c60
Add a mochitest r=mstange
Comment 24•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/fcf73947d244
https://hg.mozilla.org/mozilla-central/rev/167ef8dc0c60
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox60:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•