Incorrect lcov output for xbl sources

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
a year ago

People

(Reporter: chmanchester, Assigned: marco)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Reporter)

Description

3 years ago
When running a directory of browser-chrome mochitests I get strange lcov output for xbl sources. An excerpt from a resulting lcov file:

SF:chrome://global/content/bindings/browser.xml
SF:chrome://browser/content/tabbrowser.xml
FN:816,top-level
FNDA:1,top-level
FNF:1
FNH:1
BRF:0
BRH:0
DA:817,1
LF:1
LH:1
end_of_record
SF:chrome://browser/content/tabbrowser.xml
FN:111,top-level
FNDA:1,top-level
FNF:1
FNH:1
BRF:0
BRH:0
DA:112,1
LF:1
LH:1
end_of_record
SF:chrome://browser/content/tabbrowser.xml
SF:chrome://global/content/bindings/browser.xml
FN:114,top-level
FN:825,top-level
FNDA:1,top-level
FNDA:1,top-level
FNF:2
FNH:2
BRF:0
BRH:0
DA:115,1
LF:1
LH:1
end_of_record


The data for tabbrowser.xml/browser.xml is split up and intermingled. This could have something to do with the inheritance hierarchy between them, but I'm not really sure what's going on here.
Xbl is a weird beast.  From what I recall it uses XDR decode function to inject code.  Thus, I guess any function which is being loaded this way will appear with its own source, and its own script, making a lot of duplicates.

Unfortunately I don't think there is much we can do from the JS engine point of view, as it associate sources by their objects and not by their names.  Doing it by names would imply that we would have to merge data instead of collecting them, while this can be done off-line.

For the function names, I guess I should investigate if we have ways to recover the proper function name.
(Reporter)

Comment 2

3 years ago
Thank you for the info. I'm ok with merging records after the fact (I have patches for bug 1224691 that will make these kinds of transformations straightforward). Recovering the proper function name would be great.

I'm still not sure how to handle the multiple 'SF:' entries. I guess we'd find the actual code in one or the other of them.
Looking at the code which produces these SF statements, it looks like we have some concurrency issues.
At least concurrent write is the only explanation I can find to explain why we would have 2 SF entries following each others knowing that the same function produce the SF entry as well as the end_of_record.

This makes me wonder if if we have more than one JSRuntime per process and per thread, as currently the file which is produced assume that we only have a single one.
(Thanks bz)

The problem is not a concurrency issue, but the fact that we use the ScriptSourceObject pointer as a key to associate the scripts with their sources.

What I did not planned was that the ScriptSourceObject got reclaimed and reused for another ScriptSourceObject at the same address.  Which explains why we see 2 SF:… entries, and sometimes 2 FNDA:1,top-level.

I guess we could solve this issue by assuming that the ScriptSourceObject can only be garbage collected if all the scripts which are holding a reference to it are also garbage collected, thus, at the end of the sweeping cycle, we should "finalize" all LCovSource which have a registered  ScriptSourceObject.
Flags: needinfo?(nicolas.b.pierron)
Created attachment 8709132 [details] [diff] [review]
Code Coverage: Reset ScriptSourceObject pointers at the end of the finalize phase.

Code Coverage: Reset ScriptSourceObject pointers at the end of the finalize phase.

The problem is described in comment 4.

Due to the asynchronous nature of the finalize phase, we cannot assume that
all JSScript are finalized before the ScriptSourceObject, but (I guess) we
can assume that the ScriptSourceObject would only be finalized iff all the
JSScript which are holding a reference to it would at least be finalized by
the end of the sweeping phase.

This patch does 2 things, it moves the JSObject* weak pointer to the
LCovCompartment class and rename the LCovSourceVector to LCovSourceMap.

The second thing, is that it link the LCovCompartment with each others in a
list held alive by the LCovRuntime.  When we finishined the finialization,
and before we delete the JSCompartment, we reset all the JSObject weak
pointers which are corresponding to finalized ScriptSourceObject.

This way, we would be able to collect all JSScript and the
ScriptSourceObject in the LCovSource without mixing different sources in the
same LCovSource.
Attachment #8709132 - Flags: review?(terrence)
Comment on attachment 8709132 [details] [diff] [review]
Code Coverage: Reset ScriptSourceObject pointers at the end of the finalize phase.

Review of attachment 8709132 [details] [diff] [review]:
-----------------------------------------------------------------

I was able to reproduce the bug and the patch did not seem to remove the two 'SF' entries or the two 'FN' entries. I'll try to attach the two files that I am looking at and
try this again with a fresh build later, just in case.

I used |mach mochitest --setenv JS_CODE_COVERAGE_OUTPUT_DIR=<path/to/created/directory> browser/base/content/test/general|
to find the bug (within a reasonable amount of time) and test the patch. I found the bug in the largest file, before the patch it's at around line ~80,500 and after it is at line ~101,500.
Created attachment 8709569 [details]
Before applying the patch.
Created attachment 8709572 [details]
After applying the patch.
Greg, This is great to see that you are able to reproduce this.  Just to confirm, did you apply the patch, rebuild, then rerun the test?
Flags: needinfo?(gmierz2)
For your information, here is a simple command which will extract lcov entries which have more than one SF entry per record:

 $ gawk '/^SF:/ { numSF++; } { summary = summary "\n" $0; } /^end_of_record/ { if(numSF != 1) print summary; numSF=0; summary=""; }'

Also note that debug builds should have an assertion which should catch such cases.

I am also stun by the fact that we have one ScriptSourceObject for each function.  This sounds like a big waste of resources.
(In reply to Nicolas B. Pierron [:nbp] from comment #10)
> I am also stun by the fact that we have one ScriptSourceObject for each
> function.  This sounds like a big waste of resources.

I think there's a single ScriptSourceObject per compilation unit (file, eval/Function string, etc).
(In reply to Joel Maher (:jmaher) from comment #9)
> Greg, This is great to see that you are able to reproduce this.  Just to
> confirm, did you apply the patch, rebuild, then rerun the test?

I applied the patch but did not rebuild after, sorry. I'm going to rebuild it today and try again, hopefully we won't have duplicates after that.

And thank you for the command Nicolas, it'll definitely save me some time!
Flags: needinfo?(gmierz2)
Created attachment 8710103 [details]
After applying the patch (and rebuilding).

This file shows no duplicates anymore. I ran the same tests with the patch applied and did not find any for 'SF:' or 'FN:'. I'd say that this patch solved the problem. I am hoping that someone else could run the test with the patch and see if this is correct.
Attachment #8709572 - Attachment is obsolete: true
(clear n? based on comment 13)
Flags: needinfo?(nicolas.b.pierron)
Attachment #8709132 - Attachment description: sc# Attachment to Bug 1227735 - Incorrect lcov output for xbl sources → Code Coverage: Reset ScriptSourceObject pointers at the end of the finalize phase.
I see that we have other bugs which are dealing with using the Debugger API to do the code coverage analysis.  Do we intent to use this method to profile the start-up or workers?

I do not honestly know all the details but I recall that people had issues debugging the JS code of the start-up with the Debugger, maybe the devtools developers might help to figure out if this is mandatory.
Flags: needinfo?(cmanchester)
(Reporter)

Comment 16

3 years ago
It would be good to pin down the exact limitations for working with the JS debugger, but I think we're willing to work within the two you mentioned for now. The debugger is helpful for our experiments now, because we can see the difference between coverage at various points to get per-test coverage, and because we can choose what granularity of data to output while we're determining what's realistic to store.
Flags: needinfo?(cmanchester)
Comment on attachment 8709132 [details] [diff] [review]
Code Coverage: Reset ScriptSourceObject pointers at the end of the finalize phase.

Review of attachment 8709132 [details] [diff] [review]:
-----------------------------------------------------------------

Actually looking closer at all this code, you seem to be re-creating a second GC on top of our first GC in order to get everything printed in the right order. I think what we really want here is for the LCov entries to themselves be GC things held by the script & source objects and vice-versa. That way we are guaranteed that the whole graph will get collected at once and we can easily set the finalizers up to run in the correct order. Talk to Shu, Jon, or myself if you need advice on how to achieve this.

::: js/src/jsgc.cpp
@@ +5563,5 @@
> +     * as these pointers might be reused, we have to neuter them at the end of
> +     * the finalize phase, and before the sweeping of the compartment.
> +     */
> +    if (rt->lcovOutput.isEnabled())
> +        rt->lcovOutput.neuterScriptObjectSourcePointers();

Actually, you almost certainly want this to be under beginSweepingZoneGroup: right after we have discovered the full reachability graph and /before/ we actually do any sweeping. Otherwise a barrier fired when manipulating this table could revivify one of its pointers after it's swept and give us a UAF.

Secondly, every table should be assorted per-zone: otherwise we have to visit the full table for every zone group, which adds a tremendous amount of overhead. See how we sweep the watchpoint map for an example. I think this should actually go right next to that.

If you do assort it by Zone, you can just stuff the structure in a JS::WeakCache and not need any code directly in the GC at all, *and* have the table swept in parallel with other tables.

::: js/src/vm/CodeCoverage.cpp
@@ +416,5 @@
> +    // The ScriptSourceObject is being finalized, which implies that all the
> +    // JSScript from this source would be finalized at the end of the sweeping
> +    // phase. Thus, at the end of of the sweeping phase, we will neuter the
> +    // source pointer so that we do not mix different code coverage reports if
> +    // the ScriptSourceObject is reused for a new source.

This function should additionally, at a minimum:
1) assert IsAboutToBeFinalized(sso)
2) assert IsAboutToBeFinalized( all scripts from sso )
3) take a FreeOp* so that is clearly only called from finalizers
Attachment #8709132 - Flags: review?(terrence) → review-
(In reply to Terrence Cole [:terrence] from comment #17)
> Actually looking closer at all this code, you seem to be re-creating a
> second GC on top of our first GC in order to get everything printed in the
> right order.

I do not think so, I am just aggregating data, indexed by the value of the ScriptSourceObject.

The problem of this bug is that we were aggregating data from multiple sources because the ScriptSourceObject got re-used for another ScriptSourceObject.  This patch is hooks into the current GC to reset the ScriptSourceObject pointer-value that we use as an index, when the sweeping is complete, such that we do not aggregate multiple sources info after a free of the ScriptSourceObject.

> I think what we really want here is for the LCov entries to
> themselves be GC things held by the script & source objects and vice-versa.
> That way we are guaranteed that the whole graph will get collected at once
> and we can easily set the finalizers up to run in the correct order. Talk to
> Shu, Jon, or myself if you need advice on how to achieve this.

I am not completely sure, as currently the LCov structures outlive the GC, and the Runtime owns these LCov structures while they are gathering data through the finalizers.

On the other hand, I guess we could have a small LCovFinalizer structure which holds the JSScript pointers and ScriptSourceObject, such that we can fill the current runtime LCov data, and have a single point of entry before any other finalizers.

> ::: js/src/vm/CodeCoverage.cpp
> @@ +416,5 @@
> > +    // The ScriptSourceObject is being finalized, which implies that all the
> > +    // JSScript from this source would be finalized at the end of the sweeping
> > +    // phase. Thus, at the end of of the sweeping phase, we will neuter the
> > +    // source pointer so that we do not mix different code coverage reports if
> > +    // the ScriptSourceObject is reused for a new source.
> 
> This function should additionally, at a minimum:
> 1) assert IsAboutToBeFinalized(sso)
> 2) assert IsAboutToBeFinalized( all scripts from sso )
> 3) take a FreeOp* so that is clearly only called from finalizers

The testing infrastructure which is already in place prevents us from doing so:
http://searchfox.org/mozilla-central/rev/591354c5e5521cf845bf6b6ef44e8b3b0aeda17d/js/src/jsopcode.cpp#2067

But I can move that in its caller.

Thanks for the review.
Flags: needinfo?(nicolas.b.pierron)
I'm seeing weird information in XUL files (e.g. https://codecov.io/gh/marco-c/gecko-dev/src/master/browser/base/content/browser.xul). Is it the same problem?
(In reply to Marco Castelluccio [:marco] from comment #19)
> I'm seeing weird information in XUL files (e.g.
> https://codecov.io/gh/marco-c/gecko-dev/src/master/browser/base/content/
> browser.xul). Is it the same problem?

At first sight, this sounds likely.
Flags: needinfo?(nicolas.b.pierron)
Flags: needinfo?(jcoppeard)
It may be that I'm misunderstanding some of what's going on here. But it sounds to me like this coverage table is keying off a non-unique pointer value, and while you *could* detect when the pointer value changes, it seems like it would be a lot easier if you didn't use that as a key.

The most straightforward option would be to add an autoincrementing unique id to SSO, storing it in a slot. But it looks like it has 4 slots right now, so I guess that would probably bump it up to the next higher size class? That would increase memory usage even if coverage was not enabled, which doesn't sound great.

The next easier thing would be to use the existing unique ID mechanism: zone->getUniqueId(sso, &id). (And zone->getHashCode(sso, &hash) if you need a hash value from it.) That'll shift the memory usage to the unique id table, and slow down sweeping proportional to the number of SSOs, but the sweep is already done in parallel so I'd guess it should be fine.

Both of those are a heck of a lot easier than making a new GC thing or carefully orchestrating finalization. I don't think we've ever gotten finalization interdependencies right the first time.

Clearing ni? for jonco, though of course he's free to disagree or add something. It may be that you need the finalization trigger to know when to output results or something, so what I'm saying here is wrong? I didn't look closely at the existing code.
Flags: needinfo?(jcoppeard)
I tried to call getUniqueId in the SSO finalizer (the SSO class has the JSCLASS_FOREGROUND_FINALIZE flag), but it leads to a crash in a JS helper thread:

#0  0x00007fffe2d44bd7 in js::gc::IsInsideNursery(js::gc::Cell const*) (cell=0xfffe4b4b4b4b4b4b)
    at /home/marco/Documenti/FD/mozilla-unified/obj-x86_64-pc-linux-gnu/dist/include/js/HeapAPI.h:387
#1  0x00007fffe2d44bd7 in js::gc::Cell::isTenured() const (this=0xfffe4b4b4b4b4b4b) at /home/marco/Documenti/FD/mozilla-unified/js/src/gc/Heap.h:251
#2  0x00007fffe2d44bd7 in js::gc::TenuredCell::arena() const (this=0xfffe4b4b4b4b4b4b) at /home/marco/Documenti/FD/mozilla-unified/js/src/gc/Heap.h:1288
#3  0x00007fffe3b5dc03 in js::gc::TenuredCell::zoneFromAnyThread() const (this=<optimized out>) at /home/marco/Documenti/FD/mozilla-unified/js/src/gc/Heap.h:1317
#4  0x00007fffe3b5dc03 in JSObject::zoneFromAnyThread() const (this=<optimized out>) at /home/marco/Documenti/FD/mozilla-unified/js/src/jsobj.h:314
#5  0x00007fffe3b5dc03 in CheckIsMarkedThing<JSObject*>(JSObject**) (thingp=thingp@entry=0x7fffad402b88)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/gc/Marking.cpp:3095
#6  0x00007fffe3b5dced in IsAboutToBeFinalizedInternal<JSObject>(JSObject**) (thingp=thingp@entry=0x7fffad402b88)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/gc/Marking.cpp:3177
#7  0x00007fffe3b742c1 in js::gc::IsAboutToBeFinalizedUnbarriered<JSObject*>(JSObject**) (thingp=thingp@entry=0x7fffad402b88)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/gc/Marking.cpp:3239
#8  0x00007fffe34d0c16 in IsAboutToBeFinalizedFunctor::operator()<JSObject>(js::gc::Cell**) (this=<synthetic pointer>, t=0x7fffad402b88)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/jsgc.cpp:3449
#9  0x00007fffe34d0c16 in JS::DispatchTraceKindTyped<IsAboutToBeFinalizedFunctor, js::gc::Cell**&>(IsAboutToBeFinalizedFunctor, JS::TraceKind, js::gc::Cell**&) (f=..., traceKind=<optimized out>, args#0=@0x7fffd0bf4de8: 0x7fffad402b88) at /home/marco/Documenti/FD/mozilla-unified/obj-x86_64-pc-linux-gnu/dist/include/js/TraceKind.h:192
#10 0x00007fffe34ea646 in js::gc::UniqueIdGCPolicy::needsSweep(js::gc::Cell**, unsigned long*) (cell=<optimized out>)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/jsgc.cpp:3460
#11 0x00007fffe34ea646 in JS::GCHashMap<js::gc::Cell*, unsigned long, js::PointerHasher<js::gc::Cell*, 3ul>, js::SystemAllocPolicy, js::gc::UniqueIdGCPolicy>::sweep() (this=this@entry=0x7fffcfeea048) at /home/marco/Documenti/FD/mozilla-unified/obj-x86_64-pc-linux-gnu/dist/include/js/GCHashTable.h:81
#12 0x00007fffe348bdb9 in JS::Zone::sweepUniqueIds(js::FreeOp*) (fop=0x7fffd0bf4e90, this=<optimized out>)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/jsgc.cpp:3466
#13 0x00007fffe348bdb9 in SweepUniqueIds(JSRuntime*) (runtime=<optimized out>) at /home/marco/Documenti/FD/mozilla-unified/js/src/jsgc.cpp:5120
#14 0x00007fffe34c9fa6 in js::gc::AutoRunParallelTask::run() (this=<optimized out>) at /home/marco/Documenti/FD/mozilla-unified/js/src/jsgc.cpp:3650
#15 0x00007fffe3701c11 in js::GCParallelTask::runFromHelperThread(js::AutoLockHelperThreadState&) (this=0x7fffffffb830, locked=...)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/vm/HelperThreads.cpp:1316
#16 0x00007fffe372597a in js::HelperThread::handleGCParallelWorkload(js::AutoLockHelperThreadState&) (this=this@entry=0x7fffd6abfe00, locked=...)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/vm/HelperThreads.cpp:1348
#17 0x00007fffe37303fd in js::HelperThread::threadLoop() (this=this@entry=0x7fffd6abfe00) at /home/marco/Documenti/FD/mozilla-unified/js/src/vm/HelperThreads.cpp:2137
#18 0x00007fffe37306b0 in js::HelperThread::ThreadMain(void*) (arg=0x7fffd6abfe00) at /home/marco/Documenti/FD/mozilla-unified/js/src/vm/HelperThreads.cpp:1640
#19 0x00007fffe373d6ca in js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::callMain<0ul>(mozilla::IndexSequence<0ul>) (this=0x7fffd69ff0f0)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/threading/Thread.h:234
#20 0x00007fffe373d6ca in js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) (aPack=0x7fffd69ff0f0)
    at /home/marco/Documenti/FD/mozilla-unified/js/src/threading/Thread.h:227
#21 0x00007ffff7bc06da in start_thread (arg=0x7fffd0bf6700) at pthread_create.c:456
#22 0x00007ffff6e4ed7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
This should be fixed by bug 1229277. I will verify once bug 1229277 lands.
Assignee: nobody → mcastelluccio
Status: NEW → ASSIGNED
I've locally verified that this is fixed after bug 1229277's landing.
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.