Closed Bug 928476 Opened 6 years ago Closed 6 years ago

Do some telemetry to figure out how frequently UAs do cross-global adopts

Categories

(Core :: DOM: Core & HTML, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: bholley, Assigned: bholley)

References

Details

Attachments

(1 file, 1 obsolete file)

In [1], some of the chromium folks are hypothesizing that cross-global adopts are uncommon enough on the web that we can afford to leak a scope each time they happen. I'm personally skeptical of this, but I think telemetry would be useful here.

I think we basically want to tag a bit on the document whenever one of its nodes is adopted into another global (since these are the documents that we'd leak). We could then get a percentage by sampling telemetry in each Document destructor and reporting 0 or 1 depending on whether the node was adopted.

Does this sound good? It would also be nice to know the raw count of such adopt-involved documents per session, but that might be harder if I understand the telemetry machinery correctly.

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=20567
I think rather than per session, you want to have a breakdown of how many distinct domains this affects. E.g. if this happens all the time on google.com the number could end up being very high, but if only google.com is affected it seems like a fixable thing. Not entirely sure how to make that anonymous.
(In reply to Anne (:annevk) from comment #1)
> I think rather than per session, you want to have a breakdown of how many
> distinct domains this affects. E.g. if this happens all the time on
> google.com the number could end up being very high, but if only google.com
> is affected it seems like a fixable thing. Not entirely sure how to make
> that anonymous.

That's not possible, for the privacy reasons you mentioned. And given that this is a very old feature, I'm skeptical that it will be in the sweet spot of widespread impact but extremely narrow scope (so that it would be fixable with evangelism). I think the long tail is really our enemy here.

I'm mostly interested in the question of: "if we leak, how does this impact user session?", which this telemetry should answer. But while Telemetry might tell us "we have to support cross-global adopts", I don't think that it can tell us "we shouldn't support cross-global adopts", since there are a number of factors that would play into the latter.
Attachment #819685 - Flags: review?(mrbkap)
Comment on attachment 819685 [details] [diff] [review]
Add telemetry to measure cross-global adopts. v1

Each time we destroy a compartment, we sample 3 data points:

(1) Whether this compartment adopted a node from another compartment (and thus, without reparents, would likely to be holding that compartment alive).

(2) Whether another compartment adopted a node from this compartment (and thus, without reparents, this compartment is would be leaked).

(3) The number of compartments for which (1) is true that are alive when the sample is taken. This gives us a fuzzy lower bound on the number of compartments that we'd be leaking at the moment without reparents.
Attachment #819685 - Flags: review?(nfroyd)
Attachment #819685 - Flags: feedback?(annevk)
Comment on attachment 819685 [details] [diff] [review]
Add telemetry to measure cross-global adopts. v1

Review of attachment 819685 [details] [diff] [review]:
-----------------------------------------------------------------

f+'ing because everything but COMPARTMENT_LIVING_ADOPTERS looks good; if you can convince me that that measurement has the semantics that you want, then r+.

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +284,5 @@
>      MOZ_COUNT_DTOR(xpc::CompartmentPrivate);
> +
> +    Telemetry::Accumulate(Telemetry::COMPARTMENT_ADOPTED_NODE, adoptedNode);
> +    Telemetry::Accumulate(Telemetry::COMPARTMENT_DONATED_NODE, donatedNode);
> +    Telemetry::Accumulate(Telemetry::COMPARTMENT_LIVING_ADOPTERS, kLivingAdopters);

This measurement looks a little weird.  Let's say you have 9 compartments live, and they've all adopted nodes.  Then if you do something akin to:

for (i = 0; i < BIG_NUMBER; i++) {
  createAndDestroyCompartmentPrivate();
}

you're going to flood COMPARTMENT_LIVING_ADOPTERS with measurements of 9 (or 10, if your newly created compartments adopt nodes themselves).  Essentially, you're doing a lot of double-counting of live compartments, which seems wrong.  IIUC, you're counting this to measure how much stuff one might be leaking if you ignore nodes adopted across globals.  Is that correct?  Could you just count the number of nodes adopted by a compartment and have a histogram of that instead?
Attachment #819685 - Flags: review?(nfroyd) → feedback+
(In reply to Nathan Froyd (:froydnj) from comment #5)
> you're going to flood COMPARTMENT_LIVING_ADOPTERS with measurements of 9 (or
> 10, if your newly created compartments adopt nodes themselves). 
> Essentially, you're doing a lot of double-counting of live compartments,
> which seems wrong.  IIUC, you're counting this to measure how much stuff one
> might be leaking if you ignore nodes adopted across globals.  Is that
> correct?  Could you just count the number of nodes adopted by a compartment
> and have a histogram of that instead?

No, we want to measure the number of live compartments with adopted nodes in them - the number of nodes isn't relevant. Ideally, we'd sample this over time, but since that seems like a pain, it seems like we can fake the time sampling by just sampling when compartments are destroyed. But that may not be great, because they're generally destroyed in a big bunch. Is there some easy way to just report this value when the telemetry ping is generated? When are they generated? Other suggestions?
Flags: needinfo?(nfroyd)
(In reply to Bobby Holley (:bholley) from comment #6)
> (In reply to Nathan Froyd (:froydnj) from comment #5)
> > you're going to flood COMPARTMENT_LIVING_ADOPTERS with measurements of 9 (or
> > 10, if your newly created compartments adopt nodes themselves). 
> > Essentially, you're doing a lot of double-counting of live compartments,
> > which seems wrong.  IIUC, you're counting this to measure how much stuff one
> > might be leaking if you ignore nodes adopted across globals.  Is that
> > correct?  Could you just count the number of nodes adopted by a compartment
> > and have a histogram of that instead?
> 
> No, we want to measure the number of live compartments with adopted nodes in
> them - the number of nodes isn't relevant. Ideally, we'd sample this over
> time, but since that seems like a pain, it seems like we can fake the time
> sampling by just sampling when compartments are destroyed. But that may not
> be great, because they're generally destroyed in a big bunch. Is there some
> easy way to just report this value when the telemetry ping is generated?
> When are they generated? Other suggestions?

The gather-telemetry notification is sent before we send off pings.  You could record kLivingAdopters there, though every ~24hrs might be a bit coarse for your measurements.

Set a 1hr repeating timer to record kLivingAdopters when it goes off? ;)
Flags: needinfo?(nfroyd)
Ok, let's do it once per GC.
Attachment #819746 - Flags: review?(mrbkap)
Attachment #819746 - Flags: review?(nfroyd)
Attachment #819685 - Attachment is obsolete: true
Attachment #819685 - Flags: review?(mrbkap)
Attachment #819685 - Flags: feedback?(annevk)
Comment on attachment 819746 [details] [diff] [review]
Add telemetry to measure cross-global adopts. v2

Review of attachment 819746 [details] [diff] [review]:
-----------------------------------------------------------------

Seems reasonable enough.

::: toolkit/components/telemetry/Histograms.json
@@ +37,5 @@
> +    "description": "When a compartment is destroyed, we record whether it had ever adopted a node from another compartment"
> +  },
> +  "COMPARTMENT_LIVING_ADOPTERS": {
> +    "kind": "enumerated",
> +    "n_values": 10,

You really expect this to be <= 10?
Attachment #819746 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd (:froydnj) from comment #9)
> You really expect this to be <= 10?

I consider an average greater than 0.1 to be QED that this is too common for us to just leak in this case.
Comment on attachment 819746 [details] [diff] [review]
Add telemetry to measure cross-global adopts. v2

Review of attachment 819746 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/xpconnect/src/XPCJSRuntime.cpp
@@ +677,5 @@
>  XPCJSRuntime::CustomGCCallback(JSGCStatus status)
>  {
> +    // Record kLivingAdopters once per GC to approximate time sampling during
> +    // periods of activity.
> +    if (status == JSGC_BEGIN)

Nit: braces.

::: js/xpconnect/src/xpcprivate.h
@@ +3734,5 @@
>      // enablePrivilege. Once set, this value is never unset (i.e., it doesn't follow
>      // the old scoping rules of enablePrivilege). Using it is inherently unsafe.
>      bool universalXPConnectEnabled;
>  
> +    // for telemetry.

Perhaps either point to this bug or give a short description of what these mean?
Attachment #819746 - Flags: review?(mrbkap) → review+
https://hg.mozilla.org/mozilla-central/rev/a00ba6f64d80
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Some early results are in. The most useful statistic here is COMPARTMENT_LIVING_ADOPTERS. This is a measure of the number of live globals that have, at some point, adopted at least one node from another compartment. Assuming the node remained in the document and was not subsequently GCed, each of those globals would presumably be holding at least one other global alive without reparenting.

http://telemetry.mozilla.org/#path=nightly/27/COMPARTMENT_LIVING_ADOPTERS

From the submissions we've received so far, it looks like the mean value here is somewhere around 2.5 (75th percentile is ~3, 50th percentils is ~1.5).

These numbers may change somewhat as the telemetry propagates to different testing audiences, but this is an interesting start.
This issue was resolved, so I'm going to remove the telemetry. For posterity, the current data for Firefox 31 (Release) is:

start,>~end,>~count
0,>~1,>~9409167671
1,>~2,>~4534808701
2,>~3,>~2295459915
3,>~4,>~1387008765
4,>~5,>~948870608
5,>~6,>~656986184
6,>~7,>~514614912
7,>~8,>~382024011
8,>~9,>~309483642
9,>~10,>246246309
10,>26.568668215107216,>1661304984
Blocks: 1056332
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.