Closed Bug 1159003 Opened 10 years ago Closed 7 years ago

The buffer for user timing api should not have an upper limitation

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: hiro, Assigned: baku)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

Current implementation uses an upper bound value for resource timing api. But there is no description about the limitation in spec.
Maybe there should be? Otherwise it seems like an easy way to leak memory....
I thought user should care about the buffer using clearMarks and clearMeasures in case of user timing api.
User Timing is using the same buffer as resource timing [1][2]. Raptor relies on receiving marks from the homescreen in order to relaunch applications, and because of this buffer will not mark application launches after the 144th time (144 launches + 5 homescreen-specific marks makes 149). See bug 1175230. [1] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.cpp#877 [2] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPerformance.h#358
Blocks: 1235445
(In reply to Boris Zbarsky [:bz] (work week until 5/26) (if a patch has no decent message, automatic r-) from comment #1) > Maybe there should be? Otherwise it seems like an easy way to leak > memory.... Authors have all kind of ways to "leak memory" if they want... I don't think it is a convincing reason to set such limitation.
> Authors have all kind of ways to "leak memory" if they want... The difference is between "leak memory if you want, but you kinda have to try" and "leak memory if you use the API the way it's meant to be used".
(In reply to Boris Zbarsky [:bz] (work week until 5/26) (if a patch has no decent message, automatic r-) from comment #5) > The difference is between "leak memory if you want, but you kinda have to > try" and "leak memory if you use the API the way it's meant to be used". What is the way it's meant to be used? Recording forever? In that case, it is not implemented in a way for that in Gecko, because we only record the first several records and stop as far as the buffer is full, which is not quite helpful. Anyway, the spec doesn't agree with you [1], and we have different behavior than other browsers which makes authors have to add workaround for Firefox [2] because of this. And it seems authors think the behavior from the spec (and other browsers) is better. [1] https://github.com/w3c/resource-timing/issues/89#issuecomment-283463782 [2] e.g. https://github.com/tootsuite/mastodon/commit/33d73387793674d955d8ec244ca099ba5a9ef97e
> Recording forever? Measuring things one after another, probably on every user action or off timers, etc. If the intent is that consumers of the API are supposed to do clearMarks and clearMeasures consistently, then my question is: do they actually do this? I understand that our current limit doesn't really make this work very well; this is mostly because the limit is quite low. I don't have a problem with raising it quite a bit, in case that wasn't clear. I also think that the fact that setResourceTimingBufferSize affects this limit is clearly buggy; it shouldn't do that.
And again, if people have really thought this through and the "no limit" behavior is reasonable, I'm OK with that too. I just want people to have really thought this through, not assumed that just because a spec (especially a webperf spec) says something that automatically means the spec is correct.
Is the concern about memory leaks in production or in development? I'd be ok with a solution that increases the default limit (since 150 is really low) assuming we also allowed DevTools to set the limit to Infinity once DevTools was opened. In general some kind of solution that by default works better and then for developers it works as expected with other browsers and yet the limit protects non-developers from poor production code.
The concern would be production. Conditioning this on devtools usage would be a bit weird, since these APIs are meant to be used for telemetry in the wild, when devtools aren't open at all. Again, just to be clear, I have no problem with us removing the limit, as long as we're doing that in a considered way, not just because everyone else is doing it.
Related, Bug 1373086 is making the API faster, making raising the limit easier.
I think we need to remove this limit. We should be concerned about the production mem leaks that can occur without a limit, however when weighing that against webcompat issue where sites won't be getting reliable perf telemetry from Firefox makes me think we need to change. Many sites are already using more than our limit allows, see: https://github.com/w3c/resource-timing/issues/89#issuecomment-278214137 To combat the mem issues that come along with this DevTools can try to ensure people use clearMarks and clearMeasures consistently. I've filed bug 1374775 to handle this. We can also do some better Docs and DevRel outreach but I think the tools are likely the best place to warn people. We may want to add some telemetry to understand the number of marks / measures (if we don't already) users are seeing so we can better quantify the perf / memory targets we need to test for. Also setResourceTimingBufferSize shouldn't be working with User Timings I've filed bug 1374777 for that issue.
Attached patch perfomance.patch (obsolete) — Splinter Review
GetAsISupports() is just because nsISupports is an ambiguous class for PerformanceMainThread. I don't do an extra QI.
Assignee: hikezoe → amarchesini
Attachment #8887626 - Flags: review?(bzbarsky)
Comment on attachment 8887626 [details] [diff] [review] perfomance.patch I guess it's not clear to me why we cast to nsIObserver* to get to nsISupports* instead of casting to Performance*... If there is a reason, please document, else probably better to cast to Performance* both places. >@@ -2001,17 +2001,20 @@ nsGlobalWindow::CleanUp() What ensures that GetPerformance() will not be called after this, or at least not create a performance object? r=me with that ensured.
Attachment #8887626 - Flags: review?(bzbarsky) → review+
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b7d81fea0b33 setResourceTimingBufferSize shouldn't affect user timing, but we should clean user markers if we have memory pressure, r=bz
OK, so the patch that landed actually makes QI and GetAsISupports() return _different_ nsISupports pointers. Why? That seems broken to me and really needs to be fixed. Also, why was the boolean not placed next to the other boolean members to pack better?
Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)
Attachment #8888345 - Flags: review?(bzbarsky)
Comment on attachment 8888345 [details] [diff] [review] performance2.patch r=me
Attachment #8888345 - Flags: review?(bzbarsky) → review+
Depends on: 1383040
Depends on: 1383070
I see a perf win from this bug: == Change summary for alert #8178 (as of July 20 2017 10:57 UTC) == Improvements: 9% quantum_pageload_facebook summary windows10-64 pgo e10s 1,133.46 -> 1,032.58 8% quantum_pageload_facebook summary windows10-64 opt e10s 1,373.98 -> 1,260.21 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8178
Depends on: 1383553
Bug 1383553 is a bad memory leak and we are near the branch merge day. Can we back this out until the problem can be fixed?
Flags: needinfo?(amarchesini)
Status: RESOLVED → REOPENED
Flags: needinfo?(amarchesini)
Resolution: FIXED → ---
Target Milestone: mozilla56 → ---
(In reply to Ben Kelly [:bkelly] from comment #24) > Bug 1383553 is a bad memory leak and we are near the branch merge day. Can > we back this out until the problem can be fixed? The fix patch has already been reviewed by bz. I was planning to land it today.
Backout cancelled regressions: == Change summary for alert #8290 (as of July 26 2017 01:50 UTC) == Improvements: 8% Images summary windows7-32 opt 6,383,995.28 -> 5,842,480.07 8% Resident Memory summary windows10-64 opt 537,625,077.38 -> 494,286,302.08 8% Resident Memory summary windows7-32 pgo 383,352,045.51 -> 353,332,399.89 8% Resident Memory summary windows7-32 opt 389,532,786.94 -> 359,121,542.35 8% JS summary windows7-32 pgo 107,934,079.55 -> 99,802,179.05 7% JS summary windows10-64 pgo 145,607,084.09 -> 134,697,663.96 7% Resident Memory summary windows10-64 pgo 526,052,940.35 -> 486,648,881.76 7% Explicit Memory summary windows7-32 pgo 284,566,835.14 -> 263,363,442.22 7% Explicit Memory summary windows10-64 opt 361,002,635.23 -> 334,246,145.39 7% JS summary windows10-64 opt 145,598,700.07 -> 135,039,887.35 7% Explicit Memory summary windows10-64 pgo 361,506,278.08 -> 335,797,819.14 7% JS summary windows7-32 opt 108,417,526.32 -> 101,084,010.10 7% Explicit Memory summary windows7-32 opt 282,131,227.70 -> 263,571,893.08 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8290
(In reply to Andrea Marchesini [:baku] from comment #26) > The fix patch has already been reviewed by bz. I was planning to land it > today. Sorry I didn't see the patch in the other bug. It does look like it still has an open review though. In theory it should be straightforward to reland this with that new patch when its ready? Sorry again.
Attached file performance.patch (obsolete) —
Attachment #8887626 - Attachment is obsolete: true
Attachment #8890513 - Flags: review?(bzbarsky)
Attached patch perfomance.patchSplinter Review
Forgot the patch flag.
Attachment #8890513 - Attachment is obsolete: true
Attachment #8890513 - Flags: review?(bzbarsky)
Attachment #8890515 - Flags: review?(bzbarsky)
Comment on attachment 8890515 [details] [diff] [review] perfomance.patch r=me
Attachment #8890515 - Flags: review?(bzbarsky) → review+
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/51e745426b9a setResourceTimingBufferSize shouldn't affect user timing, but we should clean user markers if we have memory pressure, r=bz https://hg.mozilla.org/integration/mozilla-inbound/rev/ea02734a4e8b Remove Performance::GetAsISupports(), r=bz
The backout from comment 25 left us with one regression: == Change summary for alert #8289 (as of July 26 2017 01:50 UTC) == Regressions: 7% quantum_pageload_facebook summary windows10-64 opt e10s stylo 1,291.65 -> 1,382.83 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8289 Unfortunately, I cannot say if this was expected by the backout or not, as I no longer have data points older than July 22nd.
Hopefully, push from comment 34 will cancel this quantum test regression.
That's extremely unlikely, if the backout didn't affect it. Chances are, it's due to something else. Bobby, do we have a known facebook pageload regression on stylo?
Flags: needinfo?(bobbyholley)
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
(In reply to Boris Zbarsky [:bz] from comment #37) > That's extremely unlikely, if the backout didn't affect it. Chances are, > it's due to something else. Bobby, do we have a known facebook pageload > regression on stylo? I don't think so. I'm not sure of the full context of this bug, but it seems like the stylo and non-stylo builds got slower by about the same amount on Jul 25th? https://treeherder.mozilla.org/perf.html#/graphs?timerange=5184000&series=%5Bmozilla-inbound,7082fd4e57363151313c6215e6bb77b68fe4b775,1,1%5D&series=%5Bmozilla-inbound,56f971bde973a3edee958d7c4d975062f646d3a9,1,1%5D&series=%5Bmozilla-inbound,6ec57e077dec594917859dbe5526923fffbb1ad6,1,1%5D&series=%5Bmozilla-inbound,b8d73ce351c36fb3f136064b85a051b2c393decd,1,1%5D&zoom=1501001507103.5183,1501071544043.9783,594.0296685517724,1847.7610118353546
Flags: needinfo?(bobbyholley)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: