Closed Bug 631571 Opened 14 years ago Closed 13 years ago

Measure browser responsiveness in Talos

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: ted, Assigned: jmaher)

References

Details

Attachments

(3 files, 3 obsolete files)

have talos support 'responsiveness' test option (WIP) 13 years ago alice nodelman [:alice] [:anode] 4.48 KB, patch		Details \| Diff \| Splinter Review
add tp5_responsiveness to graph server db 13 years ago alice nodelman [:alice] [:anode] 613 bytes, patch	jmaher : review+	Details \| Diff \| Splinter Review
add tp5_responsiveness to graph server db (take 2) 13 years ago alice nodelman [:alice] [:anode] 897 bytes, patch	jmaher : review+	Details \| Diff \| Splinter Review
have talos support 'responsiveness' test option (WIP - take 2) 13 years ago alice nodelman [:alice] [:anode] 17.69 KB, patch		Details \| Diff \| Splinter Review
have talos support 'responsiveness' test option (WIP) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 17.56 KB, patch		Details \| Diff \| Splinter Review
have talos support 'responsiveness' test option (1.0) 13 years ago Joel Maher ( :jmaher ) (UTC -8) 17.67 KB, patch	wlach : review+	Details \| Diff \| Splinter Review

(not currently active) Ted Mielczarek

Reporter

Description

•

14 years ago

We'd like to measure browser responsiveness so we can find spots where we're currently not very responsive, and ensure that we improve the metric. In addition, we want to make sure that large architectural changes like e10s don't harm responsiveness of common tasks.

The scope of this work will be to get some measure of responsiveness out of the browser (an estimate of how long we're taking to service events), and have Talos exercise various functionality while sampling that measurement, and then have this all running on Tinderbox and reporting something useful.

We discussed this work with Blizzard a few months ago, I think Sheila took some notes. I'll have to dig through my email to see if I can find them.

(not currently active) Ted Mielczarek

Reporter

Updated

•

14 years ago

Depends on: 606574

(not currently active) Ted Mielczarek

Reporter

Comment 1

•

13 years ago

The patches in bug 606574 are currently written to fire a tracing event every 50ms (so 20/s), and report how long that event took to get processed. Right now the output just looks like:
MOZ_EVENT_INTERVAL <duration>

where duration is in integer milliseconds. We may need to tweak that to be more useful. The output goes to stdout by default, but can be redirected to a file by setting MOZ_INSTRUMENT_EVENT_LOOP_OUTPUT to a filename.

alice nodelman [:alice] [:anode]

Comment 2

•

13 years ago

I should also have this in my email, I'll go hunting.

alice nodelman [:alice] [:anode]

Comment 3

•

13 years ago

From the summary of the last meeting here's the top 7 items requested:

1. Startup 
a. Time between startup and visible window
b. Time between visible window appearing and the browser is completely responsive (type in a url) 
2. Switching tabs
3. Opening a new window
4. Quit the browser
5. Awesome bar search results (how fast it returns).
6. Right click menu.
7. Responsiveness in foreground tab during page load.

Doug Turner (:dougt)

Comment 4

•

13 years ago

Good list, but mobile needs at least two more:

1) What about scrolling time?
2) What about zoom-in zoom-out time?

I think everyone would benefit if we started measure these.

(not currently active) Ted Mielczarek

Reporter

Comment 5

•

13 years ago

The instrumentation I'm adding in bug 606574 won't let us measure "how long does it take to do X?", I figured the existing Talos harness was good enough for that (Tpan, etc). What it will let us measure is "how responsive is the browser while doing X?", which we currently have no data on.

Specifically, it will tell you how long events take to be serviced at 20Hz samples, so we can perform an action (load a page), and determine roughly how responsive we are until the action finishes.

Dietrich Ayala (:dietrich)

Comment 6

•

13 years ago

The UX team is helping to come up with a list of UI events that we should prioritize the measurement and optimization of. I'll post it here once it's been gathered.

Re comment #5: We should do some manual testing of common events (new tab, scroll on large page) to verify that attempting to correlate responsiveness via the event loop logging is useful for these higher-level user scenarios.

(not currently active) Ted Mielczarek

Reporter

Comment 7

•

13 years ago

I've run through a few non-scientific tests while developing the patch, and it's definitely producing measurable data. You can watch the event servicing time spike when you load a new page or do other intensive tasks. I've got somewhat-outdated try server builds here that you can try out if you'd like to see what it produces:
http://people.mozilla.com/~tmielczarek/event-loop-instrument/

Dietrich Ayala (:dietrich)

Comment 8

•

13 years ago

That's fantastic, thanks Ted!

The next bit would be to find a way to make the correlation, either in the harness, or maybe by doing post-processing of the event loop logs and a log we spit out from the high-level tests, possibly via timestamp?

(not currently active) Ted Mielczarek

Reporter

Comment 9

•

13 years ago

Yeah, I hadn't quite worked that out yet. I was worried about getting timestamps that actually line up, given how finicky system clocks can be. I'm using mozilla::TimeStamp internally for interval measurement, but it doesn't actually let you get a wallclock time, only intervals between points in time.

(not currently active) Ted Mielczarek

Reporter

Comment 10

•

13 years ago

As a note, we're punting on Talos measurement for this quarter. We're going to figure out how to measure something useful on developer machines first, then we'll figure out what we want to automate.

(not currently active) Ted Mielczarek

Reporter

Updated

•

13 years ago

Assignee: ted.mielczarek → anodelman

alice nodelman [:alice] [:anode]

Comment 11

•

13 years ago

From discussions in email/irc the initial tests we are going measure responsiveness during are:

- window open
- tab open
- tp5

alice nodelman [:alice] [:anode]

Comment 12

•

13 years ago

As a proof of concept flipped on ted's responsiveness measurement while running a single cycle of tdhtml (a test that cycles through a few dhtml heavy web pages).

Here's the output from the instrumentation:

MOZ_EVENT_TRACE start 1311194309792
MOZ_EVENT_TRACE sample 1311194310055 109
MOZ_EVENT_TRACE sample 1311194310158 51
MOZ_EVENT_TRACE sample 1311194311994 152
MOZ_EVENT_TRACE sample 1311194315275 67
MOZ_EVENT_TRACE sample 1311194320416 244
MOZ_EVENT_TRACE sample 1311194322645 87
MOZ_EVENT_TRACE stop 1311194326678

We can see that we exceeded the responsiveness measure of the instrumentation several times - but I don't know how to correlate that to exactly what the browser was doing at the time.  Ted - do you have any advice here?

(not currently active) Ted Mielczarek

Reporter

Comment 13

•

13 years ago

If we're just cycling through pages, maybe we can get load start times for each page? Then developers would be able to see which pages were the cause of the unresponsive periods.

cmtalbert

Comment 14

•

13 years ago

I was going to ask the same question w.r.t. tp5 as mentioned in Comment 11.  How are you going to measure responsiveness during the Tp5 test?  Is that a measure of time before we load the page? i.e. from "starting page n load" to "page n load finished"? Is that a measure of time from the event of "page n loaded" to "page n+1 loaded"?

(not currently active) Ted Mielczarek

Reporter

Comment 15

•

13 years ago

The numbers in comment 12 are simply samples from when the event loop took longer than 50ms to respond to an event. You can correlate them to whatever you want providing you have timestamps, the third field is a unix timestamp in milliseconds.

But most of our Talos tests don't try to drill down into too much detail, do they? They simply give you a number.

alice nodelman [:alice] [:anode]

Comment 16

•

13 years ago

Here's the mixed together results for tdhtml for comparison:

NOISE: MOZ_EVENT_TRACE start 1311278181023
NOISE: MOZ_EVENT_TRACE sample 1311278181283 106
NOISE: MOZ_EVENT_TRACE sample 1311278181453 67
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/colorfade.html (next: http://localhost/page_load_test/dhtml/diagball.html)
NOISE: MOZ_EVENT_TRACE sample 1311278183256 219
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/diagball.html (next: http://localhost/page_load_test/dhtml/fadespacing.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/fadespacing.html (next: http://localhost/page_load_test/dhtml/imageslide.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/imageslide.html (next: http://localhost/page_load_test/dhtml/layers1.html)
NOISE: MOZ_EVENT_TRACE sample 1311278186559 90
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers1.html (next: http://localhost/page_load_test/dhtml/layers2.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers2.html (next: http://localhost/page_load_test/dhtml/layers4.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers4.html (next: http://localhost/page_load_test/dhtml/layers5.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers5.html (next: http://localhost/page_load_test/dhtml/layers6.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers6.html (next: http://localhost/page_load_test/dhtml/meter.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/meter.html (next: http://localhost/page_load_test/dhtml/movingtext.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/movingtext.html (next: http://localhost/page_load_test/dhtml/mozilla.html)
NOISE: MOZ_EVENT_TRACE sample 1311278191699 240
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/mozilla.html (next: http://localhost/page_load_test/dhtml/replaceimages.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/replaceimages.html (next: http://localhost/page_load_test/dhtml/scrolling.html)
NOISE: MOZ_EVENT_TRACE sample 1311278193913 71
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/scrolling.html (next: http://localhost/page_load_test/dhtml/slidein.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/slidein.html (next: http://localhost/page_load_test/dhtml/slidingballs.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/slidingballs.html (next: http://localhost/page_load_test/dhtml/textslide.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/textslide.html (next: http://localhost/page_load_test/dhtml/zoom.html)
NOISE: Cycle 1: loaded http://localhost/page_load_test/dhtml/zoom.html (next: http://localhost/page_load_test/dhtml/colorfade.html)


I think that if I added a timestamp to the pageload information we might be able to do a proper correlation.

alice nodelman [:alice] [:anode]

Comment 17

•

13 years ago

Still I'm not sure what metric we want to be monitoring - should we do something like "amount of unresponsive events during test"?  Then you'd still need a human to look over the results to see where things got worse.

alice nodelman [:alice] [:anode]

Comment 18

•

13 years ago

And again with more timestamps:

NOISE: MOZ_EVENT_TRACE start 1311279480976
NOISE: MOZ_EVENT_TRACE sample 1311279481244 114
NOISE: 1311279482448: Cycle 1: loaded http://localhost/page_load_test/dhtml/colorfade.html (next: http://localhost/page_load_test/dhtml/diagball.html)
NOISE: MOZ_EVENT_TRACE sample 1311279483176 147
NOISE: 1311279483773: Cycle 1: loaded http://localhost/page_load_test/dhtml/diagball.html (next: http://localhost/page_load_test/dhtml/fadespacing.html)
NOISE: 1311279485150: Cycle 1: loaded http://localhost/page_load_test/dhtml/fadespacing.html (next: http://localhost/page_load_test/dhtml/imageslide.html)
NOISE: 1311279485852: Cycle 1: loaded http://localhost/page_load_test/dhtml/imageslide.html (next: http://localhost/page_load_test/dhtml/layers1.html)
NOISE: MOZ_EVENT_TRACE sample 1311279486457 68
NOISE: 1311279486567: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers1.html (next: http://localhost/page_load_test/dhtml/layers2.html)
NOISE: 1311279487112: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers2.html (next: http://localhost/page_load_test/dhtml/layers4.html)
NOISE: 1311279487657: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers4.html (next: http://localhost/page_load_test/dhtml/layers5.html)
NOISE: 1311279488342: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers5.html (next: http://localhost/page_load_test/dhtml/layers6.html)
NOISE: 1311279488885: Cycle 1: loaded http://localhost/page_load_test/dhtml/layers6.html (next: http://localhost/page_load_test/dhtml/meter.html)
NOISE: 1311279489824: Cycle 1: loaded http://localhost/page_load_test/dhtml/meter.html (next: http://localhost/page_load_test/dhtml/movingtext.html)
NOISE: 1311279490634: Cycle 1: loaded http://localhost/page_load_test/dhtml/movingtext.html (next: http://localhost/page_load_test/dhtml/mozilla.html)
NOISE: MOZ_EVENT_TRACE sample 1311279491566 212
NOISE: 1311279492340: Cycle 1: loaded http://localhost/page_load_test/dhtml/mozilla.html (next: http://localhost/page_load_test/dhtml/replaceimages.html)
NOISE: 1311279493135: Cycle 1: loaded http://localhost/page_load_test/dhtml/replaceimages.html (next: http://localhost/page_load_test/dhtml/scrolling.html)
NOISE: MOZ_EVENT_TRACE sample 1311279493730 64
NOISE: 1311279493743: Cycle 1: loaded http://localhost/page_load_test/dhtml/scrolling.html (next: http://localhost/page_load_test/dhtml/slidein.html)
NOISE: 1311279495274: Cycle 1: loaded http://localhost/page_load_test/dhtml/slidein.html (next: http://localhost/page_load_test/dhtml/slidingballs.html)
NOISE: 1311279495934: Cycle 1: loaded http://localhost/page_load_test/dhtml/slidingballs.html (next: http://localhost/page_load_test/dhtml/textslide.html)
NOISE: 1311279496723: Cycle 1: loaded http://localhost/page_load_test/dhtml/textslide.html (next: http://localhost/page_load_test/dhtml/zoom.html)
NOISE: 1311279497428: Cycle 1: loaded http://localhost/page_load_test/dhtml/zoom.html (next: http://localhost/page_load_test/dhtml/colorfade.html)

(not currently active) Ted Mielczarek

Reporter

Comment 19

•

13 years ago

(In reply to comment #17)
> Still I'm not sure what metric we want to be monitoring - should we do
> something like "amount of unresponsive events during test"?  Then you'd
> still need a human to look over the results to see where things got worse.

This is the crux of the issue. bsmedberg at one point suggested reporting something like the max/median/mean values, but the latter two probably aren't quite right since we're reporting only samples >50ms. I could change the code to report all samples again, but it's a lot (20 samples per second at most).

Benjamin Smedberg

Comment 20

•

13 years ago

I don't think we should treat responsiveness the same way we treat speed performance numbers: there isn't a single number to report, in most cases. 

Instead, I suspect it almost needs to be like "responsiveness mochitesting": we have a series of actions that we know do or could or used to cause the event loop to go away for more than 50ms. If we've fixed the problem, any latency over 50ms would cause the test to fail/go orange/whatever. If we haven't yet fixed the problem, we measure the latency and make sure it doesn't get worse.

There are some issues with this, mostly the random pauses which we can't control yet (GC/CC), so perhaps we need a way to blacklist/measure those separately from the main test item.

(not currently active) Ted Mielczarek

Reporter

Comment 21

•

13 years ago

I filed bug 674606 on bsmedberg's "responsiveness mochitest" idea, after discussion we're going to continue to work on this as well, since it should produce useful data in the short term.

alice nodelman [:alice] [:anode]

Updated

•

13 years ago

Depends on: 679427

Andrew Halberstadt [:ahal]

Updated

•

13 years ago

Blocks: 674606

alice nodelman [:alice] [:anode]

Comment 22

•

13 years ago

Attached patch have talos support 'responsiveness' test option (WIP) (obsolete) — Details — Splinter Review

alice nodelman [:alice] [:anode]

Comment 23

•

13 years ago

Attached patch add tp5_responsiveness to graph server db — Details — Splinter Review

Graph server patch already applied to graphs-stage for testing.

Attachment #560439 - Flags: review?(jmaher)

alice nodelman [:alice] [:anode]

Comment 24

•

13 years ago

Currently missing:
- ability to parametrize time between pings to event loop/threshold for recording slow pings as already described in bug 679427 (waiting on checkin, deployment)
- determination of the 'alpha' variable to be used in calculating the responsiveness metric
- further testing to ensure that the responsiveness test option doesn't wobble the regular tp5 results (this would make roll out a lot simpler)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Attachment #560439 - Flags: review?(jmaher) → review+

cmtalbert

Comment 25

•

13 years ago

(In reply to alice nodelman [:alice] [:anode] from comment #24)
> Currently missing:
> - ability to parametrize time between pings to event loop/threshold for
> recording slow pings as already described in bug 679427 (waiting on checkin,
> deployment)
> - determination of the 'alpha' variable to be used in calculating the
> responsiveness metric
> - further testing to ensure that the responsiveness test option doesn't
> wobble the regular tp5 results (this would make roll out a lot simpler)

Is this the full set of things you see standing between the talos user responsiveness test and rolling it out to staging?

How do you determine the alpha variable?  I'm not clear on what that step entails. (sorry if you explained it already and I missed it).

Version: unspecified → Trunk

alice nodelman [:alice] [:anode]

Comment 26

•

13 years ago

There's currently a discussion occurring over email between jlebar and bz about determining a way to calculate a useful metric from the responsiveness data that we are collecting.

Ted has checked in the ability to set the time between event loop pings/threshold for recording pings to moz-central, so that is done.

I will be working to complete the further testing to determine if adding the responsiveness test will perturb the general tp5 results - that will decide if we can roll the test out as part of standard tp5 or if it needs to be split off into a separate test run (which would be expensive, as tp has our longest test run time).

alice nodelman [:alice] [:anode]

Updated

•

13 years ago

Depends on: 690203

alice nodelman [:alice] [:anode]

Comment 27

•

13 years ago

Attached patch add tp5_responsiveness to graph server db (take 2) — Details — Splinter Review

Adding correct names to graph server db, now that mozAfterPaint tests are standard.  Already applied to graphs-stage.

Attachment #563464 - Flags: review?(jmaher)

alice nodelman [:alice] [:anode]

Comment 28

•

13 years ago

Attached patch have talos support 'responsiveness' test option (WIP - take 2) (obsolete) — Details — Splinter Review

Attachment #560438 - Attachment is obsolete: true

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Attachment #563464 - Flags: review?(jmaher) → review+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 29

•

13 years ago

as a note the sql pages haven't landed on the graph server.  I am not sure if this bug is ready for those.  Please let me know if we need to add more to the server or just land these and I can land them to the graphs repository.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Depends on: 693679

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 30

•

13 years ago

Attached patch have talos support 'responsiveness' test option (WIP) (obsolete) — Details — Splinter Review

updated patch with latest responsiveness calculation and for bitrot.

Assignee: anodelman → jmaher

Attachment #563467 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Depends on: 696059

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 31

•

13 years ago

Attached patch have talos support 'responsiveness' test option (1.0) — Details — Splinter Review

updated patch, passes on talos staging.  The numbers for tp5 do not change, so we don't need to side by side stage this.

Attachment #567451 - Attachment is obsolete: true

Attachment #568714 - Flags: review?(wlachance)

William Lachance (:wlach)

Comment 32

•

13 years ago

Comment on attachment 568714 [details] [diff] [review]
have talos support 'responsiveness' test option (1.0)

Overall looks like a nice straightforward addition to Talos. My only comment:

+def responsiveness_Metric(val_list):

Should probably be responsiveness_metric, going by PEP8, which is endorsed by the Mozilla style guidelines (https://developer.mozilla.org/En/Mozilla_Coding_Style_Guide#Python_practices). I know we have 3(!) different styles in this file but we might as well settle on the "right" one for new code.

Attachment #568714 - Flags: review?(wlachance) → review+

William Lachance (:wlach)

Comment 33

•

13 years ago

Comment on attachment 568714 [details] [diff] [review]
have talos support 'responsiveness' test option (1.0)

Oh, one other minor comment: do we actually need to specify responsiveness: False for most of the tests in the config file? It looks like we just assume the value is false if it's not set...

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 34

•

13 years ago

landed on build-talos:
http://hg.mozilla.org/build/talos/rev/f9abfefec8aa

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Depends on: 696810

Lawrence Mandel [:lmandel] (use needinfo)

Comment 35

•

13 years ago

Glad to see that this has been resolved. Has this update been deployed or when is it expected to be deployed?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 36

•

13 years ago

this is deployed, so when builds start running again, we will see results :)  I will post a link in this bug to a graph server view to show the responsiveness data.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 37

•

13 years ago

We had to back this out due to OSX 10.5.8 and 10.6.2 failing to terminate the browser and Fedora crashing.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Depends on: 697555

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Depends on: 697769

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

13 years ago

Depends on: 702351

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 38

•

13 years ago

ok, ready to deploy again in bug 702351

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 39

•

13 years ago

Here are the data points:
http://graphs-new.mozilla.org/graph.html#tests=[[130,1,1],[130,1,12],[130,1,13],[130,1,21],[130,1,22]]&sel=1322091900253,1322167344714&displayrange=7&datatype=running

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Lawrence Mandel [:lmandel] (use needinfo)

Comment 40

•

13 years ago

Thanks Joel. I added a link to the project Snappy wiki [1]. We'll review during next week's call.

[1]https://wiki.mozilla.org/Performance/Snappy#Current_Infrastructure

Matt Brubeck (:mbrubeck)

Updated

•

11 years ago

Depends on: 904642

You need to log in before you can comment on or make changes to this bug.