Switch to using event listener instead of overloading window.onload in runner/measure.js

RESOLVED FIXED in Firefox 68

Status

enhancement
RESOLVED FIXED
3 months ago
3 months ago

People

(Reporter: alexandru.ionescu, Assigned: alexandru.ionescu, Mentored)

Tracking

(Blocks 1 bug)

Version 3
mozilla68
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox68 fixed)

Details

Attachments

(1 attachment)

Assignee

Description

3 months ago

Switch from using overloading of window.onload to attaching an event listener due to race condition.

Assignee

Updated

3 months ago
Assignee: nobody → alexandru.ionescu

Comment 3

3 months ago

I see the 100% failure on try.

Looking back at the revisions I made for my load event listener change most of the tp6 tests succeeded:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d500a6f51ebbc4dd65d87d7dec86e2c19c2f43e

If you follow the revision history in that change I also removed our clearing of the window.onload (window.onload = null) and, more importantly I think, the web extension is running at document_end:
https://hg.mozilla.org/try/rev/52ba91b92b3e

Assignee

Comment 4

3 months ago

Sorry, what 100% failure on try do you see?
I also removed the clearing of window.onload.
I didn't got to see the improvements of running at documment_end, but makes sense.

Assignee

Updated

3 months ago
Flags: needinfo?(acreskey)
Assignee

Updated

3 months ago
Blocks: 1533304

Comment 5

3 months ago

(In reply to Alexandru Ionescu from comment #4)

Sorry, what 100% failure on try do you see?

Oh I meant that the tp6 jobs all failed on that push.

I also removed the clearing of window.onload.
I didn't got to see the improvements of running at documment_end, but makes sense.

I wish I knew why raptor-tp6-google-firefox (hero) on Linux64 and raptor-tp6-imgur-firefox (fcp, fnb paint, dcf, ttfi) on OSX are failing with this push https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d500a6f51ebbc4dd65d87d7dec86e2c19c2f43e
because it works locally.
But I'm curious as to what you see.

I take it raptor is designed to run on the load event so that it doesn't interfere with the loading of the page?

I'm just brainstorming here... but if we can't find a universal solution like this... perhaps we could start the runner.js and measure.js at document_start and poll until we have all of the performance/dom apis that we need.

Flags: needinfo?(acreskey)

(In reply to Andrew Creskey from comment #5)

Interesting, thanks Andrew.

The hero element is the only measurement that we do without using an API i.e. we use an observer callback for when the element is 100% visible and then we 'mark' the time [0].

I was thinking about this last night - what if we start the runner and measure * after * the page has finished loading? Since Raptor is grabbing API values and not doing any of it's own actual timing. Is that possible? I * think * that would work for all our measurements except hero element.

Comment 8

3 months ago

(In reply to Robert Wood [:rwood] from comment #6)

I was thinking about this last night - what if we start the runner and measure * after * the page has finished loading? Since Raptor is grabbing API values and not doing any of it's own actual timing. Is that possible? I * think * that would work for all our measurements except hero element.

Maybe that would be the best solution Rob - let the page do its work and the query the results.

Completely coincidentally today I bumped into how webpagetest is effectively measuring object visibility, using the performance resource api:

https://github.com/WPO-Foundation/RUM-SpeedIndex/blob/master/src/rum-speedindex.js#L129

{
  "name": "https://sites.google.com/a/webpagetest.org/docs/_/rsrc/1472780159135/config/customLogo.gif?revision=1",
  "entryType": "resource",
  "startTime": 538,
  "duration": 46,
  "initiatorType": "img",
...
  "fetchStart": 538,
...
  "connectStart": 538,
  "connectEnd": 538,
  "secureConnectionStart": 0,
  "requestStart": 538,
  "responseStart": 573,
  "responseEnd": 584,
  "transferSize": 0,
...
}

Maybe we could look up hero visibility that way instead of marking time?

Assignee

Comment 9

3 months ago

(In reply to Andrew Creskey from comment #5)

Oh I meant that the tp6 jobs all failed on that push.

Thanks. I was looking for the results here: https://treeherder.mozilla.org/testview.html?repo=try&revision=25523d804d97648c8c5387d6e16b6aa4669fd325 that why I was seeing all of them passing. After last pull from central I encountered some issues with my local raptor (not sure where they come from). I am trying to figure them out and then run the failing tests locally to see if I'm getting the same thing.

I wish I knew why raptor-tp6-google-firefox (hero) on Linux64 and raptor-tp6-imgur-firefox (fcp, fnb paint, dcf, ttfi) on OSX are failing with this push https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d500a6f51ebbc4dd65d87d7dec86e2c19c2f43e
because it works locally.
But I'm curious as to what you see.

I take it raptor is designed to run on the load event so that it doesn't interfere with the loading of the page?

Yeah, according to my understanding, this push was meant to avoid interfering with the loading of the page.

(In reply to Andrew Creskey from comment #8)

Maybe we could look up hero visibility that way instead of marking time?

Oh cool! That might be worth trying. Although, if we are going to invest time in hero element, I think we need to ensure it is still worthwhile measuring it. Very few of the page load tests have hero elements. If they aren't really worthwhile or required then we should just not measure them.

Comment 12

3 months ago

(In reply to Robert Wood [:rwood] from comment #11)

(In reply to Andrew Creskey from comment #8)

Maybe we could look up hero visibility that way instead of marking time?

Oh cool! That might be worth trying. Although, if we are going to invest time in hero element, I think we need to ensure it is still worthwhile measuring it. Very few of the page load tests have hero elements. If they aren't really worthwhile or required then we should just not measure them.

That's true. I think that neither our current intersectionObserver hero element detection nor the above method used by WebPageTest actually can record when the element would be visible to the user. (Since they don't know about rendering and composition). So might not be worth investing in.

Assignee

Comment 13

3 months ago

acreskey, in the following sheet you can find the progress with the intermittents on tp6m-6 (bbc):
https://docs.google.com/spreadsheets/d/1NcKPjWAl7roU04o8j7tZBCM2-_-hRl2vlPWi2P_Lbos/edit#gid=0

Comment 14

3 months ago

I think the hero tests will fail if the hero element isn't found in the dom on first check:
https://searchfox.org/mozilla-central/source/testing/raptor/webext/raptor/measure.js#125

For the other measurements we set a timeout and try again. So this would make hero very sensitive to timing, especially if we've moved the execution of measure document end.

These are some choices that I see:
1 - move measure.js to run at document start and keep running all the individual measure tests until they have what they need
2 - move measure.js to run at document end but change how we measure hero so it's not timing dependent
3 - drop hero?

Assignee

Comment 15

3 months ago

In the sheet above I used document_start and document_end on an anyway failing test so we can't really see its real effect on that (ralking about "60s Timeout 30s Startup Delay run_at document_start" and "60s Timeout 30s Startup Delay run_at document_end"). I ran these days some other test with document_start and document_end to see the difference and I noticed that document_start increase the failure occurence. Unfortunately I didn't note this, but I am curious to run those 2 in comparison on a passing test to see if document_start is that risky I tend to believe. Will try to do that tomorrow and come back with the results.

Comment 16

3 months ago

Thanks Alexandru.
I think we'll have to make some other changes to measure.js to have it work on document_start -- as I mentioned above I think hero will fail because it won't be found. And I suspect that other objects like window.performance.timing.timeToNonBlankPaint may or may not be defined at the start.

(In reply to Robert Wood [:rwood] from comment #6)

(In reply to Andrew Creskey from comment #5)

Interesting, thanks Andrew.

The hero element is the only measurement that we do without using an API i.e. we use an observer callback for when the element is 100% visible and then we 'mark' the time [0].

I was thinking about this last night - what if we start the runner and measure * after * the page has finished loading? Since Raptor is grabbing API values and not doing any of it's own actual timing. Is that possible? I * think * that would work for all our measurements except hero element.

For the record, we should totally do this -- thanks for bringing it up. This is how browsertime works; they run the scripts in this directory after Web Driver says the page is loaded. It works very well.

browsertime claims to do some hero element things but I do not know how it is implemented.

Question: why don't we make Raptor do exactly what browsertime is doing, and basically use the same scripts? It's very flexible and turns Raptor into a more general purpose harness.

Assignee

Comment 18

3 months ago

Switching from overloading the document.onload to event listener made tp6-5 (microsoft-chrome) and tp6-3 (imgur-firefox) fail. https://treeherder.mozilla.org/#/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception&classifiedState=unclassified&revision=208dc666a1320214bfd869aff200ae725087ad0e&selectedJob=235172479
Raised the page timeout by 10 secs and tp6-5 (microsoft-chrome) passed but tp6-3 (imgur-firefox) failed. https://treeherder.mozilla.org/#/jobs?repo=try&revision=5bb20b75e6e41b8c4821ec73f30ee0e9bf70112c&selectedJob=235481532

Comment 19

3 months ago

:nalexander
I think the more we can share, the better. From what I've read both browsertime and WebPageTest use their hero elements in conjunction with their visual metrics -- they verifying from the recordings that the element has been rendered.

Browsertime:
https://github.com/sitespeedio/browsertime/pull/629/files#diff-ee1994f15e9c5ab9045a537be5b259b0R4

WebPageTest hero element:
https://github.com/WPO-Foundation/webpagetest/blob/master/docs/Metrics/HeroElements.md

Since we don't have visual recordings yet, I did try at least their dom visibility detection algorithm:
https://github.com/sitespeedio/browsertime/pull/629/files#diff-ee1994f15e9c5ab9045a537be5b259b0R32
Interestingly enough the hero1 element on youtube-firefox was returning an empty getBoundingClientRect(), so this was failing the visibility test. And it's not visible on screen via the dev tools inspector.
We would also need to ensure that image resource was loaded... and that the viewport had been rendered, composited, and buffers swapped. I can see why they prefer visual metrics for this sort of thing!

So I'm not sure what would be the ideal next step for hero element detection.

Comment 20

3 months ago

I did kick off a test with the following changes to raptor:

-web extension scripts run at document_end
-we add a load event listener instead of replacing window.onload
-hero element is periodically polled for instead of searched for once (todo: check visible & resource downloaded)
-extended some of the retry counts for other tests (we are still limited by the overall test timeout anyway)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b25a2d9a6d38c6751dc3fe70a807c7359b55eb8e&selectedJob=235867393

Comment 21

3 months ago

^^ That's all green so far, including tp6m, so I think it's at least a good argument for running raptor at the end and polling until everything is available.

Updated

3 months ago
Blocks: 1533059

Wow that looks promising Andrew, I'll add some re-triggers to your try push in comment 20 and also add the benchmarks (since they use the main runner.js too) and chromimum jobs.

Comment 23

3 months ago

Thanks Rob -- I do see a failure on raptor-tp6-imgur-firefox for OSX, so I'll try to repro that locally and see what's going on.

Assignee

Comment 24

3 months ago

acreskey, tp6-3 is not failing due to hero:
10:22:43 INFO - raptor-control-server received webext_status: begin pagecycle 5
10:22:44 INFO - 127.0.0.1 - - [25/Mar/2019 10:22:44] "POST / HTTP/1.1" 200 -
10:22:44 INFO - raptor-control-server received webext_status: update tab 1
10:22:44 INFO - 127.0.0.1 - - [25/Mar/2019 10:22:44] "POST / HTTP/1.1" 200 -
10:22:44 INFO - raptor-control-server received webext_status: test tab updated 1
10:23:43 INFO - 127.0.0.1 - - [25/Mar/2019 10:23:43] "POST / HTTP/1.1" 200 -
10:23:43 INFO - raptor-control-server received webext_raptor-page-timeout: [u'raptor-tp6-imgur-firefox', u'https://imgur.com/gallery/m5tYJL6', {u'fcp': True, u'hero': False, u'dcf': True, u'fnb paint': True, u'ttfi': True, u'load time': False}]

Neither do 6m-3 and 6m-8.

Assignee

Comment 25

3 months ago

:acreskey, what if we don't use retry count for metrics but search them somehow in a logic sequence (like FNBPaint would be always the first one if I understood correctly the documentation) and disable the retry flag after the last of them was found. Otherwise I see a discordance between page timeout and retry counter. At least from this point of view.
For example in the sheet below I colored from dark green to dark red the measurement times related to page timeout (60s). And guess what? They probably never get upper half the timeout value.
https://docs.google.com/spreadsheets/d/1NcKPjWAl7roU04o8j7tZBCM2-_-hRl2vlPWi2P_Lbos/edit#gid=241022032

Flags: needinfo?(acreskey)

I really don't believe the method of getting the measurements is the issue here - the page timeouts are legit issues with the pages not loading / measurement api values not available. We need to be careful thinking about changing this, this is the main engine of Raptor.

The Raptor webext measure.js just polls for all required measurements and as each one is available, it tells the main Raptor runner.js. The main runner.js checks that all measurement types were received every x ms and if all measurement types weren't received by the page_timeout value, then it calls a page_timeout and fails the test.

There might be an issue in the error details when for example the time allocated for the measurement retires i.e. ttfi, itself exceeds the overall page_timeout - so the main runner.js times out first while the retries are still in progress. I'm not sure if that's an issue or not - and if so, the page_timeout time should always be longer than the longest time allocated for any of the measurement retries.

My two cents anyway. :)

Comment 27

3 months ago

I agree that the overall page timeout is important -- as Rob said that's a legitimate problem.

I do find the shared gRetryCounter to be difficult to fully understand/predict.
e.g. for FNBPaint we only retry 10 times, but other metrics are also increasing this retry count:
https://searchfox.org/mozilla-central/source/testing/raptor/webext/raptor/measure.js#182

But I agree with you Alexandru, I was thinking about the discordance independently yesterday and I put up a patch where I was also trying to ensure that we wouldn't hit the retry limit (increasing polling interval and also increasing retry limit):

For some reason not all the tests have completed yet (maybe because I broke the retry logic?? :)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1cebecba4cf012041799d628c0d070809a0ce1c

Another issue that I found is with ttfi and the intermittent failures.
We're not guaranteed to reliably get what I assume to be the problematic aspect of ttfi (not blocked for more than 50ms) on a lot of sites, especially on low-end hardware. I disabled the metric for problematic sites on that last push.

Flags: needinfo?(acreskey)

Thanks all. Generally speaking, I don't think we are going to find a solution that works with all of our test sites. It seems no matter what we try (i.e. inject content at different times, using onload before/after existing sites onload, using a load event listener instead of onload, different retry timeouts, etc) we fix a couple of intermittents but the new solution introduces a couple more new ones, sometimes just on one platform, sometimes all.

I'm absolutely all for improving Raptor however everyone sees fit, as it can always be made better! I think we should take the solution that appears to fix the majority of intermittent failures (i.e your patch in comment 20 maybe?) and then accept the fact that some test sites just won't work well with mitmproxy and pull those out of the suites. Also I really think we need to consider dropping ttfi altogether, since it is not available always and is quite noisy; and should consider not measuring hero element also.

/rant over ;)

Assignee

Comment 29

3 months ago

Andrew, Rob, would it be a good idea to write down which solution fixes which fail and breaks which tests? Maybe this way we have a clearer big picture of it.

Flags: needinfo?(rwood)
Flags: needinfo?(acreskey)

Probably a good idea, maybe a google sheet?

Flags: needinfo?(rwood)

Comment 31

3 months ago

(In reply to Robert Wood [:rwood] from comment #28)

Thanks all. Generally speaking, I don't think we are going to find a solution that works with all of our test sites. It seems no matter what we try (i.e. inject content at different times, using onload before/after existing sites onload, using a load event listener instead of onload, different retry timeouts, etc) we fix a couple of intermittents but the new solution introduces a couple more new ones, sometimes just on one platform, sometimes all.

I'm absolutely all for improving Raptor however everyone sees fit, as it can always be made better! I think we should take the solution that appears to fix the majority of intermittent failures (i.e your patch in comment 20 maybe?) and then accept the fact that some test sites just won't work well with mitmproxy and pull those out of the suites. Also I really think we need to consider dropping ttfi altogether, since it is not available always and is quite noisy; and should consider not measuring hero element also.

/rant over ;)

Well said!
I would also argue for the removal of hero since, to the best of my knowledge, we don't have an accurate way of measuring it without screen recordings and visual metrics.

For TTFI, in an ideal world I would always attempt to measure it but never fail a test if it didn't occur. It's not guaranteed to happen in the given time interval, or at all.
(In reply to Alexandru Ionescu from comment #29)

Andrew, Rob, would it be a good idea to write down which solution fixes which fail and breaks which tests? Maybe this way we have a clearer big picture of it.

Yes, personally I'm fuzzy on what is exactly broken and what is a longstanding intermittent failure.

Flags: needinfo?(acreskey)
Assignee

Comment 32

3 months ago

(In reply to Andrew Creskey from comment #31)

For TTFI, in an ideal world I would always attempt to measure it but never fail a test if it didn't occur. It's not guaranteed to happen in the given time interval, or at all.

Maybe pass the tests if the failed attempts for this are not over 50%?

Yes, personally I'm fuzzy on what is exactly broken and what is a longstanding intermittent failure.

I could write down what's apparently fixing the fail and failing others. After looking over all of them could refine what doesn't fit our "movie". Don't want to unnecessary push this idea, though. Dave?

Assignee

Updated

3 months ago
Flags: needinfo?(dave.hunt)

(In reply to Alexandru Ionescu from comment #32)

(In reply to Andrew Creskey from comment #31)

For TTFI, in an ideal world I would always attempt to measure it but never fail a test if it didn't occur. It's not guaranteed to happen in the given time interval, or at all.

Maybe pass the tests if the failed attempts for this are not over 50%?

I don't think that's what :acreskey is saying. Ideally we would try to measure ttfi but never fail a test when we're unable to measure it. When we're unable to measure it, it will simply be missing from our results and not cause the test to fail.

Yes, personally I'm fuzzy on what is exactly broken and what is a longstanding intermittent failure.

I could write down what's apparently fixing the fail and failing others. After looking over all of them could refine what doesn't fit our "movie". Don't want to unnecessary push this idea, though. Dave?

I'm not following this suggestion, could you elaborate? I would propose examining the intermittent failures, and categorising them by the cause for the failure (such as timeout). We can combine this with the unstable tests that we have disabled to give us an idea of the current state. Then, we can identify how many of these are apparently resolved by these proposed changes, and how many new issues are introduced.

Flags: needinfo?(dave.hunt)
Assignee

Comment 34

3 months ago

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #33)

I'm not following this suggestion, could you elaborate? I would propose examining the intermittent failures, and categorising them by the cause for the failure (such as timeout). We can combine this with the unstable tests that we have disabled to give us an idea of the current state. Then, we can identify how many of these are apparently resolved by these proposed changes, and how many new issues are introduced.

I was just suggesting to put them together to be able to see which progress on individual test stands up for the whole.

Comment 35

3 months ago

Yes, my opinion is that failure to find ttfi is interesting, but not something a test should fail on.

I redid the push with longer setTimeout intervals on the polling and also removed ttfi from cases where it failed:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2aa2839bddc08f6ae31625288431ae43618f96c

What's the ./mach try command to give me good coverage?

Assignee

Comment 37

3 months ago

Try build with document_end, event listener and removing TTFI for mobile sites:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=48c3a46e4d4ba511952274a55701540b2b34c40c

Assignee

Comment 38

3 months ago

:rwood, in the attached diff you have just the document_end and event listener updates.

(In reply to Alexandru Ionescu from comment #36)

Try build with document_end&event listener:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6dd975d1dd01bbb44f2d4f34665862963d6ea72c&selectedJob=236553901

I've added the Raptor desktop Firefox and Chromium jobs (Linux 64) to your try push, although I believe they were tested previously with this change on other bugs/try pushes. When those are done you can lando this! :)

Comment 40

3 months ago
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/253bab8dbd5a
Switch to using event listener instead of overloading window.onload in runner/measure.js r=acreskey

Updated

3 months ago
Blocks: 1532545
Assignee

Comment 41

3 months ago

(In reply to Robert Wood [:rwood] from comment #39)

(In reply to Alexandru Ionescu from comment #36)

Try build with document_end&event listener:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6dd975d1dd01bbb44f2d4f34665862963d6ea72c&selectedJob=236553901

I've added the Raptor desktop Firefox and Chromium jobs (Linux 64) to your try push, although I believe they were tested previously with this change on other bugs/try pushes. When those are done you can lando this! :)

Yes they were tested.

Comment 42

3 months ago
Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
You need to log in before you can comment on or make changes to this bug.