Investigate and fix raptor-tp6-facebook-firefox failure

ASSIGNED
Assigned to

Status

defect
P2
normal
ASSIGNED
4 months ago
2 months ago

People

(Reporter: rwood, Assigned: rwood)

Tracking

(Depends on 1 bug, {leave-open})

Version 3
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

Assignee

Description

4 months ago

In Bug 1506936 we temporarily disabled raptor-tp6-facebook-firefox because it went permafail for some reason. Figure out why it is failing and fix it and re-enable the test.

Assignee

Comment 1

4 months ago

I cannot reproduce the tp6-facebook-firefox failure locally on OSX with latest inbound; pushing to try to see the status in production:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=764d42c8c6f368ff97b9d6d52068fed78ef6cfa2

Assignee

Comment 2

4 months ago

After a bunch of re-triggers on autoland, looks like this is the cause: Bug 1528055

When I was working on Bug 1528055 I did see this failure on try, but then I updated the patch to also not override the 'onload' in the main raptor runner.js; that try run was green but looks like it wasn't green, it was intermittent.

https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=raptor%2Ctp6-1%2Cwindows&tochange=dd22940c7ae9cbfd06c076f29e27b4753f3add04&fromchange=708d79591ac5e4ce8b71667f491cb859545c3587

Blocks: 1528055
Assignee

Comment 3

4 months ago

Hey Andrew, do you have any suggestions as to how the patch in Bug 1528055 (not over-riding the page content's 'onload') could break tp6-facebook? FB must be doing something different, the other tp6 sites seem to be fine.

Flags: needinfo?(acreskey)
Assignee

Comment 4

4 months ago

Interesting, :davehunt, :acreskey, check this out - from the Firefox devtools console while raptor-tp6-facebook is running.

Flags: needinfo?(dave.hunt)
Assignee

Comment 5

4 months ago

(In reply to Robert Wood [:rwood] from comment #4)

Created attachment 9046455 [details]
tp6-raptor-firefox-browser-console.png

Interesting, :davehunt, :acreskey, check this out - from the Firefox devtools console while raptor-tp6-facebook is running.

Probably just a standard FB thing but wondering if maybe somehow FB is blocking the Raptor content masure.js from being injected; perhaps it's a race condition or something.

Comment 7

4 months ago

Hmm... the failure is timeout from those try jobs, right?
(And even in production it's intermittent.)

Is there a way to find out which particular subtest failed? Is it the load event

I was thinking in their load event handler facebook may have been loading a new resource that wouldn't have been captured in the initial mitmproxy recording.
But if it's succeeding for you locally then I guess that's not the case.

Flags: needinfo?(acreskey)
Assignee

Comment 8

4 months ago

(In reply to Andrew Creskey from comment #7)

When the intermittent happens facebook is always timing out on the very first page-load. It looks to me that the raptor measure.js may not even be injected, or the measure.js raptorContentHandler isn't being invoked.

I'll push to try with reduced combinations (i.e. only measure first-contentful-paint) and see if that makes a difference, but I think it is moreseo the content handler not being invoked at all, my theory at this point.

Assignee

Comment 9

4 months ago

Try run with tp6-raptor-facebook-firefox set to only measure first-non-blank-paint:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2ad0c7d7b0f64ddafe73fda9a4a4950f2f6df2b4

Another possibility is the URL is changing somehow (maybe a race condition) so the measure.js ins't being injected. However as long as the URL is *.facebook.com it would be still injected [0].

[0] https://searchfox.org/mozilla-central/rev/b10ae6b7a50d176a813900cbe9dc18c85acd604b/testing/raptor/webext/raptor/manifest.json#21

Comment 10

4 months ago

Maybe a race-condition for the window.load() function... if we set the raptor content handler first and then somewhere in their loading they replace window.onload(), the raptor handler would be lost.

If you want an experiment, you could try this

if(window.addEventListener)
{
  window.addEventListener('load', raptorContentHandler)
}

replacing this:
https://searchfox.org/mozilla-central/source/testing/raptor/webext/raptor/measure.js#327

At least for a trivial function it worked for me.

Assignee

Comment 11

4 months ago

Thanks Andrew. IIRC I tried that (addEventListener) but that didn't work with Chromium (?) but I can always revisit and try again. First here are a bunch of try pushes with the existing code, but trying out an extended page_timeout for facebook (in case it's just taking longer); and also try pushes for measuring each individual measurement type (fnbpaint, fcp, hero, dcf, ttfi, loadtime) on it's own:

https://treeherder.mozilla.org/#/jobs?repo=try&author=rwood%40mozilla.com&fromchange=2ad0c7d7b0f64ddafe73fda9a4a4950f2f6df2b4&tochange=d75b25d7ddb60534f5d51351aa341810365a4e0f

Comment 12

4 months ago

Good stuff.
I did do a quick test and was able to add an eventListener for 'load' in Chrome, but I didn't try in raptor.

I took a quick look at the pushes -- this one I'm not sure if it's correct, or if I missed the intention:
https://hg.mozilla.org/try/rev/9030f3013dc35939be5f04ac6dda1b92f17afc98

Assignee

Comment 13

4 months ago

(In reply to Andrew Creskey from comment #12)

Good stuff.
I did do a quick test and was able to add an eventListener for 'load' in Chrome, but I didn't try in raptor.

I took a quick look at the pushes -- this one I'm not sure if it's correct, or if I missed the intention:
https://hg.mozilla.org/try/rev/9030f3013dc35939be5f04ac6dda1b92f17afc98

Thanks. The purpose of that particular try run was to increase the page_timeout to a huge number (2 min) just to see if for some reason it was just taking a super long time to load FB - but it still timed out so that's not the issue.

Assignee

Comment 14

4 months ago

Looks like it * may * be an issue measuring the hero element; try push with existing code and all measurements except hero:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2e0404222a168e86d34493a779dd38a46ee4fe90

Assignee

Updated

4 months ago
Flags: needinfo?(dave.hunt)
Assignee

Updated

4 months ago
Keywords: leave-open

Comment 19

4 months ago

Interesting -- so measure.js is being injected but even with a 60 delay it's not finding the hero...

Assignee

Comment 20

4 months ago

(In reply to Andrew Creskey from comment #19)

Interesting -- so measure.js is being injected but even with a 60 delay it's not finding the hero...

Yes looks that way, and possibly for amazon also - going to do some more retriggers before landing.

Comment 21

4 months ago
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/faa8822a034d
Temporarily stop measuring hero element in tp6-1 to prevent intermittent failures; r=acreskey,davehunt

This updated some of our Raptor baselines:

== Change summary for alert #19778 (as of Tue, 05 Mar 2019 14:23:28 GMT) ==

Improvements:

11% raptor-tp6-amazon-firefox linux64 pgo 443.10 -> 392.28
11% raptor-tp6-amazon-firefox linux64-pgo-qr opt 466.97 -> 414.85
11% raptor-tp6-amazon-firefox windows10-64-pgo-qr opt 393.69 -> 352.16
10% raptor-tp6-amazon-firefox windows7-32 pgo 399.30 -> 357.79
9% raptor-tp6-facebook-firefox osx-10-10 opt 949.93 -> 867.24
9% raptor-tp6-amazon-firefox windows10-64 pgo 396.42 -> 362.15
5% raptor-tp6-facebook-firefox linux64 pgo 321.91 -> 305.03
5% raptor-tp6-facebook-firefox linux64-pgo-qr opt 340.10 -> 324.80

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=19778

Assignee

Comment 24

3 months ago

Thanks Ionut yep that makes sense as we're not measuring hero element anymore (temporarily) in those tests so the geomean would be effected.

Assignee

Updated

2 months ago
Type: enhancement → defect
Priority: P1 → P2
You need to log in before you can comment on or make changes to this bug.