Closed Bug 1533059 Opened 5 years ago Closed 4 years ago

raptor-tp6m-amazon-search-fennec, raptor-tp6m-wikipedia-fennec, and raptor-tp6m-espn-fennec timeouts

Categories

(Testing :: Raptor, defect, P2)

Version 3
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rwood, Assigned: rwood)

References

Details

The following Raptor tests are intermittently (quite often) not working on Fennec:

raptor-tp6m-amazon-search-fennec
raptor-tp6m-wikipedia-fennec
raptor-tp6m-espn-fennec

I tried several things including increasing the page-load timeout to a huge number like 3 minutes; changing the 'onload' to 'addEventListener'; and changing what measurements are being taken; but none of those had any effect.

I do notice that if you manually select the Raptor test tab on fennec and then manually choose to refresh the tab in fennec, then the page loads fine and the measurements are retrieved and the rest of the page-cycles work great.

Without that manual refresh it looks like for some reason the Raptor extension measure.js is not being injected into the test tab. This to me points to an issue with fennec itself, not loading the page on the first attempt perhaps.

Also tried running Raptor on fennec in the default tab (instead of opening a new tab; this is how the tests run on the geckoview example app and reference browser, just in the default tab) but the URL won't load on the fennec default tab because of the fennec start page.

Note that another issue in Fennec is the create tab API doesn't make the new tab active by default. Even tried passing in 'active = True' to the tabs.create API (which shouldn't be required as it is supposed to be active by default) but no effect. So when running the tests locally if you want to see them you need to manually select the 2nd tab in Fennec just FYI.

:acreskey, would you or someone on your team have a chance to look at this? I believe it is some problem with Fennec on first load of the test pages, but just a theory at this point.

Flags: needinfo?(acreskey)
See Also: → 1533064

:rwood, I can certainly try to isolate the problem.
I got raptor running on geckoview with this command, can you provide the arguments for fennec?
./mach raptor-test --test raptor-tp6m-1 --app=geckoview --binary="org.mozilla.geckoview_example"

(In reply to Andrew Creskey from comment #2)

Hey Andrew, thank you, much appreciated, and yes:

./mach raptor-test --test raptor-tp6m-amazon-search-fennec --app fennec --binary org.mozilla.fennec_aurora

That's using Fennec (target.apk) from treeherder's Android 4.0 API 16+ opt build task.

Thanks Rob -- but I must be missing something, I only see the speedometer test for fennec:
https://searchfox.org/mozilla-central/search?q=fennec&path=testing%2Fraptor%2Fraptor%2Ftests%2F*

(In reply to Andrew Creskey from comment #4)

Thanks Rob -- but I must be missing something, I only see the speedometer test for fennec:
https://searchfox.org/mozilla-central/search?q=fennec&path=testing%2Fraptor%2Fraptor%2Ftests%2F*

Ah yes sorry it hasn't landed yet - you'll need to apply the patch here first:

https://bugzilla.mozilla.org/show_bug.cgi?id=1532545#c1

Thanks Andrew

Thanks, I can now reproduce the problem locally (it's 100% on my emulator).

It looks like measure.js doesn't get run until after the page is loaded, so the window.onload event has already fired before we attached to it.

But this same logic works for raptor-tp6m-google-fennec ... strange.
I'll debug more tomorrow.

I think I found the issue:

The raptor webextension injects the js at the default time which is "document_idle".
See the run_at property of the content scripts:
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/manifest.json/content_scripts

I changed the run_at to document_end and I'm seeing the measure.js run correctly on raptor-tp6m-amazon-search-fennec and other fennec tp6 sites.

i.e. added "run_at": "document_end" here
https://searchfox.org/mozilla-central/source/testing/raptor/webext/raptor/manifest.json#43

If this holds it could explain some other problematic sites on other platforms - document_idle may arrive after the load event has fired.

Flags: needinfo?(acreskey)

Wow, that's really interesting thanks Andrew! Ok, I'll put a patch together trying out switching to "run_at: document_end" and see how that looks.

Assignee: nobody → rwood
Status: NEW → ASSIGNED
Blocks: 1533454

(In reply to Robert Wood [:rwood] from comment #9)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbb404f8e25af6bee2562bce3cf12c68d1029693

The initial try run with this change (as to when Raptor content is injected) appears to have caused issues with tp6-10 (paypal). Re-based and pushed to try again:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8aadfb82450b4332b81fae71c8b417e3b6e63be9

It looks like the initial page load is timing out waiting for any of metrics. I can reproduce this locally, even with a fresh mitmproxy recording. Running with --debug shows "awaiting results..." in the console and then nothing else. Is it possible that document_end can also be after the timing events?

Flags: needinfo?(acreskey)

Interesting -- and it looks like paypal is the only site where this is failing, right?

I can run a simple webextension with "run_at": "document_end" on https://www.paypal.com/myaccount/summary/ and it behaves correctly, catching the load event.

But let me rebuild and try this in raptor.

By the way, I find that the source documentation for run_at is more descriptive than the web docs:
https://searchfox.org/mozilla-central/source/dom/chrome-webidl/WebExtensionContentScript.webidl#120

Hmm, I pulled the latest for OSX and I'm seeing this error when trying to run ./mach raptor-test --test raptor-tp6-paypal-firefox

https://paste.rs/cI6
Any ideas? Like the configuration doesn't have a recordings field?

I can run other tests like ./mach raptor-test --test raptor-speedometer

^^ A clean build removed that error.

So on PayPal it looks like the window.onload function is being set after measure.js is executed, so the custom function that's being assigned is lost. (Looking at the PayPal source they do indeed assign a function to window.onload.)
So it looks like a timing issue with the site.

I refactored measure.js to add raptorContentHandler as load event listener (as we discussed in Bug 1528055), and similarly for runner.js on load function.

This is working locally, so I pushed it to try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d500a6f51ebbc4dd65d87d7dec86e2c19c2f43e

This does feel a bit like a whack-a-mole, but at least personally I think the above solution is correct.
We'll have to see what the try server says....

Flags: needinfo?(acreskey)

Hmmm..

And that push times out on raptor-tp6-google-firefox (hero) on Linux64 and raptor-tp6-imgur-firefox (fcp, fnb paint, dcf, ttfi) on OSX.

Locally (OSX) those tests both succeed.

Bug 1536758 is tracking switching to the event listener.

Push to try with run_at: document_start (might as well try out the last available run_at option:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=23082e496f74c0a1bad8fb9b1ab78b32e93d3620

I made a test for document_start vs document_end on geckoview and seems like document_end has generally better times: https://docs.google.com/spreadsheets/d/1NcKPjWAl7roU04o8j7tZBCM2-_-hRl2vlPWi2P_Lbos/edit#gid=1990777356

Deferring any changes here to Bug 1536758 - Switch to using event listener instead of overloading window.onload in runner/measure.js

Depends on: 1536758
No longer blocks: 1532545
Depends on: 1546323

Hi Alex, once you get the Fennec fix landed (Bug 1546323) can you please verify that this issue is resolved / these tests work on Fennec? Thanks!

Flags: needinfo?(alexandru.ionescu)

Hi Rob, sure!

Flags: needinfo?(alexandru.ionescu)
Priority: P1 → P2

:alex is this done?

Flags: needinfo?(alexandru.ionescu)

I did a fresh try to make sure all is good.

Flags: needinfo?(alexandru.ionescu)

(In reply to Alexandru Ionescu :alexandrui (needinfo me) from comment #24)

I did a fresh try to make sure all is good.

It appears that the results are no longer available. Can you push to try again :alexandrui? It's a good idea to leave needinfo open if you need to wait for results so that the bug is not forgotten.

Flags: needinfo?(aionescu)

No intermittent fail occurred anymore. Search also through the currently open intermittents and found non for f68. Closing this.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(aionescu)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.