Closed Bug 1601401 Opened 5 years ago Closed 4 years ago

Odd results testing ./mach browsertime with fission

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Fission Milestone M7

People

(Reporter: jesup, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fission][necko-triaged])

When testing browsertime with fission (via --firefox.preference fission.autostart:true), you get odd results. BackEndTime is always negative; most other measurements are better than non-fission (though I don't see that in hand testing), but visualMetrics shows considerable regression.

I see this with CNN (including visual metrics), and Netflix (I didn't have them working for this test).

Not sure why; perhaps the negative BackEndTime is a clue to what goes wrong here.

Blocking the main browsertime meta since I assume it affects CI and local; feel free to move to blocking local and (I assume) CI use.

May be related to bug 1417308

From a run:
[2019-12-05 07:52:27] INFO: https://www.cnn.com/ BackEndTime: -337 DomInteractiveTime: 540 DomContentLoadedTime: 552 FirstPaint: 535 PageLoadTime: 1951
[2019-12-05 07:52:27] INFO: VisualMetrics FirstVisualChange: 2320 SpeedIndex: 3662 PerceptualSpeedIndex: 4964 ContentfulSpeedIndex: 3422 VisualComplete85: 5720 LastVisualChange: 6480

and the json recorded by browsertime from our APIs:
"navigationTiming": {
"connectStart": -373,
"domComplete": 1951,
"domContentLoadedEventEnd": 643,
"domContentLoadedEventStart": 552,
"domInteractive": 540,
"domainLookupEnd": -373,
"domainLookupStart": -394,
"duration": 2291,
"fetchStart": -402,
"loadEventEnd": 2291,
"loadEventStart": 1951,
"redirectEnd": 0,
"redirectStart": 0,
"requestStart": -349,
"responseEnd": -311,
"responseStart": -337,
"secureConnectionStart": -365,
"startTime": 0,
"unloadEventEnd": 0,
"unloadEventStart": 0,
"workerStart": 0
},
"pageTimings": {
"backEndTime": -337,
"domContentLoadedTime": 552,
"domInteractiveTime": 540,
"domainLookupTime": 21,
"frontEndTime": 2262,
"pageDownloadTime": 26,
"pageLoadTime": 1951,
"redirectionTime": 0,
"serverConnectionTime": 24,
"serverResponseTime": 38
},

Component: Performance → Networking
Product: Testing → Core
See Also: → 1417308
Whiteboard: [fission]
Version: Version 3 → Trunk

Tracking for Fission Nightly (M6)

Fission Milestone: --- → M6
Priority: -- → P3
Whiteboard: [fission] → [fission][necko-triaged]

:jesup, quite a few things have changed since you reported this. Could you please verify whether it's still happening in latest Nightly? This would help us triage this bug for M6. Thanks!

Flags: needinfo?(rjesup)
Severity: normal → S3
Fission Milestone: M6 → M7

I can confirm the issue still persists with the latest nightly, and I am looking into this

Flags: needinfo?(rjesup)

So I think the issue isn't strictly tied to Fission, however, Fission magnified it.

The negative timing is indeed caused by the inconsistency between navigationStart and other network-related timings. navigationStart is set when the docShell in the content process is created, and other network timings are created in the main thread and send to the content process https://searchfox.org/mozilla-central/rev/598e50d2c3cd81cd616654f16af811adceb08f9f/netwerk/protocol/http/HttpChannelParent.cpp#1532-1535. Calculations are done here.

This is even an issue without Fission if the navigation creates a new content process. It's quite reproducible with e10s by starting the browser in a fresh state and navigating to a site.

It is not an issue in Browsertime with e10s because the navigation always doesn't create a new content process, it will reuse the existing one. And it starts to become an issue with Fission because of the obvious reason, Fission creates more content processes.

So I think the fundamental problem we want to solve is the inconsistent timing for navigations that create a new content process.

Not sure what a proper solution is, however I'd guess it could be as simple as passing the navigationStart from the origin process to the newly created content process?

Nhi, may I ask to loop this issue back to the necko traige queue to get some thoughts from necko engineers?

Flags: needinfo?(nhnguyen)

Honza, could you weigh in on this?

Flags: needinfo?(nhnguyen) → needinfo?(honzab.moz)

There's also a concern about whether our navigationStart violates the spec.

Per spec, navigationStart has two cases

  1. Return the time immediately after the user agent finishes prompting to unload the previous document.
  2. If there is no previous document, this attribute must return the time the current document is created.

The case I described above is for case 2). The new document gets created in the newly created content process, which matches the spec, however this creates a timing after things like connectStart.

Looking at the processing modal, maybe this is correct...?

This seems to be fixed by Matt in Bug 1623461.

I have verified it myself and I don't see the odd results anymore.

Randell, mind give it a try? We can close this bug if you also don't see it anymore.

Flags: needinfo?(honzab.moz) → needinfo?(rjesup)
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(rjesup)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.