Closed Bug 1670190 Opened 4 years ago Closed 3 years ago

Increased redirectEnd metric in Firefox 80 when connecting to Jira

Categories

(Core :: Performance, defect, P3)

80 Branch
defect

Tracking

()

RESOLVED WORKSFORME
Performance Impact high

People

(Reporter: mbiniek, Unassigned)

Details

(Keywords: perf:pageload, regressionwindow-wanted)

Attachments

(5 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36

Steps to reproduce:

Go to Jira Cloud [domain]/jira/your-work and compare responseStart results between 79 and 80. I'm happy to give access to test instance.

Actual results:

responseStart metric has been increased by ~300ms for p90 results in Firefox 80 release. I checked other navigation metrics and turned out that another impacted metric is redirectEnd which seems to be impacted by same amount.

Expected results:

navigation metrics between Firefox 79 and 80 should stay the same.

Setting a component for this issue in order to get the dev team involved.
If you feel it's an incorrect one please feel free to change it to a more appropriate one.

Component: Untriaged → Widget: Cocoa
Product: Firefox → Core

I believe we need to get some clarification before we can assign this to the right component. Are these metrics some kind of performance metrics for Jira? What exactly do these metrics measure? If you can reproduce reliably it would be great to get a regression range. Could you run mozregression[1] to see when this started happening? If you have never run mozregression before, simply run these three commands in a Terminal window:

sudo easy_install pip
sudo pip install -U mozregression --ignore-installed
mozregression --good 2017-01-01

A number of Firefox versions will open in succession to narrow down when this started occurring. Simply type "good" or "bad" in Terminal based on whether or not a build reproduces the bug. Once finished, please post the output from the last run. It should give a last good and first bad revision as well as a link to look at the changesets in that range. Thank you!

[1] https://mozilla.github.io/mozregression/

Component: Widget: Cocoa → General
Flags: needinfo?(mbiniek)
Product: Core → Firefox

Hi Stephen,
we aggregate some performance metrics (like navigation metrics fetchStart/redirectStart/responseStart) for selected experiences.
We have noticed that these metrics are significantly worse since Firefox 80 (continues in 81).
I tried to run mozregression to narrow down the version but I can't really reproduce reliably it on my machine. I was able to find version which returned requestStart/responseStart much higher, but there are different results every time I run it.

Flags: needinfo?(mbiniek)

Could you try getting a profile where you see the regression? Start recording then load the page. Ideally if you can also record when there was no regression in 79?

https://profiler.firefox.com/

Flags: needinfo?(mbiniek)
Whiteboard: [fxperf]

Hi, sorry for late reply but due to covid situation I have a little bit trouble to get access to other environments.
I cannot reproduce it on Mac, hence tried to do it on Windows but no luck as well.
We got the signal from RUM (real user metrics) so we don't have particular stack trace, however aggregated data is pretty stable and shows constant regression (~300ms for p75 and p90 of fetchStart).
I think I have narrowed down problem to the redirectEnd which seems to be non 0 when user logs in to the Jira.
Are you aware if Firefox 80 introduced different way of calculating these values? May be it isn't real regression, just reporting problem.
Are you aware of any regressions with login using Google Account?

Flags: needinfo?(mbiniek)

all the redirectEnd and fetchStart are browser reported navigation metrics: https://developer.mozilla.org/en-US/docs/Web/Performance/Navigation_and_resource_timings

Michael, to help us prioritize, what is the baseline for p75 and p90 of fetchStart? I see that it regressed 300ms, but what's the percentage increase? And am I reading correctly that it seems to be a fixed increase of around 300ms for p75 and for p90, and not a proportional increase (i.e., it's not smaller for p75 than it is for p90?)

Whiteboard: [fxperf] → [qf]
Component: General → Performance
Product: Firefox → Core

ni for mbiniek in comment #7

Flags: needinfo?(mbiniek)
Assignee: nobody → acreskey

This is an area where we may not have coverage - I'm going to see if I can reproduce this in Browsertime.

This may have been caused by Bug 1660890 which has since been fixed.
Michal, are you still seeing this issue?

Doug, Kim, I’m just waiting to hear back from legal to make sure there are no concerns with sharing this data externally.
Andrew, yes, unfortunately new versions of Firefox seems to stabilised on the higher level since 80.

Flags: needinfo?(mbiniek)

sorry for this delay.
here are baseline numbers of redirectEnd and fetchStart. It seems that the difference introduced in redirectEnd is propagated to all further metrics (fetchStart/responseStart). p90 of the redirectStart is ~1ms for all the versions of Firefox.

fetchStart p75

browser date value [ms]
Firefox 79 21 Aug 7
Firefox 79 24 Aug 6
Firefox 80 8 Sep 486
Firefox 80 14 Sep 539
Firefox 82 5 Nov 441

fetchStart p90

browser date value [ms]
Firefox 79 21 Aug 531
Firefox 79 24 Aug 544
Firefox 80 8 Sep 1077
Firefox 80 14 Sep 1114
Firefox 82 5 Nov 1058

redirectEnd p75

browser date value [ms]
Firefox 79 21 Aug 0
Firefox 79 24 Aug 0
Firefox 80 8 Sep 477
Firefox 80 14 Sep 531
Firefox 82 5 Nov 429

redirectEnd p90

browser date value [ms]
Firefox 79 21 Aug 519
Firefox 79 24 Aug 536
Firefox 80 8 Sep 1060
Firefox 80 14 Sep 1101
Firefox 82 5 Nov 1038

Interestingly Firefox 79 values stays pretty much the same for the next few weeks (which much smaller volume). At the same time Firefox 80 shown similar values few weeks before release (as well - with much lower volume).
Please let me know if you need any extra data.

Thank you for sharing those results Michal.

Steps to reproduce:

Go to Jira Cloud [domain]/jira/your-work and compare responseStart results between 79 and 80. I'm happy to give access to test instance.

Can you give the a specific url that you're using?
I've been trying this one locally, since we use jira internally:
https://jira.mozilla.com/secure/Dashboard.jspa

But so far on my Mac this is not leading to any time spent in redirection.

I'll run tests on a few sites to get a better view of redirection time between Firefox 79 and Firefox 80.

Flags: needinfo?(mbiniek)

Some results from my local device (MacBook Pro) testing a handful of sites with Browsertime.

There are intentional redirects from the target url in all cases except https://www.mozilla.org/en-US/ (the one with 0 redirectEnd time).

So far I haven't been able to reproduce a regression between FF79 and FF80.

I don't see telemetry that points to more http requests being redirected in Firefox80 vs 79.

What I'm really missing from our telemetry is a report on time spent in redirection -- let me look further into that.

I'm afraid https://jira.mozilla.com/secure/Dashboard.jspa is a server version of Jira. The data I've provided is from the cloud version.
Let me create new instance and invite you.
https://firefox-regression-investigation.atlassian.net/jira/your-work
The only case which I was able to reproduce with "higher" redirectEnd values was to log out and log in to that page again. Suspiciously redirectCount is still 0 - example from ^^ instance:
redirectCount: 0
redirectEnd: 1809
redirectStart: 0

For the comparison Chrome reports 0 for all redirect metrics:
redirectCount: 0
redirectEnd: 0
redirectStart: 0

Flags: needinfo?(mbiniek)
Severity: -- → S3
Priority: -- → P3
Whiteboard: [qf] → [qf:p1:pageload]
QA Whiteboard: [qa-regression-triage]

Thank you for making me that account Michal.

I haven't been able to reproduce a difference between FF79 and FF80 in that workflow locally.
(In both cases I see ~250ms redirection time midway through the log-in process).

But I haven't been able to use our performance framework for this because of the 2FA login.

How are you getting your redirection times?
Via the Web Console and window.performance.timing.redirectEnd - window.performance.timing.redirectStart?

We use:
performance.getEntriesByType('navigation')[0]
which returns time relative to timeOrigin.

Unfortunately, I tried to reproduce it on both macOS and Windows and variance was too big to find/narrow down browser version with the regression.
Our monitoring didn't catch that regression earlier as volume of beta versions is significantly smaller comparing to major release, however worse redirectEnd numbers were visible before official release of FF80.

I'm attaching few charts:

  • relative distribution between firefox 79 and 80 over Aug/Sep
  • p75 redirectEnd (cyan line FF79, blue line FF80)
  • p90 redirectEnd (cyan line FF79, blue line FF80)

Interesting, that certainly looks like they are related.

The only thing I've been able to find is a capture a profile with a redirection chain in Firefox80.
The redirects can be seen by hovering over the network requests.
https://share.firefox.dev/32IXghB

It looks like it starts with the response from this request.
https://atlassian.net/SetCST?cst=eyJjd ...

However it's certainly not consistently reproducible.

Something of a guess, but perhaps whatever interaction between Firefox and the Jira backend thats makes this flow occur is now more likely with FX80.

Hello I have tried reproducing the issue with various sites but unfortunately I wasn't able to do so. If you could provide a site that the issue is reproducible I will gladly continue to look for the regression of this issue.

Flags: needinfo?(mbiniek)

Negritas, as wrote previously, I'm not able to reproduce that locally. Unfortunately, I don't have access to more information (like OS) so can't narrow down if that's caused by the browser on the particular platform.
Nevertheless, is redirectCount: 0 expected to return non-zero results for redirectEnd? My guess was that maybe reporting these values has changed somehow between versions, however it seems that FF79 also reported non-zero values - please take a look at the distribution of the events reported between 29 Jul and 2 Oct (main usage of FF79 and 80).

Flags: needinfo?(mbiniek)
Assignee: acreskey → nobody

Hi Mikal,

Is this still looking similar in the latest versions of Firefox?

Flags: needinfo?(mbiniek)

Dragana, do you have any ideas here by any chance?

Flags: needinfo?(dd.mozilla)

I do not know of any change. This can be a lot of stuff. DocumentChannel was enabled shortly before. Maybe iti s something related to that work.

Flags: needinfo?(nika)
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(dd.mozilla)

I could definitely believe that something with DocumentChannel caused an impact on redirect numbers, as we do an internal redirect to complete the process switch into the correct process, but I would hope that internal redirects aren't included for these performance numbers, as that would likely also cause issues with other features, like ServiceWorkers.

The timeline for DocumentChannel also doesn't line up perfectly. It appears as though DocumentChannel rode to release in bug 1596682, which landed in Firefox 73, many releases earlier than this regression in Firefox 80.

I did a extremely non-scientific search of changes mentioning "channel" in the rough timeframe, which up a few potential candidates, though I don't quite see how they'd do it. For example, it might be related in some way to bug 1633935 (which changes how OnStartRequest is sent), bug 1646899 (which might've changed things if the page uses object/embed tags?), and a few others.

Unfortunately I don't know enough about these navigation/resource timings to know how they're measured, so my intuition about what could impact them isn't great.

Flags: needinfo?(nika)

There was a bug about redirectEnd in Firefox. According to the spec, redirectEnd supposes to return 0 when there's a cross-origin redirect in the redirect chain, and Firefox didn't do that. In Firefox, as long as the redirect had TAO header specified, it would return something rather than 0.

Michal, I wonder if it's possible that a TAO header had been introduced to the redirects in the redirect chain around that time, which caused the spike? Also, this bug has been fixed in Firefox 94 (so returns 0 even if the TAO header is specified). Do you have any early data to see if this changes anything?

Flags: needinfo?(matt.woodrow)

Hi, thanks for the follow up.

I wonder if it's possible that a TAO header had been introduced to the redirects in the redirect chain around that time

I don't think there was added anything new, but login redirect chain can be outside our control (with 3rd party integration).

I have just rerun queries to check results from last month and they are much better (constant 0 on p75).
I run the query to check longer period of time, but it will take a while to execute.

Flags: needinfo?(mbiniek)

Hey Michal, it looks like the TAO check might have been the root cause here, in which case Firefox >= 94 will solve. So, if you can check with that version and report back it would be useful to us. Also, you can see the what headers are served and parsed with Firefox by enabling develer tools, using Inspector/Network/All to look at network activity. I no longer have access to your test site, but if you check it, do you see a TAO header? Thanks for your help tracking this down.

Flags: needinfo?(mbiniek)

Hi,
sadly, we haven't been able to reproduce on local environment/our computers. We have noticed that in the reported performance data only which may be related to login pages or redirects from outside of Jira which has exposed TAO header. Jira page has attached TAO header, but that didn't change over FF79/80 release time.
It seems that Firefox 82 didn't show that regression so it has had to be fixed quite some time ago.

Flags: needinfo?(mbiniek)

It seems that Firefox 82 didn't show that regression so it has had to be fixed quite some time ago.

iiuc, it sounds like this issue was resolved in FF82 so I'm going to close the bug. If I misunderstood or you see this issue again, please reopen it. Michal, thanks for all of your help debugging this issue!

Status: UNCONFIRMED → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Performance Impact: --- → P1
Keywords: perf:pageload
Whiteboard: [qf:p1:pageload]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: