Closed Bug 1139831 Opened 5 years ago Closed 5 years ago

Resource Timing API - responseEnd is before requestStart and responseStart for HTTP 404

Categories

(Core :: DOM: Core & HTML, defect)

36 Branch
x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox37 - wontfix
firefox38 + fixed
firefox39 + fixed
firefox38.0.5 --- fixed
firefox40 + fixed

People

(Reporter: ad, Assigned: valentin, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: perf, regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.76 Safari/537.36

Steps to reproduce:

Open http://andydavies.github.io/rt-tests/http-404-failure.html in Firefox Nightly

Page includes a request for an image that doesn't exist, waits for onload and then extracts RT entries for image


Actual results:

responseEnd is equal to fetchStart and so before both requestStart and responseStart

[
  {
    "name": "http://andydavies.github.io/image.png",
    "entryType": "resource",
    "startTime": 708.7386409999999,
    "duration": 0,
    "initiatorType": "img",
    "redirectStart": 0,
    "redirectEnd": 0,
    "fetchStart": 708.7386409999999,
    "domainLookupStart": 708.7386409999999,
    "domainLookupEnd": 708.7386409999999,
    "connectStart": 708.7386409999999,
    "connectEnd": 708.7386409999999,
    "secureConnectionStart": 0,
    "requestStart": 713.88463,
    "responseStart": 944.016219,
    "responseEnd": 708.7386409999999
  }
]


Expected results:

responseEnd should be after responseStart
I cant reprouce it with FF38/39 on Win 7:
"responseStart": 359.11138241202576,
"responseEnd": 360.70304419259867

"responseStart": 143.82428982077184,
"responseEnd": 144.28969583062258"

"responseStart": 138.37685303433116,
"responseEnd": 138.85029097915535
Component: Untriaged → DOM
Product: Firefox → Core
This is 39a01 (2015-03-04) on 64 bit Win 7

[
  {
    "name": "http://andydavies.github.io/image.png",
    "entryType": "resource",
    "startTime": 211.95061104134743,
    "duration": 0,
    "initiatorType": "img",
    "redirectStart": 0,
    "redirectEnd": 0,
    "fetchStart": 211.95061104134743,
    "domainLookupStart": 263.26376676365294,
    "domainLookupEnd": 263.8425414403227,
    "connectStart": 263.8425414403227,
    "connectEnd": 263.8425414403227,
    "secureConnectionStart": 0,
    "requestStart": 264.3251446762088,
    "responseStart": 379.5530577210232,
    "responseEnd": 211.95061104134743
  }
]
Flags: needinfo?(valentin.gosu)
Andy, I am unable to reproduce the issue. Does it still reproduce for you?
Do you know of any other webpage where this happens? I do believe there's a problem here, but I'm not sure how it manifests.
Flags: needinfo?(ad)
I've not seen it on any other pages but this was one of a few test cases I put together to compare what PerformanceEntries Chrome, IE and FF's created when a resource failed to be fetched (https://github.com/w3c/resource-timing/issues/12)

I've just re-run the test URL in both HTTP and HTTPS, and I only see this behaviour for HTTP.

The same case with HTTPS appears to produce sane results

[
  {
    "name": "https://andydavies.github.io/image.png",
    "entryType": "resource",
    "startTime": 768.7048589999999,
    "duration": 227.90829499999995,
    "initiatorType": "img",
    "redirectStart": 0,
    "redirectEnd": 0,
    "fetchStart": 768.7048589999999,
    "domainLookupStart": 768.7048589999999,
    "domainLookupEnd": 768.7048589999999,
    "connectStart": 768.7048589999999,
    "connectEnd": 768.7048589999999,
    "secureConnectionStart": 0,
    "requestStart": 769.38515,
    "responseStart": 988.238554,
    "responseEnd": 996.6131539999999
  }
]

I can reproduce the HTTP case using the WebPageTest Dulles test agents too, and will check HTTPS tomorrow.

I'll also have a look for other 404 cases to see if they reproduce it too
I was not able to reproduce this using http://andydavies.github.io/rt-tests/http-404-failure.html. Here are my results:

Firefox 39.0a1 2015-03-17 > "responseStart": 381.859878, "responseEnd": 387.282743
Firefox 38.0a2 2015-03-17 > "responseStart": 273.217345, "responseEnd": 273.275276
Firefox 37.0b5 2015-03-12 > "responseStart": 361.085042, "responseEnd": 361.144751
Firefox 36.0.1 2015-03-05 > "responseStart": 212.137468, "responseEnd": 212.203263
QA Whiteboard: [triaged]
Did you have DevTools open or closed?

I've just tried it with DevTools open and I don't get the problem, but if DevTools is closed then I do.

This was in 39.0a1 2015-03-17
Flags: needinfo?(ad)
(In reply to Andy Davies from comment #6)
> Did you have DevTools open or closed?

I tested this on a new profile for each version without opening developer tools. I simply start Firefox with a new profile and load the URL.
Wonder what's different then as I can reproduce this at will…

This is FF Nightly tested in WebPageTest http://www.webpagetest.org/custom_metrics.php?test=150318_3V_6ddee2933b90f1f2af85f95c2b76e587&run=2&cached=0

I set this custom metric to pull the RT entry for the missing image:

[RT]
var url = document.getElementById("test-img");  
resourceEntries = window.performance.getEntriesByName(url.src);

return JSON.stringify(resourceEntries, null, 2);
Hey, I'm a performance engineer at Facebook and our logging indicates that this bug is extremely widespread on all Facebook pages. In fact only 25% of Firefox samples report "sane" performance metrics. It reproduces every time for me on FF36/Win7:

var p = performance.timing, x = {}; for (i in p) {x[i] = p[i] - p.navigationStart}; console.log(x); (for easier reading)

"connectEnd":1
"connectStart":1
"domainLookupEnd":1
"domainLookupStart":1
"domContentLoadedEventEnd":1993
"domContentLoadedEventStart":1993
"domInteractive":1948
"domLoading":98
"fetchStart":1
"navigationStart":0
"redirectEnd":1
"redirectStart":1
"requestStart":1
"responseEnd":1
"responseStart":97 <-- how is responseStart greater than responseEnd?
"unloadEventEnd":106
"unloadEventStart":98

Facebook pages use SPDY (not sure if it's related).
(In reply to Andrey Sukhachev from comment #9)
> Hey, I'm a performance engineer at Facebook and our logging indicates that
> this bug is extremely widespread on all Facebook pages.

Does this happen in a previous Firefox version or does this problem start with Firefox 36?
Flags: needinfo?(andrey)
I see a similar issue with FF 35. FF 34 seems to be better (it has its own issues, but that particular issue is not longer prominent).
Flags: needinfo?(andrey)
Thanks Andrey. I think you're right about the 25% correctness. It was due to using persistent connections. I believe it was fixed by Bug 1123920.
Do you have any data for Firefox 39a1?
Flags: needinfo?(andrey)
(In reply to Andrey Sukhachev from comment #11)
> I see a similar issue with FF 35. FF 34 seems to be better (it has its own
> issues, but that particular issue is not longer prominent).

Thanks Andrey, would you be willing to narrow this down further with Firefox Nightly builds using our mozregression tool? http://mozilla.github.io/mozregression/install.html

If it is a regression introduced in Firefox 35, you'll find it between 2014-09-03 and 2014-10-13.
(In reply to Valentin Gosu [:valentin] from comment #12)
> Thanks Andrey. I think you're right about the 25% correctness. It was due to
> using persistent connections. I believe it was fixed by Bug 1123920.
> Do you have any data for Firefox 39a1?

We don't have many samples for v39 (about 1% of v37 samples), but I don't see that particular error in our logs (which could be because we have so few samples, but more likely the error is fixed as you said).
Flags: needinfo?(andrey)
Thanks Andrey. Please let us know if you find any errors in the future.
Flags: needinfo?(valentin.gosu)
Assignee: nobody → valentin.gosu
(In reply to Valentin Gosu [:valentin] from comment #15)
> Thanks Andrey. Please let us know if you find any errors in the future.

Okay.. Looks like the issue is still not fixed, even in the 40.0a1 (2015-03-31 nightly). responseStart is no longer greater than responseEnd, but they all are set to the same value:

"requestStart": 150
"responseStart": 150
"responseEnd": 150

this is incorrect, as our pages render progressively and should take at least a hundred ms to finish.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #13)
> (In reply to Andrey Sukhachev from comment #11)
> > I see a similar issue with FF 35. FF 34 seems to be better (it has its own
> > issues, but that particular issue is not longer prominent).
> 
> Thanks Andrey, would you be willing to narrow this down further with Firefox
> Nightly builds using our mozregression tool?
> http://mozilla.github.io/mozregression/install.html
> 
> If it is a regression introduced in Firefox 35, you'll find it between
> 2014-09-03 and 2014-10-13.

I tried doing it yesterday, but the range seems incorrect, I think I ended up with the [2014-10-13, 2014-10-22] range, but then I hit a different issue and had to backtrack. Do you still want me to narrow it down?
(In reply to Andrey Sukhachev from comment #17)
> Do you still want me to narrow it down?

This has to be narrowed down if we have any hope for a fix. Normally I would do it but I cannot reproduce this issue.
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #18)
> (In reply to Andrey Sukhachev from comment #17)
> > Do you still want me to narrow it down?
> 
> This has to be narrowed down if we have any hope for a fix. Normally I would
> do it but I cannot reproduce this issue.

That's very interesting.. When you load a facebook page, what numbers do you get? What can be different between our browsers (or browser configurations)? IPv4 vs IPv6? SPDY version (I get 3.1)? I specifically reset the browser settings and it reproes for 75%$ of our users..
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #18)
> (In reply to Andrey Sukhachev from comment #17)
> > Do you still want me to narrow it down?
> 
> This has to be narrowed down if we have any hope for a fix. Normally I would
> do it but I cannot reproduce this issue.

Okay, I narrowed it down to [2014-10-21, 2014-10-22]:

 7:41.68 LOG: MainThread Bisector INFO Narrowed nightly regression window from [2014-10-20, 2014-10-22] (2 days) to [2014-10-21, 2014-10-22] (1 days) (~0 steps left)
 7:41.68 LOG: MainThread Bisector INFO Got as far as we can go bisecting nightlies...
 7:41.68 LOG: MainThread Bisector INFO Last good revision: 29fbfc1b31aa
 7:41.68 LOG: MainThread Bisector INFO First bad revision: ae4d9b4ff2ee
 7:41.68 LOG: MainThread Bisector INFO Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=29fbfc1b31aa&tochange=ae4d9b4ff2ee

 7:41.68 LOG: MainThread Bisector INFO ... attempting to bisect inbound builds (starting from 4 days ago, to make sure no inbound revision is missed)
 7:42.35 LOG: MainThread Bisector INFO Getting inbound builds between ae1dfa192faf and ae4d9b4ff2ee
 7:50.58 LOG: MainThread Bisector INFO No inbound data found.
 7:50.58 LOG: MainThread Bisector INFO There are no build dirs on inbound for these changesets.
It looks like this range contains Valentin Gosu's revisions that touch the code around the perf timings:
https://hg.mozilla.org/mozilla-central/rev/949644ee3fdd
https://hg.mozilla.org/mozilla-central/rev/7b6c3c0e7ea6

(just a guess, I don't know much about mozilla's codebase)
Thanks, that's as good a place as any to start.

Valentin, can you please look into this?
Blocks: 1064706
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(valentin.gosu)
Keywords: regression
[Tracking Requested - why for this release]: based on the pushlog above this is a regression from Firefox 36.
Version: 38 Branch → 36 Branch
(In reply to Andrey Sukhachev from comment #16)
> (In reply to Valentin Gosu [:valentin] from comment #15)
> > Thanks Andrey. Please let us know if you find any errors in the future.
> 
> Okay.. Looks like the issue is still not fixed, even in the 40.0a1
> (2015-03-31 nightly). responseStart is no longer greater than responseEnd,
> but they all are set to the same value:
> 
> "requestStart": 150
> "responseStart": 150
> "responseEnd": 150
> 
> this is incorrect, as our pages render progressively and should take at
> least a hundred ms to finish.

These values should be floats. Are you sure you are looking at window.performance.getEntries() ? Also, could you tell me for which resource and/or subdomain you get these values?
Flags: needinfo?(valentin.gosu) → needinfo?(andrey)
(In reply to Valentin Gosu [:valentin] from comment #24)
> These values should be floats. Are you sure you are looking at
> window.performance.getEntries() ? Also, could you tell me for which resource
> and/or subdomain you get these values?

Oh, sorry. Our issue is actually with the parent document's performance.timing (not resource timings). Looks like I totally missed that this task was about resources, not about the navigation. We should probably move this discussion to a different task. Sorry for the confusion. 

That said, the issue we're experiencing is from the top-level pages on www.facebook.com, the absolute values are:
"requestStart" 1427837805196
"responseStart" 1427837805284 <- greater than "responseEnd"
"responseEnd" 1427837804974 

I just substracted performance.timing.navigationStart to make the numbers easier to read.
Flags: needinfo?(andrey)
Tracking for 38+ since this is a regression.  
Valentin it sounds like you are investigating. 

Vladan is this something you might want to be aware of or could help with?
Flags: needinfo?(vdjeric)
Keywords: perf
Hi Liz, thank you for the heads up. My team hasn't done any work with the resource timing API yet so I don't think we are in a position to contribute any unique insights
Flags: needinfo?(vdjeric)
Valentin are you still working on this or do you have other things that are more of a priority? I'm checking in since this is a tracked bug. How much impact do we this has on general Firefox users? Is this more something that exclusively affects developers?
Flags: needinfo?(valentin.gosu)
Sorry... I forgot to update the bug with my latest findings.
I've tested extensively last week, and I was able to reproduce the bug, but only on OSX, curiously. I was able to get it to reproduce on the testcase in comment 0, about 2/10 times.
I have a fair idea of where the problem might be. I'll post a try build link for our reporters to confirm it works.
Flags: needinfo?(valentin.gosu)
Firefox 37.0.1 has been released. I'm marking 37 as wontfix. We can consider a fix for 38 if one is available in time.
Hey Andy,

I think I managed to sort out the problem. Could you download the build below and see if you can reproduce the problem? Thanks!

https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/valentin.gosu@gmail.com-b86fae4abf34/try-macosx64/firefox-40.0a1.en-US.mac.dmg
Flags: needinfo?(ad)
Hmm, it improves things

For example I can often get values that make sense:

[
  {
    "name": "http://andydavies.github.io/image.png",
    "entryType": "resource",
    "startTime": 4354.575197,
    "duration": 121.99504499999966,
    "initiatorType": "img",
    "redirectStart": 0,
    "redirectEnd": 0,
    "fetchStart": 4354.575197,
    "domainLookupStart": 4354.575197,
    "domainLookupEnd": 4354.575197,
    "connectStart": 4354.575197,
    "connectEnd": 4354.575197,
    "secureConnectionStart": 0,
    "requestStart": 4355.264456999999,
    "responseStart": 4475.732024,
    "responseEnd": 4476.570242
  }
]

But I also still get some odd results just by hitting refresh (it's not consistent):

[
  {
    "name": "http://andydavies.github.io/image.png",
    "entryType": "resource",
    "startTime": 389.586001,
    "duration": 0,
    "initiatorType": "img",
    "redirectStart": 0,
    "redirectEnd": 0,
    "fetchStart": 389.586001,
    "domainLookupStart": 389.586001,
    "domainLookupEnd": 389.586001,
    "connectStart": 389.586001,
    "connectEnd": 389.586001,
    "secureConnectionStart": 0,
    "requestStart": 390.253483,
    "responseStart": 580.6297030000001,
    "responseEnd": 389.586001
  }
]
Thank you, Andy. In that case I can work around the issue, and have it return the same value as responseStart, which would at least lead to a sane entry :)

However, this wouldn't fix the underlying cause. Would you mind reproducing the issue and sending us the logs?
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Also, are you using a proxy, or have a weird network? I find it much more difficult to reproduce the problem.
This is a temporary fix until we manage to resolve bug 1155008. If the end timestamp is null, we return the start value, so the resource timing entry at least has the proper order.
Attachment #8593138 - Flags: review?(amarchesini)
Blocks: 1141370
Comment on attachment 8593138 [details] [diff] [review]
end timestamps are before start timestamps

Review of attachment 8593138 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsPerformance.cpp
@@ +292,5 @@
>  nsPerformanceTiming::ConnectEndHighRes()
>  {
>    if (!nsContentUtils::IsPerformanceTimingEnabled() || !IsInitialized()) {
>      return mZeroTime;
>    }

Can it make sense to do:

if (mConnectEnd.IsNull()) {
 mChannel->GetConnectEnd(&mConnectEnd);
}

return mConnectEnd.IsNull() ? ConnectStartHighRes()
                            : TimeStampToDOMHighRes(mConnectEnd);

where mChannel is what we receive in InitializeTiminigInfo() ?
Attachment #8593138 - Flags: review?(amarchesini) → review+
(In reply to Andrea Marchesini (:baku) from comment #35)
> Comment on attachment 8593138 [details] [diff] [review]
> end timestamps are before start timestamps
> 
> Review of attachment 8593138 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/base/nsPerformance.cpp
> @@ +292,5 @@
> >  nsPerformanceTiming::ConnectEndHighRes()
> >  {
> >    if (!nsContentUtils::IsPerformanceTimingEnabled() || !IsInitialized()) {
> >      return mZeroTime;
> >    }
> 
> Can it make sense to do:
> 
> if (mConnectEnd.IsNull()) {
>  mChannel->GetConnectEnd(&mConnectEnd);
> }
> 
> return mConnectEnd.IsNull() ? ConnectStartHighRes()
>                             : TimeStampToDOMHighRes(mConnectEnd);
> 
> where mChannel is what we receive in InitializeTiminigInfo() ?

We could technically do that, however, we don't keep a reference to the channel past the constructor because that would mean keeping the channel alive for a long time, and for pages with a lot of subresources that's quite a lot of memory being used.

I will try to figure out why the end timings are missing in bug 1155008.
Meanwhile, we should land the patch, and hopefully uplift it to aurora&beta.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/89896e13a0e9
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment on attachment 8593138 [details] [diff] [review]
end timestamps are before start timestamps

Approval Request Comment
[Feature/regressing bug #]:
Preexisting bug.

[User impact if declined]:
Resource timing entries might have out-of-order end-timestamps.

[Describe test coverage new/current, TreeHerder]:
web-platform-tests (which were failing intermittently before this patch - bug 1141370)

[Risks and why]: 
Low risk. We just return the valid start timestamp whenever the end timestamp is null.

[String/UUID change made/needed]:
None
Attachment #8593138 - Flags: approval-mozilla-beta?
Attachment #8593138 - Flags: approval-mozilla-aurora?
Comment on attachment 8593138 [details] [diff] [review]
end timestamps are before start timestamps

Looks like this is a band aid until we can fix bug 1155008. I take it this is a short term improvement for the user. Beta+ Aurora+
Attachment #8593138 - Flags: approval-mozilla-beta?
Attachment #8593138 - Flags: approval-mozilla-beta+
Attachment #8593138 - Flags: approval-mozilla-aurora?
Attachment #8593138 - Flags: approval-mozilla-aurora+
Comment on attachment 8593138 [details] [diff] [review]
end timestamps are before start timestamps

[Triage Comment]
We merged m-b into m-r
Attachment #8593138 - Flags: approval-mozilla-beta+ → approval-mozilla-release+
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.