Closed Bug 1379762 Opened 7 years ago Closed 7 years ago

Window onload event fires twice when srcset used

Categories

(Core :: DOM: Navigation, defect)

54 Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla56
Tracking Status
platform-rel --- ?
firefox55 --- verified
firefox56 --- verified

People

(Reporter: jdlrobson, Assigned: bzbarsky)

References

(Blocks 1 open bug)

Details

(Whiteboard: [platform-rel-Wikipedia])

Attachments

(4 files, 1 obsolete file)

Attached file index.html
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0) Gecko/20100101 Firefox/54.0
Build ID: 20170628075643

Steps to reproduce:

Download the attached file and host locally. The file has an img tag with a srcset attribute. When removing that img tag, I cannot seem to replicate this problem so I believe it is linked.

Simulate a user trying to cancel an accidental link click:
Click the link to navigate away from the page, but using Cmd+left - navigate back as soon as the address bar changes.

(It should be noted the same problem can be replicated by clicking forward as well as back)

It's expected that the window.onload event should only be emitted once per page view - "The load event fires at the end of the document loading process. " [1] 
In this situation, we are presumably loading from cache as the existing HTML is still there so there is no further loading to do. We'd expect window.onload event to only ever fire once.

[1] https://developer.mozilla.org/en-US/docs/Web/API/GlobalEventHandlers/onload


Actual results:

I noticed in this scenario that the window.onload event actually fires on every back and forward button press.


Expected results:

The load event should only fire once in a given browsing context.
I'd expect the window.onload event not to fire on the back button.
The bug was discovered while investigating a bug with the Wikipedia website that is blocking a new product launch. Some background is provided here: https://phabricator.wikimedia.org/T170018

The bug means our analytics are logging multiple duplicate events in this circumstance. A short term fix seems to be to change our window.onload event to only fire once but this doesn't.
Here's a GIF demonstrating what I'm seeing using the MTC:
https://phabricator.wikimedia.org/T170018
Here's a GIF demonstrating what I'm seeing using the MTC:
https://phabricator.wikimedia.org/F8695090
(this time with valid link)
Flags: needinfo?(josh)
I confirm that the load event fires each time I return to the page when there is a srcset attribute present (even if it is the empty string), and it does not fire otherwise.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: Untriaged → DOM
Flags: needinfo?(josh)
Product: Firefox → Core
Whiteboard: [platform-rel-Wikipedia]
The reason that srcset makes a difference is that the image element is added to the list of responsive content in the document. When we go back in history, nsDocShell::RestoreFromHistory calls nsPresContext::SetOverrideDPPX, which invokes nsPresContext::MediaFeatureValuesChanged. This invokes nsDocument::MediaFeatureValuesChanged, which notifies all responsive content via HTMLImageElement::MediaFeatureValuesChanged.

This causes us to queue an ImageLoadTask, which calls nsDocument::BlockOnload in its constructor and nsDocument::UnblockOnload in its destructor. I have not yet been able to figure out why this ends up behaving differently than adding and removing an image element after a page has already loaded.
I figured it out - nsDocShell::BeginRestore calls nsLoadGroup::AddRequest with the document channel. This means that nsDocLoader::OnStartRequest sees that a document is being loaded and sets up the necessary flags. Now, in a document that does not contain responsive content, when nsDocShell::RestoreFromHistory calls nsDocShell::FinishRestore (first setting mIsRestoringDocument to true), we would call nsLoadGroup::RemoveRequest, which would invoke nsDocLoader::OnStopRequest and eventually reach nsDocShell::EndPageLoad (via nsDocShell::OnStateChange). Here we invoke nsDocumentViewer::LoadComplete, and if nsDocShell::GetRestoringDocument is true, we skip dispatching the load event.

In the case when there _is_ responsive content, everything acts a little bit differently. nsDocShell::BeginRestore calls nsLoadGroup::AddRequest like before, but before we reach nsDocShell::FinishRestore, we first hit the nsPresContext::SetOverrideDPPX that I mentioned in comment 4. This causes us to call nsLoadGroup::AddRequest for the image's load blocker, which means that when nsDocShell::FinishRestore calls nsLoadGroup::RemoveRequest, nsDocLoader::IsBusy returns true so the document loader skips the rest of nsDocLoader::DocLoaderIsEmpty. At this point, nsDocShell::FinishRestore finishes running and sets mIsRestoringDocument to false. When the image element's async loading task completes, it calls nsDocument::UnblockOnload, which invokes nsLoadGroup::RemoveRequest. When we reach nsDocLoader::DocLoaderIsEmpty this time, nsDocLoader::IsBusy return false, so we continue on with the rest of the postponed load notification. When we reach nsDocShell::EndPageLoad (via nsDocShell::OnStateChange), we invoke nsDocumentViewer::LoadComplete and check nsDocShell::GetRestoringDocument. Since it's false, we go on and dispatch the load event.
I don't really know what the right course of action is here, but given how this is tied up with the docshell/document viewer/content viewer, I guess Document Navigation is a more appropriate component. Hopefully someone else can figure out how to address this.
Component: DOM → Document Navigation
Reproduced using https://codepen.io/Krinkle/debug/pwQZKW on:
* Firefox 54 on OSX 10.11
* Firefox 53 on macOS Sierra (via BrowserStack)
* Firefox 52 on macOS Sierra
* Firefox 51 on macOS Sierra

Not reproducible on:
* Firefox 50 on macOS Sierra
* Firefox 49 on macOS Sierra
* Firefox 46 on macOS Sierra

So looks like this was a regression in Firefox 51.
platform-rel: --- → ?
> In this situation, we are presumably loading from cache as the existing HTML is still there so there
> is no further loading to do. 

During history navigation (say "back"), there are two things that can happen:

1) The entire DOM, layout, and script global of the page being navigated to is cached (the "bfcache" case).  In this case the "load" event will not fire on the navigation.

2) Only the HTML is cached; it is reparsed, scripts are re-executed, etc.  This is the normal history navigation case, and will fire the "load" event.

Both cases are spec-compliant.  I'm pretty sure most non-Firefox browsers only have case 2 happen.

Comment 5 sounds like we start doing case 1 and then switch to thinking we're doing case 2 and incorrectly re-fire load on a document that we're actually restoring from bfcache.  That would in fact be wrong.  We shouldn't be messing with async onload blockers in the bfcache restoration case.

I'll think a bit about how to fix this in a more or less foolproof way.
Flags: needinfo?(bzbarsky)
IIRC, Safari (at least mobile one) has (1) too.
So there are at least two bugs here:

1)  Our management of the "are we restoring?" state is really fragile.  Docshell sets it very narrowly around the things it does.  In particular, it is _not_ set around the prescontext calls comment 4 mentions.  We could set it there and do something about ignoring load blocking attempts while it's set, but it's still pretty fragile.

2)  We're doing unnecessary work under SetOverrideDPPX: the value hasn't changed, but we're doing MediaFeaturesChanged, etc.

Just fixing #2 will fix these specific steps to reproduce.

I'm also going to make the document viewer code a bit more robust in terms of detecting whether it should actually fire the "load" event by not relying on the "are we restoring?" state.  At that point that state will only be consumed by nsDocLoader::OnStopRequest and only for a performance optimization, effectively (or at least the only impact of getting it wrong will be a slight performance hit on restoration).

Arguably there is also performance bug #3: we're messing around with onload blockers when we know we've already fired onload.

Jon, thank you very much for the bug report and the excellent testcase.  Josh, thank you for hunting down the problem!
Assignee: nobody → bzbarsky
Flags: needinfo?(bzbarsky)
Unfortunately, GetRestoringDocument can be false by the time we reach
LoadComplete, if part of the restoration process managed to set up and then
remove onload blockers.  If that happens, we still don't want to fire a load
event for a document that has already has one fired.

Note that we could also use a boolean on the document to record whether we've
fired a load event, as long as we were careful to unset it when the readyState
transitions backwards from COMPLETE (e.g. document.open).  It's not clear which
approach is more robust.
Attachment #8886341 - Flags: review?(bugs)
Attachment #8886342 - Flags: review?(bugs)
I suspect that first change might be safe enough to backport to beta and hence get at least a bandaid fix out earlier.  We'll see what try says: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ee1ec85f91a0b76c8799dcf1ec277ab3dd486139
Comment on attachment 8886340 [details] [diff] [review]
part 1.  Don't call MediaFeaturesChanged if our override device pixel ratio is set to its current value

If it's still bad after the other patches to call MediaFeatureValuesChanged during load, should there be assertions against that?  Or do the other patches fully fix that problem?

I presume you've tested that the test fails without the patches?

Perhaps file_bug1379762.html should be added to the file-specific support-files list rather than being added for the entire directory?  (And maybe others that are specific to test_sessionhistory.html should be moved there?)
Attachment #8886340 - Flags: review?(dbaron) → review+
Comment on attachment 8886342 [details] [diff] [review]
part 3.  Don't mess about with load blockers if our document is already in the COMPLETE readyState

This is fine, but I wonder how document.open after load should be handled, or whether it is working fine.
Attachment #8886342 - Flags: review?(bugs) → review+
Comment on attachment 8886341 [details] [diff] [review]
part 2.  Use a more reliable test to figure out when we can skip firing onload in nsDocumentViewer::LoadComplete

I think I'd like to see some test involving document.write (no document.close) after load and back and forward and checking what kinds of events we get before reviewing this change.
Attachment #8886341 - Flags: review?(bugs)
> Or do the other patches fully fix that problem?

I believe they do, yes.

> I presume you've tested that the test fails without the patches?

Yes, and that each of the three patches individually fixes it.

> Perhaps file_bug1379762.html should be added to the file-specific support-files

Yeah, probably a good idea.

> This is fine, but I wonder how document.open after load should be handled

document.open does two things that are relevant here:

1)  It calls EnsureOnloadBlocker(), which will make sure that if our blocker count is nonzero then we have the load blocker request in the loadgroup.  This does not look at the readyState.

2)  It resets readyState back to LOADING.

I'll try to write some tests around this stuff.
Attachment #8886479 - Flags: review?(bugs)
Attachment #8886341 - Attachment is obsolete: true
Attachment #8886479 - Flags: review?(bugs) → review+
Pushed by bzbarsky@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/68dccd8c1468
part 1.  Don't call MediaFeaturesChanged if our override device pixel ratio is set to its current value.  r=dbaron
https://hg.mozilla.org/integration/mozilla-inbound/rev/aa65fd52fbcc
part 2.  Use a more reliable test to figure out when we can skip firing onload in nsDocumentViewer::LoadComplete.  r=smaug
https://hg.mozilla.org/integration/mozilla-inbound/rev/5fc778386eb1
part 3.  Don't mess about with load blockers if our document is already in the COMPLETE readyState.  r=smaug
Comment on attachment 8886340 [details] [diff] [review]
part 1.  Don't call MediaFeaturesChanged if our override device pixel ratio is set to its current value

Approval Request Comment
[Feature/Bug causing the regression]: Unknown, but this regressed in Firefox 51
   per comment 7.  I could try bisecting if this is important.
[User impact if declined]:  Some websites won't work right.  Wikimedia-based
   ones, for example, unless they put in a workaround for this.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Not yet.
[Needs manual test from QE? If yes, steps to reproduce]: See comment 7 and
   comment 0.
[List of other uplifts needed for the feature/fix]:  None.
[Is the change risky?]: Moderate.
[Why is the change risky/not risky?]: It seems like a no-brainer safe fix,
   except it's never clear whether something depends on the thing that should
   be a no-op actually doing something.  In this case, I'm fairly sure this is
   safe, and _very_ sure it's the safest of the three patches in this bug. 
   Luckily, it's enough on its own to fix the issue in the common case.  So
   only this one changeset should be uplifted if this is approved.
[String changes made/needed]: None.
Attachment #8886340 - Flags: approval-mozilla-beta?
Whiteboard: [platform-rel-Wikipedia] → If this is approved, only uplift https://hg.mozilla.org/integration/mozilla-inbound/rev/68dccd8c1468 [platform-rel-Wikipedia]
https://hg.mozilla.org/mozilla-central/rev/68dccd8c1468
https://hg.mozilla.org/mozilla-central/rev/aa65fd52fbcc
https://hg.mozilla.org/mozilla-central/rev/5fc778386eb1
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment on attachment 8886340 [details] [diff] [review]
part 1.  Don't call MediaFeaturesChanged if our override device pixel ratio is set to its current value

stop sending duplicated events, let's take this one patch in 55.0b11
Attachment #8886340 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/releases/mozilla-beta/rev/30e0585ef763
Whiteboard: If this is approved, only uplift https://hg.mozilla.org/integration/mozilla-inbound/rev/68dccd8c1468 [platform-rel-Wikipedia] → [platform-rel-Wikipedia]
Flags: qe-verify+
Depends on: 1382079
Part 3 got backed out to fix bug 1382079.  I filed bug 1383301 to reland it.  This bug is still fixed, because either of the other two patches fixed it on its own; part 3 was mostly a performance optimization.
Blocks: 1383301
More precisely, part 3 did.  Which is not impossible, since performance optimization was the whole point of part 3.
Reproduced on Firefox 54.0.1 under Win 10 64-bit and Mac OS X 10.12.
Verified as fixed using Firefox 55 beta 12 and latest Nightly 56.0a1 2017-07-26.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Blocks: 1340502
You need to log in before you can comment on or make changes to this bug.