frameworker reload failure

RESOLVED FIXED in Firefox 17

Status

()

Firefox
SocialAPI
P1
blocker
RESOLVED FIXED
5 years ago
5 years ago

People

(Reporter: mixedpuppy, Assigned: Gavin)

Tracking

17 Branch
Firefox 19
Points:
---
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox17+ fixed, firefox18+ fixed, firefox19+ fixed)

Details

(Whiteboard: [qa-])

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

5 years ago
Facebook reported that the worker reload seemed to be happening twice, and that the second load caused the ports to fail.

A quick look verifies that the first reload does happen twice.  It then seems the second reload happens once, and after that reload no longer works correctly.  I have not looked into the ports having an issue yet.
(Reporter)

Comment 1

5 years ago
The unload handler is removing the worker from the global list of workers, then closing all ports.  This caused two issues:

- The next call to getFrameWorkerHandle created a new frameworker instance, we would end up with multiple frameworkers per provider
- the closed ports cannot be reused (portid is set to null, never reset, then you cannot postmessage across the port)

The reason for the change over number of reloads seems to be that the worker can no longer send a reload message over the ports.
status-firefox17: --- → affected
status-firefox18: --- → affected
status-firefox19: --- → affected
tracking-firefox17: --- → +
tracking-firefox18: --- → +
tracking-firefox19: --- → +
(Reporter)

Comment 2

5 years ago
Created attachment 680822 [details] [diff] [review]
reload fix

The first part of the patch is code removal.  

Unfortunately the diff doesn't show enough context, so you need to look at the unload handler to see that we handle what I am removing later in the handler.

The second part of the patch causes the unload handler to run during termination.  If we do not close ports here, we run into the cause of our "re activation" bug because the worker was not getting the port-close message on termination.  We could live without the second part for 17 of the patch if we must.

I think that the code I removed may have been in the terminate function prior to adding the unload handler, but I didn't dig into history to verify that.
Attachment #680822 - Flags: feedback?(mhammond)
Attachment #680822 - Flags: feedback?(gavin.sharp)
Attachment #680822 - Flags: feedback?(felipc)
(Reporter)

Comment 3

5 years ago
Created attachment 680826 [details] [diff] [review]
reload fix

better fix after better understanding of what is happening there.
Attachment #680826 - Flags: feedback?(mhammond)
Attachment #680826 - Flags: feedback?(gavin.sharp)
Attachment #680826 - Flags: feedback?(felipc)
(Reporter)

Updated

5 years ago
Attachment #680822 - Attachment is obsolete: true
Attachment #680822 - Flags: feedback?(mhammond)
Attachment #680822 - Flags: feedback?(gavin.sharp)
Attachment #680822 - Flags: feedback?(felipc)
Created attachment 680845 [details] [diff] [review]
patch per IRC discussion

This doesn't fix the "close ports on terminate()" issue, but we can followup with that.
Attachment #680845 - Flags: review?(mixedpuppy)
Attachment #680845 - Flags: review?(mhammond)
Attachment #680826 - Attachment is obsolete: true
Attachment #680826 - Flags: feedback?(mhammond)
Attachment #680826 - Flags: feedback?(gavin.sharp)
Attachment #680826 - Flags: feedback?(felipc)
Comment on attachment 680826 [details] [diff] [review]
reload fix

I also omitted the "ports = []" change in my patch, since I think that line should just be removed (reload() already clears .ports). We need more tests here.
(Reporter)

Comment 6

5 years ago
bug 811134 created for the terminate fix
(Reporter)

Updated

5 years ago
Attachment #680845 - Flags: review?(mixedpuppy) → review+
(Reporter)

Comment 7

5 years ago
https://tbpl.mozilla.org/?tree=Try&rev=8fcb227ed3ee
Created attachment 680889 [details] [diff] [review]
updated patch with test
Attachment #680889 - Flags: review?(mhammond)
Attachment #680845 - Attachment is obsolete: true
Attachment #680845 - Flags: review?(mhammond)

Updated

5 years ago
Attachment #680889 - Flags: review?(mhammond) → review+
Created attachment 680898 [details] [diff] [review]
updated patch with additional test

This other test is from Mark, for the ".loaded" change.
Attachment #680889 - Attachment is obsolete: true
https://hg.mozilla.org/integration/mozilla-inbound/rev/85b3e7667edc
Flags: in-testsuite+
OS: Mac OS X → All
Hardware: x86 → All
Target Milestone: --- → Firefox 19
Assignee: nobody → mixedpuppy
Comment on attachment 680898 [details] [diff] [review]
updated patch with additional test

[Triage Comment]
Try looks good, and Shane got confirmation from Facebook engineers that this addresses their problem, so I'm going to land this on beta.
Attachment #680898 - Flags: approval-mozilla-beta+
Attachment #680898 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-beta/rev/731c85bd8e9e
https://hg.mozilla.org/releases/mozilla-aurora/rev/ea619b3b2d3c
status-firefox17: affected → fixed
status-firefox18: affected → fixed
followup test fix:
https://hg.mozilla.org/releases/mozilla-beta/rev/a5ec489a0ab9
https://hg.mozilla.org/releases/mozilla-aurora/rev/3b91e7e2c30a
https://hg.mozilla.org/integration/mozilla-inbound/rev/2e23d71714b4
https://hg.mozilla.org/integration/mozilla-inbound/rev/cab9c764a898
Created attachment 680950 [details] [diff] [review]
followup leak fix #2

First followup leak fix wasn't enough. Removing the deletion of the entry in the workerCache from the unload handler exposed a previously-existing problem: we were leaking the workers from the tests for bug 804910 until shutdown via the workerCache. This didn't come up until now because that line in the unload handler would clear the reference during shutdown, when the hidden window was unloaded.
Assignee: mixedpuppy → gavin.sharp
Status: NEW → ASSIGNED
Attachment #680950 - Flags: review?(mhammond)

Updated

5 years ago
Attachment #680950 - Flags: review?(mhammond) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/a06a34a84955
https://hg.mozilla.org/releases/mozilla-aurora/rev/56e3981e015f
https://hg.mozilla.org/releases/mozilla-beta/rev/be03b1063c9a
Not my night tonight:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a1e3296bcd21
https://hg.mozilla.org/releases/mozilla-aurora/rev/0fd81b763e50
https://hg.mozilla.org/releases/mozilla-beta/rev/9f4f7fb0216d
(thankfully all of the followup landings are test-only issues - no need to worry about them for b6)
https://hg.mozilla.org/mozilla-central/rev/85b3e7667edc
https://hg.mozilla.org/mozilla-central/rev/2e23d71714b4
https://hg.mozilla.org/mozilla-central/rev/cab9c764a898
https://hg.mozilla.org/mozilla-central/rev/a06a34a84955
https://hg.mozilla.org/mozilla-central/rev/a1e3296bcd21
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
status-firefox19: affected → fixed
Resolution: --- → FIXED
Is there a way to test this? Any exploratory testing QA can do in Beta 6 to either verify this is fixed or check for potential regressions?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #19)
> Is there a way to test this? Any exploratory testing QA can do in Beta 6 to
> either verify this is fixed or check for potential regressions?

Not really - the "reload" feature is not exposed via the UI (ie, there is nothing a user can do to invoke a reload).  The tests should now reliably cover this though.
Okay, thanks Mark. Flagging this qa- for verification.
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.