Last Comment Bug 811102 - frameworker reload failure
: frameworker reload failure
Status: RESOLVED FIXED
[qa-]
:
Product: Firefox
Classification: Client Software
Component: SocialAPI (show other bugs)
: 17 Branch
: All All
: P1 blocker (vote)
: Firefox 19
Assigned To: :Gavin Sharp [email: gavin@gavinsharp.com]
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-12 14:24 PST by Shane Caraveo (:mixedpuppy)
Modified: 2012-11-13 15:37 PST (History)
6 users (show)
gavin.sharp: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed
+
fixed


Attachments
reload fix (2.48 KB, patch)
2012-11-12 15:03 PST, Shane Caraveo (:mixedpuppy)
no flags Details | Diff | Splinter Review
reload fix (3.21 KB, patch)
2012-11-12 15:19 PST, Shane Caraveo (:mixedpuppy)
no flags Details | Diff | Splinter Review
patch per IRC discussion (2.03 KB, patch)
2012-11-12 15:55 PST, :Gavin Sharp [email: gavin@gavinsharp.com]
mixedpuppy: review+
Details | Diff | Splinter Review
updated patch with test (3.38 KB, patch)
2012-11-12 17:25 PST, :Gavin Sharp [email: gavin@gavinsharp.com]
markh: review+
Details | Diff | Splinter Review
updated patch with additional test (5.22 KB, patch)
2012-11-12 17:52 PST, :Gavin Sharp [email: gavin@gavinsharp.com]
gavin.sharp: approval‑mozilla‑aurora+
gavin.sharp: approval‑mozilla‑beta+
Details | Diff | Splinter Review
followup leak fix #2 (3.54 KB, patch)
2012-11-12 22:20 PST, :Gavin Sharp [email: gavin@gavinsharp.com]
markh: review+
Details | Diff | Splinter Review

Description Shane Caraveo (:mixedpuppy) 2012-11-12 14:24:42 PST
Facebook reported that the worker reload seemed to be happening twice, and that the second load caused the ports to fail.

A quick look verifies that the first reload does happen twice.  It then seems the second reload happens once, and after that reload no longer works correctly.  I have not looked into the ports having an issue yet.
Comment 1 Shane Caraveo (:mixedpuppy) 2012-11-12 14:52:56 PST
The unload handler is removing the worker from the global list of workers, then closing all ports.  This caused two issues:

- The next call to getFrameWorkerHandle created a new frameworker instance, we would end up with multiple frameworkers per provider
- the closed ports cannot be reused (portid is set to null, never reset, then you cannot postmessage across the port)

The reason for the change over number of reloads seems to be that the worker can no longer send a reload message over the ports.
Comment 2 Shane Caraveo (:mixedpuppy) 2012-11-12 15:03:23 PST
Created attachment 680822 [details] [diff] [review]
reload fix

The first part of the patch is code removal.  

Unfortunately the diff doesn't show enough context, so you need to look at the unload handler to see that we handle what I am removing later in the handler.

The second part of the patch causes the unload handler to run during termination.  If we do not close ports here, we run into the cause of our "re activation" bug because the worker was not getting the port-close message on termination.  We could live without the second part for 17 of the patch if we must.

I think that the code I removed may have been in the terminate function prior to adding the unload handler, but I didn't dig into history to verify that.
Comment 3 Shane Caraveo (:mixedpuppy) 2012-11-12 15:19:54 PST
Created attachment 680826 [details] [diff] [review]
reload fix

better fix after better understanding of what is happening there.
Comment 4 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-11-12 15:55:33 PST
Created attachment 680845 [details] [diff] [review]
patch per IRC discussion

This doesn't fix the "close ports on terminate()" issue, but we can followup with that.
Comment 5 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-11-12 15:58:10 PST
Comment on attachment 680826 [details] [diff] [review]
reload fix

I also omitted the "ports = []" change in my patch, since I think that line should just be removed (reload() already clears .ports). We need more tests here.
Comment 6 Shane Caraveo (:mixedpuppy) 2012-11-12 16:10:38 PST
bug 811134 created for the terminate fix
Comment 7 Shane Caraveo (:mixedpuppy) 2012-11-12 16:22:45 PST
https://tbpl.mozilla.org/?tree=Try&rev=8fcb227ed3ee
Comment 8 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-11-12 17:25:49 PST
Created attachment 680889 [details] [diff] [review]
updated patch with test
Comment 9 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-11-12 17:52:14 PST
Created attachment 680898 [details] [diff] [review]
updated patch with additional test

This other test is from Mark, for the ".loaded" change.
Comment 10 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-11-12 17:52:39 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/85b3e7667edc
Comment 11 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-11-12 18:29:00 PST
Comment on attachment 680898 [details] [diff] [review]
updated patch with additional test

[Triage Comment]
Try looks good, and Shane got confirmation from Facebook engineers that this addresses their problem, so I'm going to land this on beta.
Comment 14 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-11-12 22:20:56 PST
Created attachment 680950 [details] [diff] [review]
followup leak fix #2

First followup leak fix wasn't enough. Removing the deletion of the entry in the workerCache from the unload handler exposed a previously-existing problem: we were leaking the workers from the tests for bug 804910 until shutdown via the workerCache. This didn't come up until now because that line in the unload handler would clear the reference during shutdown, when the hidden window was unloaded.
Comment 17 :Gavin Sharp [email: gavin@gavinsharp.com] 2012-11-12 23:33:32 PST
(thankfully all of the followup landings are test-only issues - no need to worry about them for b6)
Comment 19 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-11-13 11:40:41 PST
Is there a way to test this? Any exploratory testing QA can do in Beta 6 to either verify this is fixed or check for potential regressions?
Comment 20 Mark Hammond [:markh] 2012-11-13 13:50:25 PST
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #19)
> Is there a way to test this? Any exploratory testing QA can do in Beta 6 to
> either verify this is fixed or check for potential regressions?

Not really - the "reload" feature is not exposed via the UI (ie, there is nothing a user can do to invoke a reload).  The tests should now reliably cover this though.
Comment 21 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-11-13 15:37:43 PST
Okay, thanks Mark. Flagging this qa- for verification.

Note You need to log in before you can comment on or make changes to this bug.