Last Comment Bug 791011 - When hitting the back or forward button socket connections are being dropped
: When hitting the back or forward button socket connections are being dropped
Status: RESOLVED FIXED
[qa-]
: regression
Product: Core
Classification: Components
Component: Document Navigation (show other bugs)
: 16 Branch
: x86_64 Windows 7
: -- normal (vote)
: mozilla18
Assigned To: Justin Lebar (not reading bugmail)
:
Mentors:
Depends on:
Blocks: CVE-2012-3992
  Show dependency treegraph
 
Reported: 2012-09-13 09:54 PDT by Zach Aller
Modified: 2014-02-12 15:19 PST (History)
18 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
verified
+
verified
+
verified
16+
fixed


Attachments
Patch, v1 (1.79 KB, patch)
2012-09-20 11:48 PDT, Justin Lebar (not reading bugmail)
bzbarsky: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
akeybl: approval‑mozilla‑esr10+
Details | Diff | Splinter Review

Description Zach Aller 2012-09-13 09:54:02 PDT
We have a single page web app using Director (https://github.com/flatiron/director#client-side) for client side routing along with NodeJS and NowJS (https://github.com/Flotype/now) which uses socket.io (https://github.com/LearnBoost/socket.io) and when we hit the back or forward buttons its dropping the socket connection which in turn makes our NowJS RPC calls fail this works as expected in FF15 but is not working in FF16
Comment 1 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-09-13 10:08:00 PDT
Olli, Patrick, can one of you look at this?
Comment 2 Patrick McManus [:mcmanus] 2012-09-13 10:33:47 PDT
zach do you have a url where I can see the decsribed behavior?
Comment 3 Zach Aller 2012-09-14 08:38:26 PDT
Ok was able to get this setup for you its related to the forum only so just click around in there a little then try using the back button

username: moz
password: moz

also this is using some software running on my dev box so it might not be up/stable all the time.

http://dev.asn.und.edu/learn.aero.und.edu/users.asp?Tools=7&url=/learn.aero.und.edu/forum.asp?ForumID=61
Comment 4 Patrick McManus [:mcmanus] 2012-09-14 12:25:32 PDT
websockets!

jason is probably best prepared to comment on the changes in FF16

Timestamp: 09/14/2012 03:16:45 PM
Error: The connection to wss://node1.aero.und.edu/socket.io/1/websocket/rFcQwRmPQR_2Wvt2nhvP was interrupted while the page was loading.
Source File: https://node1.aero.und.edu/socket.io/socket.io.js
Line: 2371

there is no similar error on ff14 (and probly 15, but i didn't have it handy to check.. and indeed the log above is from ff18 which I also had handy, but the reporter cites 16 as the bisect point.)
Comment 5 Patrick McManus [:mcmanus] 2012-09-14 12:26:38 PDT
worth pointing out that socket.io is a very common way of deploying websockets, so if we've got a bug late in beta this should be a highish priority to look at.
Comment 6 Lukas Blakk [:lsblakk] use ?needinfo 2012-09-14 15:58:43 PDT
Thanks for bringing this to our attention Patrick.  Jason do you have enough to go on here or is there any other help QA could provide to give you more visibility into the issue?
Comment 7 Paul Silaghi, QA [:pauly] 2012-09-18 05:48:25 PDT
Setting this to NEW based on comment 4.
Comment 8 Josh Aas 2012-09-19 06:46:15 PDT
Smaug or Andrea - can you look into this?

We should really resolve this before Firefox 16 ships and we think it might be fallout from new bindings, or a DOM issue in general.
Comment 9 Jason Duell [:jduell] (needinfo me) 2012-09-19 06:48:02 PDT
Yes--given that the DOM bindings for websockets have completely changed recently, and we haven't changed a thing AFAIK in the /netwerk code, I think the DOM folks are the first logical people to look into this one.
Comment 10 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-09-19 06:48:33 PDT
Why do you think it's related to new bindings?  Bug 775368 is only on mozilla-central.
Comment 11 Jason Duell [:jduell] (needinfo me) 2012-09-19 07:58:34 PDT
Zach:

I see this error already when I *first* connect to the URI in comment 3:

  The connection to wss://node1.aero.und.edu/socket.io/1/websocket/MOix9j6U-H2Ze30Vnhwq was interrupted while the page was loading. @ https://node1.aero.und.edu/socket.io/socket.io.js:2371

I should be seeing that only once I hit back/forward, right?

I'm also a bit unclear about the exact bug here.  Hitting back/forward is *supposed* to cancel existing websockets on a page, IIUC.  So is the bug that the websockets on the page being navigated *to* are not working?
Comment 12 Randell Jesup [:jesup] 2012-09-19 07:59:18 PDT
Were there other DOM changes from 15-16?  Perhaps changes to the events or ordering - websockets drops connections on navigation on purpose, and has a bunch of code to watch for 'ghost' connections that are opened while navigating and close them.  (jduell and smaug were involved in that)
Comment 13 Zach Aller 2012-09-19 08:19:14 PDT
Jason:

hmm... yea the initial load error didn't not see before had to turn on persist errors to catch that one so not really sure what that is about, however the sockets do work fine until back and forward are pressed on the "FORUM" pages only sockets are not used else where. You mentioned that its suppose to drop connection on navigation and I would agree with that generally however the form navigation never performs a for lack of better word refresh its just hash tag changes using client side routing that is why I don't think it should drop the socket connection.
Comment 14 Olli Pettay [:smaug] 2012-09-19 09:27:48 PDT
We need regression range here.
Comment 15 Olli Pettay [:smaug] 2012-09-19 09:31:21 PDT
Zach, do you think you could try to find the last Nightly build which works and the first one
which doesn't and report the changesets. 
about:buildconfig tells the changeset.

http://harthur.github.com/mozregression/ can be useful tool, but finding the regression range shouldn't
be hard even without it (may just take some time)
Comment 16 Zach Aller 2012-09-19 10:13:49 PDT
Last good nightly: 2012-08-25
First bad nightly: 2012-08-26

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f077de66e52d&tochange=b3cce81fef1a

if i did everything correct
Comment 17 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-09-19 10:18:04 PDT
Aha!  I bet this is a regression from Bug 775009.
Comment 18 Justin Lebar (not reading bugmail) 2012-09-19 10:26:37 PDT
Oh, I bet that's right.  Here's the cset, for those who can't see the bug:

  https://hg.mozilla.org/mozilla-central/rev/b431f498a9ba

Sigh.

bz, should we be passing STOP_CONTENT, or is that the wrong thing?
Comment 19 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-09-19 10:27:26 PDT
itym https://hg.mozilla.org/mozilla-central/rev/7ef5b8b2c2c7
Comment 20 Boris Zbarsky [:bz] 2012-09-19 10:37:08 PDT
STOP_CONTENT won't stop network loads.

What we should really do is one of two things, perhaps.  Either only call Stop() if we already have a full-page navigation in flight, or perhaps even better just cancel the full-page navigation in question and nothing else...
Comment 21 Jason Duell [:jduell] (needinfo me) 2012-09-19 10:52:05 PDT
So this sounds like some change in the navigation that didn't use to send Cancel() to websockets for navigation between anchors on the same page, but now does.  Moving component...

BTW The "ghost connection" fix landed in FF 13 (see bug 696085), so while it's possible that it regressed something it doesn't seem likely.
Comment 22 Jason Duell [:jduell] (needinfo me) 2012-09-19 10:53:42 PDT
hot potato!
Comment 23 Justin Lebar (not reading bugmail) 2012-09-20 11:48:29 PDT
Created attachment 663092 [details] [diff] [review]
Patch, v1

Like so?
Comment 24 Boris Zbarsky [:bz] 2012-09-20 12:14:30 PDT
Comment on attachment 663092 [details] [diff] [review]
Patch, v1

Oh, nice.  This is even simpler than I thought!

r=me
Comment 25 Justin Lebar (not reading bugmail) 2012-09-20 12:22:16 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/882794d0063c

I'm pretty sure this will fix your problem, Zach, but if you could use a nightly build and verify that the fix, that would be very much appreciated!

This change will make it in to tomorrow's or the day after's nightly build, depending on when it gets merged into mozilla-central.
Comment 26 Justin Lebar (not reading bugmail) 2012-09-20 12:26:32 PDT
Comment on attachment 663092 [details] [diff] [review]
Patch, v1

[Approval Request Comment]
Regression caused by (bug #): bug 775009
User impact if declined: This bug.
Testing completed (on m-c, etc.): Pushed to m-c.
Risk to taking this patch (and alternatives if risky): This could regress the security fix bug 775009 in some way.  But we should either take this plus bug 775009 or neither patch, because this is a serious regression.
Comment 27 Ryan VanderMeulen [:RyanVM] 2012-09-20 18:32:27 PDT
https://hg.mozilla.org/mozilla-central/rev/882794d0063c
Comment 28 Zach Aller 2012-09-21 07:50:53 PDT
Yup seems to be working fine again.
Comment 29 Alex Keybl [:akeybl] 2012-09-21 16:06:22 PDT
Comment on attachment 663092 [details] [diff] [review]
Patch, v1

[Triage Comment]
Given the positive verification, approving for all branches in support of keeping bug 775009 in the build.
Comment 31 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-10-16 16:33:31 PDT
Zach, can you check to see if this is now fixed for you? It should be working in Firefox 16.0.1, 17.0b1, 18.0a2, and 10.0.9esr. Thanks.
Comment 32 Justin Lebar (not reading bugmail) 2012-10-16 17:10:49 PDT
> Zach, can you check to see if this is now fixed for you?

Comment 28?  But I'm not going to ask Zach to verify in four different builds, unless he wants to.
Comment 33 Zach Aller 2012-10-17 09:56:49 PDT
Was able to test 16.0.1, 17 (what ever version is on the beta channel), and 18.0a2 (aurora), it worked on all of those, didn't have a 10.0.9esr to test on however.
Comment 34 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-10-17 11:47:29 PDT
Thanks a lot for your help Zach. You'll find the ESR build here:
ftp://ftp.mozilla.org/pub/firefox/releases/10.0.9esr/win32/en-US/Firefox%20Setup%2010.0.9esr.exe

Thanks

Note You need to log in before you can comment on or make changes to this bug.