Last Comment Bug 687295 - ThinVNC screen not updating with Firefox due to use of deflate-stream in websockets; works with Chrome and Opera
: ThinVNC screen not updating with Firefox due to use of deflate-stream in webs...
Status: VERIFIED FIXED
inbound
: dev-doc-complete, regression
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: Trunk
: x86 Windows XP
: -- major (vote)
: mozilla8
Assigned To: Patrick McManus [:mcmanus]
:
Mentors:
Depends on:
Blocks: 663096
  Show dependency treegraph
 
Reported: 2011-09-17 16:45 PDT by IU
Modified: 2011-11-07 14:15 PST (History)
12 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed


Attachments
test case: Corrupt image (88 bytes, image/png)
2011-09-19 06:41 PDT, IU
no flags Details
deflate-stream-off 1 (2.20 KB, patch)
2011-09-20 08:32 PDT, Patrick McManus [:mcmanus]
bzbarsky: review+
bugzilla: approval‑mozilla‑aurora+
bugzilla: approval‑mozilla‑beta-
Details | Diff | Review

Description IU 2011-09-17 16:45:16 PDT
Firefox's image handling makes breaks functionality for ThinVNC browser-based client to update the screen representation of the remote computer.  Instead the following appears in error console and the screen remains static:

Error: Image corrupt or truncated: http://192.168.1.20:8080/css/images/ui-bg_inset-hard_100_fcfdfd_1x100.png
Source File: http://192.168.1.20:8080/css/images/ui-bg_inset-hard_100_fcfdfd_1x100.png
Line: 0

Chrome, Opera, and I suspect also IE 9 (if I was running Windows 7) work without issue.

ThinVNC is a VNC implementation that does not require a separate client -- just an HTML5 capable browser -- to manage a remote computer.

STR:

1. Visit http://www.thinvnc.com/thinvnc/html5-vnc.html
2. Download and install ThinVNC 2.0 on another computer or a VM.
3. Verify that the ThinVNC service is started.  You will see an icon in the System Tray / Notification area.
4. Configure ThinVNC as necessary, noting what port and protocol you must connect with
5. Use the browser to connect to the ThinVNC server.
6. Notice that the initial screen displays, but nothing happens beyond that, with mouse movement.  Note the relevant error message(s) in Error Console.
7. If the ThinVNC server host is within view and has a display, glance at it and note that it is indeed responsive.
8. Compare with any of the other HTML5-capable browsers.  Note that they work without fuss.
Comment 2 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-09-19 05:34:00 PDT
Can you upload one of these corrupt images to bugzilla (or somewhere else) so that I can test?
Comment 3 IU 2011-09-19 06:41:54 PDT
Created attachment 560888 [details]
test case: Corrupt image

The image is corrupt; however, the other browsers appear to be able able to skip it somehow and continue processing the rest of the page, whereas Firefox simply stop displaying anymore updates.

I believe you really need to install ThinVNC, as indicated in comment 0, to be able to test this properly.
Comment 4 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-19 10:02:16 PDT
Bug 661045 seems pretty unlikely to cause this.

Bug 663194 on the other hand, seems like it could well have this effect....

Do you see the error message about corrupt or truncated images in the June 11 nightly too?  Or just in the June 12 and later ones?
Comment 5 IU 2011-09-19 10:20:16 PDT
(In reply to Boris Zbarsky (:bz) from comment #4)
> Do you see the error message about corrupt or truncated images in the June
> 11 nightly too?  Or just in the June 12 and later ones?

Error message does also appear in June 11 nightly.
Comment 6 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-19 10:37:08 PDT
Then yeah, I bet the issue is bug 663194.

Can you build yourself, or do you want me to set up a try build with that patch backed out?
Comment 7 IU 2011-09-19 10:58:33 PDT
Would likely be a lot easier if you created the build.  Thanks.
Comment 8 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-09-19 13:40:21 PDT
Worth keeping in mind that we have less than 48 hours to diagnose and fix this for 7.
Comment 9 christian 2011-09-19 14:29:27 PDT
1. How common is this?
2. Does it still exist if we back out 663194?
3. What's the downside of backing out bug 663194?
Comment 10 IU 2011-09-19 14:46:42 PDT
(In reply to Christian Legnitto [:LegNeato] from comment #9)
> 2. Does it still exist if we back out 663194?

If somebody creates a try build with bug 663194 backed out, I could confirm.
Comment 11 Jonas Sicking (:sicking) PTO Until July 5th 2011-09-19 16:20:36 PDT
I'd be fine with backing out yes. I'll leave to others what to do as far as raising this on WHATWG and seeing what other browsers do.
Comment 12 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-19 18:58:20 PDT
OK, there's a test build at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-a3e94cf95589/try-win32/ which is basically trunk with bug 663194 backed out.
Comment 13 IU 2011-09-19 19:33:42 PDT
(In reply to Boris Zbarsky (:bz) from comment #12)
> OK, there's a test build...

Tested and still does not work.  Definitely not bug 663194.  Thanks.
Comment 14 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-19 19:39:27 PDT
OK, I'm going to spin up several builds spread over the regression range in comment 1 to see whether we can narrow it down (several because the latency is so high, so a pure binary search would take too long).

I should have the links to the builds tomorrow morning....
Comment 16 IU 2011-09-20 01:20:37 PDT
Works with the first build, 55359cf599c8, does not work with any of the others.

Thanks
Comment 17 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-20 06:49:52 PDT
If you flip the "network.websocket.extensions.stream-deflate" preference to "false" in about:config (and possibly restart), does the problem go away?
Comment 18 IU 2011-09-20 07:11:00 PDT
Yes it does go away when that is set to "false" (no restart required).
Comment 19 Patrick McManus [:mcmanus] 2011-09-20 08:11:03 PDT
I would guess this means that Thin VNC has a broken implementation of the websockets stream-deflate extension that they have opted in to during the handshake.

We are lucky in this case that the path forward is easy: disable our support for that extension by default. We should do this because recently (in an ietf draft later than we support) that extension has been deprecated - so we know now that it likely has no future. Time spent on determining exactly where the error is won't pay off down the road. This is an extension, so turning it off shouldn't really impact anything.

There currently is no other compression extension to replace it with. I argued that was a mistake in the IETF, but cest la vie. We'll go compressionless for now rather than spreading this around the net when it has already been deprecated.
Comment 20 Patrick McManus [:mcmanus] 2011-09-20 08:32:55 PDT
Created attachment 561199 [details] [diff] [review]
deflate-stream-off 1

this patch just flips the pref and adjusts the test case which expected the pref to be on. It has been sent to try.
Comment 21 Jonas Sicking (:sicking) PTO Until July 5th 2011-09-20 09:01:17 PDT
Disabling this before we have an alternative on the implementation side does mean that some websites will have to create their own compression logic on top of WebSockets.

It does seem like a shame to disable the protocol just because of one buggy implementation, but I'll leave that decision to you.
Comment 22 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-20 09:15:47 PDT
Comment on attachment 561199 [details] [diff] [review]
deflate-stream-off 1

r=me, but it's interesting that this works in Chrome.  Does it not support deflate-stream?
Comment 23 Patrick McManus [:mcmanus] 2011-09-20 09:20:40 PDT
(In reply to Boris Zbarsky (:bz) from comment #22)

> r=me, but it's interesting that this works in Chrome.  Does it not support
> deflate-stream?

right. we were all alone on that front. and now it has been removed :(
Comment 24 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-09-20 11:11:58 PDT
Per conversation with johnath it looks like this fix is going to miss 7.
Comment 25 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-20 11:21:10 PDT
I think that's really bad.  The result will be us being the only ones to ship this compression method, and then we'll drop support in the next release, never reinstate it.... and no one else will ever ship it.  So it's pure loss, not only for us but for the web (which then has to deal with the fact that this code got shipped sometime, and probably has to do so in perpetuity).

I assume we have really good reasons to not take this pref change, to outweigh those negatives...
Comment 26 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-09-20 11:24:31 PDT
johnath told me to email release-drivers if I thought this needed to be taken.  I'll let Boris or Patrick do that.
Comment 27 Patrick McManus [:mcmanus] 2011-09-20 11:58:00 PDT
(In reply to Boris Zbarsky (:bz) from comment #25)
> I think that's really bad.  The result will be us being the only ones to
> ship this compression method, and then we'll drop support in the next
> release, never reinstate it.... and no one else will ever ship it.  So it's
> pure loss, not only for us but for the web (which then has to deal with the
> fact that this code got shipped sometime, and probably has to do so in
> perpetuity).

normally I would full throated-ly agree. but in this case the change is really going to be transparent to just about every situation. Anyone who wanted to treat it non-transparently was going to have to do feature detection with it anyhow by looking at the extensions attribute (because other clients didn't implement anything) and when we pull it back the feature detection will still work fine. So I don't see one release of it followed by disabling as something servers will need to support forever (or at all - the server always has to opt into the extension at handshake time.)

OTOH, the deflate-stream extension has no future. If we've identified even one site that doesn't operate with it then yanking it is still preferred imo,.

I'll check into inbound and set the various triage flags after the try run completes.
Comment 28 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-20 12:55:17 PDT
I mailed release-drivers.

Apparently final builds of Fx7 have already been created.  If we end up respinning for a stop-ship issue sometime in the next week, we will take this fix, but we're not treating this bug as a stop-ship.

I mailed the ThinVNC folks about the problem, on the off chance that they can fix whatever is broken on their end...
Comment 29 Jonas Sicking (:sicking) PTO Until July 5th 2011-09-20 13:55:04 PDT
I have to say that I'm not quite as optimistic about the transparent-ness as you are. This bug is an excellent example of the extension not being transparent, right?
Comment 30 Patrick McManus [:mcmanus] 2011-09-20 14:07:19 PDT
Comment on attachment 561199 [details] [diff] [review]
deflate-stream-off 1

beta-7 approval request acking that it won't trigger a respin
Comment 31 Mozilla RelEng Bot 2011-09-20 14:11:27 PDT
Try run for 005bb98b23bb is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=005bb98b23bb
Results (out of 63 total builds):
    success: 56
    warnings: 7
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-005bb98b23bb
Comment 32 Patrick McManus [:mcmanus] 2011-09-20 14:26:05 PDT
(In reply to Jonas Sicking (:sicking) from comment #29)
> I have to say that I'm not quite as optimistic about the transparent-ness as
> you are. This bug is an excellent example of the extension not being
> transparent, right?

I see an implementation bug, sure.

By transparent I mean that shipping the extension and then later removing it or changing to a different one is not going to make applications running on compliant clients and servers break.
Comment 33 Patrick McManus [:mcmanus] 2011-09-20 14:27:02 PDT
(In reply to Mozilla RelEng Bot from comment #31)
> Try run for 005bb98b23bb is complete.

when you account for random-orange and the perma breakage of android on try, this is actually green.
Comment 34 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-20 14:28:34 PDT
I think the assumption that clients and servers on the web are compliant is more or less unwarranted....
Comment 35 Jonas Sicking (:sicking) PTO Until July 5th 2011-09-20 14:52:00 PDT
Indeed, the distinction between compliant and non-compliant servers and clients isn't really meaningful. What matters are the servers and clients that are popular enough on the web.

Again, the bug is a good case in point, the server here is clearly not compliant, but we still decided to spend engineering hours to dealing with it due to its popularity.
Comment 36 Patrick McManus [:mcmanus] 2011-09-20 15:00:49 PDT
(In reply to Jonas Sicking (:sicking) from comment #35)

> Again, the bug is a good case in point

please summarize the case in point. would it be fair to say you mean "all negotiable protocol extensions are inherently bad"?

It doesn't seem any different than feature-detection driven html 5 to me.
Comment 37 Jonas Sicking (:sicking) PTO Until July 5th 2011-09-20 15:21:48 PDT
The main point is that the distinction between compliant and non-compliant implementations doesn't make much sense. As far as I know we've never taken that into account when triaging bugs. What we care about is their popularity. This has been the case in all areas of Firefox for as long as it has existed.

This is somewhat simplified. Of course we always push non-compliant implementations into fixing their stuff. But many many times we've simply had to deal anyway.

The other is "Any disparities between UAs are bad". (Restricted to disparities exposed to webpages/servers).

Sure, some disparities will always happen as we move forward with the web platform as not all browsers implement features at the same time. That is bad, but we really don't have an option short of halting development or eliminating all competition.

Feature detection is far from a silver bullet. There are lots of websites out there that forget to do it or do it wrong. For example apple.com used to render almost blank when we implemented CSS transitions since they did the feature detection wrong and detected our support but then ended up using either a webkit-prefixed CSS property or a wrongly prefixed moz property, i forget which.

Short term we'll absolutely need to have disparities between UAs as we develop new features, there simply is no alternative. However long term we should always AIM for having as much alignment between UAs as possible.
Comment 38 Marco Bonardo [::mak] 2011-09-21 02:59:03 PDT
https://hg.mozilla.org/mozilla-central/rev/6bcc3a001bb3
Comment 39 Boris Zbarsky [:bz] (Out June 25-July 6) 2011-09-21 11:23:26 PDT
The ThinVNC folks have confirmed that they can reproduce the problem and that disabling deflate-stream on their end also fixes things.  They're looking into it from their end as well.
Comment 40 Patrick McManus [:mcmanus] 2011-09-23 10:58:26 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/c84ac4a98e07
Comment 41 Eric Shepherd [:sheppy] 2011-09-30 12:46:10 PDT
Documentation updated:

https://developer.mozilla.org/en/WebSockets#Gecko_notes

Mentioned on Firefox 8 for developers.
Comment 42 IU 2011-10-03 09:43:19 PDT
Verified fixed: http://hg.mozilla.org/mozilla-central/rev/90575e23ea93

By the way, the latest build of ThinVNC 2.0.0.16 has fixed the websockets issue.  I'm not sure if that means they've fixed their implementation or simply removed support for deflate-stream, as I don't know how to test for it; however, enabling deflate-stream in Firefox no longer breaks functionality.

Note You need to log in before you can comment on or make changes to this bug.