Closed
Bug 716841
Opened 13 years ago
Closed 13 years ago
EventSource::GetInterface goes into infinite loop if EventSource hits an HTTP redirect
Categories
(Core :: Networking: WebSockets, defect)
Tracking
()
RESOLVED
FIXED
mozilla13
Tracking | Status | |
---|---|---|
firefox11 | + | fixed |
firefox12 | + | fixed |
firefox-esr10 | --- | unaffected |
People
(Reporter: scoobidiver, Assigned: jduell.mcbugs)
References
Details
(Keywords: crash, regression, Whiteboard: [qa-])
Crash Data
Attachments
(1 file)
5.87 KB,
patch
|
bzbarsky
:
review+
lsblakk
:
approval-mozilla-aurora+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
It's #58 top browser crasher in 11.0a2 and #107 in 12.0a1.
It first appeared in 11.0a1/20111209. The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6785d3003414&tochange=9e7239c0f557
Signature nsEventSource::GetInterface(nsID const&, void**) More Reports Search
UUID 9ee107dd-b569-4147-8fc3-a68c42120108
Date Processed 2012-01-08 17:11:51.776646
Uptime 23100
Last Crash 17.5 hours before submission
Install Age 6.4 hours since version was first installed.
Install Time 2012-01-08 19:47:45
Product Firefox
Version 12.0a1
Build ID 20120108031024
Release Channel nightly
OS Windows NT
OS Version 5.1.2600 Service Pack 3
Build Architecture x86
Build Architecture Info GenuineIntel family 6 model 15 stepping 11
Crash Reason EXCEPTION_STACK_OVERFLOW
Crash Address 0x16e539b
App Notes
AdapterVendorID: 0x10de, AdapterDeviceID: 0x0606, AdapterSubsysID: 23351682, AdapterDriverVersion: 6.14.12.8558
D3D10 Layers? D3D10 Layers-
D3D9 Layers? D3D9 Layers+
Processor Notes This dump is too long and has triggered the automatic truncation routine
EMCheckCompatibility False
Crashing Thread
Frame Module Signature [Expand] Source
0 xul.dll nsEventSource::GetInterface content/base/src/nsEventSource.cpp:767
1 xul.dll nsCORSListenerProxy::GetInterface content/base/src/nsCrossSiteListenerProxy.cpp:651
2 xul.dll nsEventSource::GetInterface content/base/src/nsEventSource.cpp:782
3 xul.dll nsCORSListenerProxy::GetInterface content/base/src/nsCrossSiteListenerProxy.cpp:651
4 xul.dll nsEventSource::GetInterface content/base/src/nsEventSource.cpp:782
5 xul.dll nsCORSListenerProxy::GetInterface content/base/src/nsCrossSiteListenerProxy.cpp:651
...
More reports at:
https://crash-stats.mozilla.com/report/list?signature=nsEventSource%3A%3AGetInterface%28nsID%20const%26%2C%20void**%29
Reporter | ||
Updated•13 years ago
|
Summary: Crash @ nsEventSource::GetInterface(nsID const&, void**) → Crash @ nsEventSource::GetInterface
Comment 1•13 years ago
|
||
In the regression window, bug 692067 seems the likely culprit.
Blocks: 692067
tracking-firefox11:
--- → ?
tracking-firefox12:
--- → ?
Summary: Crash @ nsEventSource::GetInterface → Re-entrant calls to nsCORSListenerProxy::GetInterface leads to stack overflow and Crash @ nsEventSource::GetInterface
Updated•13 years ago
|
Component: General → Networking: WebSockets
QA Contact: general → networking.websockets
Comment 2•13 years ago
|
||
The crash also shows up as [@ nsCORSListenerProxy::GetInterface(nsID const&, void**) ]
so it should rank slightly higher on the top crash list.
https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=nsCORSListenerProxy%3A%3AGetInterface&reason_type=contains&date=01%2F14%2F2012%2016%3A52%3A53&range_value=4&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=nsCORSListenerProxy%3A%3AGetInterface%28nsID%20const%26%2C%20void**%29
Crash Signature: [@ nsEventSource::GetInterface(nsID const&, void**)] → [@ nsEventSource::GetInterface(nsID const&, void**)]
[@ nsCORSListenerProxy::GetInterface(nsID const&, void**) ]
Comment 3•13 years ago
|
||
ccing kairo, because the correlations stuff in soccoro seems to be broken and it would be really handy right now...
I'm not sure why the content policy bit would be relevant here, but the websocket code's notification callbacks handling seems completely busted in the face of redirects. Maybe the only reason content policy matters is because it does a getInterface for an IID that the CORS listener proxy and nsEventSource don't provide? Or maybe it triggers a redirect where we didn't have one before? Looking at a few crashes manually (see broken correlations bit), it looks like there's no particular extension the users have; in fact some have no extensions at all.
In any case, the reason why I think redirect handling is busted is this. nsEventSource::OnRedirectVerifyCallback calls SetupHttpChannel(), which does this:
871 nsCOMPtr<nsIInterfaceRequestor> notificationCallbacks;
872 mHttpChannel->GetNotificationCallbacks(getter_AddRefs(notificationCallbacks));
873 if (notificationCallbacks != this) {
874 mNotificationCallbacks = notificationCallbacks;
875 mHttpChannel->SetNotificationCallbacks(this);
876 }
But mHttpChannel at that point is the _post_redirect_ channel. And HttpBaseChannel::SetupReplacementChannel does
1571 newChannel->SetNotificationCallbacks(mCallbacks);
before any redirect observers are notified. So in the code above |notificationCallbacks| are whatever the callbacks were on the pre-redirect channel.
And what are those? Well, websocket code sets up the channel in nsEventSource::InitChannelAndRequestEventSource as follows:
917 rv = SetupHttpChannel();
918 NS_ENSURE_SUCCESS(rv, rv);
919
920 nsCOMPtr<nsIStreamListener> listener =
921 new nsCORSListenerProxy(this, mPrincipal, mHttpChannel,
922 mWithCredentials, &rv);
923 NS_ENSURE_SUCCESS(rv, rv);
After line 917, the notificationCallbacks on the channel are the nsEventSource. After the nsCORSListenerProxy ctor runs, the notificationCallbacks are the nsCORSListenerProxy, chaining back to the nsEventSource via its mOuterNotificationCallbacks member.
So when we hit the redirect, the notificationCallbacks are the nsCORSListenerProxy, which is != this, so we set |this| as the notificationCallbacks again, chaining back to the nsCORSListenerProxy. And now we have a loop in the notificationCallbacks chain.
We should try moving the code that sets the notification callbacks out of SetupHTTPChannel and into nsEventSource::InitChannelAndRequestEventSource and see if that fixes things. But it'd also be nice to understand why/how this redirect stuff can get triggered and why content policy affects it.
Comment 4•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #3)
> ccing kairo, because the correlations stuff in soccoro seems to be broken
> and it would be really handy right now...
I don't know about it being broken, but it doesn't run for all versions due to its current hacky implementation (improvements are planned), and it's acting in fashions that are not always trivial to understand without being told about them. What didn't look right for you? What version(s) did you try them for?
Comment 5•13 years ago
|
||
Hmm. Maybe I just misunderstood the UI? When I look _under_ the "add-ons" bit it says "Loaded Correlation Data, but none available for this signature nsEventSource::GetInterface(nsID const&, void**)". Under Add-ons by versions, clicking "Load" just complains about undefined indices. See https://crash-stats.mozilla.com/report/index/c13e9360-4325-454b-ba40-be22c2120113 for an example.
Reporter | ||
Comment 6•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #5)
> Hmm. Maybe I just misunderstood the UI? When I look _under_ the "add-ons"
> bit it says "Loaded Correlation Data, but none available for this signature
> nsEventSource::GetInterface(nsID const&, void**)".
Correlations for a version are only available when there is enough data, i.e. for 9.0.1 and 10.0b4 (about two weeks before the release). There are no correlations for Aurora, Nightly or Fennec 9.0.
See: https://crash-analysis.mozilla.com/crash_analysis/20120115/
Reporter | ||
Updated•13 years ago
|
Crash Signature: [@ nsEventSource::GetInterface(nsID const&, void**)]
[@ nsCORSListenerProxy::GetInterface(nsID const&, void**) ] → [@ nsEventSource::GetInterface(nsID const&, void**)]
[@ nsCORSListenerProxy::GetInterface(nsID const&, void**)]
Comment 7•13 years ago
|
||
(In reply to Scoobidiver from comment #6)
> Correlations for a version are only available when there is enough data,
> i.e. for 9.0.1 and 10.0b4 (about two weeks before the release). There are no
> correlations for Aurora, Nightly or Fennec 9.0.
> See: https://crash-analysis.mozilla.com/crash_analysis/20120115/
Now that FF11 is on Beta, are the correlation reports of more use?
Reporter | ||
Comment 8•13 years ago
|
||
(In reply to Alex Keybl [:akeybl] from comment #7)
> Now that FF11 is on Beta, are the correlation reports of more use?
The FF10 correlations are still broken (see bug 724799).
The FF11 correlations are not statistically representative (20 crashes a days) and give:
nsEventSource::GetInterface(nsID const&, void**)|EXCEPTION_STACK_OVERFLOW (20 crashes)
75% (15/20) vs. 45% (7284/16009) firefox-hotfix@mozilla.org
20% (4/20) vs. 2% (267/16009) {a0d7ccb3-214d-498b-b4aa-0e8fda9a7bf7} (WOT, https://addons.mozilla.org/addon/3456)
15% (3/20) vs. 0% (5/16009) fastdialfx6@rouing3.addons.mozilla.org
15% (3/20) vs. 0% (14/16009) xpirftoolbar@roboform.com
15% (3/20) vs. 1% (196/16009) firebug@software.joehewitt.com (Firebug, https://addons.mozilla.org/addon/1843)
15% (3/20) vs. 1% (196/16009) ffxtlbr@Facemoods.com
15% (3/20) vs. 2% (259/16009) {1018e4d6-728f-4b20-ad56-37578a4de76b} (Flagfox, https://addons.mozilla.org/addon/5791)
15% (3/20) vs. 2% (359/16009) {23fcfd51-4958-4f00-80a3-ae97e717ed8b}
15% (3/20) vs. 3% (407/16009) {D4DD63FA-01E4-46a7-B6B1-EDAB7D6AD389} (Download Statusbar, https://addons.mozilla.org/addon/26)
15% (3/20) vs. 5% (731/16009) wrc@avast.com
10% (2/20) vs. 0% (8/16009) doudehou@gmail.com (StatusbarEx, https://addons.mozilla.org/addon/3271)
10% (2/20) vs. 0% (21/16009) toolbar@alexa.com (Alexa Sparky, https://addons.mozilla.org/addon/5362)
10% (2/20) vs. 0% (26/16009) info@djzig.com (LavaFox V1, https://addons.mozilla.org/addon/11861)
10% (2/20) vs. 0% (31/16009) quickstores@quickstores.de
10% (2/20) vs. 0% (60/16009) cacaoweb@cacaoweb.org
10% (2/20) vs. 1% (106/16009) vshare@toolbar
10% (2/20) vs. 1% (131/16009) 64ffxtbr@TelevisionFanatic.com
20% (4/20) vs. 11% (1742/16009) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)
10% (2/20) vs. 2% (249/16009) {EB9394A3-4AD6-4918-9537-31A1FD8E8EDF}
10% (2/20) vs. 2% (265/16009) {99079a25-328f-4bd4-be04-00955acaa0a7}
10% (2/20) vs. 3% (532/16009) {e4a8a97b-f2ed-450b-b12d-ee082ba24781} (Greasemonkey, https://addons.mozilla.org/addon/748)
nsEventSource::GetInterface(nsID const&, void**)|EXCEPTION_STACK_OVERFLOW (20 crashes)
35% (7/20) vs. 12% (1895/16009) GrooveShellExtensions.dll (MS Office)
5% (1/20) vs. 5% (770/16009) 12.0.4518.1014
10% (2/20) vs. 6% (921/16009) 12.0.6421.1000
20% (4/20) vs. 1% (187/16009) 12.0.6500.5000
25% (5/20) vs. 8% (1302/16009) Aavm4h.dll (Avast AV)
15% (3/20) vs. 1% (133/16009) 6.0.1289.0
10% (2/20) vs. 7% (1163/16009) 6.0.1367.0
15% (3/20) vs. 2% (360/16009) RocketDock.dll (RocketDock)
15% (3/20) vs. 3% (547/16009) DropboxExt.14.dll (1.0.0.14) (DropBox)
10% (2/20) vs. 0% (50/16009) 64radio.dll (1.0.0.9) (RadioControl)
10% (2/20) vs. 4% (687/16009) mgAdaptersProxy.dll (3.6.0.2) (SweetIM)
Reporter | ||
Comment 9•13 years ago
|
||
With combined signatures, it's #48 top browser crasher in 11.0b1.
Here are fresh 11 correlations:
nsEventSource::GetInterface(nsID const&, void**)|EXCEPTION_STACK_OVERFLOW (25 crashes)
44% (11/25) vs. 6% (1253/22290) datamngr.dll (SearchQu Toolbar)
40% (10/25) vs. 5% (1060/22290) 1.0.0.1
4% (1/25) vs. 0% (21/22290) 3.0.0.46593
32% (8/25) vs. 9% (2080/22290) GrooveUtil.DLL (MS Office)
8% (2/25) vs. 4% (945/22290) 12.0.4518.1014
4% (1/25) vs. 0% (44/22290) 12.0.6550.5004
16% (4/25) vs. 3% (719/22290) 12.0.6562.5000
4% (1/25) vs. 1% (269/22290) 12.0.6606.1000
20% (5/25) vs. 0% (8/22290) PCIHOOKS.DLL (NetSupport Manager)
20% (5/25) vs. 0% (5/22290) 10.60.0.391
16% (4/25) vs. 0% (28/22290) APSHook.dll (Cognizance Identity Manager)
16% (4/25) vs. 0% (12/22290) 2.0.0.15
16% (4/25) vs. 1% (121/22290) DpOSet.dll (HP ProtectTools Security Manager)
16% (4/25) vs. 0% (17/22290) 5.1.0.334
16% (4/25) vs. 5% (1128/22290) GROOVEEX.DLL (MS Office 2010)
16% (4/25) vs. 3% (733/22290) 14.0.6106.5000
12% (3/25) vs. 1% (321/22290) GoogleDesktopCommon.dll (Google Desktop)
12% (3/25) vs. 1% (290/22290) 5.9.1005.12335
Comment 10•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #5)
> Hmm. Maybe I just misunderstood the UI? When I look _under_ the "add-ons"
> bit it says "Loaded Correlation Data, but none available for this signature
> nsEventSource::GetInterface(nsID const&, void**)". Under Add-ons by
> versions, clicking "Load" just complains about undefined indices. See
> https://crash-stats.mozilla.com/report/index/c13e9360-4325-454b-ba40-
> be22c2120113 for an example.
Hi Boris, were you looking for the correlation reports to get a better idea of the types of 3rd party software that may be triggering this regression? Or would you like QA to help attempt to reproduce? If you'd like QA's help, any tips on what they should focus on when trying to reproduce (other than the above add-ons) would be greatly appreciated. Thanks!
Comment 11•13 years ago
|
||
There weren't any correlation reports that I could find.
But in any case, comment 3 explains why this is happening and what we need to do to fix it.... The only thing that's unclear is why these add-ons trigger it.
Has QA tried using the relevant addons while loading websocket testcases?
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → jduell.mcbugs
Comment 12•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #11)
> Has QA tried using the relevant addons while loading websocket testcases?
Adding qawanted, but I'm sure they'll ask to be pointed at applicable testcases.
Let's install WOT, fastdialfx6, and Roboform when performing the testing.
Keywords: qawanted
Comment 13•13 years ago
|
||
I will try and get someone in QA to help - Marcia is away for a few days.
Assignee | ||
Comment 14•13 years ago
|
||
bz is probably right about this being caused by a loop where nsEventSource and nsCORSListenerProxy wind up being each other's callbacks. But I don't think this has anything to do with websockets (other than that it seems likely that the websockets Content Policy patch for bug 692067 somehow causes this to happen: I don't see any other likely candidates in the regression window).
Specifically, the scenario in comment 3 doesn't happen for websockets as far as i can tell (I've been trying to repro in the debugger, and don't see it).
> Well, websocket code sets up the channel in
> nsEventSource::InitChannelAndRequestEventSource as follows:
No it doesn't. nsEventSource isn't involved in websocket setup at all. I've been running ws tests through the debugger and we don't ever call any nsEventSource functions during them. Actually, we don't seem to call nsEventSource functions for anything in the browser--I surfed nytimes.com, facebook, etc w/o ever hitting them. I assume that one needs to hit a page with JS that explicitly instantiates an "EventSource" object to actually invoke any of this code?
Assignee | ||
Comment 15•13 years ago
|
||
Are we sure that this line in the patch for bug 692067 was right?
+ csp._MAPPINGS[cp.TYPE_WEBSOCKET] = cspr_sd.XHR_SRC;
The only other theory I've got from looking at the patch is that the NS_CheckContentLoadPolicy call in WebSocketChannel:Init somehow messes up things for EventSource objects--seems like a stretch, but I don't know this code much. Tomorrow I'll try running the existing mochitest for Eventsource (test_bug338583.html), perhaps adding a websocket to it. Unless someone has a better idea.
Assignee | ||
Comment 16•13 years ago
|
||
cc-ing the patch author/reviewers of bug 692067, since this seems to be the likely culprit.
Assignee | ||
Comment 17•13 years ago
|
||
khuey/smaug,
I guess you were already cc'd on this. Could you look over comment 3, 14/15 and see if there's anything obvious in your patch that might be causing this?
Comment 18•13 years ago
|
||
hey jason, is the content policy triggering any kind of redirect? It's totally posislbe that is busted in websockets because normal redirects are prohibited by websockets so that code isn't run. (but for example, I think STS could run it.)
Comment 19•13 years ago
|
||
Some content policies (HTTPS everywhere, noscript) definitely trigger redirects by hand.
Comment 20•13 years ago
|
||
I have Fx11b4(build1) on Windows 7 running with WOT, NoScript, Adblock Plus, HTTPS Everywhere and Fast Dial 4.0 and I have been trying to crash the application by going to different sites like Google properties, loging in, checking/sending mail, and such. I've also tried testing with websockets test suites that I found on google, the kind that ask you to load a page and it runs a script with a pass/fail result at the end. I've also tried other demos.
I haven't been able to crash yet.
Assignee | ||
Comment 21•13 years ago
|
||
juan,
It looks like this issue will require that a web page use EventSource (which is fairly rare AFAICT: I can't point you at any I know of). And possibly also websockets? Not clear yet. I'm going to try to write a mochitest that exposes the crash.
Assignee | ||
Comment 22•13 years ago
|
||
So this is indeed an EventSource-only bug, not websockets.
The fix is as bz specified in comment 3: just move setting of callbacks.
Test is also attached.
It appears that bug 692067 actually has nothing to do with this. I tried backing that patch out and running my test, and we get the same infinite loop.
AFACIT redirects for EventSource were just broken, and we started encountering them in the wild just as we landed 692067. That's my best guess, at least.
This is a simple fix, and I'd recommend we land it on aurora/beta (not setting approval flags 'cause I don't have review yet: bz feel free to do the honors.)
Attachment #600241 -
Flags: review?(bzbarsky)
Comment 23•13 years ago
|
||
awesome jason.
Comment 24•13 years ago
|
||
Comment on attachment 600241 [details] [diff] [review]
Fix and mochitest
r=me
Attachment #600241 -
Flags: review?(bzbarsky) → review+
Reporter | ||
Updated•13 years ago
|
Status: NEW → ASSIGNED
Component: Networking: WebSockets → Event Handling
QA Contact: networking.websockets → events
Assignee | ||
Comment 25•13 years ago
|
||
Assignee: jduell.mcbugs → nobody
Status: ASSIGNED → NEW
Component: Event Handling → Networking: WebSockets
QA Contact: events → networking.websockets
Summary: Re-entrant calls to nsCORSListenerProxy::GetInterface leads to stack overflow and Crash @ nsEventSource::GetInterface → EventSource::GetInterface goes into infinite loop if EventSource hits an HTTP redirect
Assignee | ||
Comment 26•13 years ago
|
||
Comment on attachment 600241 [details] [diff] [review]
Fix and mochitest
I'd suggest we take this on esr too--how do I nominate for that?
Regression caused by (bug #): Appears to have been in the source from the beginning, but is only now being hit by websites (as we see the EventSource API being picked up more widely on the web)
User impact if declined: Any site that uses an EventSource with a URI that redirects will cause browser to go into infinite loop then crash.
Testing completed (on m-c, etc.): mochitest
Risk to taking this patch (and alternatives if risky): very low risk.
String changes made by this patch: no
Attachment #600241 -
Flags: approval-mozilla-beta?
Attachment #600241 -
Flags: approval-mozilla-aurora?
Reporter | ||
Comment 27•13 years ago
|
||
(In reply to Jason Duell (:jduell) from comment #26)
> I'd suggest we take this on esr too--how do I nominate for that?
> Regression caused by (bug #): Appears to have been in the source from the
> beginning, but is only now being hit by websites (as we see the EventSource API
> being picked up more widely on the web)
I see only one crash in 10.0 and one in 10.0.2 while there are 178 crashes in 11.0b3 for combined signatures, so there's a patch in the regression range that make it more hittable.
Assignee: nobody → jduell.mcbugs
Status: NEW → ASSIGNED
status-firefox11:
--- → affected
status-firefox12:
--- → affected
Target Milestone: --- → mozilla13
Comment 28•13 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 29•13 years ago
|
||
This changeset is almost the culprit (added the CORS listener that is needed for the infinite loop)
changeset: 81500:ed783dfd8179
user: Wellington Fernando de Macedo <wfernandom2004@gmail.com>
date: Mon Dec 05 21:02:42 2011 -0200
summary: Bug 664179 - Allow Cross-Origin URLs in EventSource (Server-Sent Events). r=sicking
It's in aurora/beta, but not in esr.
Depends on: 664179
Assignee | ||
Comment 30•13 years ago
|
||
s/almost/almost certainly/ :)
Updated•13 years ago
|
Updated•13 years ago
|
status-firefox-esr10:
--- → unaffected
Comment 31•13 years ago
|
||
Comment on attachment 600241 [details] [diff] [review]
Fix and mochitest
[Triage Comment]
Accepted based on low risk and no crash reports since it landed
Attachment #600241 -
Flags: approval-mozilla-beta?
Attachment #600241 -
Flags: approval-mozilla-beta+
Attachment #600241 -
Flags: approval-mozilla-aurora?
Attachment #600241 -
Flags: approval-mozilla-aurora+
Assignee | ||
Comment 32•13 years ago
|
||
Reporter | ||
Updated•13 years ago
|
Comment 33•13 years ago
|
||
Removing qawanted since this is marked fixed. Will be verified in Beta6.
Keywords: qawanted
Comment 34•13 years ago
|
||
Given that QA was never able to reproduce this, what can we do to help verify the fix?
Whiteboard: [qa?]
Assignee | ||
Comment 35•13 years ago
|
||
I added a mochitest as part of my fix that clearly repro's the issue. I'd consider it verified.
You need to log in
before you can comment on or make changes to this bug.
Description
•