Crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] without using Cisco VPN

RESOLVED WORKSFORME

Status

()

defect
P3
critical
RESOLVED WORKSFORME
9 years ago
4 days ago

People

(Reporter: scoobidiver, Assigned: jrmuizel)

Tracking

({crash, regression})

Trunk
x86
All
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 .x+)

Details

(Whiteboard: [softblocker][approved-patches-landed][tbird crash][gfx-noted], crash signature)

Attachments

(5 attachments, 3 obsolete attachments)

From 4.0b10pre/20110114, there is a spike in crashes.
It is #16 top crasher in 4.0b10pre for the last 3 days.

According to some comments, it is related to Panorama:
"crash switching from panorama to a tab group"
"Click on the group button just after closed a tab with the mouse middle button."
"crash switching into panorama (1st time since startup)"

Signature	_de_casteljau
UUID	3030c9d2-bf8b-44ac-855a-5f8472110118
Time 	2011-01-18 19:58:04.264771
Uptime	333
Last Crash	515998 seconds (6.0 days) before submission
Install Age	333 seconds (5.5 minutes) since version was first installed.
Product	Firefox
Version	4.0b10pre
Build ID	20110118030327
Branch	2.0
OS	Windows NT
OS Version	6.1.7600
CPU	x86
CPU Info	AuthenticAMD family 15 model 104 stepping 1
Crash Reason	EXCEPTION_STACK_OVERFLOW
Crash Address	0x5affbec1
App Notes 	AdapterVendorID: 1002, AdapterDeviceID: 791f

Processor Notes 	This dump is too long and has triggered the automatic truncation routine

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	_de_casteljau 	gfx/cairo/cairo/src/cairo-spline.c:103
1 	xul.dll 	_cairo_spline_decompose_into 	gfx/cairo/cairo/src/cairo-spline.c:195
2 	xul.dll 	_cairo_spline_decompose_into 	gfx/cairo/cairo/src/cairo-spline.c:197
3 	xul.dll 	_cairo_spline_decompose_into 	gfx/cairo/cairo/src/cairo-spline.c:197
...

The regression range for the spike is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=54184cfa6f0e&tochange=9f412256da4c

More reports at:
https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=_de_casteljau
blocking2.0: --- → ?
See discussion of the problem in bug 435756.
Depends on: 435756
This is almost certainly the same problem as bug 435756, which we have no hope of fixing unfortunately.
blocking2.0: ? → -
Joe, I think we can easily wallpaper this crash as discussed in bug 435756.
And given the crash frequency I think it's worth doing.
It is #11 top crasher in 4.0b10.
I'd be surprised if this is the same problem as bug 435756. It seems unlikely that people would begin using old cisco software again. I expect this infinite recursion is caused by some problem elsewhere. I'd like to add some instrumentation to try to figure out what's going wrong here.
blocking2.0: - → ?
It seems like these crashes are triggered by Panorama.
Summary: Spike in crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] → Spike in crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] Panorama
Posted patch Try to detect Cisco VPN (obsolete) — Splinter Review
For now, we'll call this a hardblocker, but this might turn into not-a-blocker if it's actually Cisco VPN-related.
Assignee: nobody → jmuizelaar
blocking2.0: ? → final+
Whiteboard: [hardblocker]
(In reply to comment #8)
> For now, we'll call this a hardblocker, but this might turn into not-a-blocker
> if it's actually Cisco VPN-related.

I'm skeptical that this is the cause, my gut feeling tells me this is related to the relatively extreme minification levels used by panorama.  But that's just a hunch...
Posted patch Try to detect Cisco VPN v2 (obsolete) — Splinter Review
This version should actually work.
Attachment #507607 - Attachment is obsolete: true
Attachment #508652 - Flags: review?(ehsan)
Comment on attachment 508652 [details] [diff] [review]
Try to detect Cisco VPN v2

>+/* Cisco's VPN software can cause corruption of the floating point state.
>+ * Make a note of this in our crash reports so that some weird crashes
>+ * make more sense */
>+static void
>+CheckForCiscoVPN() {
>+#if defined(MOZ_CRASHREPORTER) && defined(MOZ_ENABLE_LIBXUL)

This function is only ever called from AddCrashReportAnnotations, so please move it all inside the #if block here.

>+  LONG result;
>+  HKEY key;
>+  /* This will give false positives, but hopefully no false negatives */
>+  result = RegOpenKeyExW(HKEY_LOCAL_MACHINE, L"Software\\Cisco Systems\\VPN Client", 0, KEY_QUERY_VALUE, &key);
>+  if (result == ERROR_SUCCESS) {
>+    CrashReporter::AppendAppNotesToCrashReport(NS_LITERAL_CSTRING("Cisco VPN"));

And there you leak one handle!  Please close the returned key here.

r=me with those comments addressed.
Attachment #508652 - Flags: review?(ehsan) → review+
Whiteboard: [hardblocker] → [hardblocker][has patch]
Actually, I don't think this qualifies as "has patch", since it's really a debugging patch, not something that solves the problem.  We shouldn't be counting it as any indication that this bug "will be fixed soon".
sorry about that. removed the whiteboard update.
Whiteboard: [hardblocker][has patch] → [hardblocker]
Final version
Attachment #508652 - Attachment is obsolete: true
Attachment #508774 - Flags: review+
(In reply to comment #14)
> Created attachment 508774 [details] [diff] [review]
> Try to detect Cisco VPN v3
> 
> Final version

This was landed as <http://hg.mozilla.org/mozilla-central/rev/0a74956ae143>.
Attachment #508876 - Flags: review?(ehsan)
Comment on attachment 508876 [details] [diff] [review]
Try to find out the inputs to infinite recursion

The code looks fine to me, r=me given that Jeff tests it before landing.
Attachment #508876 - Flags: review?(ehsan) → review+
Depends on: 630444
I tried to test it and land it, but it seems like this patch is based on another patch which makes it not apply on mozilla-central...
Comment on attachment 509155 [details] [diff] [review]
Fix up depth counting

I landed this patch on the beta11 relbranch as well:

http://hg.mozilla.org/mozilla-central/rev/36f4e8a8b953
So far there has been only one crash in beta 12:
https://crash-stats.mozilla.com/report/index/7e4efd36-0f4b-412f-9051-582712110206

It has a the Cisco VPN tag.
FWIW, after seeing daily+ panorama-related crashes between Jan 15 and 30th, 2011,
I've seen zero since. 
I'm guessing they went away when panorama transitions were converted to use css.
AdapterVendorID: 1002, AdapterDeviceID: 5e4f, AdapterDriverVersion: 8.552.0.0
curve 40e4c00000000000 c090000000000000, 40e54d4000000000 c090000000000000, 40e5c00000000000 c05a800000000000, 40e5c00000000000 4090000000000000
crv-f: 42496,000000 -1024,000000, 43626,000000 -1024,000000, 44544,000000 -106,000000, 44544,000000 1024,000000
I just realized that the types passed to StoreSpline don't actually match the types passed in. So the data might take some interpretation here.
(In reply to comment #29)
> I just realized that the types passed to StoreSpline don't actually match the
> types passed in. So the data might take some interpretation here.

The actual inputs seem to be something like:
(166., -4.) (170.410625., -4.) (174., -0.4140625) (173.95703125, 4)

I don't see anything particularly interesting about those co-ordinates.
This fixes the type problem and adds a bit more debugging info.
Attachment #510674 - Flags: review?(jdaggett)
Attachment #510674 - Flags: review?(ehsan)
Attachment #510674 - Flags: review?(ehsan) → review+
There have only be two crashes (both with cisco vpn) on beta 11 so far I'm demoting this to a softblocker.
Whiteboard: [hardblocker] → [softblocker]
Attachment #510674 - Flags: review?(jdaggett) → approval2.0+
(In reply to comment #30)
> (In reply to comment #29)
> > I just realized that the types passed to StoreSpline don't actually match the
> > types passed in. So the data might take some interpretation here.
> 
> The actual inputs seem to be something like:
> (166., -4.) (170.410625., -4.) (174., -0.4140625) (173.95703125, 4)
> 
> I don't see anything particularly interesting about those co-ordinates.

Given that this version of the code went out with beta11, what's the calculation for determining the correct coordinates from the debug output included in the crashdump?  Or did you directly analyze the crashdump file?

As of 02/09/2011 16:51:28, beta11 has 8 crashreports, all Cisco VPN.
(In reply to comment #34)
> (In reply to comment #30)
> > (In reply to comment #29)
> > > I just realized that the types passed to StoreSpline don't actually match the
> > > types passed in. So the data might take some interpretation here.
> > 
> > The actual inputs seem to be something like:
> > (166., -4.) (170.410625., -4.) (174., -0.4140625) (173.95703125, 4)
> > 
> > I don't see anything particularly interesting about those co-ordinates.
> 
> Given that this version of the code went out with beta11, what's the
> calculation for determining the correct coordinates from the debug output
> included in the crashdump?  Or did you directly analyze the crashdump file?

The values in beta11 are cairo's fixed point integers converted to doubles.
You can convert back to the original doubles by taking value and dividing it by 256 (the 8 bit fractional part of the fixed point representation)
Looks like there are other ways the tolerance value is getting whacked apart from Cisco VPN libs:

https://crash-stats.mozilla.com/report/index/62aebcb3-4e7a-4e47-af72-1403f2110210

curve 41250c9c 6a7eec8, 5 5, 2952ac0 0, 0 0
crv-crash(0,000000): 41250c9c 6a7eec8, 41250c9b 6a7eec7, 41250c9a 6a7eec6, 41250c99 6a7eec5
In reply to comment 26
> I'm guessing they went away when panorama transitions were converted to use
> css.
You're right because it is now #170 top crasher in 4.0b11 and #167 top crasher in 3.6.13.
I think it can be closed as work for me, as there is no longer spike.
The only applicable bug is now bug 435756.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Sorry, this is still causing crashes in situations where Cisco VPNs are not involved.  Until we can prove that this is equivalent to other bugs I think this should stay open.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: Spike in crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] Panorama → Crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] without using Cisco VPN
Whiteboard: [softblocker] → [softblocker][approved-patches-landed]
I'm not sure tagging "approved patches landed" is right here, those patches provide better debugging and do not constitute any change that will fix/prevent/reduce the problem, so metrics that assume "patches landed" == "almost fixed" will not be correct.
In this case, [approved-patches-landed] just means "stay out of beltzner's query." ;)
** PRODUCT DRIVERS PLEASE NOTE **

This bug is one of 7 automatically changed from blocking2.0:final+ to blocking2.0:.x during the endgame of Firefox 4 for the following reasons:

 - it was marked as a soft blocking issue without a requirement for beta coverage
blocking2.0: final+ → .x+
(In reply to comment #26)
> FWIW, after seeing daily+ panorama-related crashes between Jan 15 and 30th,
> 2011,
> I've seen zero since. 

Got a couple crashes overnight with the b13pre equivalent of 4.0RC1.
Crash Signature: [@ _de_casteljau ] [@ _cairo_spline_decompose_into]
Crash Signature: [@ _de_casteljau ] [@ _cairo_spline_decompose_into] → [@ _de_casteljau ] [@ _cairo_spline_decompose_into]
We're still getting a lot of crashes from this. We should continue the work.
I see this crash on a regular basis in Thunderbird trunk on Linux (Fedora 17 x86_64) on a locally connected display, so I am changing the the platform to "All" -- this is not Windows specific. I also see it periodically in firefox on the same platform.
OS: Windows 7 → All
(In reply to Benoit Girard (:BenWa) from comment #44)
> We're still getting a lot of crashes from this. We should continue the work.

FWIW, afaict for thunderbird
- _cairo_spline_decompose_into doesn't exist in current releases
- _de_casteljau  65% of crashes in past month are two most recent releases + ESR https://crash-stats.mozilla.com/report/list?product=Thunderbird&query_search=signature&query_type=exact&query=_de_casteljau&reason_type=contains&date=07%2F13%2F2012%2013%3A55%3A11&range_value=4&range_unit=weeks&hang_type=any&process_type=all&do_query=1&signature=_de_casteljau
Whiteboard: [softblocker][approved-patches-landed] → [softblocker][approved-patches-landed][tbird crash]
(In reply to Wayne Mery (:wsmwk) from comment #46)
> (In reply to Benoit Girard (:BenWa) from comment #44)
> > We're still getting a lot of crashes from this. We should continue the work.
> 
> FWIW, afaict for thunderbird
> - _cairo_spline_decompose_into doesn't exist in current releases

Um, I just got a crash in _cairo_spline_decompose_into this morning in a Thunderbird trunk build dated July 9.
(In reply to Jonathan Kamens from comment #47)
> (In reply to Wayne Mery (:wsmwk) from comment #46)
> (In reply to Benoit
> Girard (:BenWa) from comment #44)
> > We're still getting a lot of crashes
> from this. We should continue the work.
> 
> FWIW, afaict for thunderbird
>
> - _cairo_spline_decompose_into doesn't exist in current releases

> Um, I just
> got a crash in _cairo_spline_decompose_into this morning in a Thunderbird
> trunk build dated July 9.

Thanks. In that case, it's just not showing in crash-stats. 

_cairo_spline_decompose_into is rare for thunderbird on crash-stats:
* only 18 crashs in 4 months
* half are 3.x release
* none recorded for version 12 or newer.

v6 bp-44b28371-9a24-4dd5-acd8-d064b2120517
v7 bp-26e595f4-c090-4bed-b044-b3bcb2120523
v11 bp-2d1d4831-a90a-40c5-beae-176152120326
This still gets reported with Firefox, Fennec, and Thunderbird but at extremely low volume.
Whiteboard: [softblocker][approved-patches-landed][tbird crash] → [softblocker][approved-patches-landed][tbird crash][gfx-noted]

Closing because no crashes reported for 12 weeks.

Status: REOPENED → RESOLVED
Closed: 9 years ago4 days ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.