Closed Bug 626994 Opened 14 years ago Closed 6 years ago

Crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] without using Cisco VPN

Categories

(Core :: Graphics, defect, P3)

x86
All
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking2.0 --- .x+

People

(Reporter: scoobidiver, Assigned: jrmuizel)

References

Details

(Keywords: crash, regression, Whiteboard: [softblocker][approved-patches-landed][tbird crash][gfx-noted])

Crash Data

Attachments

(5 files, 3 obsolete files)

From 4.0b10pre/20110114, there is a spike in crashes. It is #16 top crasher in 4.0b10pre for the last 3 days. According to some comments, it is related to Panorama: "crash switching from panorama to a tab group" "Click on the group button just after closed a tab with the mouse middle button." "crash switching into panorama (1st time since startup)" Signature _de_casteljau UUID 3030c9d2-bf8b-44ac-855a-5f8472110118 Time 2011-01-18 19:58:04.264771 Uptime 333 Last Crash 515998 seconds (6.0 days) before submission Install Age 333 seconds (5.5 minutes) since version was first installed. Product Firefox Version 4.0b10pre Build ID 20110118030327 Branch 2.0 OS Windows NT OS Version 6.1.7600 CPU x86 CPU Info AuthenticAMD family 15 model 104 stepping 1 Crash Reason EXCEPTION_STACK_OVERFLOW Crash Address 0x5affbec1 App Notes AdapterVendorID: 1002, AdapterDeviceID: 791f Processor Notes This dump is too long and has triggered the automatic truncation routine Frame Module Signature [Expand] Source 0 xul.dll _de_casteljau gfx/cairo/cairo/src/cairo-spline.c:103 1 xul.dll _cairo_spline_decompose_into gfx/cairo/cairo/src/cairo-spline.c:195 2 xul.dll _cairo_spline_decompose_into gfx/cairo/cairo/src/cairo-spline.c:197 3 xul.dll _cairo_spline_decompose_into gfx/cairo/cairo/src/cairo-spline.c:197 ... The regression range for the spike is: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=54184cfa6f0e&tochange=9f412256da4c More reports at: https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=_de_casteljau
blocking2.0: --- → ?
See discussion of the problem in bug 435756.
Depends on: 435756
This is almost certainly the same problem as bug 435756, which we have no hope of fixing unfortunately.
blocking2.0: ? → -
Joe, I think we can easily wallpaper this crash as discussed in bug 435756. And given the crash frequency I think it's worth doing.
It is #11 top crasher in 4.0b10.
I'd be surprised if this is the same problem as bug 435756. It seems unlikely that people would begin using old cisco software again. I expect this infinite recursion is caused by some problem elsewhere. I'd like to add some instrumentation to try to figure out what's going wrong here.
blocking2.0: - → ?
It seems like these crashes are triggered by Panorama.
Summary: Spike in crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] → Spike in crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] Panorama
Attached patch Try to detect Cisco VPN (obsolete) — — Splinter Review
For now, we'll call this a hardblocker, but this might turn into not-a-blocker if it's actually Cisco VPN-related.
Assignee: nobody → jmuizelaar
blocking2.0: ? → final+
Whiteboard: [hardblocker]
(In reply to comment #8) > For now, we'll call this a hardblocker, but this might turn into not-a-blocker > if it's actually Cisco VPN-related. I'm skeptical that this is the cause, my gut feeling tells me this is related to the relatively extreme minification levels used by panorama. But that's just a hunch...
Attached patch Try to detect Cisco VPN v2 (obsolete) — — Splinter Review
This version should actually work.
Attachment #507607 - Attachment is obsolete: true
Attachment #508652 - Flags: review?(ehsan)
Comment on attachment 508652 [details] [diff] [review] Try to detect Cisco VPN v2 >+/* Cisco's VPN software can cause corruption of the floating point state. >+ * Make a note of this in our crash reports so that some weird crashes >+ * make more sense */ >+static void >+CheckForCiscoVPN() { >+#if defined(MOZ_CRASHREPORTER) && defined(MOZ_ENABLE_LIBXUL) This function is only ever called from AddCrashReportAnnotations, so please move it all inside the #if block here. >+ LONG result; >+ HKEY key; >+ /* This will give false positives, but hopefully no false negatives */ >+ result = RegOpenKeyExW(HKEY_LOCAL_MACHINE, L"Software\\Cisco Systems\\VPN Client", 0, KEY_QUERY_VALUE, &key); >+ if (result == ERROR_SUCCESS) { >+ CrashReporter::AppendAppNotesToCrashReport(NS_LITERAL_CSTRING("Cisco VPN")); And there you leak one handle! Please close the returned key here. r=me with those comments addressed.
Attachment #508652 - Flags: review?(ehsan) → review+
Whiteboard: [hardblocker] → [hardblocker][has patch]
Actually, I don't think this qualifies as "has patch", since it's really a debugging patch, not something that solves the problem. We shouldn't be counting it as any indication that this bug "will be fixed soon".
sorry about that. removed the whiteboard update.
Whiteboard: [hardblocker][has patch] → [hardblocker]
Attached patch Try to detect Cisco VPN v3 — — Splinter Review
Final version
Attachment #508652 - Attachment is obsolete: true
Attachment #508774 - Flags: review+
(In reply to comment #14) > Created attachment 508774 [details] [diff] [review] > Try to detect Cisco VPN v3 > > Final version This was landed as <http://hg.mozilla.org/mozilla-central/rev/0a74956ae143>.
Attachment #508876 - Flags: review?(ehsan)
Comment on attachment 508876 [details] [diff] [review] Try to find out the inputs to infinite recursion The code looks fine to me, r=me given that Jeff tests it before landing.
Attachment #508876 - Flags: review?(ehsan) → review+
Attachment #508876 - Attachment is obsolete: true
Depends on: 630444
I tried to test it and land it, but it seems like this patch is based on another patch which makes it not apply on mozilla-central...
Upload the correct one.
Attached patch Fix up depth counting — — Splinter Review
Comment on attachment 509155 [details] [diff] [review] Fix up depth counting I landed this patch on the beta11 relbranch as well: http://hg.mozilla.org/mozilla-central/rev/36f4e8a8b953
So far there has been only one crash in beta 12: https://crash-stats.mozilla.com/report/index/7e4efd36-0f4b-412f-9051-582712110206 It has a the Cisco VPN tag.
FWIW, after seeing daily+ panorama-related crashes between Jan 15 and 30th, 2011, I've seen zero since. I'm guessing they went away when panorama transitions were converted to use css.
AdapterVendorID: 1002, AdapterDeviceID: 5e4f, AdapterDriverVersion: 8.552.0.0 curve 40e4c00000000000 c090000000000000, 40e54d4000000000 c090000000000000, 40e5c00000000000 c05a800000000000, 40e5c00000000000 4090000000000000 crv-f: 42496,000000 -1024,000000, 43626,000000 -1024,000000, 44544,000000 -106,000000, 44544,000000 1024,000000
I just realized that the types passed to StoreSpline don't actually match the types passed in. So the data might take some interpretation here.
(In reply to comment #29) > I just realized that the types passed to StoreSpline don't actually match the > types passed in. So the data might take some interpretation here. The actual inputs seem to be something like: (166., -4.) (170.410625., -4.) (174., -0.4140625) (173.95703125, 4) I don't see anything particularly interesting about those co-ordinates.
Attached patch Improve debug logging — — Splinter Review
This fixes the type problem and adds a bit more debugging info.
Attachment #510674 - Flags: review?(jdaggett)
Attachment #510674 - Flags: review?(ehsan)
Attachment #510674 - Flags: review?(ehsan) → review+
There have only be two crashes (both with cisco vpn) on beta 11 so far I'm demoting this to a softblocker.
Whiteboard: [hardblocker] → [softblocker]
Attachment #510674 - Flags: review?(jdaggett) → approval2.0+
(In reply to comment #30) > (In reply to comment #29) > > I just realized that the types passed to StoreSpline don't actually match the > > types passed in. So the data might take some interpretation here. > > The actual inputs seem to be something like: > (166., -4.) (170.410625., -4.) (174., -0.4140625) (173.95703125, 4) > > I don't see anything particularly interesting about those co-ordinates. Given that this version of the code went out with beta11, what's the calculation for determining the correct coordinates from the debug output included in the crashdump? Or did you directly analyze the crashdump file? As of 02/09/2011 16:51:28, beta11 has 8 crashreports, all Cisco VPN.
(In reply to comment #34) > (In reply to comment #30) > > (In reply to comment #29) > > > I just realized that the types passed to StoreSpline don't actually match the > > > types passed in. So the data might take some interpretation here. > > > > The actual inputs seem to be something like: > > (166., -4.) (170.410625., -4.) (174., -0.4140625) (173.95703125, 4) > > > > I don't see anything particularly interesting about those co-ordinates. > > Given that this version of the code went out with beta11, what's the > calculation for determining the correct coordinates from the debug output > included in the crashdump? Or did you directly analyze the crashdump file? The values in beta11 are cairo's fixed point integers converted to doubles. You can convert back to the original doubles by taking value and dividing it by 256 (the 8 bit fractional part of the fixed point representation)
Looks like there are other ways the tolerance value is getting whacked apart from Cisco VPN libs: https://crash-stats.mozilla.com/report/index/62aebcb3-4e7a-4e47-af72-1403f2110210 curve 41250c9c 6a7eec8, 5 5, 2952ac0 0, 0 0 crv-crash(0,000000): 41250c9c 6a7eec8, 41250c9b 6a7eec7, 41250c9a 6a7eec6, 41250c99 6a7eec5
In reply to comment 26 > I'm guessing they went away when panorama transitions were converted to use > css. You're right because it is now #170 top crasher in 4.0b11 and #167 top crasher in 3.6.13. I think it can be closed as work for me, as there is no longer spike. The only applicable bug is now bug 435756.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
Sorry, this is still causing crashes in situations where Cisco VPNs are not involved. Until we can prove that this is equivalent to other bugs I think this should stay open.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: Spike in crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] Panorama → Crashes [@ _de_casteljau ] due to infinite recursion of [@ _cairo_spline_decompose_into] without using Cisco VPN
Whiteboard: [softblocker] → [softblocker][approved-patches-landed]
I'm not sure tagging "approved patches landed" is right here, those patches provide better debugging and do not constitute any change that will fix/prevent/reduce the problem, so metrics that assume "patches landed" == "almost fixed" will not be correct.
In this case, [approved-patches-landed] just means "stay out of beltzner's query." ;)
** PRODUCT DRIVERS PLEASE NOTE ** This bug is one of 7 automatically changed from blocking2.0:final+ to blocking2.0:.x during the endgame of Firefox 4 for the following reasons: - it was marked as a soft blocking issue without a requirement for beta coverage
blocking2.0: final+ → .x+
(In reply to comment #26) > FWIW, after seeing daily+ panorama-related crashes between Jan 15 and 30th, > 2011, > I've seen zero since. Got a couple crashes overnight with the b13pre equivalent of 4.0RC1.
Crash Signature: [@ _de_casteljau ] [@ _cairo_spline_decompose_into]
Crash Signature: [@ _de_casteljau ] [@ _cairo_spline_decompose_into] → [@ _de_casteljau ] [@ _cairo_spline_decompose_into]
We're still getting a lot of crashes from this. We should continue the work.
I see this crash on a regular basis in Thunderbird trunk on Linux (Fedora 17 x86_64) on a locally connected display, so I am changing the the platform to "All" -- this is not Windows specific. I also see it periodically in firefox on the same platform.
OS: Windows 7 → All
(In reply to Benoit Girard (:BenWa) from comment #44) > We're still getting a lot of crashes from this. We should continue the work. FWIW, afaict for thunderbird - _cairo_spline_decompose_into doesn't exist in current releases - _de_casteljau 65% of crashes in past month are two most recent releases + ESR https://crash-stats.mozilla.com/report/list?product=Thunderbird&query_search=signature&query_type=exact&query=_de_casteljau&reason_type=contains&date=07%2F13%2F2012%2013%3A55%3A11&range_value=4&range_unit=weeks&hang_type=any&process_type=all&do_query=1&signature=_de_casteljau
Whiteboard: [softblocker][approved-patches-landed] → [softblocker][approved-patches-landed][tbird crash]
(In reply to Wayne Mery (:wsmwk) from comment #46) > (In reply to Benoit Girard (:BenWa) from comment #44) > > We're still getting a lot of crashes from this. We should continue the work. > > FWIW, afaict for thunderbird > - _cairo_spline_decompose_into doesn't exist in current releases Um, I just got a crash in _cairo_spline_decompose_into this morning in a Thunderbird trunk build dated July 9.
(In reply to Jonathan Kamens from comment #47) > (In reply to Wayne Mery (:wsmwk) from comment #46) > (In reply to Benoit > Girard (:BenWa) from comment #44) > > We're still getting a lot of crashes > from this. We should continue the work. > > FWIW, afaict for thunderbird > > - _cairo_spline_decompose_into doesn't exist in current releases > Um, I just > got a crash in _cairo_spline_decompose_into this morning in a Thunderbird > trunk build dated July 9. Thanks. In that case, it's just not showing in crash-stats. _cairo_spline_decompose_into is rare for thunderbird on crash-stats: * only 18 crashs in 4 months * half are 3.x release * none recorded for version 12 or newer. v6 bp-44b28371-9a24-4dd5-acd8-d064b2120517 v7 bp-26e595f4-c090-4bed-b044-b3bcb2120523 v11 bp-2d1d4831-a90a-40c5-beae-176152120326
This still gets reported with Firefox, Fennec, and Thunderbird but at extremely low volume.
Whiteboard: [softblocker][approved-patches-landed][tbird crash] → [softblocker][approved-patches-landed][tbird crash][gfx-noted]

Closing because no crashes reported for 12 weeks.

Status: REOPENED → RESOLVED
Closed: 14 years ago6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: