Last Comment Bug 435756 - Crash [@ _de_casteljau] due to infinite recursion of [@ _cairo_spline_decompose_into] when using cisco vpn
: Crash [@ _de_casteljau] due to infinite recursion of [@ _cairo_spline_decompo...
Status: REOPENED
[sg:dos]
: crash, user-doc-needed
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: 1.9.1 Branch
: x86 Windows XP
: -- critical with 11 votes (vote)
: ---
Assigned To: Jeff Muizelaar [:jrmuizel]
:
Mentors:
: 433609 458130 465344 471846 502955 (view as bug list)
Depends on: 515197
Blocks: 626994
  Show dependency treegraph
 
Reported: 2008-05-26 06:46 PDT by Mats Palmgren (:mats)
Modified: 2015-10-22 13:54 PDT (History)
32 users (show)
dveditz: wanted1.9.0.x+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wanted


Attachments
Upstream patch to fix this problem (773 bytes, patch)
2009-07-23 10:45 PDT, Jeff Muizelaar [:jrmuizel]
joe: review+
Details | Diff | Splinter Review
debug log (2.12 KB, text/plain)
2009-09-24 11:28 PDT, huomenta
no flags Details
patch, detect and log tolerance == NaN occurrences (20.48 KB, patch)
2009-12-21 18:54 PST, John Daggett (:jtd)
no flags Details | Diff | Splinter Review
Debug log output from jdaggett@mozilla.com-badtolerancelogging2 tryserver build (31.94 KB, text/plain)
2009-12-22 01:55 PST, Tobias Kunze
no flags Details
Zip of debugevents.out (41.05 KB, application/zip)
2010-01-13 07:21 PST, David Dillard
no flags Details
Another trace for John (40.87 KB, application/zip)
2010-01-14 07:11 PST, David Dillard
no flags Details

Description Mats Palmgren (:mats) 2008-05-26 06:46:50 PDT
Crash [@ _de_casteljau] due to infinite recursion of
[@ _cairo_spline_decompose_into].

Currently at #82 on the "Top Crashers for Firefox 3.0" list, with 284
crashes in the past 2 weeks.  All those crashes are on Windows.
(it's #64 for Windows).

Example crash reports:
bp-e902c5b8-2769-11dd-a5b8-0013211cbf8a
bp-0cb6bee3-25ba-11dd-bca0-0013211cbf8a
bp-6fda3d66-2a4a-11dd-93ee-001a4bd46e84
bp-2a222bc7-25b3-11dd-9514-0013211cbf8a

Stack:
_de_casteljau                 mozilla/gfx/cairo/cairo/src/cairo-spline.c:167
_cairo_spline_decompose_into  mozilla/gfx/cairo/cairo/src/cairo-spline.c:255 
_cairo_spline_decompose_into  mozilla/gfx/cairo/cairo/src/cairo-spline.c:257
          ... repeat a few thousand times ...
_cairo_spline_decompose_into  mozilla/gfx/cairo/cairo/src/cairo-spline.c:257
_cairo_spline_decompose_into  mozilla/gfx/cairo/cairo/src/cairo-spline.c:261
_cairo_spline_decompose       mozilla/gfx/cairo/cairo/src/cairo-spline.c:278
_cairo_filler_curve_to        mozilla/gfx/cairo/cairo/src/cairo-path-fill.c:132
_cairo_path_fixed_interpret   mozilla/gfx/cairo/cairo/src/cairo-path-fixed.c:524
_cairo_path_fixed_fill_to_traps	mozilla/gfx/cairo/cairo/src/cairo-path-fill.c:185
_cairo_surface_fallback_fill  mozilla/gfx/cairo/cairo/src/cairo-surface-fallback.c:898
_cairo_surface_fill           mozilla/gfx/cairo/cairo/src/cairo-surface.c:1626
_cairo_gstate_fill            mozilla/gfx/cairo/cairo/src/cairo-gstate.c:1015
_moz_cairo_fill_preserve      mozilla/gfx/cairo/cairo/src/cairo.c:2177
gfxContext::Fill              mozilla/gfx/thebes/src/gfxContext.cpp:136
FillFastBorderPath            mozilla/layout/base/nsCSSRendering.cpp:1574
DrawBorderSides               mozilla/layout/base/nsCSSRendering.cpp:2209
DrawBorders                   mozilla/layout/base/nsCSSRendering.cpp:2629
nsCSSRendering::PaintBorder   mozilla/layout/base/nsCSSRendering.cpp:2836 
...


http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/gfx/cairo/cairo/src/cairo-spline.c&rev=1.14&mark=255,257,261#261

http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/gfx/cairo/cairo/src/cairo-spline.c&rev=1.14&root=/cvsroot&mark=154,161#153
Comment 1 timeless 2008-05-27 03:02:25 PDT
bug 433609 has a reporter.
Comment 2 Mats Palmgren (:mats) 2008-05-28 06:39:04 PDT
I filed a bug at freedesktop.org too, in case it affects more than Firefox:
https://bugs.freedesktop.org/show_bug.cgi?id=16116
Comment 3 timeless 2008-05-28 08:08:26 PDT
*** Bug 433609 has been marked as a duplicate of this bug. ***
Comment 4 chris hofmann 2008-10-01 15:30:47 PDT
*** Bug 458130 has been marked as a duplicate of this bug. ***
Comment 5 chris hofmann 2008-10-01 15:31:35 PDT
looks like it has moved up to #37 in early fx 3.0.3 data
Comment 6 Vladimir Vukicevic [:vlad] [:vladv] 2008-10-01 15:42:48 PDT
Any idea what sites this is showing up on?
Comment 7 Martijn Wargers [:mwargers] (not working for Mozilla) 2008-10-01 16:38:15 PDT
See comments: http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&version=Firefox%3A3.0.3&signature=_de_casteljau
Nothing really stands out, it seems.
Comment 8 Mats Palmgren (:mats) 2008-10-08 08:47:55 PDT
It's been fixed upstream for Cairo 1.8 it seems:
http://bugs.freedesktop.org/show_bug.cgi?id=16116#c1
Comment 9 timeless 2008-11-24 05:44:17 PST
*** Bug 465344 has been marked as a duplicate of this bug. ***
Comment 10 Mats Palmgren (:mats) 2008-12-25 17:23:32 PST
Based on http://bugs.freedesktop.org/show_bug.cgi?id=16116#c1
I'm guessing it's this that fixed it:
http://cgit.freedesktop.org/cairo/commit/?id=8c0ff8b5856a8a7cb61dffaad7d72ed2dcdb5cf3
which we have in trunk and 1.9.1, but not in 1.9.0

-> WORKSFORME
Comment 11 timeless 2009-01-02 02:01:31 PST
*** Bug 471846 has been marked as a duplicate of this bug. ***
Comment 12 timeless 2009-01-02 02:06:02 PST
given that we know what fixed this (cairo 1.8) and could find the exact fix, i think we should consider trying to get this into a 1.9.0.x release
Comment 13 Marcus Christie 2009-01-05 08:54:43 PST
I'm still getting this crash on Firefox 3.1b2 on Windows

http://crash-stats.mozilla.com/report/index/6ec4cc3e-4cec-4fcf-8221-61b122090105
Comment 15 Dave Anderson 2009-04-30 09:41:17 PDT
This problem is making Firefox just about useless for me. I can't depend on FF running for a whole day at work, and have gone back to FF 3.0.5 since it seems to crash about once per day instead of several times per day. I have not sent every possible crash report but I intend to do so in the future, and enter this bug number in the comments. I have tried running in Safe Mode, I've created a new profile and only copied over saved passwords and bookmarks(and run that profile in safe mode as well). Most if not all of the crashes below are due to this problem.

64ca1335-3357-4190-a87d-2b1c22090429 4/29/2009 4:57 PM
a9c38fda-dd5e-44eb-991a-c61292090428 4/28/2009 10:55 AM
6567d6d8-16d1-4466-b2cf-cb8082090424 4/24/2009 1:23 AM
52b79fff-1432-48ba-ae65-f21ff2090422 4/22/2009 9:29 AM
28f59dbd-e531-47bc-8298-227c92090421 4/21/2009 9:39 AM
236d282c-7f57-401c-a727-871062090421 4/21/2009 8:39 AM
239d28a2-6172-48d8-b22a-3f5aa2090417 4/17/2009 10:28 AM
c8aba42e-92b9-4906-ad7d-987d22090417 4/17/2009 10:23 AM
b314cb64-073f-4621-9a2f-9ff8f2090417 4/17/2009 8:49 AM
e0bd7e93-818a-4a36-97eb-b12b62090416 4/16/2009 11:24 PM
baf8010f-535b-4feb-bd32-1799e2090416 4/16/2009 9:48 PM
6cc862d9-bb24-4185-a9da-1a1102090415 4/15/2009 2:51 PM
0a1ff274-9f78-4c13-a077-2f29f2090415 4/15/2009 2:46 PM
a768f2d9-90bb-4513-a2a1-3b9ed2090415 4/15/2009 1:54 PM
5b09b697-df51-4aa1-adcb-95a042090415 4/15/2009 1:37 PM
96ce5775-1658-40da-a22f-6322c2090414 4/14/2009 8:29 AM
063468c3-a83d-408a-9eb6-d1ada2090410 4/10/2009 11:39 AM
ab670b5b-d9dc-4d4a-aba4-909ef2090408 4/8/2009 10:21 AM
89160ca0-91bd-4ad5-9c52-870422090408 4/8/2009 9:28 AM
0d57bd96-e59c-42f6-b135-f0cc82090407 4/7/2009 12:09 AM
f8ef229c-4c62-499a-8024-388762090406 4/6/2009 6:42 PM
b2e8cd27-6aa1-437a-906f-0c4752090406 4/6/2009 6:42 PM
5698ca58-d786-41cc-983a-43b722090406 4/6/2009 3:37 PM
67486e22-973c-4cdc-863b-0a2f72090401 4/1/2009 9:34 AM
3548681d-5a99-4d6c-9550-bfa172090331 3/31/2009 8:38 PM
36bd4d1b-ad60-4f9b-aff0-69ed12090331 3/31/2009 8:37 PM
21976e21-1ce8-4c9b-9b45-435b12090328 3/28/2009 7:41 PM
3fa648e2-8010-4d67-8cbc-1f1c72090328 3/28/2009 7:37 PM
113174eb-3ed8-48b7-8308-8f98d2090328 3/28/2009 7:16 PM
cb184be5-2bcb-4c83-b6e3-9b08e2090327 3/27/2009 10:15 AM
d5dbbec2-152b-4eec-8247-3cc072090327 3/27/2009 9:23 AM
3c5d9cd5-a770-4a97-8f73-8b1cd2090326 3/26/2009 9:13 AM
bfcf4e6f-7567-4b53-bf46-6dcaa2090325 3/25/2009 11:13 PM
056a77d2-d5cf-4ef1-b616-f21972090325 3/25/2009 3:08 PM
77447c95-27da-4faf-b4b8-d03d22090325 3/25/2009 10:26 AM
5078e138-138a-4eab-be08-3f3992090325 3/25/2009 9:58 AM
245dfbac-0cc0-4b6e-9582-54ef12090324 3/24/2009 11:28 PM
d430b9a9-920c-41fc-b60d-b12ac2090318 3/18/2009 3:02 PM
1113fa1f-5d22-438d-8bff-292f62090318 3/18/2009 8:06 AM
6b16a46b-ab4b-483f-ae74-70bfd2090317 3/17/2009 11:02 AM
e89f4b2a-ddca-44b0-ab56-1ef612090317 3/17/2009 11:01 AM
bf21b47b-5bb7-410b-9f05-98fc82090317 3/17/2009 9:35 AM
8c68e83e-6a0a-4026-a4d3-125252090311 3/11/2009 6:14 PM
ec81274f-bbb8-4b9e-8bf3-882172090311 3/11/2009 2:24 PM
f8f30f3c-d2f0-4001-b0ec-db8f42090311 3/11/2009 2:23 PM
5841a55a-4444-4f8e-8f30-d0e3a2090310 3/10/2009 8:19 PM
68054a7c-609e-41f7-906e-7de232090310 3/10/2009 7:56 PM
139247cd-661d-41cb-9187-1554f2090310 3/10/2009 7:44 PM
Comment 16 timeless 2009-04-30 17:34:48 PDT
dave: i'm sorry, but your paste wasn't helpful. we never claimed it was fixed for 3.0.x, there was a fix for what would be 3.5, however it's clear that it isn't fixed. note that we generally don't fix things on branches until after we've fixed them on trunk.

bp-10a88457-e723-42f2-b9c8-5c7f92090420
Firefox 3.6a1pre Crash Report [@ _de_casteljau ]
Crash Reason	EXCEPTION_STACK_OVERFLOW

mats: could you please rebug the freedesktop people? thanks.
Comment 17 timeless 2009-07-07 15:39:06 PDT
*** Bug 502955 has been marked as a duplicate of this bug. ***
Comment 18 Mikel Waxler 2009-07-23 09:07:29 PDT
I am going to have to mirror Dave's comments. My IT is still working on rolling out something newer then IE6 so having access to FF is crucial for me. Here are my crash logs, as you can see, this happens several times a day.

Is there any way this can get bumped up? Everyone at my office just deals with this all day long. 

Thanks for all the hard work.

8cd02f9a-35f7-42d7-a90b-046452090723	7/23/2009	11:46 AM
bb3a8254-9bb8-4730-86da-be5b92090723	7/23/2009	11:35 AM
aa3657ef-abd0-429d-85b8-1ee5a2090723	7/23/2009	9:45 AM
e3422aab-b897-4854-9fe6-0aa952090722	7/22/2009	2:56 PM
2609a8d4-727f-41fc-bd9d-5c2e92090722	7/22/2009	2:05 PM
481d743d-08a3-4929-ac64-1da2e2090722	7/22/2009	10:35 AM
(vacation)
d93e2210-1f74-4544-836d-5c0d72090710	7/10/2009	12:09 PM
65812348-7ae9-4596-a9a7-3b8172090709	7/9/2009	1:34 PM
8c6b4591-f44a-4529-b975-cba132090707	7/7/2009	10:06 AM
0268800a-e8cc-4686-9856-a6b612090702	7/2/2009	7:57 PM
4e4cda7a-a6a9-4d39-8793-f5bcc2090702	7/2/2009	4:42 PM
0907c489-6b67-400b-adfc-582572090702	7/2/2009	10:37 AM
03f20086-da2f-4cd2-937e-3b7df2090630	6/30/2009	2:48 PM
a34c950f-1742-40fb-aad7-5bc9d2090629	6/29/2009	7:15 PM
345eeda8-b6b1-4628-af04-2456f2090629	6/29/2009	7:09 PM
c407efc4-ef2e-4dd0-a766-b5bb72090629	6/29/2009	6:29 PM
6e2899f9-973f-4f73-afe8-cac752090629	6/29/2009	6:29 PM
d7fd4118-c100-4d7c-8349-5775b2090629	6/29/2009	6:29 PM
d2a21d8d-7949-4989-8082-8c3d32090626	6/26/2009	10:46 PM
daaeb74d-e829-4293-9dc0-0850b2090626	6/26/2009	4:25 PM
bddafff6-dc50-43ef-a3b6-99db62090626	6/26/2009	3:16 PM
e12b0f94-262a-448c-a511-7dde52090626	6/26/2009	12:08 PM
64038b51-9414-4951-b86a-c43932090626	6/26/2009	11:25 AM
6661f9cb-5569-4fd4-b667-0236b2090626	6/26/2009	11:24 AM
ac7a5341-6959-406e-8522-99ee22090623	6/23/2009	1:15 PM
96bbafb1-f77e-49fb-88a5-70ad02090619	6/19/2009	11:24 PM
e68f7cce-dc1a-4772-bcc3-e88432090619	6/19/2009	2:56 PM
678ab82b-ecfe-4366-a9e3-f647a2090619	6/19/2009	1:51 PM
3d1281bf-a8a6-4eb7-81db-991262090618	6/18/2009	5:26 PM
d6d45efe-f124-4cc5-9496-b470d2090618	6/18/2009	3:45 PM
0c745165-7f4c-42ee-8799-9e52a2090618	6/18/2009	12:51 PM
73b2176c-5e3c-4670-ae35-481272090617	6/17/2009	12:06 PM
188f10a1-0f39-4559-9bd3-180652090617	6/17/2009	11:21 AM
a8def83a-c155-4d37-8392-2577e2090612	6/12/2009	3:00 PM
b3e9c8e9-b10a-473e-80aa-7bbc12090612	6/12/2009	2:57 PM
b1db3cb2-4496-40c5-a7a8-736412090612	6/12/2009	10:16 AM
6a73ab1c-cb0d-418e-a5c5-d6a442090608	6/8/2009	3:24 PM
81404735-804c-4b2e-b7db-627d82090608	6/8/2009	3:01 PM
ed076222-440d-4851-8d68-4fc942090608	6/8/2009	3:01 PM
5471962c-45ce-481f-a3c6-000df2090605	6/5/2009	2:06 PM
f731dd44-4f0f-47a0-b64c-bfca62090605	6/5/2009	1:19 PM
72a62a43-4f13-499e-b363-2c8422090601	6/1/2009	5:41 PM
b25520ae-7ec3-4ae0-a4e5-2f0202090529	5/29/2009	4:01 PM
e2cf1556-866e-4591-8128-d235f2090529	5/29/2009	1:03 PM
dd0b1517-196b-4e1d-abfc-978202090529	5/29/2009	1:02 PM
f574133c-3a15-4f72-8964-e8e092090529	5/29/2009	11:49 AM
ecddf680-f86f-4dcf-b812-204702090529	5/29/2009	11:48 AM
ba6d897c-3236-48e9-9072-516052090528	5/28/2009	12:01 PM
e46cd668-cac5-4af3-914b-46cbf2090528	5/28/2009	12:01 PM
a434e742-90d7-44e3-8ec9-d94182090526	5/26/2009	2:47 PM
e0208f1c-e22f-42c8-ac8b-7f09c2090526	5/26/2009	11:20 AM
2a8fb500-f8d4-4847-8c33-9a4f52090522	5/22/2009	3:50 PM
66d3dcc7-adb3-4f0b-ac20-bff982090521	5/21/2009	4:50 PM
f96b8a1d-b138-4e74-8b36-140b42090521	5/21/2009	2:04 PM
7e62852d-250e-4c68-880c-8d73f2090521	5/21/2009	11:43 AM
24315ba9-e11c-4dd6-af06-1fe752090520	5/20/2009	12:00 PM
643db6c6-ac03-4f46-b92f-116902090518	5/18/2009	3:48 PM
ac681783-5ac0-497e-876a-764bd2090518	5/18/2009	11:24 AM
8bdec44d-9a92-406d-a059-f67592090512	5/12/2009	2:50 PM
ca6136a6-463e-4f58-a703-8160f2090511	5/11/2009	5:00 PM
4b3fb2c5-0ed2-4138-a128-ee14b2090511	5/11/2009	4:37 PM
e7bafabf-575f-4750-8c83-9f8cc2090507	5/7/2009	9:55 AM
1047792b-a948-42c6-a80b-d9daa2090506	5/6/2009	2:23 PM
744d86c1-abc9-4c19-aed2-1403d2090504	5/4/2009	2:52 PM
d191dc54-0f9d-496f-a47f-bc05b2090504	5/4/2009	2:48 PM
975c31e2-1fc5-4845-b0e8-c5f622090504	5/4/2009	2:47 PM
6e0e7b32-aa94-45dc-8b3e-145eb2090504	5/4/2009	9:49 AM
f36321db-2c17-44a7-a746-4b40b2090429	4/29/2009	3:15 PM
afdb438d-e8a6-435b-b1a7-617252090429	4/29/2009	11:01 AM
5880f10c-584d-4144-85f0-d83aa2090428	4/28/2009	3:10 PM
bd1624c4-91fd-4e2b-87e9-515882090428	4/28/2009	12:11 PM
e68910c9-9ddd-4538-8831-00e722090428	4/28/2009	12:10 PM
6e167878-14a6-41d0-9df9-cc5bb2090427	4/27/2009	4:59 PM
7ce41cc9-bc58-479c-9dd0-266532090427	4/27/2009	3:01 PM
25f6b455-d95e-47d5-88e1-434c02090427	4/27/2009	2:36 PM
57e31331-2711-4b11-a7a8-cd4442090427	4/27/2009	9:50 AM
5289c39c-0cdb-4973-ba29-d8c1f2090427	4/27/2009	9:22 AM
bdbda4e5-d676-43de-a7ea-2b9962090424	4/24/2009	1:04 PM
6640ae8e-dc3a-46a8-bb81-cbc602090423	4/23/2009	6:27 PM
eecffa71-d4e2-49a5-ac59-773f82090423	4/23/2009	10:44 AM
5176f143-2c15-4199-8977-dd4212090423	4/23/2009	9:28 AM
13ed8c3b-081a-4507-811d-2c8c82090423	4/23/2009	9:11 AM
c50dbc97-5e53-4930-8655-5d5592090422	4/22/2009	4:25 PM
45a4f160-ccc0-4ed7-b756-d92cf2090422	4/22/2009	4:25 PM
954c07f1-4681-449d-8d28-180232090417	4/17/2009	12:48 PM
d43dea93-065e-4e69-b9b8-e252f2090417	4/17/2009	12:02 PM
d1400766-d644-4d56-b5b0-b962c2090417	4/17/2009	11:20 AM
38c2a9ce-8621-4800-a0ee-a918f2090417	4/17/2009	11:13 AM
0b5d001a-c1db-4c7d-91be-aded92090416	4/16/2009	12:42 PM
c7cde4a6-a817-48fb-9e40-a13092090415	4/15/2009	4:56 PM
82721973-c7fa-4813-a75f-ad89b2090413	4/13/2009	12:39 PM
1583a1be-44df-4f4b-99f8-b5bab2090410	4/10/2009	4:12 PM
93d31bb8-3bfd-4003-b1f4-fc9122090410	4/10/2009	3:02 PM
83e8875e-3131-49e8-a643-2104c2090410	4/10/2009	2:13 PM
f325bf48-043a-4464-bfdd-2e12f2090410	4/10/2009	2:00 PM
319fc578-afa3-4292-b42e-f647a2090410	4/10/2009	1:51 PM
1158f58d-6042-42f2-b114-751b52090410	4/10/2009	12:55 PM
1afd7fa1-5e2f-44ff-a059-c73182090410	4/10/2009	9:28 AM
20526fbb-3acc-4c92-a57b-a6f552090409	4/9/2009	5:08 PM
480c806f-9253-40a3-90c0-a0d622090409	4/9/2009	3:57 PM
c9351580-1a47-460f-80af-357fa2090409	4/9/2009	3:10 PM
11f0a631-9962-41ee-80ab-e83b62090409	4/9/2009	10:38 AM
68089eac-b090-4bce-87ce-346b32090409	4/9/2009	10:29 AM
1c9d02d2-843f-4649-ac39-7dfd02090409	4/9/2009	10:27 AM
588a561f-5564-40cc-999c-6aca22090409	4/9/2009	9:20 AM
71c60e7b-5b6a-4955-b8f3-77eb22090409	4/9/2009	9:01 AM
29d25dfd-fc15-4df0-9aee-8ed372090408	4/8/2009	10:37 AM
473d088a-79fc-4238-8cfb-208c12090408	4/8/2009	10:20 AM
0851cddb-9436-49fd-9be6-18c072090407	4/7/2009	11:47 AM
4b9c9556-d96a-4051-a9b4-7adc02090407	4/7/2009	11:47 AM
fdeb0b83-edd0-4bc7-a0c4-d33d02090407	4/7/2009	9:54 AM
6f6d1f88-47f8-4f8e-85fc-07f902090406	4/6/2009	4:37 PM
10f406a2-082f-41b4-8d50-ea7422090406	4/6/2009	1:54 PM
2f684b66-ceac-468e-af8b-07efe2090406	4/6/2009	1:52 PM
6a6be241-80b3-40d2-9d5d-9c86c2090406	4/6/2009	9:34 AM
9e1149e2-bf76-4f20-8b6f-8c9ef2090406	4/6/2009	9:33 AM
10a321e3-a0e7-4512-81dc-3114d2090406	4/6/2009	9:33 AM
be973d32-b54e-43a4-8898-c11a52090403	4/3/2009	2:27 PM
bbee63a2-77b9-43b0-8c0d-b6a112090403	4/3/2009	2:23 PM
e89cdb0c-bdac-4ff6-baa3-9ce082090403	4/3/2009	12:59 PM
a61f4558-570d-4898-bca9-850792090403	4/3/2009	12:56 PM
727d0c86-fac4-436b-949b-e69522090403	4/3/2009	11:12 AM
461a841a-aedf-44b7-8566-cba9f2090402	4/2/2009	2:10 PM
ae522093-5162-4e59-8355-765432090402	4/2/2009	2:01 PM
ccd40fcf-1b5d-4aed-bafb-c037b2090402	4/2/2009	11:51 AM
c87aa3a6-5b06-4c50-8526-4d1172090402	4/2/2009	11:51 AM
07d4a11e-2ff0-491d-b91d-fb6232090402	4/2/2009	10:30 AM
Comment 19 Jeff Muizelaar [:jrmuizel] 2009-07-23 10:45:42 PDT
Created attachment 390265 [details] [diff] [review]
Upstream patch to fix this problem
Comment 20 Daniel Veditz [:dveditz] 2009-08-31 15:16:53 PDT
Is the "version" field on this bug correct? If this affects trunk are we going to land this fix there as well, or are we planning on upgrading the entire libcairo? Normally we'd like things fixed on trunk and verified before taking them on the stable branches, unless they're not applicable.
Comment 21 chris hofmann 2009-08-31 15:56:59 PDT
not sure I'm getting trunk data in my crash analysis report yet, but here are the versions we got related crash on during 2009 08 30

distribution of all versions where the _de_casteljau crash was found on 20090830-crashdata.csv
  24 Firefox 3.5.2
  18 Firefox 3.0.13
   2 Firefox 3.5
   2 Firefox 3.0.1
   1 Firefox 3.0.11

distribution of all versions where the _cairo_spline_decompose_into crash was found on 20090830-crashdata.csv
   3 Firefox 3.5.2
   1 Firefox 3.0.13
Comment 22 Daniel Veditz [:dveditz] 2009-09-02 15:57:49 PDT
Comment on attachment 390265 [details] [diff] [review]
Upstream patch to fix this problem

needs answer to comment 20 before we can approve this for the 1.9.1 branch.
Comment 23 Jeff Muizelaar [:jrmuizel] 2009-09-08 08:49:10 PDT
(In reply to comment #20)
> Is the "version" field on this bug correct? If this affects trunk are we going
> to land this fix there as well, or are we planning on upgrading the entire
> libcairo? Normally we'd like things fixed on trunk and verified before taking
> them on the stable branches, unless they're not applicable.

The bug was already fixed on trunk with an upgrade of libcairo, so the problem only occurs on 1.9.0 and 1.9.1.
Comment 24 Mats Palmgren (:mats) 2009-09-08 09:40:57 PDT
But the patch is already present on the 1.9.1 branch:
http://mxr.mozilla.org/mozilla1.9.1/source/gfx/cairo/cairo/src/cairo.c

AFAICT, it was added 2008-11-05 as part of the upgrade to cairo 1.8.2:

changeset:   21379:ce976e0708ab
user:        Vladimir Vukicevic <vladimir@pobox.com>
date:        Wed Nov 05 23:48:23 2008 -0800
summary:     b=462938, Upgrade cairo to 1.8.2 and pixman to 0.12.0 [cairo piece]

So it should be fixed in Firefox 3.5(.x), but since we're still
getting crash reports it seems cairo 1.8.2 didn't fix it?

FWIW, a crash in 3.5:
bp-5296cef3-67bd-4f99-9de9-430c72090904
and in 3.5.1:
bp-44cc3795-79db-49bf-aae5-b0a272090907
and in 3.5.3 (a candidate build presumably):
bp-93d17ac9-2152-48af-940a-00e3a2090908
Comment 25 Jeff Muizelaar [:jrmuizel] 2009-09-08 10:30:37 PDT
(In reply to comment #24)
> But the patch is already present on the 1.9.1 branch:
> http://mxr.mozilla.org/mozilla1.9.1/source/gfx/cairo/cairo/src/cairo.c
> 
> AFAICT, it was added 2008-11-05 as part of the upgrade to cairo 1.8.2:
> 
> changeset:   21379:ce976e0708ab
> user:        Vladimir Vukicevic <vladimir@pobox.com>
> date:        Wed Nov 05 23:48:23 2008 -0800
> summary:     b=462938, Upgrade cairo to 1.8.2 and pixman to 0.12.0 [cairo
> piece]
> 
> So it should be fixed in Firefox 3.5(.x), but since we're still
> getting crash reports it seems cairo 1.8.2 didn't fix it?
>

Indeed. So perhaps there's a different problem here.
Comment 26 Mats Palmgren (:mats) 2009-09-08 10:37:56 PDT
I reopened the Cairo bug:
http://bugs.freedesktop.org/show_bug.cgi?id=16116#c4
Comment 27 Mikel Waxler 2009-09-08 10:39:34 PDT
Here is a crash report for 3.5.2:
http://crash-stats.mozilla.com/report/index/bbdd74dd-458d-44d3-8e00-3380f2090904

Incidentally, this seems to happen exclusively when connected via VPN. I have not had any trouble in since 7/30, but the first day I worked from home and used my VPN, I had 4 crashes in 1 day. I hope that helps.

bbdd74dd-458d-44d3-8e00-3380f2090904	9/4/2009	5:41 PM
0b4af0e7-bc72-4759-a7a8-1953c2090904	9/4/2009	5:40 PM
566aaa0b-a8dd-4fa6-9291-83bf32090904	9/4/2009	12:42 PM
521e93c3-9eba-4b75-9e17-b1dc72090904	9/4/2009	12:42 PM
b5933090-7f0a-4348-a0e5-2f0822090904	9/4/2009	10:14 AM
4735d72f-b692-417d-8b2b-1cf2a2090730	7/30/2009	11:22 AM
Comment 28 Jeff Muizelaar [:jrmuizel] 2009-09-08 10:59:00 PDT
(In reply to comment #27)
> Here is a crash report for 3.5.2:
> http://crash-stats.mozilla.com/report/index/bbdd74dd-458d-44d3-8e00-3380f2090904
> 

These crash reports don't have useful stacks. I've filled bug 515197 to get that fixed. In the mean time is there a particular web page that causes the crashes?
Comment 29 Mikel Waxler 2009-09-08 11:24:06 PDT
No, it is definitely not always Google Reader, but I would say that that page is one of the most frequent offenders.
Comment 30 David Dillard 2009-09-08 11:46:54 PDT
There's definitely more to it than just the web site.  I get this crash mostly on an internal web site, but it doesn't happen every time or even most times.  I don't think I've ever had it crash on Google Reader and I use that all application many times every day (on three different OSs).
Comment 31 Jeff Muizelaar [:jrmuizel] 2009-09-10 18:15:53 PDT
The processor has been fixed to give more useful results. So if some one can link to a crash more recent then now that would be helpful.
Comment 32 David Dillard 2009-09-19 09:01:37 PDT
Jeff, take a look at this crash: http://crash-stats.mozilla.com/report/index/6af176d3-15e9-4b42-8a1b-3984f2090908

It might have happened after the issue with the processor was fixed.
Comment 33 huomenta 2009-09-21 14:57:45 PDT
I've upgraded to Namoroka (3.6a2) to get rid of this crash. It used to crash several times per day with 3.0.x and 3.5.x. Now I'm getting pretty much the same crashing pattern with Namoroka. The signature is in _cairo_spline_error_squared, but infinite recursion is in the same _cairo_spline_decompose_into.

http://crash-stats.mozilla.com/report/index/bp-1785696e-7367-43c6-a221-d22e72090921
http://crash-stats.mozilla.com/report/index/bp-ec15a481-c89e-4ccb-83f6-c325f2090921
http://crash-stats.mozilla.com/report/index/bp-c14407b2-b97f-43d9-9189-d7bda2090921

Looks like it was not fixed in 1.9.2.

Also, once I saw comment https://bugzilla.mozilla.org/show_bug.cgi?id=435756#c29, it occurred to me that here too it crashes most of the time (if not always, I have not tracked it long enough) when connected via Cisco VPN (4.8.00.0440).
Comment 34 Jeff Muizelaar [:jrmuizel] 2009-09-21 15:10:35 PDT
Does anyone have a repeatably crashing webpage or instruction for how to repeatedly reproduce the crash? Having the actual curve data that's causing the infinite recursion would be tremendously helpful.
Comment 35 Mats Palmgren (:mats) 2009-09-21 17:42:00 PDT
We could make a special build with logging of the needed data and
kindly ask people to use it?  Several people in this bug reports
frequent crashes and might be interested in volunteering for that.
Comment 36 huomenta 2009-09-22 00:22:35 PDT
(In reply to comment #35)
> We could make a special build with logging of the needed data and
> kindly ask people to use it?  Several people in this bug reports
> frequent crashes and might be interested in volunteering for that.

I'm willing to try it if you make one.
Comment 37 huomenta 2009-09-22 00:36:05 PDT
(In reply to comment #34)
> Does anyone have a repeatably crashing webpage or instruction for how to
> repeatedly reproduce the crash? Having the actual curve data that's causing the
> infinite recursion would be tremendously helpful.

I've tried to isolate such webpage, but could not. However it is certain that once it crashed first time it usually keeps crashing during session restore. See #27 and #33. One way to get rid of the crash is clean start. The other is to let browser rest for a while, not touching mouse or keyboard in the browser window until all restored tabs are fully loaded. This usually helps to avoid immediate crash on restore.
Comment 38 David Dillard 2009-09-22 07:24:53 PDT
(In reply to comment #35)
> We could make a special build with logging of the needed data and
> kindly ask people to use it?  Several people in this bug reports
> frequent crashes and might be interested in volunteering for that.

I'm willing to try as well.
Comment 39 pfaffflo 2009-09-23 03:52:37 PDT
(In reply to comment #35)
> We could make a special build with logging of the needed data and
> kindly ask people to use it?  Several people in this bug reports
> frequent crashes and might be interested in volunteering for that.

I'm willing to try as well. 

Had about 4-5 crashes only today. btw I am also using the Cisco VPN (4.8.00.0440) mentioned above (comment 33)
Comment 40 Mats Palmgren (:mats) 2009-09-23 18:33:50 PDT
Here's Firefox 3.6x builds with some logging:
https://build.mozilla.org/tryserver-builds/mpalmgren@mozilla.com-435756-192-log/
Instructions:
1. start the build from a console window
2. make it crash
3. save the console output to a file and attach the file to this bug
Comment 41 Jeff Muizelaar [:jrmuizel] 2009-09-23 19:49:05 PDT
I also have a trunk build at https://build.mozilla.org/tryserver-builds/jmuizelaar@mozilla.com-spline-decompose-log/ that should write out a "spline-deep.log" file just before crashing that people can try out.
Comment 42 David Dillard 2009-09-24 10:35:12 PDT
Here's the trace data from an apparent crash using the build specified in comment #40:

lvl=1 cairo_spline_error_squared=1.371582e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1704 -1024
c: -2048 -680
d: -2048 -256
}
lvl=2 cairo_spline_error_squared=1.004943e-001
s(0012E538)={
    x       y
a: -1280 -1024
b: -1492 -1024
c: -1684 -938
d: -1823 -799
}
lvl=3 cairo_spline_error_squared=6.636616e-003
s(0012E538)={
    x       y
a: -1280 -1024
b: -1386 -1024
c: -1487 -1003
d: -1579 -964
}
lvl=4 cairo_spline_error_squared=4.027062e-004
s(0012E538)={
    x       y
a: -1280 -1024
b: -1333 -1024
c: -1385 -1019
d: -1435 -1009
}
lvl=5 cairo_spline_error_squared=1.601807e-005
s(0012E538)={
    x       y
a: -1280 -1024
b: -1307 -1024
c: -1333 -1023
d: -1359 -1021
}
lvl=6 cairo_spline_error_squared=0.000000e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1294 -1024
c: -1307 -1024
d: -1320 -1024
}
lvl=7 cairo_spline_error_squared=0.000000e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1287 -1024
c: -1294 -1024
d: -1301 -1024
}
lvl=8 cairo_spline_error_squared=0.000000e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1284 -1024
c: -1288 -1024
d: -1292 -1024
}
lvl=9 cairo_spline_error_squared=0.000000e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1282 -1024
c: -1284 -1024
d: -1286 -1024
}
lvl=10 cairo_spline_error_squared=0.000000e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1281 -1024
c: -1282 -1024
d: -1283 -1024
}
lvl=11 cairo_spline_error_squared=0.000000e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1281 -1024
c: -1282 -1024
d: -1283 -1024
}

// Continues on with same values to:

lvl=9872 tolerance_squared=-1.#IND00e+000
lvl=9873 cairo_spline_error_squared=0.000000e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1281 -1024
c: -1282 -1024
d: -1283 -1024
}
lvl=9873 tolerance_squared=-1.#IND00e+000
lvl=9874 cairo_spline_error_squared=0.000000e+000
s(0012E538)={
    x       y
a: -1280 -1024
b: -1281 -1024
c: -1282 -1024
d: -1283 -1024
}
lvl=9874 tolerance_squared=-1.#IND00e+000


I have the full trace file if it's useful.

I'll try the build in comment #41 now.
Comment 43 huomenta 2009-09-24 10:41:44 PDT
(In reply to comment #40)
> Here's Firefox 3.6x builds with some logging:
> https://build.mozilla.org/tryserver-builds/mpalmgren@mozilla.com-435756-192-log/
> Instructions:
> 1. start the build from a console window
> 2. make it crash
> 3. save the console output to a file and attach the file to this bug

I've tried your build. It crashed with VPN, but no console output was produced.
There were several crashes in session restore but no output to console window.
There was also no familiar crash reporter dialog shown. Instead windows
application crash dialog was displayed. I don't know how to extract any crash
info from it.

Just for reference, I've reproduced this crash using session manager restore in
my night build of Namoroka. Not surprisingly the crash was in the same place.

http://crash-stats.mozilla.com/report/index/bp-c85e583c-4ec1-4a86-8dcb-f403f2090924

Then, I disconnected VPN and restored session several times in the night build
and in your test build. No crash.
Comment 44 huomenta 2009-09-24 11:01:19 PDT
(In reply to comment #41)
> I also have a trunk build at
> https://build.mozilla.org/tryserver-builds/jmuizelaar@mozilla.com-spline-decompose-log/
> that should write out a "spline-deep.log" file just before crashing that people
> can try out.

Reproduced crash from my comment #43, now with build from comment #41. It crashed 5 times all restored tabs were loaded. Again no output to console and no crash reporter was activated, instead windows crash dialog displayed.

The log was small, so I'm including it inline (with comments).


crash #1
deep: fffff700 ffffff00, fffff700 fffffccb, fffff8cb fffffb00, fffffb00 fffffb00

crash #2
deep: fffff700 ffffff00, fffff700 fffffccb, fffff8cb fffffb00, fffffb00 fffffb00

crash #3
deep: 0 100, 8d 100, 100 173, 100 200
deep: 0 100, 8d 100, 100 173, 100 200
deep: 0 100, 8d 100, 100 173, 100 200

crash #4
deep: 0 100, 8d 100, 100 173, 100 200
deep: 0 100, 8d 100, 100 173, 100 200
deep: 0 100, 8d 100, 100 173, 100 200

crash #5
deep: 0 100, 8d 100, 100 173, 100 200
deep: 0 100, 8d 100, 100 173, 100 200
deep: 0 100, 8d 100, 100 173, 100 200
Comment 45 huomenta 2009-09-24 11:28:22 PDT
Created attachment 402616 [details]
debug log

More crashes, full spline-deep.log attached.
Comment 46 Mats Palmgren (:mats) 2009-09-24 18:49:26 PDT
(In reply to comment #42)

Thanks David, this is really helpful.

> tolerance_squared=-1.#IND00e+000

-1.#IND00e+000 is what you get if you printf a NaN with %e on VC/win32,
on Linux it prints as "nan".  I'm pretty sure the error is that the
'tolerance' given to _cairo_spline_decompose is NaN.
tolerance_squared = tolerance * tolerance  which in that case is also a NaN.
The test in _cairo_spline_decompose_into:
    if (_cairo_spline_error_squared (s1) < tolerance_squared) {
will always be false if tolerance_squared is NaN => infinite recursion.

I'm making a new test build with better logging of 'tolerance'...
Comment 47 Mats Palmgren (:mats) 2009-09-24 19:42:13 PDT
https://build.mozilla.org/tryserver-builds/mpalmgren@mozilla.com-435756-wip3-192/
(Windows and OSX builds should be ready soon)
Comment 48 Jeff Muizelaar [:jrmuizel] 2009-09-24 20:42:50 PDT
Likewise, I have a build with additional logging at:
https://build.mozilla.org/tryserver-builds/jmuizelaar@mozilla.com-spline-decompose-log3/

The tolerance being NaN feels like the problem might be memory corruption. Afaik, we don't change the tolerance from it's default value in Firefox. This may also explain why the problem doesn't seem to be attached to a particular page.
Comment 49 pfaffflo 2009-09-29 01:32:57 PDT
Had two crashes using the build from commant #48 while working over VPN. No crashes when I worked directly in the office.

Crash reporter did not open but found entries in spline-deep.log

deep(0x1,47ae14p-7): f100 fffffe00, f100 fffffce5, f1e5 fffffc00, f300 fffffc00 -- s1: f100 fffffe00, f100 fffffdb9, f10e fffffd75, f127 fffffd37
deep(0x1,47ae14p-7): f100 fffffe00, f100 fffffce5, f1e5 fffffc00, f300 fffffc00 -- s1: f127 fffffd37, f141 fffffcfa, f167 fffffcc3, f195 fffffc95
deep(0x1,47ae14p-7): f100 fffffe00, f100 fffffce5, f1e5 fffffc00, f300 fffffc00 -- s1: f195 fffffc95, f1f2 fffffc39, f272 fffffc00, f300 fffffc00
deep(0x1,47ae14p-7): f100 fffffe00, f100 fffffce5, f1e5 fffffc00, f300 fffffc00 -- s1: f100 fffffe00, f100 fffffdb9, f10e fffffd75, f127 fffffd37
deep(0x1,47ae14p-7): f100 fffffe00, f100 fffffce5, f1e5 fffffc00, f300 fffffc00 -- s1: f127 fffffd37, f141 fffffcfa, f167 fffffcc3, f195 fffffc95
deep(0x1,47ae14p-7): f100 fffffe00, f100 fffffce5, f1e5 fffffc00, f300 fffffc00 -- s1: f195 fffffc95, f1f2 fffffc39, f272 fffffc00, f300 fffffc00
deep(0x1,47ae14p-7): ed00 fffffb00, ef35 fffffb00, f100 fffffccb, f100 ffffff00 -- s1: ed00 fffffb00, ee1a fffffb00, ef1a fffffb72, efd3 fffffc2b
deep(0x1,47ae14p-7): ed00 fffffb00, ef35 fffffb00, f100 fffffccb, f100 ffffff00 -- s1: f0af fffffd70, f0c9 fffffdad, f0dd fffffded, f0ea fffffe30
deep(0x1,47ae14p-7): ed00 fffffb00, ef35 fffffb00, f100 fffffccb, f100 ffffff00 -- s1: f0ea fffffe30, f0f8 fffffe73, f100 fffffeb9, f100 ffffff00
deep(0x1,47ae14p-7): fffff700 ffffff00, fffff700 fffffccb, fffff8cb fffffb00, fffffb00 fffffb00 -- s1: fffff700 ffffff00, fffff700 fffffccb, fffff8cb fffffb00, fffffb00 fffffb00
Comment 50 Jeff Muizelaar [:jrmuizel] 2009-09-29 12:15:45 PDT
The results of the last test don't really make sense to me. Here's another build to see if we can get more information:

https://build.mozilla.org/tryserver-builds/jmuizelaar@mozilla.com-spline-decompose-log4/
Comment 51 huomenta 2009-09-29 15:31:50 PDT
(In reply to comment #50)
> The results of the last test don't really make sense to me. Here's another
> build to see if we can get more information:
> 
> https://build.mozilla.org/tryserver-builds/jmuizelaar@mozilla.com-spline-decompose-log4/

Here are 3 crashes with your build and VPN connected (3 log lines per crash).

deep(998:-0x1,#IND00p+0)=0,000000: e500 fffffb00, e735 fffffb00, e900 fffffccb, e900 ffffff00 -- s1: e500 fffffb00, e500 fffffb00, e500 fffffb00, e500 fffffb00
deep(999:-0x1,#IND00p+0)=0,000000: e500 fffffb00, e735 fffffb00, e900 fffffccb, e900 ffffff00 -- s1: e500 fffffb00, e500 fffffb00, e500 fffffb00, e500 fffffb00
deep(1000:-0x1,#IND00p+0)=0,000000: e500 fffffb00, e735 fffffb00, e900 fffffccb, e900 ffffff00 -- s1: e500 fffffb00, e500 fffffb00, e500 fffffb00, e500 fffffb00
deep(998:-0x1,#IND00p+0)=0,000000: 35d00 fffffc00, 35b58 fffffc00, 35a00 fffffd58, 35a00 ffffff00 -- s1: 35d00 fffffc00, 35cff fffffc00, 35cfe fffffc00, 35cfd fffffc00
deep(999:-0x1,#IND00p+0)=0,000000: 35d00 fffffc00, 35b58 fffffc00, 35a00 fffffd58, 35a00 ffffff00 -- s1: 35d00 fffffc00, 35cff fffffc00, 35cfe fffffc00, 35cfd fffffc00
deep(1000:-0x1,#IND00p+0)=0,000000: 35d00 fffffc00, 35b58 fffffc00, 35a00 fffffd58, 35a00 ffffff00 -- s1: 35d00 fffffc00, 35cff fffffc00, 35cfe fffffc00, 35cfd fffffc00
deep(998:-0x1,#IND00p+0)=0,000000: 2cf00 fffffe00, 2cf00 fffffd73, 2cf73 fffffd00, 2d000 fffffd00 -- s1: 2cf4a fffffd4a, 2cf4a fffffd49, 2cf4a fffffd48, 2cf4a fffffd47
deep(999:-0x1,#IND00p+0)=0,000000: 2cf00 fffffe00, 2cf00 fffffd73, 2cf73 fffffd00, 2d000 fffffd00 -- s1: 2cf4a fffffd4a, 2cf4a fffffd49, 2cf4a fffffd48, 2cf4a fffffd47
deep(1000:-0x1,#IND00p+0)=0,000000: 2cf00 fffffe00, 2cf00 fffffd73, 2cf73 fffffd00, 2d000 fffffd00 -- s1: 2cf4a fffffd4a, 2cf4a fffffd49, 2cf4a fffffd48, 2cf4a fffffd47
Comment 52 David Dillard 2009-10-07 07:42:04 PDT
Finally got another crash.  To be honest, I don't remember which build I was running.  I'm guessing you guys can figure it out based on the output.

Most of the time there are short stretches like this:

lvl=1 cairo_spline_error_squared=2.435493e+000
lvl=2 cairo_spline_error_squared=1.776404e-001
lvl=3 cairo_spline_error_squared=1.155082e-002
lvl=4 cairo_spline_error_squared=7.046496e-004
lvl=4 tolerance_squared=1.000000e-002 tolerance=1.000000e-001

When it goes off the rails, so to speak, we see this:

lvl=1 cairo_spline_error_squared=6.110306e-001
lvl=2 cairo_spline_error_squared=4.436425e-002
lvl=3 cairo_spline_error_squared=2.483176e-003
lvl=4 cairo_spline_error_squared=1.211542e-004
lvl=4 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=5 cairo_spline_error_squared=1.356052e-005
lvl=5 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=6 cairo_spline_error_squared=3.584446e-006
lvl=6 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=7 cairo_spline_error_squared=3.788389e-006
lvl=7 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=8 cairo_spline_error_squared=0.000000e+000
lvl=8 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=9 cairo_spline_error_squared=0.000000e+000
lvl=9 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=10 cairo_spline_error_squared=0.000000e+000
lvl=10 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001

...

lvl=5357 cairo_spline_error_squared=0.000000e+000
lvl=5357 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001


The entire file is around 55MB, I can zip it up and upload if you want.
Comment 53 Jeff Muizelaar [:jrmuizel] 2009-11-03 07:15:47 PST
So the cause of this seems to be memory corruption of some sort...
The fact that it only happens during VPN usage is very strange. Can I get more information on the VPN software you're using?
Comment 54 David Dillard 2009-11-03 07:18:23 PST
I've definitely seen occasions where this happened and VPN was not in use (though it is installed).

The VPN I'm currently using is Cisco Systems VPN Client Version 4.8.02.0010
Comment 55 Jeff Muizelaar [:jrmuizel] 2009-11-03 07:23:12 PST
I've seen some reports that upgrading to version 5 of the vpn software helps...
Comment 56 Jeff Muizelaar [:jrmuizel] 2009-11-03 07:24:42 PST
Also, if there's anyone who doens't have the vpn software installed and sees these crashes that would be good to hear about too.
Comment 57 huomenta 2009-11-03 08:00:07 PST
(In reply to comment #54)
> I've definitely seen occasions where this happened and VPN was not in use
> (though it is installed).
> 
> The VPN I'm currently using is Cisco Systems VPN Client Version 4.8.02.0010


For me I happened only with Cisco VPN 4.8.00.0440 *connected*.
Comment 58 Mikel Waxler 2009-11-03 08:01:01 PST
Cisco Systems VPN Client Version 4.8.01.0300

I had not seen this issue in weeks until I worked via VPN the other day. FF crashed within minutes of using it. I am going to go ahead and say this issue ONLY occurs for me when on the Cisco VPN.
Comment 59 Dave Anderson 2009-11-03 10:37:01 PST
I had the same experience last week while working remotely on the east coast. Constant crashes, even in safe mode. Interestingly this happened while I was in PA logged into Pittsburg HQ, but not when I was in NC logged into Raleigh HQ. If there is some way I can query those VPN servers for details of interest WRT this bug let me know. Last crash before that was end of July, as you can see from about:crashes info posted below. Cisco VPN 4.6.02.0011, IPSEC over TCP. I'll check with IT and see if I can get a later version.

bp-8593c996-0c7d-4647-a863-269072091029 10/29/2009 2:40 PM
bp-103623d2-75cd-4c16-a152-d64b22091029 10/29/2009 11:00 AM
bp-c259f730-19f9-4a95-a0ed-3c79d2091029 10/29/2009 9:08 AM
bp-5824f5ac-9475-47e7-ae4c-a27de2091029 10/29/2009 9:08 AM
bp-c62529e4-7009-4c9c-abba-9d6a92091029 10/29/2009 9:04 AM
bp-08150ed5-11fd-47b7-b500-40c862091029 10/29/2009 8:57 AM
bp-ff526a86-8a47-44d8-8956-f986a2091029 10/29/2009 8:56 AM
bp-f08029f2-1844-4152-b772-bbc892091028 10/28/2009 6:48 AM
bp-ab71e498-9131-4d51-a29a-048742091027 10/27/2009 9:56 PM
bp-21d3379b-74aa-45fa-ad62-42b8b2091027 10/27/2009 9:53 PM
bp-1f28d64d-6d0c-486f-bb56-d89642091027 10/27/2009 9:45 PM
bp-c76e4283-29a6-4660-a89e-59bb72091027 10/27/2009 8:58 PM
bp-246bde6a-4dfc-411d-afb7-bca252091027 10/27/2009 8:51 PM
bp-cc72321a-7110-458e-9351-31a9d2091027 10/27/2009 7:54 PM
bp-6c57c230-bd17-4c69-8028-aa2972090731 7/31/2009 12:04 PM
Comment 60 huomenta 2009-11-03 11:58:57 PST
(In reply to comment #55)
> I've seen some reports that upgrading to version 5 of the vpn software helps...

Actually I've fixed crashes by upgrading to Cisco VPN 5 (5.0.04.0300). No crashes since for 3 weeks already.
Comment 62 Jeremy 2009-12-02 14:29:44 PST
This problem has been plaguing me for months. Please let me know if I can help test any builds or anything. I'm desperate for this to be resolved. 

I'm also using Cisco VPN v4.8.

Just a few of my recent crashes:
http://crash-stats.mozilla.com/report/index/8a352d6a-283b-4ee0-b45d-d58822091201
http://crash-stats.mozilla.com/report/index/c4ddb1d9-6453-48df-995c-b56ae2091201
http://crash-stats.mozilla.com/report/pending/09de59b6-876c-45a9-96f0-44f922091130
http://crash-stats.mozilla.com/report/pending/cca385cd-fe37-4931-8aab-9b15d2091130
Comment 63 David Dillard 2009-12-02 18:48:34 PST
And another crash with these debugging results:

lvl=1 cairo_spline_error_squared=2.435493e+000
lvl=2 cairo_spline_error_squared=1.776404e-001
lvl=3 cairo_spline_error_squared=1.155082e-002
lvl=4 cairo_spline_error_squared=8.028778e-004
lvl=4 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=5 cairo_spline_error_squared=5.578715e-005
lvl=5 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=6 cairo_spline_error_squared=5.755620e-006
lvl=6 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=7 cairo_spline_error_squared=5.528138e-006
lvl=7 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=8 cairo_spline_error_squared=0.000000e+000
lvl=8 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=9 cairo_spline_error_squared=0.000000e+000
lvl=9 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=10 cairo_spline_error_squared=0.000000e+000
lvl=10 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=11 cairo_spline_error_squared=0.000000e+000
lvl=11 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=12 cairo_spline_error_squared=0.000000e+000
lvl=12 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=13 cairo_spline_error_squared=0.000000e+000
lvl=13 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001
lvl=14 cairo_spline_error_squared=0.000000e+000
lvl=14 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001

...

lvl=5357 cairo_spline_error_squared=0.000000e+000
lvl=5357 tolerance_squared=-1.#IND00e+000 tolerance=1.000000e-001


The VPN client was loaded and active when this crash occurred.
Comment 64 John Daggett (:jtd) 2009-12-17 00:37:11 PST
This infinite recursion occurs here:

static cairo_status_t
_cairo_spline_decompose_into (cairo_spline_knots_t *s1, double tolerance_squared, cairo_spline_t *result)
{
    cairo_spline_knots_t s2;
    cairo_status_t status;

    if (_cairo_spline_error_squared (s1) < tolerance_squared)
	return _cairo_spline_add_point (result, &s1->a);

    _de_casteljau (s1, &s2);

    status = _cairo_spline_decompose_into (s1, tolerance_squared, result);
    if (unlikely (status))
	return status;

    return _cairo_spline_decompose_into (&s2, tolerance_squared, result);
}

If tolerance_squared is NaN this code will stack overflow, independent of the value of s1.  I examined several recent minidumps for this bug, in all cases the value of tolerance_squared was:

  fff8 0000 0000 0000 (== NaN)

Adding '|| isnan(tolerance_squared)' to the recursion test would eliminate the stack overflow (and crash) but wouldn't fix the underlying problem.  Mozilla code always uses the default value of 0.1 for tolerance, so some piece of code is somehow whacking this value, probably in the gstate struct.
Comment 65 John Daggett (:jtd) 2009-12-17 05:31:43 PST
http://people.mozilla.com/crash_analysis/20091217/20091217_Firefox_3.5.5-core-counts.txt.gz

 _de_casteljau|EXCEPTION_STACK_OVERFLOW (104 crashes)
      0% (0/104) vs.   1% (511/78679) x86 with 0 cores
     21% (22/104) vs.  40% (31837/78679) x86 with 1 cores
     78% (81/104) vs.  54% (42730/78679) x86 with 2 cores
      0% (0/104) vs.   0% (267/78679) x86 with 3 cores
      1% (1/104) vs.   4% (3035/78679) x86 with 4 cores
      0% (0/104) vs.   0% (1/78679) x86 with 5 cores
      0% (0/104) vs.   0% (296/78679) x86 with 8 cores
      0% (0/104) vs.   0% (2/78679) x86 with 16 cores

Looks like threading may also be a factor here.
Comment 66 John Daggett (:jtd) 2009-12-17 22:51:41 PST
From http://people.mozilla.com/crash_analysis/20091215/20091215_Firefox_3.5.5-interesting-addons.txt.gz

  _de_casteljau|EXCEPTION_STACK_OVERFLOW (120 crashes)
     61% (73/120) vs.  35% (37575/106812) jqs@sun.com (Java Quick Starter, http://java.sun.com/javase/downloads/)
     34% (41/120) vs.  23% (24898/106812) {CAFEEFAC-0016-0000-0017-ABCDEFFEDCBA}
     20% (24/120) vs.   9% (9825/106812) {CAFEEFAC-0016-0000-0007-ABCDEFFEDCBA} (Java Console, http://java.sun.com/javase/downloads/)
     14% (17/120) vs.   4% (4220/106812) {77b819fa-95ad-4f2c-ac7c-486b356188a9} (IE Tab, https://addons.mozilla.org/addon/1419)
     27% (32/120) vs.  19% (20131/106812) {CAFEEFAC-0016-0000-0015-ABCDEFFEDCBA} (Java Console, http://java.sun.com/javase/downloads/)
     18% (21/120) vs.  10% (10418/106812) {CAFEEFAC-0016-0000-0011-ABCDEFFEDCBA} (Java Console, http://java.sun.com/javase/downloads/)
     16% (19/120) vs.   9% (9587/106812) {d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d} (Adblock Plus, https://addons.mozilla.org/addon/1865)

I pulled down the addon info files for the past 15 days and the totals for Java Quick Starter are

jqs@sun.com (Java Quick Starter, http://java.sun.com/javase/downloads/)
This crash: 54.2339% (807/1488) All crashes: 34.9939% (518336/1481220)

Could be a red herring but worth checking out.

http://www.java.com/en/download/help/quickstarter.xml

"JQS is enabled by default in Windows XP and Windows 2000 operating systems and is not necessary on Windows Vista as Vista offers its own pre-loading mechanisms. A process called jqs.exe will run in the background in order to allow quick statup. jqs.exe will be loaded after a Windows restart. Instructions on how to disable the JQS and the jqs.exe process are below."

Note that almost all of the stack traces for this bug are Windows XP-only.
Comment 67 chris hofmann 2009-12-18 08:54:42 PST
> Note that almost all of the stack traces for this bug are Windows XP-only.

looking at dec1-17 here are the stats I get.

 129 _cairo_spline_decompose_into 5.1.2600 Service Pack 3
  75 _cairo_spline_decompose_into 5.1.2600 Service Pack 2
   3 _cairo_spline_decompose_into 5.1.2600 Szervizcsomag 3
   1 free | _cairo_spline_decompose_into 5.1.2600 Service Pack 3

  14 _cairo_spline_decompose_into 6.1.7600
   7 _cairo_spline_decompose_into 6.0.6002 Service Pack 2
   2 _cairo_spline_decompose_into 6.0.6000
   1 _cairo_spline_decompose_into 6.1.7100
   1 _cairo_spline_decompose_into 6.0.6001 Service Pack 1

here are the links to 25 crash reports not windows xp

_cairo_spline_decompose_into 6.0.6000 http://crash-stats.mozilla.com/report/index/1957526c-4bdc-4e89-867e-9e9832091205
_cairo_spline_decompose_into 6.0.6000 http://crash-stats.mozilla.com/report/index/927bbdc1-eda0-45cb-a76d-c1ac32091206
_cairo_spline_decompose_into 6.0.6001 Service Pack 1 http://crash-stats.mozilla.com/report/index/c29040d3-5137-4799-ae74-a17422091206
_cairo_spline_decompose_into 6.0.6002 Service Pack 2 http://crash-stats.mozilla.com/report/index/0954d775-edf6-4d12-862a-490f82091209
_cairo_spline_decompose_into 6.0.6002 Service Pack 2 http://crash-stats.mozilla.com/report/index/28ac75d0-8a96-48ee-9c95-dcddb2091202
_cairo_spline_decompose_into 6.0.6002 Service Pack 2 http://crash-stats.mozilla.com/report/index/6424fd62-866c-43ca-a981-974172091203
_cairo_spline_decompose_into 6.0.6002 Service Pack 2 http://crash-stats.mozilla.com/report/index/6e1e0301-6738-49fb-9bca-165242091205
_cairo_spline_decompose_into 6.0.6002 Service Pack 2 http://crash-stats.mozilla.com/report/index/803715d1-fcd2-4531-840f-f45612091210
_cairo_spline_decompose_into 6.0.6002 Service Pack 2 http://crash-stats.mozilla.com/report/index/91ce384f-766e-4274-adf8-6ceab2091217
_cairo_spline_decompose_into 6.0.6002 Service Pack 2 http://crash-stats.mozilla.com/report/index/bb3edc52-c308-4935-b129-a322c2091205
_cairo_spline_decompose_into 6.1.7100 http://crash-stats.mozilla.com/report/index/09900dd3-856e-4e18-a9d1-d2d572091205
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/0b1104d5-ec2f-44a9-b2c1-f11602091214
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/13888c84-f0d3-4e79-9618-ba8772091207
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/1c24d41a-c756-479c-9697-45d692091206
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/43642b76-92af-4760-acb4-ac2fc2091206
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/52e55cab-8e52-42b2-a421-41af02091214
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/52f84841-7a18-4544-b0f9-96de62091213
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/68ebcfd4-4748-4862-885e-5729d2091201
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/6a0505ac-9a8e-4f98-acbe-4e2dc2091213
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/8ab7d2ef-69a3-4350-8113-5866c2091214
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/a1d442fc-b64e-451b-a985-df0a62091212
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/c39bc44d-b615-4464-bbf8-d22d62091210
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/cd39d140-c7a5-4fc4-b2c2-ff44a2091206
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/d3a636a8-d5d4-467f-8edc-dd8c22091207
_cairo_spline_decompose_into 6.1.7600 http://crash-stats.mozilla.com/report/index/fef84f39-f3bb-4fe7-ab09-2a36c2091207
Comment 68 John Daggett (:jtd) 2009-12-21 18:54:50 PST
Created attachment 418769 [details] [diff] [review]
patch, detect and log tolerance == NaN occurrences

Patch for use in debugging this problem.  The patch catches tolerance == NaN problems and will benignly error out.  It constantly checks the tolerance value to try and narrow down the place where the value is getting whacked.  When a NaN is detected, it dumps out recent validity checks to a logging file 'debugevents.out' in the same directory as firefox.exe.

I set up a tryserver build for this patch (Windows only):

https://build.mozilla.org/tryserver-builds/jdaggett@mozilla.com-badtolerancelogging2/

If those experiencing this problem could try out this build that would be a huge help.  Note that the previous crash won't occur so you'll need to manually check for the existence of the debug logging file.  This is built on top of the 1.9.1 branch, so it's very close to 3.5.6.
Comment 69 John Daggett (:jtd) 2009-12-21 18:58:25 PST
Example debug logging output (simulated by programmatically setting the tolerance value to NaN at random points):

2009-11-21 09:40:04.342393 UTC - Debug Events BEGIN version:1
2009-11-21 09:40:02.205451 UTC t:0080a930 obj:01207e00 context_create
2009-11-21 09:40:02.206154 UTC t:0080a930 obj:01207e00 context_destroy
2009-11-21 09:40:02.272727 UTC t:0080a930 obj:01222c00 context_create
2009-11-21 09:40:02.303114 UTC t:0080a930 obj:01222c00 context_destroy
  .
  .
  .
2009-11-21 09:40:04.342006 UTC t:0080a930 obj:01613800 check_css_paint_border_begin
2009-11-21 09:40:04.342006 UTC t:0080a930 obj:01613800 check_save_begin
2009-11-21 09:40:04.342006 UTC t:0080a930 obj:01613800 check_save_end
2009-11-21 09:40:04.342010 UTC t:0080a930 obj:01613800 check_fill_preserve_begin
2009-11-21 09:40:04.342018 UTC t:0080a930 obj:01613800 check_fill_preserve_end
2009-11-21 09:40:04.342018 UTC t:0080a930 obj:01613800 tolerance_is_NaN
2009-11-21 09:40:04.342393 UTC - Debug Events END
Comment 70 Tobias Kunze 2009-12-22 01:55:19 PST
Created attachment 418810 [details]
Debug log output from jdaggett@mozilla.com-badtolerancelogging2 tryserver build

Debug log file created using John Daggett's patched tryserver build.
Comment 71 John Daggett (:jtd) 2009-12-22 07:39:09 PST
From Tobias's log:

29996	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_restore_begin
29997	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_restore_end
29998	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_restore_begin
29999	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_restore_end
30000	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 tolerance_is_NaN
30001	2009-11-22 09:12:15.489000 UTC - Debug Events END

So the tolerance value is getting corrupted during a gstate restore.

Looking at the sequence of calls up to this:

29854	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 context_create
29859	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_layout_paint_frame_begin
29860	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_layout_paint_frame_before_paint
29861	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_background_begin
29864	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_background_end
29865	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_background_begin
29868	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_background_end
29873	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_background_begin
29876	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_background_end
29881	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_border_begin
29886	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_border_end
29893	2009-11-22 09:12:15.489000 UTC t:00728140 obj:068b1400 check_css_paint_border_begin

Within the call to nsCSSRendering::PaintBorder, the context is saved 13 times, and on the 12th restore a NaN tolerance value is set.
Comment 72 David Dillard 2010-01-13 07:21:47 PST
Created attachment 421447 [details]
Zip of debugevents.out

John, here's a zipped log file for you to take a look at.
Comment 73 Jeff Muizelaar [:jrmuizel] 2010-01-13 13:58:18 PST
Can anyone reproduce this problem with only a single core?
Comment 74 David Dillard 2010-01-14 07:06:30 PST
My old laptop was single core and I had this crash on it.
Comment 75 David Dillard 2010-01-14 07:11:21 PST
Created attachment 421628 [details]
Another trace for John

Here's another trace for you John (still multicore)
Comment 76 Jeff Muizelaar [:jrmuizel] 2010-01-19 07:49:58 PST
I've reproduced this. I'll try to debug it next.
Comment 77 Jeff Muizelaar [:jrmuizel] 2010-01-19 14:02:06 PST
I've been able to reproduce this in a recording vm with a release build. I should be able to make slow progress now.
Comment 78 Jeff Muizelaar [:jrmuizel] 2010-01-20 18:08:11 PST
It looks like this is caused by corruption of fpu state during a context switch.

This page has some people running into similar problems:
http://blog.excastle.com/2007/08/28/delphi-bug-of-the-day-fpu-stack-leak/

I don't know if there will be anything we can do about this, but I'll investigate more.
Comment 79 John Daggett (:jtd) 2010-01-20 18:36:38 PST
(In reply to comment #78)
> It looks like this is caused by corruption of fpu state during a context
> switch.
> 
> This page has some people running into similar problems:
> http://blog.excastle.com/2007/08/28/delphi-bug-of-the-day-fpu-stack-leak/
> 
> I don't know if there will be anything we can do about this, but I'll
> investigate more.

Can you explain a little more about the sequence of steps that take place when this happens?  The FPU stack is somehow trashed which causes memory in random places to get whacked?  Is this a wider problem or somehow it only affects this codepath?

Can you spot the actual piece of Cisco code that's fiddling with the FPU stack?  The offending code is in a driver?  Are there loader flags that indicate 'save my FPU stack on context switch'?  Do we know if this is a problem known to the Cisco folks?

Also, why does this only affect _cairo_spline_decompose_into?  Would a recursion limit in that routine make sense, conditionally compiled if other Cairo folks object?  Or just fix the tolerance_squared value, since Mozilla never changes that.  Not really a solution but it seems like we should be able to avoid crashing, one way or another.
Comment 80 Jeff Muizelaar [:jrmuizel] 2010-01-21 13:07:57 PST
(In reply to comment #79)
> (In reply to comment #78)
> > It looks like this is caused by corruption of fpu state during a context
> > switch.
> > 
> > This page has some people running into similar problems:
> > http://blog.excastle.com/2007/08/28/delphi-bug-of-the-day-fpu-stack-leak/
> > 
> > I don't know if there will be anything we can do about this, but I'll
> > investigate more.
> 
> Can you explain a little more about the sequence of steps that take place when
> this happens?  The FPU stack is somehow trashed which causes memory in random
> places to get whacked?  Is this a wider problem or somehow it only affects this
> codepath?

In the crash that I looked at there was no memory corruption. The bad values we're only on the stack. However, the problem could easily occur anytime there is a floating point store. This could happen during gstate initialization which could explain some of the differences people were seeing in the crashes.

> 
> Can you spot the actual piece of Cisco code that's fiddling with the FPU stack?
>  The offending code is in a driver?  Are there loader flags that indicate 'save
> my FPU stack on context switch'?  Do we know if this is a problem known to the
> Cisco folks?

Nope, I haven't been able to see any cisco code yet. I suspect it is the kernel driver and I don't know if you can debug those with replay debugging. I haven't talked to Cisco folks yet. I'll try to reach out to them.

> 
> Also, why does this only affect _cairo_spline_decompose_into?  Would a
> recursion limit in that routine make sense, conditionally compiled if other
> Cairo folks object?  Or just fix the tolerance_squared value, since Mozilla
> never changes that.  Not really a solution but it seems like we should be able
> to avoid crashing, one way or another.

I don't think _cairo_spline_decompose_into is the only code affected by this, just that it's one of the more likely to show up as a crash. If my theory is correct bad floating point results could show up anywhere.

It's tough for me to try to work around this by putting a recursion limit on the routine. Doing so is sort of like trying to work around bad ram or other hardware problems. If we patch this problem, we'll just run into harder to explain bugs. I'll see if there's anything we can do to detect the problem and let users know about it.
Comment 81 David Dillard 2010-01-21 19:08:31 PST
I wonder if this might be the problem:

Kernel-mode drivers for Windows 2000 and its successors can use floating-point instructions when IRQL is less than or equal to DISPATCH_LEVEL, but must explicitly preserve the processor's floating-point state so that the caller's floating-point context is not changed. Driver functions must call KeSaveFloatingPointState before performing any floating-point operations, and must call KeRestoreFloatingPointState before returning to the caller. These functions are documented in the Windows 2000 Driver Development Kit (DDK).

http://support.microsoft.com/kb/102555

Even if the calls are in place, if a structured exception occurred prior to the restore call and no try/finally was in place then the restore would never be done.



But if this is the cause, why do we (or at least I) not see weird behavior from other apps?
Comment 82 Jeff Muizelaar [:jrmuizel] 2010-01-21 20:24:21 PST
(In reply to comment #81)
> I wonder if this might be the problem:
> 
> Kernel-mode drivers for Windows 2000 and its successors can use floating-point
> instructions when IRQL is less than or equal to DISPATCH_LEVEL, but must
> explicitly preserve the processor's floating-point state so that the caller's
> floating-point context is not changed. Driver functions must call
> KeSaveFloatingPointState before performing any floating-point operations, and
> must call KeRestoreFloatingPointState before returning to the caller. These
> functions are documented in the Windows 2000 Driver Development Kit (DDK).
> 
> http://support.microsoft.com/kb/102555
> 
> Even if the calls are in place, if a structured exception occurred prior to the
> restore call and no try/finally was in place then the restore would never be
> done.

I had a quick look at what functions the drivers import and none of them imported KeSaveFloatingPointState/KeRestoreFloatingPointState. I also did a quick disassembly of the drivers and a saw a few floating point instructions but I'd need to take a closer look to confirm this.

> But if this is the cause, why do we (or at least I) not see weird behavior from
> other apps?

If you search the internet for 'cisco vpn floating point' there are some reports similar to this. I think one of the reasons that it shows up more in firefox is because firefox uses floating point while doing a large amount of network io. Further, the problem shows up as crash because the recursion limit depends on proper floating point, this isn't as a common and idiom and so most programs will only see floating point corruption and will not crash.

I was discussing the problem with Ehsan today, and we concluded that the best solution might be to add a recursion limit and somehow inform users that the vpn software interferes with the proper operation of Firefox. I'm also going to try to talk to Cisco and see if we can get confirmation of the problem.
Comment 83 David Dillard 2010-01-22 05:58:16 PST
Yeah, I see a couple of places where people complain about getting unexpected NaNs and have traced the issue to Cisco VPN software.  Sadly, they go back to 2005.  Scary that this problem (assuming it really is a Cisco problem) has been around that long.

Let me know if you don't get anywhere with Cisco and I'll try to raise this issue with them through our IT org.
Comment 84 David Dillard 2010-01-22 09:53:39 PST
I disassembled the Cisco driver I'm using and I found four floating point instructions:

- fisub
- fsub
- fyl2xpl
- fdiv

And there are no imports of the save/restore floating point state Windows kernel functions.
Comment 85 David Dillard 2010-01-25 06:52:47 PST
I take it back.  While those instructions do show up in the disassembly, it appears that they're not really used as instructions.  Instead they're data for jump tables.
Comment 86 PBM 2010-03-18 09:38:43 PDT
(In reply to comment #60)
> (In reply to comment #55)
> > I've seen some reports that upgrading to version 5 of the vpn software helps...
> 
> Actually I've fixed crashes by upgrading to Cisco VPN 5 (5.0.04.0300). No
> crashes since for 3 weeks already.

Same here. After upgrading from Cisco VPN v4.7.00.0533 to v5.0.02.0090, I'm no longer experiencing the almost twice daily crashes.
Comment 87 Richard D 2010-03-24 11:42:50 PDT
I had the same problem using Firefox 3.5.7 through to 3.6.2, with several crashes per day.  Solved by upgrading to Cisco VPN Client version 5.0.05.00280.

This is reported quite a lot, e.g. http://support.mozilla.com/en-US/forum/1/278730 and http://www.google.com/search?hl=en&q=firefox+crash+"cisco+vpn" - should probably be added to the Mozilla support knowledge base, as it's the last solution I would have thought of.
Comment 88 Richard D 2010-03-26 03:29:04 PDT
(In reply to comment #82)
> I was discussing the problem with Ehsan today, and we concluded that the best
> solution might be to add a recursion limit and somehow inform users that the
> vpn software interferes with the proper operation of Firefox. 

This sounds like a great idea.  Cisco VPN software is very common in enterprises - for just one large company I'm aware of, this would affect tens of thousands of users.  Simply highlighting that this crash is probably Cisco VPN related could really improve Firefox's stability and improve perceptions of enterprise readiness.

> I'm also going to try to talk to Cisco and see if we can get confirmation of the problem.

See http://support.mozilla.com/tiki-view_forum_thread.php?locale=hu&comments_parentId=125937&forumId=1#threadId208708 for what seems like a related Cisco bug fix.
Comment 89 Richard D 2010-03-29 00:57:36 PDT
Sorry for more bug spam, but it might be good to make the recursion limit and specific error message conditional on a registry key check for Cisco VPN software - testing existence of HKEY_LOCAL_MACHINE\SOFTWARE\Cisco Systems\VPN Client should be enough.  

It seems that Cisco has fixed the bug in multiple versions e.g. 5.0.02, 50.0.04, so a version check would be difficult and not elegant anyway.

Hoping that other people can have this crash bug solved more quickly by prompting a Cisco VPN upgrade - it's been dogging me since late last year.
Comment 90 Mats Palmgren (:mats) 2010-04-10 17:59:21 PDT
There were 1680 /reported/ crashes in the past week.
I think we should add a wallpaper to prevent this crash.
John, will you hack one up?
Comment 91 :Ehsan Akhgari 2010-04-10 19:10:09 PDT
Would it help to keep a backup of the tolerance value in a global volatile variable and switch to using that when we see that tolerance_squared is nan?

This won't prevent against all crashes of this sort, but it should at least decrease the likelihood of a user crashing.

I wrote this patch to demonstrate what I mean, but it's probably too stupid to even be attached to the bug!

diff --git a/gfx/cairo/cairo/src/cairo-spline.c b/gfx/cairo/cairo/src/cairo-spline.c
--- a/gfx/cairo/cairo/src/cairo-spline.c
+++ b/gfx/cairo/cairo/src/cairo-spline.c
@@ -171,22 +171,26 @@ _cairo_spline_error_squared (const cairo
     berr = bdx * bdx + bdy * bdy;
     cerr = cdx * cdx + cdy * cdy;
     if (berr > cerr)
 	return berr;
     else
 	return cerr;
 }
 
+static volatile double tolerance_squared_backup;
+
 static cairo_status_t
 _cairo_spline_decompose_into (cairo_spline_knots_t *s1, double tolerance_squared, cairo_spline_t *result)
 {
     cairo_spline_knots_t s2;
     cairo_status_t status;
 
+    if (unlikely (isnan (tolerance_squared)))
+        tolerance_squared = tolerance_squared_backup;
     if (_cairo_spline_error_squared (s1) < tolerance_squared)
 	return _cairo_spline_add_point (result, &s1->a);
 
     _de_casteljau (s1, &s2);
 
     status = _cairo_spline_decompose_into (s1, tolerance_squared, result);
     if (unlikely (status))
 	return status;
@@ -197,16 +201,17 @@ _cairo_spline_decompose_into (cairo_spli
 cairo_status_t
 _cairo_spline_decompose (cairo_spline_t *spline, double tolerance)
 {
     cairo_spline_knots_t s1;
     cairo_status_t status;
 
     s1 = spline->knots;
     spline->last_point = s1.a;
+    tolerance_squared_backup = tolerance * tolerance;
     status = _cairo_spline_decompose_into (&s1, tolerance * tolerance, spline);
     if (unlikely (status))
 	return status;
 
     return _cairo_spline_add_point (spline, &spline->knots.d);
 }
 
 /* Note: this function is only good for computing bounds in device space. */


Also, if we want to take the approach of reporting the problem to the user, I can probably sit down with Jeff one day at the office and try to find out a way to detect the buggy driver, and then attempt to warn the user in one way or another, preferably with a hint of how to fix the problem as well (upgrading the VPN software.)
Comment 92 Richard D 2010-04-11 00:29:52 PDT
To detect the Cisco VPN software, you can just look in the registry for "HKEY_LOCAL_MACHINE\SOFTWARE\Cisco Systems\VPN Client" (contains several keys).  Checking for a specific version will be more work, as there are multiple release trains in which this has been fixed, as mentioned above.

It is probably enough to say something like "this crash may be related to a known bug in some versions of the Cisco VPN Client - please consider upgrading to a more recent version, and specifically one that fixes Cisco bug reference CSCse77792"

Cisco bug description from comment 88 link: "Application crashes when the VPN Client is connected. AES encryption is configured for the VPN Client connection and a VPN Client version of 4.6.01.0019 or higher is in use. The issue is an optimization for the MMX processor introduced by RSA that introduces an error when used with AES."

FWIW I was using AES on the VPN connection when I had Firefox crashes.

I haven't had this crash in weeks since upgrading, and in fact can't remember the last Firefox crash.
Comment 93 John Daggett (:jtd) 2010-04-11 17:43:56 PDT
Passing recursion limit patch over to Jeff...
Comment 94 Martin Girschick 2010-07-28 10:19:00 PDT
As it seems, Cisco isn't planning to release a bug fix for release 4 of Cisco VPN. Due to large rollout cycles in enterprises this issue probably does not go away within the next 1-2 years. Therefore a workaround from Firefox' side would be very appreciated. As the patch has been handed off three months ago I thought I'd give it a mild push by adding a comment ;-).
Comment 95 st.shadow.by 2010-08-06 06:04:11 PDT
I confirm that switching Cisco VPN to version 5.0 from 4.8 resolve issue.
Comment 96 Matthew Middleton (:zzxc) 2010-08-18 14:07:21 PDT
According to a user on SUMO, disabling McAfee SiteAdvisor Enterprise 3.0.0.539 stopped it from crashing when logging into a cisco vpn.  Reports:

bo-eefb857f-a5b2-4a68-95bb-ca5dd2100818
bp-0690f4e1-ca21-42de-a55b-698e12100818
Comment 97 aja+bugzilla 2011-01-15 12:29:37 PST
On win/xp with latest m-c nightly, got
http://crash-stats.mozilla.com/report/index/bp-25f27f9d-d2c3-4735-95c9-57d2b2110115
No VPN involved, nor McAfee SiteAdvisor.
No STR at his point, though.
Comment 98 Mats Palmgren (:mats) 2011-01-19 08:36:26 PST
Bug 626994 reports a recent spike of crashes with this signature.
Comment 99 Martin Klocke 2011-08-31 03:00:48 PDT
Hi,

FF crashes on me frequently. Sometimes 2-3 times in 10 minutes, sometimes once in an hour (which I consider "stable" at the moment).
It is so bad, that crashreporter usually cannot send the crashreports.
I get the message in the log:
"Crash report submission failed: Der Vorgang wurde erfolgreich beendet."
So most of the reports end up in "pending".
It seems that without Cisco VPN, this is not happening, and that this crashy behaviour is only happening with Cisco VPN. Unfortunately, I am working in a big company and cannot change the client (4.8.01.0300).
If there are tests needed, I will try to help.
Right now I a HAVE to use IE in order to get work done.
Comment 100 alex_mayorga 2012-09-08 10:55:13 PDT
This is from 2 days ago on 14.0.1 https://crash-stats.mozilla.com/report/index/fe59a92f-de3c-4796-bb6b-ca8702120904 FWIW.

Note You need to log in before you can comment on or make changes to this bug.