Closed Bug 446169 Opened 17 years ago Closed 17 years ago

Workaround for false failures of __kernel_cmpxchg on some ARM platforms

Categories

(Core :: JavaScript Engine, defect)

x86
Maemo
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: romaxa)

References

()

Details

(Keywords: mobile)

Attachments

(3 files, 8 obsolete files)

+++ This bug was initially created as a clone of Bug #445818 +++ The changes introduced in bug 444076 and bug 440184 triggered a GCC bug on N810 (arm) when the compiler improperly expands inline functions. See bug 445818 for relevant details.
Attached patch Proposed fix (obsolete) — Splinter Review
Attachment #330371 - Flags: review?(igor)
Comment on attachment 330371 [details] [diff] [review] Proposed fix >diff -r 8f3ff1995383 configure.in >--- a/configure.in Sat Jul 19 12:38:25 2008 +0200 >+++ b/configure.in Sat Jul 19 15:11:03 2008 +0300 >@@ -379,6 +379,9 @@ if test "$GNU_CC"; then > if `$CC -print-prog-name=ld` -v 2>&1 | grep -c GNU >/dev/null; then > GCC_USE_GNU_LD=1 > fi >+fi >+if test "`echo | $CXX_VERSION -v 2>&1 | grep -c CodeSourcery`" != "0"; then >+ AC_DEFINE(CSGCC) > fi It is not possible to deduce the presence of CodeSourcery from the C preprocessor alone, is it? I am asking because when building a standalone JS shell the code uses separated Makefile.ref meaning that the workaround would fix only the browser. >+#if defined(CSGCC) && \ >+ (__GNUC__ == 3 && __GNUC_MINOR__ == 4 && __GNUC_PATCHLEVEL__ == 4) >+__attribute__ ((noinline)) >+#else >+JS_INLINE >+#endif If the help from the configure script is required, it would be better to have something like: #if HAS_CSGCC_INLINE_BUG __attribute__ ((noinline)) #else JS_INLINE #endif and push to the configure script the job of defining HAS_CSGCC_INLINE_BUG.
Attachment #330371 - Flags: review?(igor)
Attached file echo | gcc -g3 -E -
> It is not possible to deduce the presence of CodeSourcery from the C > preprocessor alone, is it? No, at least I did not found any special flag in gcc defines, see attachment > and push to the configure script the job of defining HAS_CSGCC_INLINE_BUG. > Ok, I will fix it.
Attachment #330377 - Attachment mime type: application/octet-stream → text/plain
Comment on attachment 330377 [details] echo | gcc -g3 -E - >#define __VERSION__ "3.4.4 (release) (CodeSourcery ARM 2005q3-2)" If C preprocessor would support string matching, that would give a preprocessor-only solution. But in the real world the configure changes are required.
Attached patch Proposed fix r2 (obsolete) — Splinter Review
Attachment #330371 - Attachment is obsolete: true
Comment on attachment 330379 [details] [diff] [review] Proposed fix r2 >-static JS_INLINE int >+static int >+#if HAS_CSGCC_INLINE_BUG >+__attribute__ ((noinline)) >+#else >+JS_INLINE >+#endif > NativeCompareAndSwap(jsword *w, jsword ov, jsword nv) Nit: the patch changes the canonical ordering style inline_attribute return_type into return_type inline_attribute. r+ with that fixed.
Comment on attachment 330379 [details] [diff] [review] Proposed fix r2 Asking for a review of configure changes: they are required to workaround a bug in the compiler.
Attachment #330379 - Flags: review?(ted.mielczarek)
Attached patch ordering inline style fix (obsolete) — Splinter Review
> return_type into return_type inline_attribute. r+ with that fixed. > I guess it is ok?
Seems only one noinline not enough fixing that crash.
Attachment #330380 - Attachment is obsolete: true
Attachment #330440 - Flags: review?(igor)
Attachment #330440 - Flags: review?(igor) → review+
Attachment #330379 - Attachment is obsolete: true
Attachment #330379 - Flags: review?(ted.mielczarek)
Attachment #330440 - Flags: review?(ted.mielczarek)
Unfortunately even with noinline for all functions it still crashes... Seems problem not in compiler and inlines.
I have revert commits ed28977f2452 and 215bf1ed8e2e, and only then crashes disappears.
(In reply to comment #11) > I have revert commits ed28977f2452 and 215bf1ed8e2e, and only then crashes > disappears. Would the crash still happen with reverting just ed28977f2452 and keeping 215bf1ed8e2e? The former should not change any code on ARM with KUSER defined.
(In reply to comment #12) > (In reply to comment #11) > > I have revert commits ed28977f2452 and 215bf1ed8e2e, and only then crashes > > disappears. > > Would the crash still happen with reverting just ed28977f2452 and keeping > 215bf1ed8e2e? The former should not change any code on ARM with KUSER defined. I meant the latter, 215bf1ed8e2e. It is only ed28977f2452 that introduced the substantial changes.
I have check more and found that commits ed28977f2452 and 215bf1ed8e2e, also not reasons for crash. It still crashes if all that commits reverted I need to test a bit more, but looks like crash disappears with next simple fix: --- mozilla/js/src/jslock.h.orig<-->2008-07-20 23:39:31.000000000 +0300 +++ mozilla/js/src/jslock.h>2008-07-20 23:42:11.000000000 +0300 @@ -56,11 +56,10 @@ JS_BEGIN_EXTERN_C ............ - defined(USE_ARM_KUSER) || \ ............ seems something wrong with jslock implementation.
Comment on attachment 330440 [details] [diff] [review] noinline all JS_INLINE in jslock.cpp As reported the bug is unrelated to inline changes.
Attachment #330440 - Attachment is obsolete: true
Attachment #330440 - Flags: review?(ted.mielczarek)
(In reply to comment #14) > I have check more and found that commits ed28977f2452 and 215bf1ed8e2e, also > not reasons for crash. The crashes happens on the real hardware, not on some emulator or development board, right? And only with non-debug builds? I wish I would still have my N800 to try - but it is "borrowed" by one of the relatives ;) > seems something wrong with jslock implementation. The only difference with other platforms is compare-and-swap implementation. On x86 and x86-64 the code was tested with embeddings that use heavy multithreading, not just the browser. So it is unlikely that the bug in platform-independent code.
Yep, this is real N810, and non-debug build Ok, seems before I was building thread safe js without USE_ARM_KUSER... and it was not crashes... Now I tried the same testcase again with thread-safe jsshell compiled with USE_ARM_KUSER define: var array = [{}, {}, {}, {}]; function test() { for (var i = 0; i != 42*42*42; ++i) { var obj = array[i % array.length]; obj["a"+i] = 1; var tmp = {}; tmp["a"+i] = 2; } } scatter([test, test, test, test]); And it is 99% reproducible crash. Here is the output: js> scatter([test, test, test, test]); Assertion failure: Thin_GetWait(tl->owner), at jslock.cpp:1070 Aborted (core dumped) #1 0x401e6450 in abort () from /lib/libc.so.6 #2 0x000f4890 in JS_Assert (s=0x14095c "Thin_GetWait(tl->owner)", file=0x14077c "jslock.cpp", ln=1070) at jsutil.cpp:63 #3 0x00087690 in ThinUnlock (tl=0x172900, me=1585152) at jslock.cpp:1070 #4 0x00087a24 in js_UnlockTitle (cx=0x17e5f0, title=0x1728fc) at jslock.cpp:1176 #5 0x00087e6c in js_UnlockObj (cx=0x17e5f0, obj=0x175000) at jslock.cpp:1287 #6 0x00125b80 in js_Interpret (cx=0x17e5f0) at jsinterp.cpp:5041 #7 0x0007f2c4 in js_Invoke (cx=0x17e5f0, argc=0, vp=0x193088, flags=0) at jsinterp.cpp:1331 #8 0x0007f61c in js_InternalInvoke (cx=0x17e5f0, obj=0x0, fval=1528640, flags=0, argc=0, argv=0x0, rval=0x17d7bc) at jsinterp.cpp:1387 #9 0x0002ac6c in JS_CallFunctionValue (cx=0x17e5f0, obj=0x0, fval=1528640, argc=0, argv=0x0, rval=0x17d7bc) at jsapi.cpp:5084 #10 0x00010560 in DoScatteredWork (cx=0x17e5f0, td=0x17d054) at js.cpp:2574 #11 0x00010640 in RunScatterThread (arg=0x17d054) at js.cpp:2604 ...........
Can you also try in a debug shell the test case below as it should provide a better match to the original problem and would show the problem in simpler to analyze settings. var array = []; for (var i = 0; i != 100*1000; i += 2) { array[i] = i; array[i+1] = i; } var src = array.join(';x'); function f() { new Function(src); } scatter([f, f, f, f]);
Hmm I have receive one more core-dump.... #2 0x000f4890 in JS_Assert (s=0x14095c "Thin_GetWait(tl->owner)", file=0x14077c "jslock.cpp", ln=1070) at jsutil.cpp:63 #3 0x00087690 in ThinUnlock (tl=0x15aad4, me=1449456) at jslock.cpp:1070 #4 0x00087604 in js_Unlock (cx=0x161c58, tl=0x15aad4) at jslock.cpp:1090 #5 0x000383f4 in js_AtomizeString (cx=0x161c58, str=0xbeb3b1c8, flags=8) at jsatom.cpp:704 #6 0x00038630 in js_AtomizeChars (cx=0x161c58, chars=0x3e1bd0, length=5, flags=0) at jsatom.cpp:756 #7 0x000dac14 in js_GetToken (cx=0x161c58, ts=0xbeb3b8f8) at jsscan.cpp:1261 #8 0x000d9a04 in js_PeekToken (cx=0x161c58, ts=0xbeb3b8f8) at jsscan.cpp:928 #9 0x000b6330 in Statements (cx=0x161c58, ts=0xbeb3b8f8, tc=0xbeb3b7d0) at jsparse.cpp:1492 #10 0x000b4a6c in FunctionBody (cx=0x161c58, ts=0xbeb3b8f8, tc=0xbeb3b7d0) at jsparse.cpp:896 #11 0x000b4d38 in js_CompileFunctionBody (cx=0x161c58, fun=0x17a888, principals=0x0, chars=0x4055e008, length=688888, filename=0x179f99 "typein", lineno=10) at jsparse.cpp:977 #12 0x0007236c in Function (cx=0x161c58, obj=0x17a888, argc=1, argv=0x17b080, rval=0xbeb3bc4c) at jsfun.cpp:2013 #13 0x0007f230 in js_Invoke (cx=0x161c58, argc=1, vp=0x17b078, flags=1) at jsinterp.cpp:1314 ---Type <return> to continue, or q <return> to quit--- #14 0x00080d08 in js_InvokeConstructor (cx=0x161c58, argc=1, vp=0x17b078) at jsinterp.cpp:1886 #15 0x0011e868 in js_Interpret (cx=0x161c58) at jsinterp.cpp:3903 #16 0x0007f2c4 in js_Invoke (cx=0x161c58, argc=0, vp=0x17b070, flags=0) at jsinterp.cpp:1331 #17 0x0007f61c in js_InternalInvoke (cx=0x161c58, obj=0x0, fval=1528448, flags=0, argc=0, argv=0x0, rval=0x1728d8) at jsinterp.cpp:1387 #18 0x0002ac6c in JS_CallFunctionValue (cx=0x161c58, obj=0x0, fval=1528448, argc=0, argv=0x0, rval=0x1728d8) at jsapi.cpp:5084 #19 0x00010560 in DoScatteredWork (cx=0x161c58, td=0x3e1810) at js.cpp:2574 #20 0x00010c34 in Scatter (cx=0x161c58, argc=1, vp=0x17b054) at js.cpp:2732 #21 0x00125248 in js_Interpret (cx=0x161c58) at jsinterp.cpp:4928 #22 0x0007fc50 in js_Execute (cx=0x161c58, chain=0x175000, script=0x3e1520, down=0x0, flags=0, result=0xbeb3d500) at jsinterp.cpp:1553 #23 0x0002a370 in JS_ExecuteScript (cx=0x161c58, obj=0x175000, script=0x3e1520, rval=0xbeb3d500) at jsapi.cpp:4925 #24 0x0000a8f8 in Process (cx=0x161c58, obj=0x175000, filename=0x0, forceTTY=0) at js.cpp:310 #25 0x0000b368 in ProcessArgs (cx=0x161c58, obj=0x175000, argv=0xbeb3d738, argc=0) at js.cpp:558 #26 0x000125f0 in main (argc=0, argv=0xbeb3d738, envp=0xbeb3d73c)
(In reply to comment #19) > Hmm I have receive one more core-dump.... > > #2 0x000f4890 in JS_Assert (s=0x14095c "Thin_GetWait(tl->owner)", > file=0x14077c "jslock.cpp", ln=1070) at jsutil.cpp:63 > #3 0x00087690 in ThinUnlock (tl=0x15aad4, me=1449456) at jslock.cpp:1070 With this core-dump at the frame 3 in ThinUnlock what is the value of me and tl->owner ?
#2 0x000f4890 in JS_Assert (s=0x14095c "Thin_GetWait(tl->owner)", file=0x14077c "jslock.cpp", ln=1070) at jsutil.cpp:63 #3 0x00087690 in ThinUnlock (tl=0x15aaec, me=1106878472) at jslock.cpp:1070 (gdb) p tl $1 = (JSThinLock *) 0x15aaec (gdb) p *tl $2 = {owner = 1106878472, fat = 0x0} (gdb)
This version of NativeCompareAndSwap works much better and without crashes... At least testcase works always fine, without crashes
Attachment #330623 - Flags: review?(igor)
This should be more correct
Attachment #330623 - Attachment is obsolete: true
Attachment #330626 - Flags: review?(igor)
Attachment #330623 - Flags: review?(igor)
Blocks: 429387
Comment on attachment 330626 [details] [diff] [review] Right arm NativeCompareAndSwap version 2 >diff -r f4c88d2be5bd js/src/jslock.cpp >--- a/js/src/jslock.cpp Mon Jul 21 12:15:18 2008 -0700 >+++ b/js/src/jslock.cpp Mon Jul 21 22:36:49 2008 +0300 >@@ -187,8 +187,12 @@ static JS_INLINE int Add comments here to explain that __kernel_cmpxchg is broken on N810 as it may fail when in fact "w" points to the old value and cite this bug number. > static JS_INLINE int > NativeCompareAndSwap(jsword *w, jsword ov, jsword nv) > { >- volatile int *vp = (volatile int*)w; >- return !__kernel_cmpxchg(ov, nv, vp); >+ volatile PRInt32 *vp = (volatile int*)w; That cast should be (volatile PRInt32*) w. >+ PRInt32 failed = 1; Existing nit: SM style requires a blank line after local declarations. >+ do { >+ failed = __kernel_cmpxchg(ov, nv, vp); Nit: the indent unit for SM is 4 spaces. >+ } while (failed && *vp == ov); >+ return !failed; Nit: add JS_ASSERT(ov != nv)before the loop as the conditional is essential for the loop termination.
>Add comments here to explain that __kernel_cmpxchg is broken on N810 as it may >fail when in fact "w" points to the old value and cite this bug number I don't think that it is the bug on N810... If you will grep google for __kernel_cmpxchg, then you will find a lot of implementations similar to this. Current NativeCompareAndSwap implementation just wrong.
Attachment #330626 - Attachment is obsolete: true
Attachment #330650 - Flags: review?(igor)
Attachment #330626 - Flags: review?(igor)
Comment on attachment 330650 [details] [diff] [review] Fixed style and JS_ASSERT I can push this to mozilla-central if required.
Attachment #330650 - Flags: review?(igor) → review+
(In reply to comment #26) > (From update of attachment 330650 [details] [diff] [review]) > I can push this to mozilla-central if required. > I think it is good idea... As I remember I have only physical rights for committing...
I changed the title to reflect the real nature of this bug: it is precisely the problem raised in the bug 429387 comment 1 item 2).
Assignee: igor → romaxa
No longer blocks: 440184, 444076
Summary: Workaround for GCC bugs with inline functions on N810 (arm). → Workaround for false failures of __kernel_cmpxchg on some ARM platforms
(In reply to comment #28) > I changed the title to reflect the real nature of this bug: it is precisely the > problem raised in the bug 429387 comment 1 item 2). That should be bug 429387 comment 0 item 2)
Comment on attachment 330650 [details] [diff] [review] Fixed style and JS_ASSERT Asking for an extra review just to be sure.
Attachment #330650 - Flags: review?(doug.turner)
Comment on attachment 330650 [details] [diff] [review] Fixed style and JS_ASSERT > static JS_INLINE int > NativeCompareAndSwap(jsword *w, jsword ov, jsword nv) > { >- volatile int *vp = (volatile int*)w; >- return !__kernel_cmpxchg(ov, nv, vp); >+ volatile PRInt32 *vp = (volatile PRInt32*)w; >+ PRInt32 failed = 1; >+ >+ JS_ASSERT(ov != nv); >+ do { >+ failed = __kernel_cmpxchg(ov, nv, vp); >+ } while (failed && *vp == ov); >+ return !failed; So the bug that's being worked around is that __kernel_cmpxchg is returning failure (nonzero) even though *vp == ov, i.e., that the exchange should've happened? (And if that happens, keep trying until it either succeeds or until ov becomes stale.) If so, that looks fine, though it's a pretty odd failure.
confirmed. i have seen this in todays build of fennec. Assertion failure: Thin_GetWait(tl->owner), at /home/dougt/builds/mozilla/src/js/src/jslock.cpp:1062
Keywords: mobile
Attachment #330650 - Flags: review?(doug.turner) → review?(brendan)
Comment on attachment 330650 [details] [diff] [review] Fixed style and JS_ASSERT igor, this looks right. it might be good to included a comment why we have to loop here.
Attachment #330650 - Flags: review?(brendan)
Comment on attachment 330717 [details] [diff] [review] Add comment about loop and this bug. >diff -r f4c88d2be5bd js/src/jslock.cpp >--- a/js/src/jslock.cpp Mon Jul 21 12:15:18 2008 -0700 >+++ b/js/src/jslock.cpp Tue Jul 22 08:26:41 2008 +0300 >@@ -187,8 +187,17 @@ static JS_INLINE int > static JS_INLINE int > NativeCompareAndSwap(jsword *w, jsword ov, jsword nv) > { >- volatile int *vp = (volatile int*)w; >- return !__kernel_cmpxchg(ov, nv, vp); >+ volatile PRInt32 *vp = (volatile PRInt32*)w; >+ PRInt32 failed = 1; >+ >+ JS_ASSERT(ov != nv); >+ do { >+ /* Loop until a __kernel_cmpxchg succeeds. >+ * See https://bugzilla.mozilla.org/show_bug.cgi?id=446169 >+ */ Nit: The style for multiline comments in SM is /* * line1 * ... * lineN */ But here there is no need to write bugzilla URL, just "See bug 446169." is enough and the whole comment would fit one line with /* Text. */ style.
Comment on attachment 330717 [details] [diff] [review] Add comment about loop and this bug. Couple more nits: There are comments before the function, it is better add the comments clarifying __kernel_cmpxchg behaviur there while fixing the comment style. > NativeCompareAndSwap(jsword *w, jsword ov, jsword nv) > { >- volatile int *vp = (volatile int*)w; >- return !__kernel_cmpxchg(ov, nv, vp); >+ volatile PRInt32 *vp = (volatile PRInt32*)w; Pre-existing nit: SM style requires a space after * that follow the type even inside the cast and the cast should have a space after it like in (foo *) bar. But the cast is not necessary if jsword would be used as the type for vp delegating the job of verifying that jsword is int32 to the compiler when __kernel_cmpxchg is invoked. That would also allow to remove the static assert above the function.
Attached patch Some fixes for comment 35 (obsolete) — Splinter Review
Attachment #330717 - Attachment is obsolete: true
(In reply to comment #37) > Created an attachment (id=330746) [details] > Some fixes for comment 35 Would also consider the comment 36?
> Would also consider the comment 36? > I did not get understand properly what you mean about jsword and int32 casting? If I will remove casting then I have compiler error: error: invalid conversion from `jsword*' to `volatile int*' --- volatile int *vp = (volatile int *)w; +++ volatile int *vp = (volatile int *) w; ? it is ok?
(In reply to comment #39) > > Would also consider the comment 36? > > > I did not get understand properly what you mean about jsword and int32 casting? > If I will remove casting then I have compiler error: > error: invalid conversion from `jsword*' to `volatile int*' Right, on ARM jsword is long, not int, so the cast is necessary. > --- volatile int *vp = (volatile int *)w; > +++ volatile int *vp = (volatile int *) w; > ? it is ok? That will do.
Attached patch Style fix cm 36Splinter Review
> +++ volatile int *vp = (volatile int *) w;
Attachment #330746 - Attachment is obsolete: true
(In reply to comment #41) > Created an attachment (id=330765) [details] > Style fix cm 36 > > > +++ volatile int *vp = (volatile int *) w; Thanks, I will push the patch.
Attachment #330765 - Flags: review+
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
/cvsroot/mozilla/js/tests/js1_8/extensions/regress-446169-01.js,v <-- regress-446169-01.js initial revision: 1.1 /cvsroot/mozilla/js/tests/js1_8/extensions/regress-446169-02.js,v <-- regress-446169-02.js initial revision: 1.1 mozilla-central: changeset: 16464:890388d172ba
Flags: in-testsuite+
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: