Closed
Bug 446169
Opened 17 years ago
Closed 17 years ago
Workaround for false failures of __kernel_cmpxchg on some ARM platforms
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: igor, Assigned: romaxa)
References
()
Details
(Keywords: mobile)
Attachments
(3 files, 8 obsolete files)
|
2.46 KB,
text/plain
|
Details | |
|
590 bytes,
patch
|
igor
:
review+
|
Details | Diff | Splinter Review |
|
620 bytes,
patch
|
igor
:
review+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #445818 +++
The changes introduced in bug 444076 and bug 440184 triggered a GCC bug on N810 (arm) when the compiler improperly expands inline functions. See bug 445818 for relevant details.
| Assignee | ||
Comment 1•17 years ago
|
||
Attachment #330371 -
Flags: review?(igor)
| Reporter | ||
Comment 2•17 years ago
|
||
Comment on attachment 330371 [details] [diff] [review]
Proposed fix
>diff -r 8f3ff1995383 configure.in
>--- a/configure.in Sat Jul 19 12:38:25 2008 +0200
>+++ b/configure.in Sat Jul 19 15:11:03 2008 +0300
>@@ -379,6 +379,9 @@ if test "$GNU_CC"; then
> if `$CC -print-prog-name=ld` -v 2>&1 | grep -c GNU >/dev/null; then
> GCC_USE_GNU_LD=1
> fi
>+fi
>+if test "`echo | $CXX_VERSION -v 2>&1 | grep -c CodeSourcery`" != "0"; then
>+ AC_DEFINE(CSGCC)
> fi
It is not possible to deduce the presence of CodeSourcery from the C preprocessor alone, is it? I am asking because when building a standalone JS shell the code uses separated Makefile.ref meaning that the workaround would fix only the browser.
>+#if defined(CSGCC) && \
>+ (__GNUC__ == 3 && __GNUC_MINOR__ == 4 && __GNUC_PATCHLEVEL__ == 4)
>+__attribute__ ((noinline))
>+#else
>+JS_INLINE
>+#endif
If the help from the configure script is required, it would be better to have something like:
#if HAS_CSGCC_INLINE_BUG
__attribute__ ((noinline))
#else
JS_INLINE
#endif
and push to the configure script the job of defining HAS_CSGCC_INLINE_BUG.
Attachment #330371 -
Flags: review?(igor)
| Assignee | ||
Comment 3•17 years ago
|
||
> It is not possible to deduce the presence of CodeSourcery from the C
> preprocessor alone, is it?
No, at least I did not found any special flag in gcc defines, see attachment
> and push to the configure script the job of defining HAS_CSGCC_INLINE_BUG.
>
Ok, I will fix it.
| Reporter | ||
Updated•17 years ago
|
Attachment #330377 -
Attachment mime type: application/octet-stream → text/plain
| Reporter | ||
Comment 4•17 years ago
|
||
Comment on attachment 330377 [details]
echo | gcc -g3 -E -
>#define __VERSION__ "3.4.4 (release) (CodeSourcery ARM 2005q3-2)"
If C preprocessor would support string matching, that would give a preprocessor-only solution. But in the real world the configure changes are required.
| Assignee | ||
Comment 5•17 years ago
|
||
Attachment #330371 -
Attachment is obsolete: true
| Reporter | ||
Comment 6•17 years ago
|
||
Comment on attachment 330379 [details] [diff] [review]
Proposed fix r2
>-static JS_INLINE int
>+static int
>+#if HAS_CSGCC_INLINE_BUG
>+__attribute__ ((noinline))
>+#else
>+JS_INLINE
>+#endif
> NativeCompareAndSwap(jsword *w, jsword ov, jsword nv)
Nit: the patch changes the canonical ordering style inline_attribute return_type into return_type inline_attribute. r+ with that fixed.
| Reporter | ||
Comment 7•17 years ago
|
||
Comment on attachment 330379 [details] [diff] [review]
Proposed fix r2
Asking for a review of configure changes: they are required to workaround a bug in the compiler.
Attachment #330379 -
Flags: review?(ted.mielczarek)
| Assignee | ||
Comment 8•17 years ago
|
||
> return_type into return_type inline_attribute. r+ with that fixed.
>
I guess it is ok?
| Assignee | ||
Comment 9•17 years ago
|
||
Seems only one noinline not enough fixing that crash.
Attachment #330380 -
Attachment is obsolete: true
Attachment #330440 -
Flags: review?(igor)
| Reporter | ||
Updated•17 years ago
|
Attachment #330440 -
Flags: review?(igor) → review+
| Reporter | ||
Updated•17 years ago
|
Attachment #330379 -
Attachment is obsolete: true
Attachment #330379 -
Flags: review?(ted.mielczarek)
| Reporter | ||
Updated•17 years ago
|
Attachment #330440 -
Flags: review?(ted.mielczarek)
| Assignee | ||
Comment 10•17 years ago
|
||
Unfortunately even with noinline for all functions it still crashes...
Seems problem not in compiler and inlines.
| Assignee | ||
Comment 11•17 years ago
|
||
I have revert commits ed28977f2452 and 215bf1ed8e2e, and only then crashes disappears.
| Reporter | ||
Comment 12•17 years ago
|
||
(In reply to comment #11)
> I have revert commits ed28977f2452 and 215bf1ed8e2e, and only then crashes
> disappears.
Would the crash still happen with reverting just ed28977f2452 and keeping 215bf1ed8e2e? The former should not change any code on ARM with KUSER defined.
| Reporter | ||
Comment 13•17 years ago
|
||
(In reply to comment #12)
> (In reply to comment #11)
> > I have revert commits ed28977f2452 and 215bf1ed8e2e, and only then crashes
> > disappears.
>
> Would the crash still happen with reverting just ed28977f2452 and keeping
> 215bf1ed8e2e? The former should not change any code on ARM with KUSER defined.
I meant the latter, 215bf1ed8e2e. It is only ed28977f2452 that introduced the substantial changes.
| Assignee | ||
Comment 14•17 years ago
|
||
I have check more and found that commits ed28977f2452 and 215bf1ed8e2e, also not reasons for crash.
It still crashes if all that commits reverted
I need to test a bit more, but looks like crash disappears with next simple fix:
--- mozilla/js/src/jslock.h.orig<-->2008-07-20 23:39:31.000000000 +0300
+++ mozilla/js/src/jslock.h>2008-07-20 23:42:11.000000000 +0300
@@ -56,11 +56,10 @@ JS_BEGIN_EXTERN_C
............
- defined(USE_ARM_KUSER) || \
............
seems something wrong with jslock implementation.
| Reporter | ||
Comment 15•17 years ago
|
||
Comment on attachment 330440 [details] [diff] [review]
noinline all JS_INLINE in jslock.cpp
As reported the bug is unrelated to inline changes.
Attachment #330440 -
Attachment is obsolete: true
Attachment #330440 -
Flags: review?(ted.mielczarek)
| Reporter | ||
Comment 16•17 years ago
|
||
(In reply to comment #14)
> I have check more and found that commits ed28977f2452 and 215bf1ed8e2e, also
> not reasons for crash.
The crashes happens on the real hardware, not on some emulator or development board, right? And only with non-debug builds? I wish I would still have my N800 to try - but it is "borrowed" by one of the relatives ;)
> seems something wrong with jslock implementation.
The only difference with other platforms is compare-and-swap implementation. On x86 and x86-64 the code was tested with embeddings that use heavy multithreading, not just the browser. So it is unlikely that the bug in platform-independent code.
| Assignee | ||
Comment 17•17 years ago
|
||
Yep, this is real N810, and non-debug build
Ok, seems before I was building thread safe js without USE_ARM_KUSER...
and it was not crashes...
Now I tried the same testcase again with thread-safe jsshell compiled with USE_ARM_KUSER define:
var array = [{}, {}, {}, {}];
function test() {
for (var i = 0; i != 42*42*42; ++i) {
var obj = array[i % array.length];
obj["a"+i] = 1;
var tmp = {};
tmp["a"+i] = 2;
}
}
scatter([test, test, test, test]);
And it is 99% reproducible crash.
Here is the output:
js> scatter([test, test, test, test]);
Assertion failure: Thin_GetWait(tl->owner), at jslock.cpp:1070
Aborted (core dumped)
#1 0x401e6450 in abort () from /lib/libc.so.6
#2 0x000f4890 in JS_Assert (s=0x14095c "Thin_GetWait(tl->owner)",
file=0x14077c "jslock.cpp", ln=1070) at jsutil.cpp:63
#3 0x00087690 in ThinUnlock (tl=0x172900, me=1585152) at jslock.cpp:1070
#4 0x00087a24 in js_UnlockTitle (cx=0x17e5f0, title=0x1728fc)
at jslock.cpp:1176
#5 0x00087e6c in js_UnlockObj (cx=0x17e5f0, obj=0x175000) at jslock.cpp:1287
#6 0x00125b80 in js_Interpret (cx=0x17e5f0) at jsinterp.cpp:5041
#7 0x0007f2c4 in js_Invoke (cx=0x17e5f0, argc=0, vp=0x193088, flags=0)
at jsinterp.cpp:1331
#8 0x0007f61c in js_InternalInvoke (cx=0x17e5f0, obj=0x0, fval=1528640,
flags=0, argc=0, argv=0x0, rval=0x17d7bc) at jsinterp.cpp:1387
#9 0x0002ac6c in JS_CallFunctionValue (cx=0x17e5f0, obj=0x0, fval=1528640,
argc=0, argv=0x0, rval=0x17d7bc) at jsapi.cpp:5084
#10 0x00010560 in DoScatteredWork (cx=0x17e5f0, td=0x17d054) at js.cpp:2574
#11 0x00010640 in RunScatterThread (arg=0x17d054) at js.cpp:2604
...........
| Reporter | ||
Comment 18•17 years ago
|
||
Can you also try in a debug shell the test case below as it should provide a better match to the original problem and would show the problem in simpler to analyze settings.
var array = [];
for (var i = 0; i != 100*1000; i += 2) {
array[i] = i;
array[i+1] = i;
}
var src = array.join(';x');
function f() {
new Function(src);
}
scatter([f, f, f, f]);
| Assignee | ||
Comment 19•17 years ago
|
||
Hmm I have receive one more core-dump....
#2 0x000f4890 in JS_Assert (s=0x14095c "Thin_GetWait(tl->owner)",
file=0x14077c "jslock.cpp", ln=1070) at jsutil.cpp:63
#3 0x00087690 in ThinUnlock (tl=0x15aad4, me=1449456) at jslock.cpp:1070
#4 0x00087604 in js_Unlock (cx=0x161c58, tl=0x15aad4) at jslock.cpp:1090
#5 0x000383f4 in js_AtomizeString (cx=0x161c58, str=0xbeb3b1c8, flags=8)
at jsatom.cpp:704
#6 0x00038630 in js_AtomizeChars (cx=0x161c58, chars=0x3e1bd0, length=5,
flags=0) at jsatom.cpp:756
#7 0x000dac14 in js_GetToken (cx=0x161c58, ts=0xbeb3b8f8) at jsscan.cpp:1261
#8 0x000d9a04 in js_PeekToken (cx=0x161c58, ts=0xbeb3b8f8) at jsscan.cpp:928
#9 0x000b6330 in Statements (cx=0x161c58, ts=0xbeb3b8f8, tc=0xbeb3b7d0)
at jsparse.cpp:1492
#10 0x000b4a6c in FunctionBody (cx=0x161c58, ts=0xbeb3b8f8, tc=0xbeb3b7d0)
at jsparse.cpp:896
#11 0x000b4d38 in js_CompileFunctionBody (cx=0x161c58, fun=0x17a888,
principals=0x0, chars=0x4055e008, length=688888,
filename=0x179f99 "typein", lineno=10) at jsparse.cpp:977
#12 0x0007236c in Function (cx=0x161c58, obj=0x17a888, argc=1, argv=0x17b080,
rval=0xbeb3bc4c) at jsfun.cpp:2013
#13 0x0007f230 in js_Invoke (cx=0x161c58, argc=1, vp=0x17b078, flags=1)
at jsinterp.cpp:1314
---Type <return> to continue, or q <return> to quit---
#14 0x00080d08 in js_InvokeConstructor (cx=0x161c58, argc=1, vp=0x17b078)
at jsinterp.cpp:1886
#15 0x0011e868 in js_Interpret (cx=0x161c58) at jsinterp.cpp:3903
#16 0x0007f2c4 in js_Invoke (cx=0x161c58, argc=0, vp=0x17b070, flags=0)
at jsinterp.cpp:1331
#17 0x0007f61c in js_InternalInvoke (cx=0x161c58, obj=0x0, fval=1528448,
flags=0, argc=0, argv=0x0, rval=0x1728d8) at jsinterp.cpp:1387
#18 0x0002ac6c in JS_CallFunctionValue (cx=0x161c58, obj=0x0, fval=1528448,
argc=0, argv=0x0, rval=0x1728d8) at jsapi.cpp:5084
#19 0x00010560 in DoScatteredWork (cx=0x161c58, td=0x3e1810) at js.cpp:2574
#20 0x00010c34 in Scatter (cx=0x161c58, argc=1, vp=0x17b054) at js.cpp:2732
#21 0x00125248 in js_Interpret (cx=0x161c58) at jsinterp.cpp:4928
#22 0x0007fc50 in js_Execute (cx=0x161c58, chain=0x175000, script=0x3e1520,
down=0x0, flags=0, result=0xbeb3d500) at jsinterp.cpp:1553
#23 0x0002a370 in JS_ExecuteScript (cx=0x161c58, obj=0x175000,
script=0x3e1520, rval=0xbeb3d500) at jsapi.cpp:4925
#24 0x0000a8f8 in Process (cx=0x161c58, obj=0x175000, filename=0x0, forceTTY=0)
at js.cpp:310
#25 0x0000b368 in ProcessArgs (cx=0x161c58, obj=0x175000, argv=0xbeb3d738,
argc=0) at js.cpp:558
#26 0x000125f0 in main (argc=0, argv=0xbeb3d738, envp=0xbeb3d73c)
| Reporter | ||
Comment 20•17 years ago
|
||
(In reply to comment #19)
> Hmm I have receive one more core-dump....
>
> #2 0x000f4890 in JS_Assert (s=0x14095c "Thin_GetWait(tl->owner)",
> file=0x14077c "jslock.cpp", ln=1070) at jsutil.cpp:63
> #3 0x00087690 in ThinUnlock (tl=0x15aad4, me=1449456) at jslock.cpp:1070
With this core-dump at the frame 3 in ThinUnlock what is the value of me and tl->owner ?
| Assignee | ||
Comment 21•17 years ago
|
||
#2 0x000f4890 in JS_Assert (s=0x14095c "Thin_GetWait(tl->owner)", file=0x14077c "jslock.cpp", ln=1070) at jsutil.cpp:63
#3 0x00087690 in ThinUnlock (tl=0x15aaec, me=1106878472) at jslock.cpp:1070
(gdb) p tl
$1 = (JSThinLock *) 0x15aaec
(gdb) p *tl
$2 = {owner = 1106878472, fat = 0x0}
(gdb)
| Assignee | ||
Comment 22•17 years ago
|
||
This version of NativeCompareAndSwap works much better and without crashes...
At least testcase works always fine, without crashes
Attachment #330623 -
Flags: review?(igor)
| Assignee | ||
Comment 23•17 years ago
|
||
This should be more correct
Attachment #330623 -
Attachment is obsolete: true
Attachment #330626 -
Flags: review?(igor)
Attachment #330623 -
Flags: review?(igor)
| Reporter | ||
Comment 24•17 years ago
|
||
Comment on attachment 330626 [details] [diff] [review]
Right arm NativeCompareAndSwap version 2
>diff -r f4c88d2be5bd js/src/jslock.cpp
>--- a/js/src/jslock.cpp Mon Jul 21 12:15:18 2008 -0700
>+++ b/js/src/jslock.cpp Mon Jul 21 22:36:49 2008 +0300
>@@ -187,8 +187,12 @@ static JS_INLINE int
Add comments here to explain that __kernel_cmpxchg is broken on N810 as it may fail when in fact "w" points to the old value and cite this bug number.
> static JS_INLINE int
> NativeCompareAndSwap(jsword *w, jsword ov, jsword nv)
> {
>- volatile int *vp = (volatile int*)w;
>- return !__kernel_cmpxchg(ov, nv, vp);
>+ volatile PRInt32 *vp = (volatile int*)w;
That cast should be (volatile PRInt32*) w.
>+ PRInt32 failed = 1;
Existing nit: SM style requires a blank line after local declarations.
>+ do {
>+ failed = __kernel_cmpxchg(ov, nv, vp);
Nit: the indent unit for SM is 4 spaces.
>+ } while (failed && *vp == ov);
>+ return !failed;
Nit: add JS_ASSERT(ov != nv)before the loop as the conditional is essential for the loop termination.
| Assignee | ||
Comment 25•17 years ago
|
||
>Add comments here to explain that __kernel_cmpxchg is broken on N810 as it may
>fail when in fact "w" points to the old value and cite this bug number
I don't think that it is the bug on N810... If you will grep google for __kernel_cmpxchg, then you will find a lot of implementations similar to this.
Current NativeCompareAndSwap implementation just wrong.
Attachment #330626 -
Attachment is obsolete: true
Attachment #330650 -
Flags: review?(igor)
Attachment #330626 -
Flags: review?(igor)
| Reporter | ||
Comment 26•17 years ago
|
||
Comment on attachment 330650 [details] [diff] [review]
Fixed style and JS_ASSERT
I can push this to mozilla-central if required.
Attachment #330650 -
Flags: review?(igor) → review+
| Assignee | ||
Comment 27•17 years ago
|
||
(In reply to comment #26)
> (From update of attachment 330650 [details] [diff] [review])
> I can push this to mozilla-central if required.
>
I think it is good idea... As I remember I have only physical rights for committing...
| Reporter | ||
Comment 28•17 years ago
|
||
I changed the title to reflect the real nature of this bug: it is precisely the problem raised in the bug 429387 comment 1 item 2).
| Reporter | ||
Comment 29•17 years ago
|
||
(In reply to comment #28)
> I changed the title to reflect the real nature of this bug: it is precisely the
> problem raised in the bug 429387 comment 1 item 2).
That should be bug 429387 comment 0 item 2)
| Reporter | ||
Comment 30•17 years ago
|
||
Comment on attachment 330650 [details] [diff] [review]
Fixed style and JS_ASSERT
Asking for an extra review just to be sure.
Attachment #330650 -
Flags: review?(doug.turner)
Comment on attachment 330650 [details] [diff] [review]
Fixed style and JS_ASSERT
> static JS_INLINE int
> NativeCompareAndSwap(jsword *w, jsword ov, jsword nv)
> {
>- volatile int *vp = (volatile int*)w;
>- return !__kernel_cmpxchg(ov, nv, vp);
>+ volatile PRInt32 *vp = (volatile PRInt32*)w;
>+ PRInt32 failed = 1;
>+
>+ JS_ASSERT(ov != nv);
>+ do {
>+ failed = __kernel_cmpxchg(ov, nv, vp);
>+ } while (failed && *vp == ov);
>+ return !failed;
So the bug that's being worked around is that __kernel_cmpxchg is returning failure (nonzero) even though *vp == ov, i.e., that the exchange should've happened? (And if that happens, keep trying until it either succeeds or until ov becomes stale.) If so, that looks fine, though it's a pretty odd failure.
Comment 32•17 years ago
|
||
confirmed. i have seen this in todays build of fennec.
Assertion failure: Thin_GetWait(tl->owner), at /home/dougt/builds/mozilla/src/js/src/jslock.cpp:1062
Updated•17 years ago
|
Attachment #330650 -
Flags: review?(doug.turner) → review?(brendan)
Comment 33•17 years ago
|
||
Comment on attachment 330650 [details] [diff] [review]
Fixed style and JS_ASSERT
igor, this looks right. it might be good to included a comment why we have to loop here.
Attachment #330650 -
Flags: review?(brendan)
| Assignee | ||
Comment 34•17 years ago
|
||
| Reporter | ||
Comment 35•17 years ago
|
||
Comment on attachment 330717 [details] [diff] [review]
Add comment about loop and this bug.
>diff -r f4c88d2be5bd js/src/jslock.cpp
>--- a/js/src/jslock.cpp Mon Jul 21 12:15:18 2008 -0700
>+++ b/js/src/jslock.cpp Tue Jul 22 08:26:41 2008 +0300
>@@ -187,8 +187,17 @@ static JS_INLINE int
> static JS_INLINE int
> NativeCompareAndSwap(jsword *w, jsword ov, jsword nv)
> {
>- volatile int *vp = (volatile int*)w;
>- return !__kernel_cmpxchg(ov, nv, vp);
>+ volatile PRInt32 *vp = (volatile PRInt32*)w;
>+ PRInt32 failed = 1;
>+
>+ JS_ASSERT(ov != nv);
>+ do {
>+ /* Loop until a __kernel_cmpxchg succeeds.
>+ * See https://bugzilla.mozilla.org/show_bug.cgi?id=446169
>+ */
Nit: The style for multiline comments in SM is
/*
* line1
* ...
* lineN
*/
But here there is no need to write bugzilla URL, just "See bug 446169." is enough and the whole comment would fit one line with /* Text. */ style.
| Reporter | ||
Comment 36•17 years ago
|
||
Comment on attachment 330717 [details] [diff] [review]
Add comment about loop and this bug.
Couple more nits:
There are comments before the function, it is better add the comments clarifying __kernel_cmpxchg behaviur there while fixing the comment style.
> NativeCompareAndSwap(jsword *w, jsword ov, jsword nv)
> {
>- volatile int *vp = (volatile int*)w;
>- return !__kernel_cmpxchg(ov, nv, vp);
>+ volatile PRInt32 *vp = (volatile PRInt32*)w;
Pre-existing nit: SM style requires a space after * that follow the type even inside the cast and the cast should have a space after it like in (foo *) bar. But the cast is not necessary if jsword would be used as the type for vp delegating the job of verifying that jsword is int32 to the compiler when __kernel_cmpxchg is invoked. That would also allow to remove the static assert above the function.
| Assignee | ||
Comment 37•17 years ago
|
||
Attachment #330717 -
Attachment is obsolete: true
| Reporter | ||
Comment 38•17 years ago
|
||
(In reply to comment #37)
> Created an attachment (id=330746) [details]
> Some fixes for comment 35
Would also consider the comment 36?
| Assignee | ||
Comment 39•17 years ago
|
||
> Would also consider the comment 36?
>
I did not get understand properly what you mean about jsword and int32 casting?
If I will remove casting then I have compiler error:
error: invalid conversion from `jsword*' to `volatile int*'
--- volatile int *vp = (volatile int *)w;
+++ volatile int *vp = (volatile int *) w;
? it is ok?
| Reporter | ||
Comment 40•17 years ago
|
||
(In reply to comment #39)
> > Would also consider the comment 36?
> >
> I did not get understand properly what you mean about jsword and int32 casting?
> If I will remove casting then I have compiler error:
> error: invalid conversion from `jsword*' to `volatile int*'
Right, on ARM jsword is long, not int, so the cast is necessary.
> --- volatile int *vp = (volatile int *)w;
> +++ volatile int *vp = (volatile int *) w;
> ? it is ok?
That will do.
| Assignee | ||
Comment 41•17 years ago
|
||
> +++ volatile int *vp = (volatile int *) w;
Attachment #330746 -
Attachment is obsolete: true
| Reporter | ||
Comment 42•17 years ago
|
||
(In reply to comment #41)
> Created an attachment (id=330765) [details]
> Style fix cm 36
>
> > +++ volatile int *vp = (volatile int *) w;
Thanks, I will push the patch.
| Reporter | ||
Updated•17 years ago
|
Attachment #330765 -
Flags: review+
| Reporter | ||
Comment 43•17 years ago
|
||
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment 44•17 years ago
|
||
/cvsroot/mozilla/js/tests/js1_8/extensions/regress-446169-01.js,v <-- regress-446169-01.js
initial revision: 1.1
/cvsroot/mozilla/js/tests/js1_8/extensions/regress-446169-02.js,v <-- regress-446169-02.js
initial revision: 1.1
mozilla-central: changeset: 16464:890388d172ba
Flags: in-testsuite+
Flags: in-litmus-
You need to log in
before you can comment on or make changes to this bug.
Description
•