Closed Bug 946072 Opened 12 years ago Closed 12 years ago

Assertion failure: 0 == rv, at nsprpub/pr/src/pthreads/ptthread.c

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: gkw, Assigned: wtc)

References

Details

(Keywords: assertion, testcase)

Attachments

(1 file, 2 obsolete files)

Attached file stack
With unreliable and basically unreproducible testcases, js debug shell sometimes asserts on m-c changeset c93cfe704487 at Assertion failure: 0 == rv, at nsprpub/pr/src/pthreads/ptthread.c My configure flags are: AR=ar sh ./configure --enable-optimize --enable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --enable-methodjit --enable-type-inference --disable-tests --enable-more-deterministic --enable-exact-rooting --enable-elf-hack --enable-stdcxx-compat --enable-warnings-as-errors --enable-signmar --disable-elf-hack --enable-js-diagnostics --with-intl-api=build --enable-ctypes --disable-shared-js --enable-jemalloc --with-ccache --enable-threadsafe <other NSPR flags> There are possible dupes as bug 942476 or bug 775080. Filed this because this can be quite annoying, and I now have a core file for anyone willing to take a look.
Wan-Teh, any ideas on what caused this or how to move forward?
Flags: needinfo?(wtc)
It's weird that we're asserting in NSPR. According to the stack, the failed assertion is on line 848, which is this one: static void _pt_thread_death(void *arg) { ... if (NULL == thred) { rv = pthread_setspecific(pt_book.key, NULL); PR_ASSERT(0 == rv); } } Race condition on shutdown, maybe?
Gary: I wonder if this is a regression introduced by the NSPR upstream change https://hg.mozilla.org/projects/nspr/rev/08840926f193. It would be nice if we can get the value of |rv| (an error code) when the assertion failed. I guess it is EINVAL ("The key value is invalid."). Does this assertion start to fail in mozilla-central after 2013-11-22?
Flags: needinfo?(wtc)
> Does this assertion start to fail in mozilla-central after 2013-11-22? Likely, bug 942476 was filed on 2013-11-23.
Flags: needinfo?(wtc)
Gary, Jason: I'm interested in getting to the bottom of this assertion failure. If you tell me how to run the test case by hand, I can try to reproduce it. Unfortunately you seemed to say that you don't have a reliable way to reproduce the assertion failure. This area of the NSPR code is very easy to do right. If I can't reproduce this assertion failure, I'll see if I can find an alternative fix for NSPR bug 749849.
Flags: needinfo?(wtc)
Gary, I will back out that NSPR patch as an experiment. If the assertion stops failing, we will know the assertion failure is caused by that patch.
Assignee: general → wtc
Status: NEW → ASSIGNED
Attachment #8346862 - Flags: review?(gary)
Fixed a typo in the README file.
Attachment #8346862 - Attachment is obsolete: true
Attachment #8346862 - Flags: review?(gary)
Attachment #8346864 - Flags: review?(gary)
Wan-Teh, I'm not sure I can review here. However I can definitely test this patch, and you can set feedback? from me.
Comment on attachment 8346864 [details] [diff] [review] Back out an NSPR patch to see if it introduced the assertion failure, v2 Gary: Thanks for helping me test this patch. Does that mean you can reproduce the assertion failure reliably? Regarding the code review, let me make my request official. I am an NSPR module owner (please verify that at https://wiki.mozilla.org/Modules/All#NSPR). I'd like to ask you to review this patch to back out an NSPR checkin temporarily. This Mozilla-private patch will be properly documented in the nsprpub/patches directory. I also sent you a feedback request, if you still don't feel comfortable reviewing this patch.
Attachment #8346864 - Flags: feedback?(gary)
> Thanks for helping me test this patch. Does that mean you can reproduce > the assertion failure reliably? No, I still cannot reproduce it reliably, but I can throw a fuzzing run at it to see if it comes back. > Regarding the code review, let me make my request official. I understand. I also know you are an NSPR module owner. It's just that I don't really write in C so I don't feel like the best person to review. jorendorff might be a better person - although both of us might not know much about NSPR, he sure knows C code! > I also sent you a feedback request, if you still don't feel comfortable > reviewing this patch. This works. So I'll clear the review? request for now.
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #10) > > No, I still cannot reproduce it reliably, but I can throw a fuzzing run at > it to see if it comes back. Good. Please test both with and without this patch. Thanks.
QA Contact: general
Comment on attachment 8346864 [details] [diff] [review] Back out an NSPR patch to see if it introduced the assertion failure, v2 After testing this for almost a week, I have not yet seen this come back, but I can't yet say for sure.
Attachment #8346864 - Flags: feedback?(gary) → feedback+
Flags: needinfo?(wtc)
Comment on attachment 8346864 [details] [diff] [review] Back out an NSPR patch to see if it introduced the assertion failure, v2 Gary: thank you for the status update. Did you also run fuzz testing *without* this patch? We need the results of that testing as a comparison.
Flags: needinfo?(wtc)
> Did you also run fuzz testing *without* this patch? We need the > results of that testing as a comparison. Yes, I also did, but due to its rarity, I didn't also hit the assert. I've spun up more cores to try and hit it (without the patch).
> Yes, I also did, but due to its rarity, I didn't also hit the assert. I've > spun up more cores to try and hit it (without the patch). Unfortunately, without the patch I have also been unable to reproduce the assert. So this bug is probably WFM.
If I understand correctly: - you saw this assertion a few times in the past - recently, you haven't seen the assertion again - you had tested with and without the proposed backout patch, and regardless of this change, you haven't seen the assertion again recently - we still don't know if the change from bug 749849 is related Maybe the problems that you had experienced were caused by an incomplete rebuild, i.e. the compiler might not have rebuilt some parts.
Wan-Teh, I propose to undo the experimental backout, to check if anyone reports the assertion again.
(In reply to Kai Engert (:kaie) from comment #16) > If I understand correctly: Yes, your summary sounds about right. I re-checked the machine and it seems that I don't see this assertion (with and without the patch in comment 7) > Wan-Teh, I propose to undo the experimental backout, to check if anyone > reports the assertion again. Was the backout ever landed? I thought it was only a patch here (comment 7) in bugzilla, unless I'm mistaken. (which I probably am)
Marked the bug WORKSFORME. I didn't land the experimental backout patch. I will mark the patch obsolete.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Attachment #8346864 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: