Intermittent Assertion failure: 0 == rv, nsprpub/pr/src/pthreads/ptthread.c:288, PROCESS-CRASH | Main app process exited normally | application crashed [@ mozalloc_abort(char const*)]

RESOLVED FIXED in Firefox -esr52

Status

()

Core
Security: PSM
P3
normal
RESOLVED FIXED
3 years ago
12 days ago

People

(Reporter: nigelb, Assigned: mrbkap)

Tracking

({intermittent-failure})

Trunk
mozilla55
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 wontfix, firefox-esr52 fixed, firefox53 fixed, firefox54 fixed, firefox55 fixed)

Details

(Whiteboard: [psm-intermittent] [e10s-multi:+][stockwell fixed:product])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(2 attachments)

Comment hidden (empty)
(Reporter)

Updated

3 years ago
Component: General → Security: PSM
Product: Firefox → Core
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Comment 6

3 years ago
13 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 7
* fx-team: 5
* b2g-inbound: 1

Platform breakdown:
* osx-10-6: 13

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2015-10-12&endday=2015-10-18&tree=all

Comment 7

3 years ago
9 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* fx-team: 5
* mozilla-inbound: 4

Platform breakdown:
* osx-10-6: 9

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2015-10-19&endday=2015-10-25&tree=all

Comment 8

3 years ago
5 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 3
* mozilla-central: 1
* fx-team: 1

Platform breakdown:
* osx-10-6: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2015-10-26&endday=2015-11-01&tree=all
Blocks: 1211080
Blocks: 1211082
Blocks: 1219986
Blocks: 1242305
Blocks: 1202325
Blocks: 1202044
Mass whiteboard change to annotate PSM intermittent test failures as [psm-intermittent]. Filter on 31b932bd-1aad-4e29-9f4b-4cd864a3ffdc if that's important to you.
Whiteboard: [psm-intermittent]

Comment 10

2 years ago
5 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* autoland: 3
* mozilla-inbound: 2

Platform breakdown:
* osx-10-10: 5

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2016-06-27&endday=2016-07-03&tree=all

Comment 11

2 years ago
6 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-inbound: 4
* fx-team: 1
* autoland: 1

Platform breakdown:
* osx-10-10: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2016-07-04&endday=2016-07-10&tree=all

Comment 12

2 years ago
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
8 failures in 715 pushes (0.011 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* autoland: 4
* mozilla-inbound: 2
* mozilla-central: 1
* mozilla-aurora: 1

Platform breakdown:
* linux64: 4
* osx-10-10: 3
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2016-11-14&endday=2016-11-20&tree=all
6 failures in 694 pushes (0.009 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-inbound: 3
* autoland: 3

Platform breakdown:
* linux64: 4
* osx-10-10: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2016-11-28&endday=2016-12-04&tree=all
7 failures in 289 pushes (0.024 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-inbound: 4
* autoland: 2
* mozilla-aurora: 1

Platform breakdown:
* linux64: 5
* linux32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2016-12-05&endday=2016-12-11&tree=all
5 failures in 526 pushes (0.01 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-central: 4
* mozilla-inbound: 1

Platform breakdown:
* osx-10-10: 4
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2016-12-12&endday=2016-12-18&tree=all
8 failures in 722 pushes (0.011 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-inbound: 5
* autoland: 2
* mozilla-central: 1

Platform breakdown:
* osx-10-10: 5
* linux64: 2
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-01-09&endday=2017-01-15&tree=all
6 failures in 690 pushes (0.009 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-inbound: 3
* autoland: 2
* mozilla-central: 1

Platform breakdown:
* osx-10-10: 3
* linux64: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-01-16&endday=2017-01-22&tree=all
6 failures in 749 pushes (0.008 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-inbound: 3
* mozilla-beta: 1
* mozilla-aurora: 1
* autoland: 1

Platform breakdown:
* osx-10-10: 5
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-01-23&endday=2017-01-29&tree=all
7 failures in 733 pushes (0.01 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-inbound: 4
* autoland: 2
* mozilla-aurora: 1

Platform breakdown:
* osx-10-10: 7

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-01-30&endday=2017-02-05&tree=all
(Assignee)

Comment 21

a year ago
I'm investigating this for e10s-multi (4 processes) as this appears to happen much more frequently with 4 processes.
Assignee: nobody → mrbkap
Whiteboard: [psm-intermittent] → [psm-intermittent] [e10s-multi:?]
Summary: Intermittent Assertion failure: 0 == rv, nsprpub/pr/src/pthreads/ptthread.c:288 → Intermittent Assertion failure: 0 == rv, nsprpub/pr/src/pthreads/ptthread.c:288, PROCESS-CRASH | Main app process exited normally | application crashed [@ mozalloc_abort(char const*)]
(In reply to Blake Kaplan (:mrbkap) from comment #21)
> I'm investigating this for e10s-multi (4 processes) as this appears to
> happen much more frequently with 4 processes.

Are you sure that this is the same crash as we see on ash? Then we should dupe Bug 1340512 over this one, but to me the two crashes look a bit different (I might be missing something though).
(Assignee)

Comment 23

a year ago
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #22)
> Are you sure that this is the same crash as we see on ash? Then we should
> dupe Bug 1340512 over this one, but to me the two crashes look a bit
> different (I might be missing something though).

It looks like there are at least two crashes. I've seen this one a few times as well.
11 failures in 812 pushes (0.014 failures/push) were associated with this bug in the last 7 days.  
Repository breakdown:
* mozilla-inbound: 6
* autoland: 2
* mozilla-esr52: 1
* mozilla-central: 1
* mozilla-aurora: 1

Platform breakdown:
* osx-10-10: 11

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-02-20&endday=2017-02-26&tree=all
Iteration: --- → 54.3 - Mar 6
(Assignee)

Updated

a year ago
Blocks: 1304546
Whiteboard: [psm-intermittent] [e10s-multi:?] → [psm-intermittent] [e10s-multi:+]
21 failures in 783 pushes (0.027 failures/push) were associated with this bug in the last 7 days.  
Repository breakdown:
* autoland: 11
* mozilla-inbound: 4
* try: 2
* mozilla-aurora: 2
* mozilla-central: 1
* graphics: 1

Platform breakdown:
* osx-10-10: 21

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-02-27&endday=2017-03-05&tree=all
Iteration: 54.3 - Mar 6 → 55.1 - Mar 20
(Assignee)

Comment 26

a year ago
This appears to be due to a thread shutdown happening way too late in application shutdown -- in all of the instances of this that I've seen, the main thread is late in shutdown (oftentimes in ~nsStringStats other times running atexit-registered functions). My current strategy is to see if there is a specific thread that we are leaking too late on OSX and to fix it if so.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7299921eb98150d00b78fdfb2107a790456c97d8
glad to see work already in progress here.  Do let me know if there is help in doing try runs, bisecting data, or looking for patterns.
Whiteboard: [psm-intermittent] [e10s-multi:+] → [psm-intermittent] [e10s-multi:+][stockwell needswork]
37 failures in 790 pushes (0.047 failures/push) were associated with this bug in the last 7 days. 

This is the #48 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* autoland: 21
* mozilla-inbound: 11
* mozilla-central: 2
* try: 1
* mozilla-esr52: 1
* graphics: 1

Platform breakdown:
* osx-10-10: 37

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-03-06&endday=2017-03-12&tree=all
:mrbkap, it has been 6 days since your try push, do you have more updates?  Luckily this hasn't increased in frequency, but it is still something we determine as high frequency and would like to get fixed soon.
Flags: needinfo?(mrbkap)
(Assignee)

Comment 30

a year ago
I've been working on this pretty much full time. I've been pushing to try and debugging locally. If my current try run [1] doesn't shed more light, I'll probably try to get my hands on a loaner try machine to debug there.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=01a73b27d7a119d09da4de3fc2347a52391248bf
Flags: needinfo?(mrbkap)
(Assignee)

Comment 31

a year ago
What's happening here is that we have a thread and an associated nsThread (but the thread wasn't started via nsThread::Init!) that is lasting through shutdown. At shutdown, the OS is killing all threads forcing us to clear out the thread private data and we're apparently not joining on the thread before unloading NSPR. Because of this, releasing the nsThread (and its related data) ends up causing us to try to re-initial NSPR thread data, which eventually fails, leading to a fatal assertion.

The trick is to figure out which thread it is that we're leaking so late into shutdown and to make sure that we wait for it properly so it has a chance to shut down before the main thread.

I hope.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

a year ago
Attachment #8847676 - Flags: review?(wmccloskey)
Attachment #8847677 - Flags: review?(wmccloskey)
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.

https://reviewboard.mozilla.org/r/120592/#review122678
Attachment #8847676 - Flags: review?(wmccloskey) → review+
Comment on attachment 8847677 [details]
Bug 1208957 - No need for a condvar for thread shutdown.

https://reviewboard.mozilla.org/r/120594/#review122680
Attachment #8847677 - Flags: review?(wmccloskey) → review+

Comment 37

a year ago
Pushed by mrbkap@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/10a3d094cfc1
Join the watchdog thread to avoid shutdown races. r=billm
https://hg.mozilla.org/integration/autoland/rev/9ba55f98e3bf
No need for a condvar for thread shutdown. r=billm
Whiteboard: [psm-intermittent] [e10s-multi:+][stockwell needswork] → [psm-intermittent] [e10s-multi:+][stockwell fixed]

Comment 38

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/10a3d094cfc1
https://hg.mozilla.org/mozilla-central/rev/9ba55f98e3bf
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
*bows before mrbkap* Can we please get this across all active branches? OSX debug xpcshell hits this 20-30% of the time, so it would be *fantastic* to see it uplifted around.
status-firefox44: affected → ---
status-firefox52: --- → wontfix
status-firefox53: --- → affected
status-firefox54: --- → affected
status-firefox-esr52: --- → affected
(Assignee)

Comment 40

a year ago
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.

Approval Request Comment
[Feature/Bug causing the regression]: n/a (I suspect that bug 1323100 might have "caused" this by registering the watchdog thread with the profiler and therefore forcing creation of an nsThread for it but I haven't tested to be sure).
[User impact if declined]: None! This bug should only show up in debug builds (and probably mostly on Treeherder).
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: 
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: Despite dealing with threads, this change should not be too risky -- it's moving from a manual condvar to one in the system in order to wait for a thread to clean up after itself.
[String changes made/needed]: n/a
Attachment #8847676 - Flags: approval-mozilla-beta?
Attachment #8847676 - Flags: approval-mozilla-aurora?
28 failures in 777 pushes (0.036 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 20
* mozilla-inbound: 3
* try: 1
* mozilla-central: 1
* mozilla-beta: 1
* mozilla-aurora: 1
* graphics: 1

Platform breakdown:
* osx-10-10: 26
* linux64-stylo: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-03-13&endday=2017-03-19&tree=all
Hi :mrbkap,
According to comment #41, is that OK?
Flags: needinfo?(mrbkap)
Doesn't seem to apply to aurora:
    grafting 386317:9ba55f98e3bf "Bug 1208957 - No need for a condvar for thread shutdown. r=billm"
    merging js/xpconnect/src/XPCJSContext.cpp
    warning: conflicts while merging js/xpconnect/src/XPCJSContext.cpp! (edit, then use 'hg resolve --mark')
    abort: unresolved conflicts, can't continue
    (use 'hg resolve' and 'hg graft --continue')
(Assignee)

Comment 44

a year ago
(In reply to Gerry Chang [:gchang] from comment #42)
> Hi :mrbkap,
> According to comment #41, is that OK?

Yes, the data shows that the last failure due to this bug was on the 16th (except for a single failure on mozilla-beta on the 17th). There will be some number of failures coming from that branch until this fix eventually merges there.

(I'm leaving the ni on me to fix the merge to Aurora.)
(Assignee)

Comment 45

a year ago
Sylvestre, it appears that these patches apply cleanly to Aurora and Beta. I wonder, though, if maybe you didn't apply them in the right order. They need to be applied in the same order as they appear in comment 38 (that is: "Join the watchdog thread..." followed by "No need for a condvar...").
Flags: needinfo?(mrbkap) → needinfo?(sledru)
(Assignee)

Updated

a year ago
Attachment #8847677 - Flags: approval-mozilla-beta?
Attachment #8847677 - Flags: approval-mozilla-aurora?
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.

Fix an intermittent failure. Aurora54+ & Beta53+.
Attachment #8847676 - Flags: approval-mozilla-beta?
Attachment #8847676 - Flags: approval-mozilla-beta+
Attachment #8847676 - Flags: approval-mozilla-aurora?
Attachment #8847676 - Flags: approval-mozilla-aurora+
Attachment #8847677 - Flags: approval-mozilla-beta?
Attachment #8847677 - Flags: approval-mozilla-beta+
Attachment #8847677 - Flags: approval-mozilla-aurora?
Attachment #8847677 - Flags: approval-mozilla-aurora+
Ok, we tried with a bot, we should manage the order correctly, sorry
Flags: needinfo?(sledru)
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.

This is an extremely frequent issue on ESR52 with OSX debug xpcshell, so it would be wonderful to get it backported there as well.
Attachment #8847676 - Flags: approval-mozilla-esr52?
Attachment #8847677 - Flags: approval-mozilla-esr52?
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.

fix a race on shutdown, esr52+
Attachment #8847676 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Attachment #8847677 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Updated

10 months ago
Whiteboard: [psm-intermittent] [e10s-multi:+][stockwell fixed] → [psm-intermittent] [e10s-multi:+][stockwell fixed:product]

Comment 53

10 months ago
1 failures in 892 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 1

Platform breakdown:
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-06-19&endday=2017-06-25&tree=all

Comment 54

9 months ago
1 failures in 1008 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 1

Platform breakdown:
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-07-24&endday=2017-07-30&tree=all

Comment 55

8 months ago
2 failures in 949 pushes (0.002 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 2

Platform breakdown:
* osx-10-10: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-08-14&endday=2017-08-20&tree=all

Comment 56

8 months ago
1 failures in 939 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 1

Platform breakdown:
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-08-28&endday=2017-09-03&tree=all

Comment 57

6 months ago
1 failures in 824 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 1

Platform breakdown:
* macosx64-stylo-disabled: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-10-02&endday=2017-10-08&tree=all

Comment 58

6 months ago
1 failures in 857 pushes (0.001 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-inbound: 1

Platform breakdown:
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-10-30&endday=2017-11-05&tree=all

Comment 59

5 months ago
2 failures in 849 pushes (0.002 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* mozilla-central: 2

Platform breakdown:
* macosx64-stylo-disabled: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1208957&startday=2017-11-06&endday=2017-11-12&tree=all
Duplicate of this bug: 1202325
Duplicate of this bug: 1202044
Duplicate of this bug: 1203927
Duplicate of this bug: 1211080
Duplicate of this bug: 1219986
Duplicate of this bug: 1211082
Duplicate of this bug: 1242305
You need to log in before you can comment on or make changes to this bug.