Closed
Bug 1208957
Opened 9 years ago
Closed 8 years ago
Intermittent Assertion failure: 0 == rv, nsprpub/pr/src/pthreads/ptthread.c:288, PROCESS-CRASH | Main app process exited normally | application crashed [@ mozalloc_abort(char const*)]
Categories
(Core :: Security: PSM, defect, P3)
Core
Security: PSM
Tracking
()
People
(Reporter: nigelb, Assigned: mrbkap)
References
Details
(Keywords: intermittent-failure, Whiteboard: [psm-intermittent] [e10s-multi:+][stockwell fixed:product])
Attachments
(2 files)
59 bytes,
text/x-review-board-request
|
billm
:
review+
gchang
:
approval-mozilla-aurora+
gchang
:
approval-mozilla-beta+
jcristau
:
approval-mozilla-esr52+
|
Details |
59 bytes,
text/x-review-board-request
|
billm
:
review+
gchang
:
approval-mozilla-aurora+
gchang
:
approval-mozilla-beta+
jcristau
:
approval-mozilla-esr52+
|
Details |
No description provided.
Reporter | ||
Updated•9 years ago
|
Component: General → Security: PSM
Product: Firefox → Core
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 9•9 years ago
|
||
Mass whiteboard change to annotate PSM intermittent test failures as [psm-intermittent]. Filter on 31b932bd-1aad-4e29-9f4b-4cd864a3ffdc if that's important to you.
Whiteboard: [psm-intermittent]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 12•8 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 21•8 years ago
|
||
I'm investigating this for e10s-multi (4 processes) as this appears to happen much more frequently with 4 processes.
Assignee: nobody → mrbkap
Whiteboard: [psm-intermittent] → [psm-intermittent] [e10s-multi:?]
Updated•8 years ago
|
Summary: Intermittent Assertion failure: 0 == rv, nsprpub/pr/src/pthreads/ptthread.c:288 → Intermittent Assertion failure: 0 == rv, nsprpub/pr/src/pthreads/ptthread.c:288, PROCESS-CRASH | Main app process exited normally | application crashed [@ mozalloc_abort(char const*)]
Comment 22•8 years ago
|
||
(In reply to Blake Kaplan (:mrbkap) from comment #21)
> I'm investigating this for e10s-multi (4 processes) as this appears to
> happen much more frequently with 4 processes.
Are you sure that this is the same crash as we see on ash? Then we should dupe Bug 1340512 over this one, but to me the two crashes look a bit different (I might be missing something though).
Assignee | ||
Comment 23•8 years ago
|
||
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #22)
> Are you sure that this is the same crash as we see on ash? Then we should
> dupe Bug 1340512 over this one, but to me the two crashes look a bit
> different (I might be missing something though).
It looks like there are at least two crashes. I've seen this one a few times as well.
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Iteration: --- → 54.3 - Mar 6
Assignee | ||
Updated•8 years ago
|
Blocks: e10s-multi-aurora
Whiteboard: [psm-intermittent] [e10s-multi:?] → [psm-intermittent] [e10s-multi:+]
Comment hidden (Intermittent Failures Robot) |
Updated•8 years ago
|
Iteration: 54.3 - Mar 6 → 55.1 - Mar 20
Assignee | ||
Comment 26•8 years ago
|
||
This appears to be due to a thread shutdown happening way too late in application shutdown -- in all of the instances of this that I've seen, the main thread is late in shutdown (oftentimes in ~nsStringStats other times running atexit-registered functions). My current strategy is to see if there is a specific thread that we are leaking too late on OSX and to fix it if so.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7299921eb98150d00b78fdfb2107a790456c97d8
Comment 27•8 years ago
|
||
glad to see work already in progress here. Do let me know if there is help in doing try runs, bisecting data, or looking for patterns.
Whiteboard: [psm-intermittent] [e10s-multi:+] → [psm-intermittent] [e10s-multi:+][stockwell needswork]
Comment hidden (Intermittent Failures Robot) |
Comment 29•8 years ago
|
||
:mrbkap, it has been 6 days since your try push, do you have more updates? Luckily this hasn't increased in frequency, but it is still something we determine as high frequency and would like to get fixed soon.
Flags: needinfo?(mrbkap)
Assignee | ||
Comment 30•8 years ago
|
||
I've been working on this pretty much full time. I've been pushing to try and debugging locally. If my current try run [1] doesn't shed more light, I'll probably try to get my hands on a loaner try machine to debug there.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=01a73b27d7a119d09da4de3fc2347a52391248bf
Flags: needinfo?(mrbkap)
Assignee | ||
Comment 31•8 years ago
|
||
What's happening here is that we have a thread and an associated nsThread (but the thread wasn't started via nsThread::Init!) that is lasting through shutdown. At shutdown, the OS is killing all threads forcing us to clear out the thread private data and we're apparently not joining on the thread before unloading NSPR. Because of this, releasing the nsThread (and its related data) ends up causing us to try to re-initial NSPR thread data, which eventually fails, leading to a fatal assertion.
The trick is to figure out which thread it is that we're leaking so late into shutdown and to make sure that we wait for it properly so it has a chance to shut down before the main thread.
I hope.
Assignee | ||
Comment 32•8 years ago
|
||
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8847676 -
Flags: review?(wmccloskey)
Attachment #8847677 -
Flags: review?(wmccloskey)
Comment 35•8 years ago
|
||
mozreview-review |
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.
https://reviewboard.mozilla.org/r/120592/#review122678
Attachment #8847676 -
Flags: review?(wmccloskey) → review+
Comment 36•8 years ago
|
||
mozreview-review |
Comment on attachment 8847677 [details]
Bug 1208957 - No need for a condvar for thread shutdown.
https://reviewboard.mozilla.org/r/120594/#review122680
Attachment #8847677 -
Flags: review?(wmccloskey) → review+
Comment 37•8 years ago
|
||
Pushed by mrbkap@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/10a3d094cfc1
Join the watchdog thread to avoid shutdown races. r=billm
https://hg.mozilla.org/integration/autoland/rev/9ba55f98e3bf
No need for a condvar for thread shutdown. r=billm
Updated•8 years ago
|
Whiteboard: [psm-intermittent] [e10s-multi:+][stockwell needswork] → [psm-intermittent] [e10s-multi:+][stockwell fixed]
Comment 38•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/10a3d094cfc1
https://hg.mozilla.org/mozilla-central/rev/9ba55f98e3bf
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 39•8 years ago
|
||
*bows before mrbkap* Can we please get this across all active branches? OSX debug xpcshell hits this 20-30% of the time, so it would be *fantastic* to see it uplifted around.
status-firefox44:
affected → ---
status-firefox52:
--- → wontfix
status-firefox53:
--- → affected
status-firefox54:
--- → affected
status-firefox-esr52:
--- → affected
Assignee | ||
Comment 40•8 years ago
|
||
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.
Approval Request Comment
[Feature/Bug causing the regression]: n/a (I suspect that bug 1323100 might have "caused" this by registering the watchdog thread with the profiler and therefore forcing creation of an nsThread for it but I haven't tested to be sure).
[User impact if declined]: None! This bug should only show up in debug builds (and probably mostly on Treeherder).
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]:
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: n/a
[Is the change risky?]: Despite dealing with threads, this change should not be too risky -- it's moving from a manual condvar to one in the system in order to wait for a thread to clean up after itself.
[String changes made/needed]: n/a
Attachment #8847676 -
Flags: approval-mozilla-beta?
Attachment #8847676 -
Flags: approval-mozilla-aurora?
Comment hidden (Intermittent Failures Robot) |
Comment 43•8 years ago
|
||
Doesn't seem to apply to aurora:
grafting 386317:9ba55f98e3bf "Bug 1208957 - No need for a condvar for thread shutdown. r=billm"
merging js/xpconnect/src/XPCJSContext.cpp
warning: conflicts while merging js/xpconnect/src/XPCJSContext.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Assignee | ||
Comment 44•8 years ago
|
||
(In reply to Gerry Chang [:gchang] from comment #42)
> Hi :mrbkap,
> According to comment #41, is that OK?
Yes, the data shows that the last failure due to this bug was on the 16th (except for a single failure on mozilla-beta on the 17th). There will be some number of failures coming from that branch until this fix eventually merges there.
(I'm leaving the ni on me to fix the merge to Aurora.)
Assignee | ||
Comment 45•8 years ago
|
||
Sylvestre, it appears that these patches apply cleanly to Aurora and Beta. I wonder, though, if maybe you didn't apply them in the right order. They need to be applied in the same order as they appear in comment 38 (that is: "Join the watchdog thread..." followed by "No need for a condvar...").
Flags: needinfo?(mrbkap) → needinfo?(sledru)
Assignee | ||
Updated•8 years ago
|
Attachment #8847677 -
Flags: approval-mozilla-beta?
Attachment #8847677 -
Flags: approval-mozilla-aurora?
Comment 46•8 years ago
|
||
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.
Fix an intermittent failure. Aurora54+ & Beta53+.
Attachment #8847676 -
Flags: approval-mozilla-beta?
Attachment #8847676 -
Flags: approval-mozilla-beta+
Attachment #8847676 -
Flags: approval-mozilla-aurora?
Attachment #8847676 -
Flags: approval-mozilla-aurora+
Updated•8 years ago
|
Attachment #8847677 -
Flags: approval-mozilla-beta?
Attachment #8847677 -
Flags: approval-mozilla-beta+
Attachment #8847677 -
Flags: approval-mozilla-aurora?
Attachment #8847677 -
Flags: approval-mozilla-aurora+
Comment 47•8 years ago
|
||
bugherder uplift |
Comment 48•8 years ago
|
||
bugherder uplift |
Comment 49•8 years ago
|
||
Ok, we tried with a bot, we should manage the order correctly, sorry
Flags: needinfo?(sledru)
Comment 50•8 years ago
|
||
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.
This is an extremely frequent issue on ESR52 with OSX debug xpcshell, so it would be wonderful to get it backported there as well.
Attachment #8847676 -
Flags: approval-mozilla-esr52?
Updated•8 years ago
|
Attachment #8847677 -
Flags: approval-mozilla-esr52?
Comment 51•8 years ago
|
||
Comment on attachment 8847676 [details]
Bug 1208957 - Join the watchdog thread to avoid shutdown races.
fix a race on shutdown, esr52+
Attachment #8847676 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Updated•8 years ago
|
Attachment #8847677 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment 52•8 years ago
|
||
bugherder uplift |
Updated•8 years ago
|
Whiteboard: [psm-intermittent] [e10s-multi:+][stockwell fixed] → [psm-intermittent] [e10s-multi:+][stockwell fixed:product]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•