[e10s] Random crashes of Firefox while running Marionette tests | application crashed [@ mozalloc_abort | PR_GetCurrentThread]

RESOLVED WORKSFORME

Status

NSPR
NSPR
--
critical
RESOLVED WORKSFORME
a year ago
4 months ago

People

(Reporter: Treeherder Bug Filer, Unassigned)

Tracking

({crash, intermittent-failure})

other
crash, intermittent-failure
Dependency tree / graph

Firefox Tracking Flags

(firefox51 affected, firefox52 affected)

Details

(crash signature)

(Reporter)

Description

a year ago
treeherder
Filed by: philringnalda [at] gmail.com

https://treeherder.mozilla.org/logviewer.html#?job_id=1846231&repo=autoland

https://queue.taskcluster.net/v1/task/AvN5-7cYTK-sPSi4Otf_8Q/runs/0/artifacts/public%2Flogs%2Flive_backing.log
This is a crash which happens a lot with all kinds of Marionette tests. 

Crash stack:

 00:30:54     INFO -  Crash reason:  SIGSEGV
 00:30:54     INFO -  Crash address: 0x0
 00:30:54     INFO -  Thread 3 (crashed)
 00:30:54     INFO -   0  plugin-container!mozalloc_abort [mozalloc_abort.cpp:aeadc64780e1 : 33 + 0x0]
 00:30:54     INFO -      rbx = 0x00007ff006788828   r12 = 0x00007ff00d4c3899
 00:30:54     INFO -      r13 = 0x00007ff00d4c7945   r14 = 0x00007feff52fea58
 00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x000000000040947c
 00:30:54     INFO -      rsp = 0x00007feff52fe940   rbp = 0x00007feff52fe950
 00:30:54     INFO -      Found by: given as instruction pointer in context
 00:30:54     INFO -   1  plugin-container!abort [mozalloc_abort.cpp:aeadc64780e1 : 80 + 0x4]
 00:30:54     INFO -      rbx = 0x00007ff006788828   r12 = 0x00007ff00d4c3899
 00:30:54     INFO -      r13 = 0x00007ff00d4c7945   r14 = 0x00007feff52fea58
 00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x000000000040942d
 00:30:54     INFO -      rsp = 0x00007feff52fe960   rbp = 0x00007feff52fe960
 00:30:54     INFO -      Found by: call frame info
 00:30:54     INFO -   2  libnspr4.so!PR_Assert [prlog.c:aeadc64780e1 : 553 + 0x4]
 00:30:54     INFO -      rbx = 0x00007ff006788828   r12 = 0x00007ff00d4c3899
 00:30:54     INFO -      r13 = 0x00007ff00d4c7945   r14 = 0x00007feff52fea58
 00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff00d4a9283
 00:30:54     INFO -      rsp = 0x00007feff52fe970   rbp = 0x00007feff52fe9a0
 00:30:54     INFO -      Found by: call frame info
 00:30:54     INFO -   3  libnspr4.so!PR_GetCurrentThread [ptthread.c:aeadc64780e1 : 292 + 0x1b]
 00:30:54     INFO -      rbx = 0x00007fefe49a8380   r12 = 0x0000000000000000
 00:30:54     INFO -      r13 = 0x00000000000003e8   r14 = 0x00007feff52fea58
 00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff00d4c0bd7
 00:30:54     INFO -      rsp = 0x00007feff52fe9b0   rbp = 0x00007feff52fe9c0
 00:30:54     INFO -      Found by: call frame info
 00:30:54     INFO -   4  libnspr4.so!PR_WaitCondVar [ptsynch.c:aeadc64780e1 : 363 + 0x4]
 00:30:54     INFO -      rbx = 0x00007feff54ff2c0   r12 = 0x00007feff54036a0
 00:30:54     INFO -      r13 = 0x00000000000003e8   r14 = 0x00007feff52fea58
 00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff00d4bf0b1
 00:30:54     INFO -      rsp = 0x00007feff52fe9d0   rbp = 0x00007feff52fe9f0
 00:30:54     INFO -      Found by: call frame info
 00:30:54     INFO -   5  libxul.so!Watchdog::Sleep [XPCJSRuntime.cpp:aeadc64780e1 : 1090 + 0x7]
 00:30:54     INFO -      rbx = 0x00007feff54e02f0   r12 = 0x00007feff54036a0
 00:30:54     INFO -      r13 = 0x00007feff54e02f0   r14 = 0x00007feff52fea58
 00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff008677662
 00:30:54     INFO -      rsp = 0x00007feff52fea00   rbp = 0x00007feff52fea20
 00:30:54     INFO -      Found by: call frame info
 00:30:54     INFO -   6  libxul.so!WatchdogMain [XPCJSRuntime.cpp:aeadc64780e1 : 1294 + 0xe]
 00:30:54     INFO -      rbx = 0x00007feff54e02e0   r12 = 0x00007feff54036a0
 00:30:54     INFO -      r13 = 0x00007feff54e02f0   r14 = 0x00007feff52fea58
 00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff00868bf8f
 00:30:54     INFO -      rsp = 0x00007feff52fea30   rbp = 0x00007feff52fea90
 00:30:54 INFO - Found by: call frame info 

As it looks like we fail the assertion in Watchdog::Sleep():

http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/js/xpconnect/src/XPCJSRuntime.cpp#1090

Blake or Bobby, does this crash sound familiar to you?
Severity: normal → critical
Crash Signature: [@ mozalloc_abort] [@ Watchdog::Sleep]
Flags: needinfo?(mrbkap)
Flags: needinfo?(bobbyholley)
Keywords: crash
Summary: Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages | application crashed [@ mozalloc_abort] → Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages | application crashed [@ mozalloc_abort] [@ Watchdog::Sleep]
(In reply to Henrik Skupin (:whimboo) from comment #1)
> This is a crash which happens a lot with all kinds of Marionette tests. 
> 
> Crash stack:
> 
>  00:30:54     INFO -  Crash reason:  SIGSEGV
>  00:30:54     INFO -  Crash address: 0x0
>  00:30:54     INFO -  Thread 3 (crashed)
>  00:30:54     INFO -   0  plugin-container!mozalloc_abort
> [mozalloc_abort.cpp:aeadc64780e1 : 33 + 0x0]
>  00:30:54     INFO -      rbx = 0x00007ff006788828   r12 = 0x00007ff00d4c3899
>  00:30:54     INFO -      r13 = 0x00007ff00d4c7945   r14 = 0x00007feff52fea58
>  00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x000000000040947c
>  00:30:54     INFO -      rsp = 0x00007feff52fe940   rbp = 0x00007feff52fe950
>  00:30:54     INFO -      Found by: given as instruction pointer in context
>  00:30:54     INFO -   1  plugin-container!abort
> [mozalloc_abort.cpp:aeadc64780e1 : 80 + 0x4]
>  00:30:54     INFO -      rbx = 0x00007ff006788828   r12 = 0x00007ff00d4c3899
>  00:30:54     INFO -      r13 = 0x00007ff00d4c7945   r14 = 0x00007feff52fea58
>  00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x000000000040942d
>  00:30:54     INFO -      rsp = 0x00007feff52fe960   rbp = 0x00007feff52fe960
>  00:30:54     INFO -      Found by: call frame info
>  00:30:54     INFO -   2  libnspr4.so!PR_Assert [prlog.c:aeadc64780e1 : 553
> + 0x4]
>  00:30:54     INFO -      rbx = 0x00007ff006788828   r12 = 0x00007ff00d4c3899
>  00:30:54     INFO -      r13 = 0x00007ff00d4c7945   r14 = 0x00007feff52fea58
>  00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff00d4a9283
>  00:30:54     INFO -      rsp = 0x00007feff52fe970   rbp = 0x00007feff52fe9a0
>  00:30:54     INFO -      Found by: call frame info
>  00:30:54     INFO -   3  libnspr4.so!PR_GetCurrentThread
> [ptthread.c:aeadc64780e1 : 292 + 0x1b]
>  00:30:54     INFO -      rbx = 0x00007fefe49a8380   r12 = 0x0000000000000000
>  00:30:54     INFO -      r13 = 0x00000000000003e8   r14 = 0x00007feff52fea58
>  00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff00d4c0bd7
>  00:30:54     INFO -      rsp = 0x00007feff52fe9b0   rbp = 0x00007feff52fe9c0
>  00:30:54     INFO -      Found by: call frame info
>  00:30:54     INFO -   4  libnspr4.so!PR_WaitCondVar [ptsynch.c:aeadc64780e1
> : 363 + 0x4]
>  00:30:54     INFO -      rbx = 0x00007feff54ff2c0   r12 = 0x00007feff54036a0
>  00:30:54     INFO -      r13 = 0x00000000000003e8   r14 = 0x00007feff52fea58
>  00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff00d4bf0b1
>  00:30:54     INFO -      rsp = 0x00007feff52fe9d0   rbp = 0x00007feff52fe9f0
>  00:30:54     INFO -      Found by: call frame info
>  00:30:54     INFO -   5  libxul.so!Watchdog::Sleep
> [XPCJSRuntime.cpp:aeadc64780e1 : 1090 + 0x7]
>  00:30:54     INFO -      rbx = 0x00007feff54e02f0   r12 = 0x00007feff54036a0
>  00:30:54     INFO -      r13 = 0x00007feff54e02f0   r14 = 0x00007feff52fea58
>  00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff008677662
>  00:30:54     INFO -      rsp = 0x00007feff52fea00   rbp = 0x00007feff52fea20
>  00:30:54     INFO -      Found by: call frame info
>  00:30:54     INFO -   6  libxul.so!WatchdogMain
> [XPCJSRuntime.cpp:aeadc64780e1 : 1294 + 0xe]
>  00:30:54     INFO -      rbx = 0x00007feff54e02e0   r12 = 0x00007feff54036a0
>  00:30:54     INFO -      r13 = 0x00007feff54e02f0   r14 = 0x00007feff52fea58
>  00:30:54     INFO -      r15 = 0x00000000000f226d   rip = 0x00007ff00868bf8f
>  00:30:54     INFO -      rsp = 0x00007feff52fea30   rbp = 0x00007feff52fea90
>  00:30:54 INFO - Found by: call frame info 
> 
> As it looks like we fail the assertion in Watchdog::Sleep():
> 
> http://searchfox.org/mozilla-central/rev/
> 9ec085584d7491ddbaf6574d3732c08511709172/js/xpconnect/src/XPCJSRuntime.
> cpp#1090

I don't think that's quite right. The crash stack appears to be crashing _within_ PR_GetCurrentThread. In particular, the assertion that's firing seems to be here:

http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/nsprpub/pr/src/pthreads/ptthread.c#292

Which gets called from here:

http://searchfox.org/mozilla-central/rev/9ec085584d7491ddbaf6574d3732c08511709172/nsprpub/pr/src/pthreads/ptthread.c#656

That seems to suggest that the NSPR thread data is missing from the slot, which might indicate that this is running after shutdown or something? I can't really tell because the main thread isn't symbolicated.
Flags: needinfo?(bobbyholley)
A regression window would help.
Crash Signature: [@ mozalloc_abort] [@ Watchdog::Sleep] → [@ mozalloc_abort] [@ PR_GetCurrentThread]
Summary: Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages | application crashed [@ mozalloc_abort] [@ Watchdog::Sleep] → Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages | application crashed [@ mozalloc_abort] [@ PR_GetCurrentThread]
The gecko.log actually has an assertion shown which mentions something about shutdown:

https://public-artifacts.taskcluster.net/AvN5-7cYTK-sPSi4Otf_8Q/0/public/test_info//gecko.log

WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
[Child 1228] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /home/worker/workspace/build/src/xpcom/build/XPCOMInit.cpp, line 1020
Leaked URLs:
  file:///home/worker/workspace/build/application/firefox/omni.ja
  file:///home/worker/workspace/build/application/firefox/browser/omni.ja
  x:///chrome/pdfjs/content/
  file:///home/worker/workspace/build/application/firefox/omni.ja
  x:///modules/services-crypto/
  file:///home/worker/workspace/build/application/firefox/omni.ja
  x:///chrome/toolkit/res/
  file:///home/worker/workspace/build/application/firefox/omni.ja
  x:///modules/services-common/
  chrome://browser/locale/searchplugins/
  file:///home/worker/workspace/build/application/firefox/browser/omni.ja
  x:///chrome/devtools/modules/devtools/
  file:///home/worker/workspace/build/application/firefox/omni.ja
  x:///modules/services-sync/
  chrome://pluginproblem/content/pluginFinderBinding.css
  chrome://pluginproblem/content/pluginProblemBinding.css
  resource://gre-resources/counterstyles.css
  chrome://global/content/minimal-xul.css
  resource://gre-resources/quirk.css
  resource://gre/res/svg.css
  chrome://global/content/xul.css
  chrome://global/skin/scrollbars.css
  resource://gre-resources/number-control.css
  resource://gre-resources/forms.css
  resource://gre-resources/noscript.css
  resource://gre-resources/html.css
  resource://gre-resources/ua.css
  http://127.0.0.1:51431/javascriptPage.html
  http://127.0.0.1:51431/javascriptPage.html
  http://127.0.0.1:51431/javascriptPage.html
  http://127.0.0.1:51431/javascriptPage.html
  http://127.0.0.1:51431/map.png
  chrome://global/content/bindings/scrollbar.xml#scrollbar
  chrome://global/content/bindings/scrollbar.xml
  x:///chrome/toolkit/content/global/bindings/scrollbar.xml
  chrome://global/content/bindings/scrollbar.xml
  chrome://global/content/bindings/scrollbar.xml#thumb
  chrome://global/content/bindings/scrollbar.xml
  chrome://global/content/bindings/scrollbar.xml#scrollbar-base
  chrome://global/content/bindings/scrollbar.xml#scrollbar
  chrome://global/content/bindings/scrollbar.xml#scrollbar-base
  chrome://global/skin/scrollbar/slider.gif
  chrome://global/content/bindings/scrollbar.xml#thumb
  chrome://global/content/bindings/scrollbar.xml#scrollbar-base
  http://127.0.0.1:51431/javascriptPage.html#throwing-mouseover
  http://127.0.0.1:51431/javascriptPage.html#
  http://127.0.0.1:51431/javascriptPage.html#
  http://127.0.0.1:51431/javascriptPage.html#
  http://127.0.0.1:51431/javascriptPage.html#
  http://127.0.0.1:51431/javascriptPage.html#
  chrome://global/content/platformHTMLBindings.xml#inputFields
  chrome://global/content/platformHTMLBindings.xml
  x:///chrome/toolkit/content/global/platformHTMLBindings.xml
  chrome://global/content/platformHTMLBindings.xml
  chrome://global/content/platformHTMLBindings.xml#inputFields
  chrome://global/content/platformHTMLBindings.xml
  chrome://global/content/platformHTMLBindings.xml#textAreas
  chrome://global/content/platformHTMLBindings.xml#browser
  chrome://global/content/platformHTMLBindings.xml#editor
  chrome://global/content/platformHTMLBindings.xml#textAreas
  chrome://global/content/bindings/resizer.xml#resizer
  chrome://global/content/bindings/resizer.xml
  x:///chrome/toolkit/content/global/bindings/resizer.xml
  chrome://global/content/bindings/resizer.xml
  chrome://global/content/bindings/resizer.xml#resizer
  chrome://global/content/bindings/resizer.xml
  chrome://global/skin/resizer.css
  chrome://global/skin/icons/resizer.png
  resource://gre-resources/arrow.gif
  resource://gre-resources/loading-image.png
  resource://gre-resources/broken-image.png
  resource://gre-resources/loading-image.png
  resource://gre-resources/broken-image.png
  http://127.0.0.1:51431/javascriptPage.html
  http://127.0.0.1:51431/javascriptPage.html
  http://127.0.0.1:51431/javascriptPage.html
  http://127.0.0.1:51431/javascriptPage.html
  http://127.0.0.1:51431/map.png
  http://127.0.0.1:51431/javascriptPage.html#
  http://127.0.0.1:51431/javascriptPage.html#
  http://127.0.0.1:51431/javascriptPage.html#
  http://127.0.0.1:51431/javascriptPage.html#throwing-mouseover
  http://127.0.0.1:51431/javascriptPage.html#
  http://127.0.0.1:51431/javascriptPage.html#
[Child 1228] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
[Child 1228] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
nsStringStats

Then we hit another regression and abort:

Assertion failure: 0 == rv, at /home/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:292
Redirecting call to abort() to mozalloc_abort

Hit MOZ_CRASH() at /home/worker/workspace/build/src/memory/mozalloc/mozalloc_abort.cpp:33

The "0 == rv" assertion looks to be the same as what we have for bug 1252222.

Maybe the above information helps? So far the crash volume is very low (maybe even happened only once) so Orange Factor doesn't track it yet.
Crashstats lists three crashes so far for users (in case those are related):

https://crash-stats.mozilla.com/signature/?signature=PR_GetCurrentThread&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports
Component: Marionette → NSPR
Product: Testing → NSPR
Version: Version 3 → other
Summary: Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages | application crashed [@ mozalloc_abort] [@ PR_GetCurrentThread] → Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages, test_favicon_in_autocomplete.py TestFaviconInAutocomplete.test_favicon_in_autocomplete | application crashed [@ mozalloc_abort] [@ PR_GetCurrentThread]
See Also: → bug 1290294
See Also: → bug 1285749
It looks like maybe a test (or something in the Marionette harness?) could be causing us to leak and then run JS later too late into shutdown. Outside of that, it'll be hard to debug this with only the information we have at hand.
Flags: needinfo?(mrbkap)
Is there any kind of loggin which could be enabled to get more information about these specific data?
Summary: Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages, test_favicon_in_autocomplete.py TestFaviconInAutocomplete.test_favicon_in_autocomplete | application crashed [@ mozalloc_abort] [@ PR_GetCurrentThread] → Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages, test_favicon_in_autocomplete.py TestFaviconInAutocomplete.test_favicon_in_autocomplete | application crashed [@ mozalloc_abort | PR_GetCurrentThread]
Duplicate of this bug: 1285749
Blocks: 1285749
Blocks: 1304257
See Also: → bug 1308760
See Also: → bug 1308624
See Also: → bug 1309085
See Also: → bug 1309061
Bobby, we are getting this crash way more often now. Please see all the related see also links to other bugs. Given that it is not crashing permanently it's hard to find a regression window. Would there be a chance for someone to have a look at it?
Flags: needinfo?(bobbyholley)
(In reply to Henrik Skupin (:whimboo) from comment #9)
> Bobby, we are getting this crash way more often now. Please see all the
> related see also links to other bugs. Given that it is not crashing
> permanently it's hard to find a regression window. Would there be a chance
> for someone to have a look at it?

I personally don't have cycles to investigate this either now or in the new future.

I also don't think we have any particular reason to believe this is XPConnect-related (though I could be wrong). To make progress here we need a platform hacker with time to investigate.
Flags: needinfo?(bobbyholley)
Bobby, do you have a recommendation who might know about that or at least could help with finding another person?
I actually found a crash for a debug build which might be helpful:
https://treeherder.mozilla.org/logviewer.html#?job_id=3813325&repo=mozilla-aurora

Checking the appropriate gecko.log I can see an assertion which finally is followed by the crash:

http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-aurora/sha512/f638a9b4731b84881b60164c9c40bb5860302c694af626a4c8503e239635209df5055fb5d39684ad071a969a18df6e1b3a1904a22e4e84fc14c4c406c7c010a5

[Child 1978] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/slave/m-aurora-lx-d-0000000000000000/build/src/xpcom/threads/nsThread.cpp, line 920
[..]
WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
[Child 1978] ###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /builds/slave/m-aurora-lx-d-0000000000000000/build/src/xpcom/build/XPCOMInit.cpp, line 1050
[Child 1978] WARNING: '!compMgr', file /builds/slave/m-aurora-lx-d-0000000000000000/build/src/xpcom/glue/nsComponentManagerUtils.cpp, line 63
Assertion failure: 0 == rv, at /builds/slave/m-aurora-lx-d-0000000000000000/build/src/nsprpub/pr/src/pthreads/ptthread.c:292
Redirecting call to abort() to mozalloc_abort
Hit MOZ_CRASH() at /builds/slave/m-aurora-lx-d-0000000000000000/build/src/memory/mozalloc/mozalloc_abort.cpp:33

The assertion is interestingly the same as what we have seen on bug 1252222 and others.

Maybe that helps a bit?
(In reply to Henrik Skupin (:whimboo) from comment #12)
> 
> Maybe that helps a bit?

That certainly lends credence to comment 6.

(In reply to Henrik Skupin (:whimboo) from comment #11)
> Bobby, do you have a recommendation who might know about that or at least
> could help with finding another person?'

People on njn's team, overholt's team, or various other low-level gecko hackers (like mccr8) would fit the bill. Given that it doesn't fall into anyone's particular domain, I would recommend getting manager buy-in from someone (like njn or overholt) about investigating it, and let them help you resource it.

Finding a consistent way to reproduce the bug would help a lot.
See Also: → bug 1311603
See Also: → bug 1311583
Crash Signature: [@ mozalloc_abort] [@ PR_GetCurrentThread] → [@ mozalloc_abort | PR_GetCurrentThread]
Summary: Intermittent test_about_pages.py TestAboutPages.test_navigate_non_remote_about_pages, test_favicon_in_autocomplete.py TestFaviconInAutocomplete.test_favicon_in_autocomplete | application crashed [@ mozalloc_abort | PR_GetCurrentThread] → Random crashes of Firefox while running Marionette tests | application crashed [@ mozalloc_abort | PR_GetCurrentThread]
I checked the OrangeFactor links for all dependent bugs and it looks like this is an e10s only issue, which seems to go back to Firefox 51.0a1.

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #13)
> People on njn's team, overholt's team, or various other low-level gecko
> hackers (like mccr8) would fit the bill. Given that it doesn't fall into
> anyone's particular domain, I would recommend getting manager buy-in from
> someone (like njn or overholt) about investigating it, and let them help you
> resource it.

Nicholas or Andrew, is there anything we can do for this particular crasher? It randomly affects us kinda often for Marionette related test suites. Maybe you can give some hints or proposals in what else we could enable to give more details output.
status-firefox51: --- → affected
status-firefox52: --- → affected
Flags: needinfo?(overholt)
Flags: needinfo?(n.nethercote)
Summary: Random crashes of Firefox while running Marionette tests | application crashed [@ mozalloc_abort | PR_GetCurrentThread] → [e10s] Random crashes of Firefox while running Marionette tests | application crashed [@ mozalloc_abort | PR_GetCurrentThread]
I don't have any particular suggestions, sorry.
Flags: needinfo?(n.nethercote)
Sorry for the delay. I've thought about this and asked around and my guesses are needinfo'd.
Flags: needinfo?(ted)
Flags: needinfo?(overholt)
Flags: needinfo?(jld)
Flags: needinfo?(amarchesini)
Could this be happening after a thread has already called exit() and started to run destructors, so that PR_GetCurrentThread is racing _PR_Fini and the pthread specific key is already destroyed?
Flags: needinfo?(jld)
That sounds like as plausible a theory as anything. I have bug 1282862 open in another tab, which has a similarly inscrutable crash that seems to be happening at shutdown, which might have the same root cause.
Flags: needinfo?(ted)
Blocks: 1314877
I just got an instance of this on a Try run, and the this time the crash report actually shows a thread in the middle of exit() running destructors: https://treeherder.mozilla.org/logviewer.html#?job_id=30866804&repo=try#L4090
Blocks: 1276662
Blocks: 1252222
Blocks: 1320601
Blocks: 1291973
All the reported test failures as marked as dependent on this bug, have gone away. There is no more crash in the past couple of months. So this issue should have been fixed by some other already landed patch. Closing as WFM.
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Flags: needinfo?(amarchesini)
Resolution: --- → WORKSFORME
Blocks: 1299179
You need to log in before you can comment on or make changes to this bug.