Closed Bug 845190 Opened 11 years ago Closed 11 years ago

Intermittent test_645970.js,test_logging.js,test_startup_session_sync.js,test_startup_nosession_sync.js | application crashed [@ mozalloc_abort(char const*)] from ASSERTION: thread pool wasn't shutdown

Categories

(Core :: DOM: Workers, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox25 --- wontfix
firefox26 --- wontfix
firefox27 --- fixed
firefox-esr24 --- wontfix

People

(Reporter: philor, Assigned: Yoric)

References

(Depends on 1 open bug)

Details

(Keywords: assertion, crash, intermittent-failure)

Attachments

(1 file, 16 obsolete files)

2.06 KB, patch
Yoric
: review+
Details | Diff | Splinter Review
https://tbpl.mozilla.org/php/getParsedLog.php?id=20090551&tree=Mozilla-Inbound
Ubuntu 12.04 mozilla-inbound debug test xpcshell on 2013-02-25 20:36:43 PST for push 70814e7dbcb8
slave: tst-linux32-ec2-056

21:46:56     INFO -  TEST-PASS | /builds/slave/test/build/tests/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_645970.js | [run_test : 23] [xpconnect wrapped nsISearchEngine @ 0x9e367e8 (native @ 0x9e365b8)] != null
21:46:56     INFO -  *** Search: SRCH_SVC_saveSortedEngineList: starting
21:46:56     INFO -  TEST-INFO | (xpcshell/head.js) | test 1 finished
21:46:56     INFO -  TEST-INFO | (xpcshell/head.js) | exiting test
21:46:56     INFO -  TEST-PASS | (xpcshell/head.js) | 1 (+ 1) check(s) passed
21:46:56     INFO -  TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
21:46:56     INFO -  WARNING: nsExceptionService ignoring thread destruction after shutdown: file ../../../xpcom/base/nsExceptionService.cpp, line 167
21:46:56     INFO -  ###!!! ASSERTION: thread pool wasn't shutdown: '!mPool', file ../../../../netwerk/base/src/nsStreamTransportService.cpp, line 435
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x003425C9]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x00342674]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x003F948F]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x015C46E6]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x01608089]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x016050B1]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x015CA3AF]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x01605B47]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x01605BB4]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/libxul.so +0x015CFB67]
21:46:56     INFO -  NS_ShutdownXPCOM+0x0000001A [/builds/slave/test/build/application/firefox/libxpcom.so +0x00001DD6]
21:46:56     INFO -  UNKNOWN [/builds/slave/test/build/application/firefox/xpcshell +0x00009388]
21:46:56     INFO -  ###!!! ASSERTION: thread pool wasn't shutdown: '!mPool', file ../../../../netwerk/base/src/nsStreamTransportService.cpp, line 435
21:46:56     INFO -  <<<<<<<
21:47:16     INFO -  PROCESS-CRASH | /builds/slave/test/build/tests/xpcshell/tests/toolkit/components/search/tests/xpcshell/test_645970.js | application crashed [@ mozalloc_abort(char const*)]
21:47:16     INFO -  Crash dump filename: /builds/slave/test/build/tests/xpcshell/tests/toolkit/components/search/tests/xpcshell/24d47b23-07e3-f49d-4b785bfa-18f81c59.dmp
21:47:16     INFO -  Operating system: Linux
21:47:16     INFO -                    0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686
21:47:16     INFO -  CPU: x86
21:47:16     INFO -       GenuineIntel family 6 model 26 stepping 5
21:47:16     INFO -       1 CPU
21:47:16     INFO -  Crash reason:  SIGSEGV
21:47:16     INFO -  Crash address: 0x0
21:47:16     INFO -  Thread 0 (crashed)
21:47:16     INFO -   0  libmozalloc.so!mozalloc_abort(char const*) [mozalloc_abort.cpp : 30 + 0x0]
21:47:16     INFO -      eip = 0xb7722f0d   esp = 0xbfa13ec0   ebp = 0xbfa13ed8   ebx = 0xb7724120
21:47:16     INFO -      esi = 0xb4868d9c   edi = 0xbfa13f08   eax = 0x0000000a   ecx = 0xffffffff
21:47:16     INFO -      edx = 0xb48698ac   efl = 0x00210282
21:47:16     INFO -      Found by: given as instruction pointer in context
21:47:16     INFO -   1  libxul.so!NS_DebugBreak_P [nsDebugImpl.cpp:70814e7dbcb8 : 422 + 0x5]
21:47:16     INFO -      eip = 0xb5ffd692   esp = 0xbfa13ee0   ebp = 0xbfa14318
21:47:16     INFO -      Found by: previous frame's frame pointer
21:47:16     INFO -   2  libxul.so!nsStreamTransportService::~nsStreamTransportService [nsStreamTransportService.cpp:70814e7dbcb8 : 435 + 0x23]
21:47:16     INFO -      eip = 0xb4d265c9   esp = 0xbfa14320   ebp = 0xbfa14348
21:47:16     INFO -      Found by: previous frame's frame pointer
21:47:16     INFO -   3  libxul.so!nsStreamTransportService::Release() [nsStreamTransportService.cpp:70814e7dbcb8 : 457 + 0x5]
21:47:16     INFO -      eip = 0xb4d26674   esp = 0xbfa14350   ebp = 0xbfa14378
21:47:16     INFO -      Found by: previous frame's frame pointer
21:47:16     INFO -   4  libxul.so!nsCOMPtr_base::assign_assuming_AddRef(nsISupports*) [nsCOMPtr.h:70814e7dbcb8 : 442 + 0x7]
21:47:16     INFO -      eip = 0xb4ddd48f   esp = 0xbfa14380   ebp = 0xbfa143a8
21:47:16     INFO -      Found by: previous frame's frame pointer
21:47:16     INFO -   5  libxul.so!nsCOMPtr_base::assign_with_AddRef(nsISupports*) [nsCOMPtr.cpp:70814e7dbcb8 : 49 + 0x8]
21:47:16     INFO -      eip = 0xb5fa86e6   esp = 0xbfa143b0   ebp = 0xbfa143c8
21:47:16     INFO -      Found by: previous frame's frame pointer
21:47:16     INFO -   6  libxul.so!FreeFactoryEntries [nsCOMPtr.h:70814e7dbcb8 : 947 + 0x9]
21:47:16     INFO -      eip = 0xb5fec089   esp = 0xbfa143d0   ebp = 0xbfa143e8
21:47:16     INFO -      Found by: previous frame's frame pointer
21:47:16     INFO -   7  libxul.so!nsBaseHashtable<nsIDHashKey, nsFactoryEntry*, nsFactoryEntry*>::s_EnumReadStub(PLDHashTable*, PLDHashEntryHdr*, unsigned int, void*) [nsBaseHashtable.h:70814e7dbcb8 : 400 + 0x2]
21:47:16     INFO -      eip = 0xb5fe90b1   esp = 0xbfa143f0   ebp = 0xbfa14408
21:47:16     INFO -      Found by: previous frame's frame pointer
21:47:16     INFO -   8  libxul.so!PL_DHashTableEnumerate [pldhash.cpp:70814e7dbcb8 : 717 + 0x11]
21:47:16     INFO -      eip = 0xb5fae3af   esp = 0xbfa14410   ebp = 0xbfa14458
21:47:16     INFO -      Found by: previous frame's frame pointer
josh, jduell: this is one of our most frequent intermittent orange bugs. We're tripping an assertion in nsStreamTransportService:
###!!! ASSERTION: thread pool wasn't shutdown: '!mPool', file ../../../../netwerk/base/src/nsStreamTransportService.cpp, line 435

in this xpcshell test:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/search/tests/xpcshell/test_645970.js

That test is manually sending out a "quit-application" observer service notification, so I wonder if it's screwing up nsStreamTransportService's expectations around shutdown ordering?

Can you take a look at this (or find someone else who can)?
Josh: can you get someone to take a look at this? See comment 103 for my brief synopsis.
Flags: needinfo?(joshmoz)
Test disabled for Linux debug for now, until we get get responses to comment 103 / comment 115):
https://hg.mozilla.org/integration/mozilla-inbound/rev/509b300bda68
Whiteboard: [test disabled on Linux debug][leave open]
Looking into an owner now.
Flags: needinfo?(joshmoz)
Flags: needinfo?(joshmoz)
Flags: needinfo?(joshmoz)
nsStreamTransportService gets first instantiated after NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID.


When doing check-interactive of the failing test, after test has run and I type quit() I'm getting the following stack (windows):

 	xul.dll!nsStreamTransportService::Init()  Line 441	C++
 	xul.dll!nsStreamTransportServiceConstructor(nsISupports * aOuter=0x00000000, const nsID & aIID={...}, void * * aResult=0x0034f568)  Line 67 + 0x79 bytes	C++
 	xul.dll!mozilla::GenericFactory::CreateInstance(nsISupports * aOuter=0x00000000, const nsID & aIID={...}, void * * aResult=0x0034f568)  Line 16 + 0x14 bytes	C++
 	xul.dll!nsComponentManagerImpl::CreateInstance(const nsID & aClass={...}, nsISupports * aDelegate=0x00000000, const nsID & aIID={...}, void * * aResult=0x0034f568)  Line 1009 + 0x25 bytes	C++
 	xul.dll!nsComponentManagerImpl::GetService(const nsID & aClass={...}, const nsID & aIID={...}, void * * result=0x0034f5c0)  Line 1257 + 0x34 bytes	C++
 	xul.dll!CallGetService(const nsID & aCID={...}, const nsID & aIID={...}, void * * aResult=0x0034f5c0)  Line 52	C++
 	xul.dll!nsGetServiceByCIDWithError::operator()(const nsID & aIID={...}, void * * aInstancePtr=0x0034f5c0)  Line 234 + 0x13 bytes	C++
 	xul.dll!nsCOMPtr<nsIStreamTransportService>::assign_from_gs_cid_with_error(const nsGetServiceByCIDWithError & gs={...}, const nsID & aIID={...})  Line 1193 + 0xf bytes	C++
 	xul.dll!nsCOMPtr<nsIStreamTransportService>::nsCOMPtr<nsIStreamTransportService>(const nsGetServiceByCIDWithError & gs={...})  Line 581	C++
 	xul.dll!nsInputStreamPump::AsyncRead(nsIStreamListener * listener=0x0427aa98, nsISupports * ctxt=0x00000000)  Line 307	C++
 	xul.dll!nsBaseChannel::BeginPumpingData()  Line 223 + 0x3a bytes	C++
 	xul.dll!nsBaseChannel::AsyncOpen(nsIStreamListener * listener=0x04a6c988, nsISupports * ctxt=0x02b61480)  Line 600 + 0xb bytes	C++
>	xul.dll!`anonymous namespace'::ScriptLoaderRunnable::RunInternal()  Line 324 + 0x34 bytes	C++
 	xul.dll!`anonymous namespace'::ScriptLoaderRunnable::Run()  Line 159 + 0xb bytes	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait=false, bool * result=0x0034f7ff)  Line 627 + 0x19 bytes	C++
 	xul.dll!NS_ProcessPendingEvents_P(nsIThread * thread=0x01d10750, unsigned int timeout=4294967295)  Line 188 + 0x14 bytes	C++
 	xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager * servMgr=0x00000000)  Line 564	C++
 	xul.dll!NS_ShutdownXPCOM_P(nsIServiceManager * servMgr=0x00000000)  Line 519 + 0x9 bytes	C++
 	xpcom.dll!NS_ShutdownXPCOM(nsIServiceManager * svcMgr=0x00000000)  Line 134 + 0xa bytes	C++
 	xpcshell.exe!NS_internal_main(int argc=24, char * * argv=0x01c4f6bc, char * * envp=0x01c431b8)  Line 1973 + 0x8 bytes	C++
 	xpcshell.exe!wmain(int argc=31, wchar_t * * argv=0x01cfbc68)  Line 109 + 0x16 bytes	C++


This is just before NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID is fired.  I can also see the following stack (events fired from inside call to NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID notification:



>	xul.dll!`anonymous namespace'::ScriptLoaderRunnable::RunInternal()  Line 243	C++
 	xul.dll!`anonymous namespace'::ScriptLoaderRunnable::Run()  Line 159 + 0xb bytes	C++
 	xul.dll!nsThread::ProcessNextEvent(bool mayWait=true, bool * result=0x0034f53b)  Line 627 + 0x19 bytes	C++
 	xul.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x01d10750, bool mayWait=true)  Line 238 + 0x17 bytes	C++
 	xul.dll!mozilla::dom::workers::RuntimeService::Cleanup()  Line 1192 + 0xe bytes	C++
 	xul.dll!mozilla::dom::workers::RuntimeService::Observe(nsISupports * aSubject=0x00000000, const char * aTopic=0x556e7cdc, const wchar_t * aData=0x00000000)  Line 1417	C++
 	xul.dll!nsObserverList::NotifyObservers(nsISupports * aSubject=0x00000000, const char * aTopic=0x556e7cdc, const wchar_t * someData=0x00000000)  Line 100	C++
 	xul.dll!nsObserverService::NotifyObservers(nsISupports * aSubject=0x00000000, const char * aTopic=0x556e7cdc, const wchar_t * someData=0x00000000)  Line 164	C++
 	xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager * servMgr=0x00000000)  Line 570	C++
 	xul.dll!NS_ShutdownXPCOM_P(nsIServiceManager * servMgr=0x00000000)  Line 519 + 0x9 bytes	C++
 	xpcom.dll!NS_ShutdownXPCOM(nsIServiceManager * svcMgr=0x00000000)  Line 134 + 0xa bytes	C++
 	xpcshell.exe!NS_internal_main(int argc=24, char * * argv=0x01c4f6bc, char * * envp=0x01c431b8)  Line 1973 + 0x8 bytes	C++
 	xpcshell.exe!wmain(int argc=31, wchar_t * * argv=0x01cfbc68)  Line 109 + 0x16 bytes	C++


If there is some specific change in order of observers to NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID on Ubuntu or it's just a timing issue of some kind, I don't know.

I don't see any bug that would be landed around the time this started to be so intensive, maybe except Bug 822399 (cc Olli).  That is a wild theory only.


This is definitely not caused by some network change.
(In reply to Honza Bambas (:mayhemer) from comment #180)
> When doing check-interactive of the failing test, after test has run and I
> type quit() I'm getting the following stack (windows):

I've seen an issue which looked imilar to this one while working on a different bug. That issue was fixed in bug 835730 and boiled down to this: nsBrowserGlue was tearing down some stuff when receiving an xpcom-shutdown event; those components were initialized during profile loading and in some tests this wasn't happening. In the end various shutdown functions were called on components that had never been initialized leading to an error very much like this one when manually calling _quit() interactively in the affected xpcshell test.

Since the problem might be something similar I'll have a look later today and see if can track it down.
Thanks, Gabriele.
I've gathered a few more clues about this failure and they point in a direction similar my previous experience:

- The assertion is triggered by executing a script (osfile_async_worker.js) as part of an xpcom-shutdown observer after ShutdownXPCOM() has been called

- This is causing a ScriptLoaderRunnable class to be instanced three times, it is from here that the nsStreamTransportService code is being called. The first time it runs, the |mPool| field is non-null and everything goes fine, this is during ShutdownXPCOM(). The second and third times it's in the "xpcom-shutdown" observer and that's where the assertion is being triggered

- In every case this is happening in a DOM worker thread, this is created here as part of the test:

https://mxr.mozilla.org/mozilla-central/source/toolkit/components/search/nsSearchService.js#2708

      let promise = OS.File.writeAtomic(path, data, { tmpPath: path + ".tmp"});

So my diagnose is that some code is hooking the "xpcom-shutdown" event and running some code that has already been run once cleaning up after itself so the following runs are causing the issue. The culprit seems to lie somewhere in the OS.File module so that's where I'll be looking next.
https://tbpl.mozilla.org/php/getParsedLog.php?id=21505860&tree=Mozilla-Inbound
Summary: Intermittent test_645970.js | application crashed [@ mozalloc_abort(char const*)] from ASSERTION: thread pool wasn't shutdown → Intermittent test_645970.js,test_logging.js | application crashed [@ mozalloc_abort(char const*)] from ASSERTION: thread pool wasn't shutdown
It looks like enabling the ECMAScript Internationalization API (part of SpiderMonkey, uses the big third-party ICU library) may make this failure permanent instead of intermittent on Windows - I see two complete sets of failures on two try runs:
April 9: https://tbpl.mozilla.org/?tree=Try&rev=4cc524a3f8ae
April 8: https://tbpl.mozilla.org/?tree=Try&rev=1fc6411a138d

I don't see these failures on earlier try runs:
April 4: https://tbpl.mozilla.org/?tree=Try&rev=7aa110e2dc60
so something else must have happened between April 4 and April 8 (the only differences on my side are restructured configure/make files).

To reproduce the failures, apply patches b2470ed032d1 and edfeec4def28 of the first-listed try run to current m-c.
(In reply to Gabriele Svelto [:gsvelto] from comment #183)
> I've gathered a few more clues about this failure and they point in a
> direction similar my previous experience:
> 
> - The assertion is triggered by executing a script (osfile_async_worker.js)
> as part of an xpcom-shutdown observer after ShutdownXPCOM() has been called
> 
> - This is causing a ScriptLoaderRunnable class to be instanced three times,
> it is from here that the nsStreamTransportService code is being called. The
> first time it runs, the |mPool| field is non-null and everything goes fine,
> this is during ShutdownXPCOM(). The second and third times it's in the
> "xpcom-shutdown" observer and that's where the assertion is being triggered
> 
> - In every case this is happening in a DOM worker thread, this is created
> here as part of the test:
> 
> https://mxr.mozilla.org/mozilla-central/source/toolkit/components/search/
> nsSearchService.js#2708
> 
>       let promise = OS.File.writeAtomic(path, data, { tmpPath: path +
> ".tmp"});
> 
> So my diagnose is that some code is hooking the "xpcom-shutdown" event and
> running some code that has already been run once cleaning up after itself so
> the following runs are causing the issue. The culprit seems to lie somewhere
> in the OS.File module so that's where I'll be looking next.

There is some code hooking web-workers-shutdown in OS.File: 
http://dxr.mozilla.org/mozilla-central/toolkit/components/osfile/osfile_async_front.jsm#l152

Could this be the cause?
Blocks: 853301
(In reply to Norbert Lindenberg from comment #192)
> It looks like enabling the ECMAScript Internationalization API (part of
> SpiderMonkey, uses the big third-party ICU library) may make this failure
> permanent instead of intermittent on Windows [...]

Interesting, I hadn't tried on Windows yet but I could reproduce it fairly consistently on Linux already.

(In reply to David Rajchenbach Teller [:Yoric] from comment #193)
> There is some code hooking web-workers-shutdown in OS.File: 
> http://dxr.mozilla.org/mozilla-central/toolkit/components/osfile/
> osfile_async_front.jsm#l152
> 
> Could this be the cause?

It might, thanks for the heads up.
It seems to me that there are two separate bugs - the failure logs above are always either Linux/test_645970.js or Windows/test_logging.js, never Linux/test_logging.js or Windows/test_645970.js. The ones I've seen in my try runs were always of the Windows/test_logging.js variant. Note that test_logging.js is a new test which reached mozilla-central only on April 5.
Partial fix.
Attachment #736684 - Flags: review?(jonathan.laver)
Comment on attachment 736684 [details] [diff] [review]
Made test_logging.js correctly ignore messages it wants to ignore

Review of attachment 736684 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Comment on attachment 736684 [details] [diff] [review]
Made test_logging.js correctly ignore messages it wants to ignore

Chiming in: rabbidrabbit doesn't have review rights on this part of the code, although he authored it.
Attachment #736684 - Flags: review?(jonathan.laver) → feedback?(jonathan.laver)
Comment on attachment 736684 [details] [diff] [review]
Made test_logging.js correctly ignore messages it wants to ignore

Review of attachment 736684 [details] [diff] [review]:
-----------------------------------------------------------------

Patch looks like it will fix something. I do not understand the cause of the crash well enough to know whether this will be sufficient, though.
Attachment #736684 - Flags: review+
(In reply to Gabriele Svelto [:gsvelto] from comment #183)
> - In every case this is happening in a DOM worker thread, this is created
> here as part of the test:
> 
> https://mxr.mozilla.org/mozilla-central/source/toolkit/components/search/
> nsSearchService.js#2708
> 
>       let promise = OS.File.writeAtomic(path, data, { tmpPath: path +
> ".tmp"});

Do I understand correctly that this code is executed upon shutdown?
If so, we have a weird race condition, because this is code that is meant to be executed during initialization.
Flags: needinfo?(jwalden+bmo)
(In reply to David Rajchenbach Teller [:Yoric] from comment #202)
> Do I understand correctly that this code is executed upon shutdown?
> If so, we have a weird race condition, because this is code that is meant to
> be executed during initialization.

No, it's more complicated and re-reading my comment I didn't describe the mechanism very well :-) This code only triggers the creation of a DOM Worker thread; then during shutdown this thread goes on to execute another script which is causing the problem. I was able to analyze the problem better and I found where the whole chain starts, it's here in RuntimeService::Cleanup():

https://mxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#1127

Which is being called by an "xpcom-shutdown" observer (which is what I first saw when testing):

https://mxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#1419

So for some reason the following is happening:

- OS.File starts up, creates a DOM worker and registers the "web-workers-shutdown" event
- During shutdown the "xpcom-shutdown" event is dispatched
- RuntimeService picks it up, calls Cleanup() which sends "web-workers-shutdown"
- osfile_async_front.jsm:webWorkersShutdownObserver() is invoked in response to the event
- This sends a "System_shutdown" message to the worker
- The shutdown code in osfile_async_worker.js is called, this is what's triggering the execution of ScriptLoaderRunnable

Now, the first time this runs it's fine; for some reason however it seems that the last four steps are repeated more than once causing the web worker to be shutdown more than once and that trips the assertion.
Gabriele, if your analysis is correct (and it looks correct), this should fix the bug.
Attachment #737865 - Flags: review?(nfroyd)
Attachment #737865 - Flags: feedback?(yura.zenevich)
Flags: needinfo?(jwalden+bmo)
Attachment #737865 - Flags: feedback?(yura.zenevich) → feedback+
Comment on attachment 737865 [details] [diff] [review]
Unregistering the shutdown observer after the first "web-workers-shutdown"

Review of attachment 737865 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/osfile/osfile_async_front.jsm
@@ +183,5 @@
> +  if (aTopic == WEB_WORKERS_SHUTDOWN_TOPIC) {
> +    // Failing to unregister can cause crashes in some tests
> +    // (probably not in production), see bug 845190
> +    Services.obs.removeObserver(webWorkersShutdownObserver, WEB_WORKERS_SHUTDOWN_TOPIC);
> +  }

I don't think you really need the commit; unregistering observers is pretty standard.

But why not unregister TEST_WEB_WORKERS_SHUTDOWN_TOPIC?  Actually, why is *registering* the observer on the test topic even done unconditionally in this file?

@@ +203,5 @@
>    });
>  };
>  
>  // Attaching an observer listening to the "web-workers-shutdown".
> +// The observer is removed once "web-workers-shutdown" is received.

No comment necessary.  The previous one could go too.
Attachment #737865 - Flags: review?(nfroyd) → feedback+
(In reply to Nathan Froyd (:froydnj) from comment #208)
> Comment on attachment 737865 [details] [diff] [review]
> Unregistering the shutdown observer after the first "web-workers-shutdown"
> 
> Review of attachment 737865 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/components/osfile/osfile_async_front.jsm
> @@ +183,5 @@
> > +  if (aTopic == WEB_WORKERS_SHUTDOWN_TOPIC) {
> > +    // Failing to unregister can cause crashes in some tests
> > +    // (probably not in production), see bug 845190
> > +    Services.obs.removeObserver(webWorkersShutdownObserver, WEB_WORKERS_SHUTDOWN_TOPIC);
> > +  }
> 
> I don't think you really need the commit; unregistering observers is pretty
> standard.

Ok with s/commit/comment/.

> But why not unregister TEST_WEB_WORKERS_SHUTDOWN_TOPIC?

Well, because I do not want to enforce the fact that we are only ever going to run a single test.

> Actually, why is
> *registering* the observer on the test topic even done unconditionally in
> this file?

Because, at this stage, we don't really have a way to determine whether we are being executed in a testing setting. I guess we could wrap this in a preference, but I would rather do this as a followup, if that's ok with you.

> 
> @@ +203,5 @@
> >    });
> >  };
> >  
> >  // Attaching an observer listening to the "web-workers-shutdown".
> > +// The observer is removed once "web-workers-shutdown" is received.
> 
> No comment necessary.  The previous one could go too.

Ok.
Comment on attachment 737865 [details] [diff] [review]
Unregistering the shutdown observer after the first "web-workers-shutdown"

(In reply to David Rajchenbach Teller [:Yoric] from comment #209)
> (In reply to Nathan Froyd (:froydnj) from comment #208)
> > I don't think you really need the commit; unregistering observers is pretty
> > standard.
> 
> Ok with s/commit/comment/.

Right, thank you!

> > But why not unregister TEST_WEB_WORKERS_SHUTDOWN_TOPIC?
> 
> Well, because I do not want to enforce the fact that we are only ever going
> to run a single test.
> 
> > Actually, why is
> > *registering* the observer on the test topic even done unconditionally in
> > this file?
> 
> Because, at this stage, we don't really have a way to determine whether we
> are being executed in a testing setting. I guess we could wrap this in a
> preference, but I would rather do this as a followup, if that's ok with you.

You know how much I like followup bugs. :)  That's fine.

r=me with the changes.
Attachment #737865 - Flags: feedback+ → review+
The combination of the two patches available so far still does not fix the perma-orange on Windows with the internationalization API enabled:
https://tbpl.mozilla.org/?tree=Try&rev=e9a5f999214f
OS: Linux → All
I've tried applying the patch and the test passes correctly however now I see this when running on Linux:

 0:00.40 file:///home/gsvelto/projects/build/firefox-debug/dist/bin/components/nsHandlerService.js:891: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]
 0:00.40 JS Component Loader: ERROR /home/gsvelto/projects/mozilla-central/testing/xpcshell/head.js:941
 0:00.40                      TypeError: Components is undefined
 0:00.40 file:///home/gsvelto/projects/build/firefox-debug/dist/bin/components/nsHandlerService.js:891: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]
 0:00.40 JS Component Loader: ERROR /home/gsvelto/projects/mozilla-central/testing/xpcshell/head.js:944
 0:00.40                      TypeError: Components is undefined
 0:00.40 LoadPlugin() /home/gsvelto/projects/build/firefox-debug/dom/plugins/test/testplugin/libnptest.so returned 7fb2a3bde8b0
 0:00.40 LoadPlugin() /home/gsvelto/projects/build/firefox-debug/dom/plugins/test/testplugin/secondplugin/libnpsecondtest.so returned 7fb2a7faad30
 0:00.40 WARNING: NS_ENSURE_TRUE(mPool) failed: file /home/gsvelto/projects/mozilla-central/netwerk/base/src/nsStreamTransportService.cpp, line 465
 0:00.40 WARNING: unable to post continuation event: file /home/gsvelto/projects/mozilla-central/xpcom/io/nsStreamUtils.cpp, line 438
 0:00.40 WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0xC1F30001: file /home/gsvelto/projects/mozilla-central/dom/workers/ScriptLoader.cpp, line 323
 0:00.40 WORKER ERROR DURING SETUP Error: Failed to load script: resource://gre/modules/osfile.jsm (nsresult = 0xc1f30001)
 0:00.40 WORKER ERROR DETAIL @resource://gre/modules/osfile/osfile_async_worker.js:16
 0:00.40 @resource://gre/modules/osfile/osfile_async_worker.js:392

Breaking-in with GDB shows that the issue is different, yet a I can still see a worker thread been spun up during shutdown which is then torn down and triggers the warning.
No longer blocks: 862570
Depends on: 862570
The patches are being collapsed into bug 862570.
Attachment #736684 - Flags: checkin?(jwalden+bmo)
Comment on attachment 736684 [details] [diff] [review]
Made test_logging.js correctly ignore messages it wants to ignore

https://hg.mozilla.org/integration/mozilla-inbound/rev/8a2536988191
Attachment #736684 - Flags: checkin?(jwalden+bmo)
Cc'ing Taras and Dougt for any insights, also for point of view on disabling this test and opening a bug to diagnose and re-enable. That would let Waldo land the JS Intl code, so we can make fwd progress there.

/be
We expected that bug 862570 would fix the issue. It seems that this is not the case.
And this explosion of instances is from yet another JS patch, https://hg.mozilla.org/integration/mozilla-inbound/rev/75699d465ba0 from bug 864085, making it permaorange on Windows.

I might be making the wrong call, but I think it's time to disable it.
Disabled on Windows debug in https://hg.mozilla.org/integration/mozilla-inbound/rev/65463f2f7ba9
Whiteboard: [test disabled on Linux debug][leave open] → [test disabled on Linux debug][test_logging.js disabled on Windows debug][leave open]
I'm currently neck deep in bug 838577, but as soon as I have some respite (hopefully this week), I'll resume investigating this bug.
I'm also stuck fixing hard-to-reproduce B2G crashes but as soon as I'm done I'll be trying to further the investigation I was doing in comment 213. I'm still convinced that the cause is some code that is spawning a worker thread while shutdown is already undergoing. Last time I hit an issue like this it was because of a non-initialized subsystem being shut down but this time it looks different.
I've had some time to investigate this further and I can confirm two things:

- The issue is the same on Windows and Linux so this is not platform-specific

- The assertion is triggered by the following chain: ShutdownXPCOM() sends an xpcom-shutdown event to all observers and then spins the event loop with NS_ProcessPendingEvents() to let them run. During this process a new worker thread is spawned; afterwards the xpcom-shutdown-threads event is sent and at this stage RuntimeService::Cleanup() spins the event loop again causing the worker to run and fail with the assertion.
Note: Comment 289 onwards was caused by the landing of bug 860965, which has since been backed out.
Ok, I think that I got it, thanks to comment 274.

I believe that the problem is due to the lazy initialization of worker threads. In a few situations, we end up sending our first message to the OS.File worker thread during shutdown.

In both cases, we import osfile.jsm (which registers a shutdown observer which sends a message to the worker thread to collect information on resource leaks) but never actually perform any I/O (hence never initialize the worker thread before shutdown).

This patch does two things:
- skip this reporting if the worker thread has never been initialized (which is semantically correct - a worker that has never been initialized cannot leak file descriptors);
- throw an error if we attempt to send any other message during shutdown to a yet uninitialized OS.File worker thread.
Attachment #736684 - Attachment is obsolete: true
Attachment #738449 - Attachment is obsolete: true
Attachment #736684 - Flags: feedback?(jonathan.laver)
Attachment #748808 - Flags: review?(nfroyd)
Comment on attachment 748808 [details] [diff] [review]
Better handling of attempts to initialize the OS.File worker thread during shutdown.

Review of attachment 748808 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/osfile/osfile_async_front.jsm
@@ +241,5 @@
>      if (msg) {
>        LOG("WARNING: File descriptors leaks detected.\n" + msg);
>      }
>    });
> +  gShuttingDown = true;

Setting gShuttingDown should be moved further up, before the check of Scheduler.initialized, right?  Otherwise you're never going to hit the error case you added in Scheduler.post.  Or maybe it should even be moved into the conditional for WEB_WORKERS_SHUTDOWN_TOPIC; I don't really understand why this function would have code specially for WEB_WORKERS_SHUTDOWN_TOPIC and then separate code for every other topic.
(In reply to David Rajchenbach Teller [:Yoric] from comment #311)
> Ok, I think that I got it, thanks to comment 274.
> 
> I believe that the problem is due to the lazy initialization of worker
> threads. In a few situations, we end up sending our first message to the
> OS.File worker thread during shutdown.

From what I could see the problem is even deeper. It seems that by the time we hit ShutdownXPCOM() in this test we haven't even yet loaded the worker's JS code. Armed with a lot of patience and stepping carefully through the code I managed to unravel this chain of events:

- nsSearchService.js tries to do an atomic write immediately after the test has executed:

https://mxr.mozilla.org/mozilla-central/source/toolkit/components/search/nsSearchService.js#2703

- This causes the file handling code to spawn a new DOM Worker in _PromiseWorker.jsm, this is done by posting a runnable to via RuntimeService::ScheduleWorker() (this all happens in the main thread)

https://mxr.mozilla.org/mozilla-central/source/toolkit/components/osfile/_PromiseWorker.jsm#82

- The test finishes and then the WorkerThreadRunnable gets a chance to run and creates the worker. The worker now runs and tries to load the script it needs to execute by invoking LoadWorkerScript() this in turn sends a ScriptLoaderRunnable to execute in the main thread.

- Before the ScriptLoaderRunnable runnable gets a chance to run, the test quits and calls ShutdownXPCOM() which notifies all xpcom-shutdown observers and then spins the event loop here:

https://mxr.mozilla.org/mozilla-central/source/xpcom/build/nsXPComInit.cpp#583

- At this stage the ScriptLoaderRunnable starts to run and tries to load the worker's JS code but it's now already too late. It will fail in multiple places which you can see in the log if you run the test, this is the first related error that crops up:

file:///home/gsvelto/projects/build/firefox-debug/dist/bin/components/nsHandlerService.js:895: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]
JS Component Loader: ERROR /home/gsvelto/projects/mozilla-central/testing/xpcshell/head.js:957
                     TypeError: Components is undefined

What's causing it is the system trying to establish the MIME type of the source file which is being loaded for the worker. Things basically go south from here but the actual problem is that the worker's JS code has not been loaded yet before we hit ShutdownXPCOM(), so this is just a side-effect.
(In reply to Ed Morley [:edmorley UTC+1] from comment #310)
> Note: Comment 289 onwards was caused by the landing of bug 860965, which has
> since been backed out.

I'm not too familiar with the XPCOM side of things -- can someone shed light on why landing a patch that only shuffles some self-hosted JavaScript code around causes this particular xpcshell failure? I suppose it changed event loop timings?
(In reply to Shu-yu Guo [:shu] from comment #314)
> I'm not too familiar with the XPCOM side of things -- can someone shed light
> on why landing a patch that only shuffles some self-hosted JavaScript code
> around causes this particular xpcshell failure? I suppose it changed event
> loop timings?

Yes, it likely changes the order in which posted events get executed. I'm preparing a patch that tries to drain the event queue more thoroughly before shutting down to see if it fixes the problem.
(In reply to Gabriele Svelto [:gsvelto] from comment #313)
> (In reply to David Rajchenbach Teller [:Yoric] from comment #311)
> > Ok, I think that I got it, thanks to comment 274.
> > 
> > I believe that the problem is due to the lazy initialization of worker
> > threads. In a few situations, we end up sending our first message to the
> > OS.File worker thread during shutdown.
> 
> From what I could see the problem is even deeper. It seems that by the time
> we hit ShutdownXPCOM() in this test we haven't even yet loaded the worker's
> JS code. Armed with a lot of patience and stepping carefully through the
> code I managed to unravel this chain of events:

Yes, that's what I meant with "lazy initialization initialization of worker threads." – namely OS.File's IO worker thread. My patch should raise an asynchronous error whenever this takes place in OS.File.
Let's simplify that code a little.
Attachment #748808 - Attachment is obsolete: true
Attachment #748808 - Flags: review?(nfroyd)
Attachment #748965 - Flags: review?(nfroyd)
Blocks: 860965
(In reply to David Rajchenbach Teller [:Yoric] from comment #316)
> Yes, that's what I meant with "lazy initialization initialization of worker
> threads." – namely OS.File's IO worker thread. My patch should raise an
> asynchronous error whenever this takes place in OS.File.

Yes but the problem here is that the post() command happens before shutdown commences so it doesn't trigger your additional check. The creation of the worker is what's triggering the problem as it fires the ScriptLoaderRunnable that will then run after shutdown has commenced.

I've been looking through the code under dom/workers and I can't seem to find a simple solution to prevent the problem from happening. I think we need to modify RuntimeService::Cleanup() (which runs in response to xpcom-shutdown) in such a way that it ensures that all loader runnables have finished executing before returning. It might that this would also fail if other xpcom-shutdown listeners have already shut themselves down in which case we would need to cancel all ScriptLoaderRunnables before they get to run.
More debugging and more pinpointing... turns out that I had partially misinterpreted the sequence of events. The osfile_async_workers.js source is loaded properly *before* shutdown has initiated but gets executed *afterwards*. Once it starts being executed it will try to import more JS source code here:

https://mxr.mozilla.org/mozilla-central/source/toolkit/components/osfile/osfile_async_worker.js#16

This in turn creates a new ScriptLoaderRunnable to load the code which then fails.
Mmmh...
This wouldn't be the first time that ScriptLoaderRunnable causes mysterious errors. In this case, my patch probably won't help.
(In reply to David Rajchenbach Teller [:Yoric] from comment #322)
> Mmmh...
> This wouldn't be the first time that ScriptLoaderRunnable causes mysterious
> errors. In this case, my patch probably won't help.

No, and the bigger issue here is that a worker thread can invoke importScripts() recursively so there's no way to ensure that loading a piece of code won't trigger loading another one unless the worker is completely stopped/canceled/killed. Ultimately one of the runnables might end up beyond the ShutdownXPCOM() call and wreak havoc. Unfortunately I'm not sufficiently familiar with this code to come up with a way to change the shutdown procedure and prevent this; I think we need a DOM worker guru here.
Is there a workaround at least, or do I have to wait for this to be fixed until I try to reland my stuff?
Component: Search → DOM: Workers
Product: Firefox → Core
It's not clear to me what the problem is here. From comment 323:

> Ultimately one of the runnables might end up beyond the ShutdownXPCOM() call
> and wreak havoc.

This shouldn't be possible as RuntimeService::Cleanup won't return until all worker code is finished and all worker threads joined (and a full flush of the main thread's event queue as well).
Comment on attachment 748965 [details] [diff] [review]
Better handling of attempts to initialize the OS.File worker thread during shutdown, v2

Clearing review in light of the discussion; please reset if this looks like the right fix after all.
Attachment #748965 - Flags: review?(nfroyd)
(In reply to ben turner [:bent] from comment #326)
> This shouldn't be possible as RuntimeService::Cleanup won't return until all
> worker code is finished and all worker threads joined (and a full flush of
> the main thread's event queue as well).

That's what I also thought but somehow it isn't the case here. I can try to break in RuntimeService::Cleanup() and see what's going on there. Is it possible that only the ScriptLoaderRunnable might have "survived" the shutdown procedure while the actual worker was shut down?
No, not really. The worker can't complete until all pending ScriptLoaderRunnables are finished.
(In reply to ben turner [:bent] from comment #329)
> No, not really. The worker can't complete until all pending
> ScriptLoaderRunnables are finished.

I see... that sounds reasonable. I'll go through the test one more time with GDB and I'll post a full trace of what I'm seeing in the hope that it sheds some light on what's going on here.
(In reply to ben turner [:bent] from comment #329)
> No, not really. The worker can't complete until all pending
> ScriptLoaderRunnables are finished.

Your comment pointed me in the right direction. I've checked and while that ScriptLoaderRunnable is running in the DOM worker thread the main thread is waiting in RuntimeService::Cleanup() for the worker to stop so that part is completely fine.

I then went ahead and checked the order of shutdown of the various components and found out that nsStreamTransportService is shut down before that ScriptLoaderRunnable gets a chance to run in response to an xpcom-shutdown-threads event, see here:

https://mxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsStreamTransportService.cpp#513

I assume this does not happen in general, but in this specific test the nsStreamTransportService gets the xpcom-shutdown-threads before it reaches RuntimeService and thus the latter will clean up after itself "too late". Considering that I'm seeing more errors besides the assertion it might be that other important subsystems have already been shut down by the time the worker thread gets to run its code.
Hm. That's an interesting theory, but then if the stream transport is cleaning itself up I don't see how we could be hitting the assertion that this bug was filed for...
For these specific crashes, I could put a wallpaper bug that initializes OS.File in the test early enough that we don't see the crash. Would this be any useful?
(In reply to David Rajchenbach Teller [:Yoric] from comment #352)
> For these specific crashes, I could put a wallpaper bug that initializes
> OS.File in the test early enough that we don't see the crash. Would this be
> any useful?

I think that might fix this specific problem; a lot of other similar tests use OS.File and work just fine. We should then keep track in another bug about the root issue (web workers shutdown sequence).
I still don't think we understand the root problem here...

If what you said in comment 331 is true ("nsStreamTransportService is shut down before that ScriptLoaderRunnable gets a chance to run") then we shouldn't see the assertion that we're apparently hitting here ("ASSERTION: thread pool wasn't shutdown").
(In reply to ben turner [:bent] from comment #356)
> I still don't think we understand the root problem here...
> 
> If what you said in comment 331 is true ("nsStreamTransportService is shut
> down before that ScriptLoaderRunnable gets a chance to run") then we
> shouldn't see the assertion that we're apparently hitting here ("ASSERTION:
> thread pool wasn't shutdown").

The problem is that something instantiates (GetService'es) the stream transport service AFTER or INSIDE xpcom-shutdown-threads.  The stream transport service then cannot shutdown, it can never get xpcom-shutdown-threads notification, since it's behind it.
As a side-note, this is the kind of situation for which we first discussed replacing observer events by promises in some cases: being able to watch for an event after it had occurred.
(In reply to Honza Bambas (:mayhemer) from comment #359)
> The problem is that something instantiates (GetService'es) the stream
> transport service AFTER or INSIDE xpcom-shutdown-threads.  The stream
> transport service then cannot shutdown, it can never get
> xpcom-shutdown-threads notification, since it's behind it.

Yes, that's happening in RuntimeService::Cleanup() which is waiting for workers to terminate. Since the OS.File worker hasn't even been loaded yet it spins up and starts requesting services while inside xpcom-shutdown-threads which fails.
I'm not 100% sure but the worker code should be able to gracefully fail if the StreamTransportService is not available. It sounds to me like the real fix here should be to make sure that the StreamTransportService does not try to resurrect itself after it has already been shut down.
A gentle reminder that this blocks me from landing self-hosted JS code, and so we should fix this soon. :)
Attached patch Wallpaper patch (obsolete) — Splinter Review
Since this prevents other work from landing, here's a wallpaper patch.

Try: https://tbpl.mozilla.org/?tree=Try&rev=e88535999673
Attachment #756933 - Flags: review?(nfroyd)
Comment on attachment 756933 [details] [diff] [review]
Wallpaper patch

Review of attachment 756933 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is the right fix.  Ben's solution from comment 362 sounds reasonable and should be ~10 lines of code, maybe less.
Attachment #756933 - Flags: review?(nfroyd)
> I don't think this is the right fix.  Ben's solution from comment 362 sounds
> reasonable and should be ~10 lines of code, maybe less.

Ok, this will be a larger wallpaper fix. I'll give it a try.
Is that what you had in mind, Ben?

Try: https://tbpl.mozilla.org/?tree=Try&rev=4e4faf4b5945
Attachment #756933 - Attachment is obsolete: true
Attachment #757107 - Flags: feedback?(bent.mozilla)
Comment on attachment 757107 [details] [diff] [review]
Preventing nsStreamTransportService from re-initializing

Review of attachment 757107 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/src/nsStreamTransportService.cpp
@@ +514,5 @@
>  
>    if (mPool) {
>      mPool->Shutdown();
>      mPool = nullptr;
> +    mHasBeenShutdown = true;

I think you want to set this variable unconditionally.
Good point.
Attachment #757107 - Attachment is obsolete: true
Attachment #757107 - Flags: feedback?(bent.mozilla)
Attachment #757126 - Flags: feedback?(bent.mozilla)
Comment on attachment 757126 [details] [diff] [review]
Preventing nsStreamTransportService from re-initializing, v2

Review of attachment 757126 [details] [diff] [review]:
-----------------------------------------------------------------

Wouldn't a resurrected nsStreamTransportService actually be a new instance? You have to keep track of the shutdown flag statically.
Attachment #757126 - Flags: feedback?(bent.mozilla) → feedback-
Good point, too.
Attachment #757126 - Attachment is obsolete: true
Attachment #757152 - Flags: review?(cbiesinger)
Sorry, fixed typo.
Attachment #757152 - Attachment is obsolete: true
Attachment #757152 - Flags: review?(cbiesinger)
Attachment #757178 - Flags: review?
Attachment #757178 - Flags: review? → review?(cbiesinger)
Comment on attachment 757178 [details] [diff] [review]
Preventing nsStreamTransportService from re-initializing, v4

>diff --git a/netwerk/base/src/nsStreamTransportService.cpp b/netwerk/base/src/nsStreamTransportService.cpp

> nsStreamTransportService::Init()
> {
>+    if (mHasBeenShutdown) {

(you meant sHasBeenShutdown)
Ah, thanks.
I guess that's what I get for writing patches instead of sleeping.
Attachment #757178 - Attachment is obsolete: true
Attachment #757178 - Flags: review?(cbiesinger)
Attachment #757302 - Flags: review?(cbiesinger)
Comment on attachment 757302 [details] [diff] [review]
Preventing nsStreamTransportService from re-initializing, v5

Review of attachment 757302 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think this is going to work.

Problem is that stream transport service is apparently instantiated AFTER the "xpcom-shutdown-threads", so the change in Init() method will see your new flag as false anyway.

You needs this flag in XPCOM and let the stream transport service check it.  Problem is - what impact is this going to have on OS.File?  Is there a chance of user data loose during shutdown?
(In reply to Honza Bambas (:mayhemer) from comment #379)
> I don't think this is going to work.
> 
> Problem is that stream transport service is apparently instantiated AFTER
> the "xpcom-shutdown-threads", so the change in Init() method will see your
> new flag as false anyway.

Ah, it was my understanding that it was *re*instantiated. However, as you, I fear that my patch is too fragile.

The only robust way I can think of is to make nsIComponentManager aware of where we are in the startup/shutdown process, possibly with two methods such as:

  /**
   * Determine if the process has reached a given stage.
   *
   * @param stage One of the startup or shutdown notifications detailed in 
   * https://developer.mozilla.org/en-US/docs/Observer_Notifications
   * @return |true| if |setIsReached(stage)| has ever been called.
   */
  bool isReached(wstring stage);
  [noscript] bool setIsReached(wstring stage);

I believe that this would be useful for more than this bug.

> You needs this flag in XPCOM and let the stream transport service check it. 
> Problem is - what impact is this going to have on OS.File?  Is there a
> chance of user data loose during shutdown?

In real use scenario, OS.File is instantiated during startup, so we shouldn't have any issue. In a non-Firefox context, this might be different.
Attachment #757302 - Attachment is obsolete: true
Attachment #757302 - Flags: review?(cbiesinger)
We can force the stream transport service alive in RuntimeService::Init(), but that seems fragile too. It seems like Necko should be handling this somehow.
Well, back to plan whatever-it-was: wallpaper the patch until we can make the necessary changes to Necko or XPCOM itself, so as to unblock bug 853301 and bug 860965.
(In reply to Honza Bambas (:mayhemer) from comment #379)
> Comment on attachment 757302 [details] [diff] [review]
> Preventing nsStreamTransportService from re-initializing, v5
> 
> Review of attachment 757302 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't think this is going to work.
> 
> Problem is that stream transport service is apparently instantiated AFTER
> the "xpcom-shutdown-threads", so the change in Init() method will see your
> new flag as false anyway.
> 
> You needs this flag in XPCOM and let the stream transport service check it. 
> Problem is - what impact is this going to have on OS.File?  Is there a
> chance of user data loose during shutdown?

I don't know how intermittent the problem is, but https://tbpl.mozilla.org/?tree=Try&rev=5551a5bf965c doesn't have xpcshell failures (there's a build failure, but doesn't seem related to my patch). I'll relaunch tests a few times, but I believe that we can try and land this patch – and revert it if it doesn't actually fix the issue.
Honza, does that strategy sound ok to you?
Flags: needinfo?(honzab.moz)
I personally don't believe the patch will change stuff, but you've spent more time investigating this then me.  So, go forward, we can back it out any time later.
Flags: needinfo?(honzab.moz)
Attachment #748965 - Flags: review?(cbiesinger)
Comment on attachment 748965 [details] [diff] [review]
Better handling of attempts to initialize the OS.File worker thread during shutdown, v2

     let promise = worker.post.apply(worker, args);
+    if (!this.initialized && gShuttingDown) {
+      return Promise.reject(new Error("Attempting to initialize OS.File thread during shutdown"));
+    }

shouldn't the return be before the "let promise" line?
Attachment #748965 - Flags: review?(cbiesinger) → review+
Comment on attachment 757302 [details] [diff] [review]
Preventing nsStreamTransportService from re-initializing, v5

Mmmhhh... It seems that I obsoleted v5 in reaction to an issue on v4 that v5 actually fixed. I must be tired.
Also, sorry, I managed to r? the wrong patch.
Attachment #757302 - Attachment is obsolete: false
Attachment #757302 - Flags: review?(cbiesinger)
Attachment #757302 - Flags: review?(cbiesinger) → review+
Ok, let's see if this fixes the issue.
Keywords: checkin-needed
Attachment #748965 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/f92a6ad158c7

What's the plan for re-enabling the tests on the platforms they were disabled on?
The patch should re-enable the tests.
The idea is to revert the patch (hence re-disabling the tests) if we don't solve the issue.
I don't see any test re-enabling in that changeset...
Attached patch Re-enabling test suite. (obsolete) — Splinter Review
My bad.
Attachment #760563 - Flags: review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/14d35bcbc52f

Fingers crossed.
Keywords: checkin-needed
Whiteboard: [test disabled on Linux debug][test_logging.js disabled on Windows debug][leave open] → [leave open]
Looks like that didn't work.
Reverted.
https://hg.mozilla.org/integration/mozilla-inbound/rev/8fe099852dc4
Whiteboard: [leave open] → [test_645970.js disabled on Linux debug][test_logging.js disabled on Windows debug][leave open]
I'm guessing from this sudden deluge of failures that the load-bearing wallpaper didn't work out?
Same objective, more robust implementation.
Attachment #757302 - Attachment is obsolete: true
Attachment #767930 - Flags: review?(cbiesinger)
Here's a patch exposing the current xpcom startup/shutdown status, so that clients can take informed decisions, such as not initializing during shutdown.

I have put the code in nsXPComInit.cpp, but it could move
Attachment #767935 - Flags: feedback?(doug.turner)
Comment on attachment 767930 [details] [diff] [review]
2. Preventing nsStreamTransportService from re-initializing, v6

this is missing a return statement...?
Attachment #767930 - Flags: review?(cbiesinger) → review-
Comment on attachment 768100 [details] [diff] [review]
2. Preventing nsStreamTransportService from re-initializing, v7

this is a storage patch, not stream transport service?
Sorry about that. Too much multitasking, too much jetlag, not enough sleep.
Attachment #768100 - Attachment is obsolete: true
Attachment #768100 - Flags: review?(cbiesinger)
Attachment #768517 - Flags: review?(cbiesinger)
Comment on attachment 768517 [details] [diff] [review]
2. Preventing nsStreamTransportService from re-initializing, v8

 {
+    if (NS_GetAppStatus() >= NS_APPSTATUS_QUIT_APPLICATION_GRANTED) {
+      // Do not initialize if we are currently shutting down.
+      // See bug 845190.
+     return NS_ERROR_ILLEGAL_DURING_SHUTDOWN;
+    }

please fix the inconsistent indentation
Attachment #768517 - Flags: review?(cbiesinger) → review+
Attachment #767935 - Attachment is obsolete: true
Attachment #767935 - Flags: feedback?(doug.turner)
Comment on attachment 768517 [details] [diff] [review]
2. Preventing nsStreamTransportService from re-initializing, v8

Applied feedback and moved this to bug 888347.
Attachment #768517 - Attachment is obsolete: true
Disabled test_logging on Linux as well.
https://hg.mozilla.org/mozilla-central/rev/4ffb23062b3b
Whiteboard: [test_645970.js disabled on Linux debug][test_logging.js disabled on Windows debug][leave open] → [test_645970.js disabled on Linux debug][test_logging.js disabled on Linux and Windows debug][leave open]
This is affecting test_startup_session_sync.js on Linux now as well.

https://tbpl.mozilla.org/php/getParsedLog.php?id=25251485&tree=Fx-Team
https://tbpl.mozilla.org/php/getParsedLog.php?id=25280419&tree=Fx-Team
https://tbpl.mozilla.org/php/getParsedLog.php?id=25280093&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=25292826&tree=Mozilla-Inbound
Summary: Intermittent test_645970.js,test_logging.js | application crashed [@ mozalloc_abort(char const*)] from ASSERTION: thread pool wasn't shutdown → Intermittent test_645970.js,test_logging.js,test_startup_session_sync.js | application crashed [@ mozalloc_abort(char const*)] from ASSERTION: thread pool wasn't shutdown
Yoric, what's the status of this bug? I recall you telling me in person this was waiting on a review from bent (which I don't see). In any case, he's on PTO right now so I suggest getting a review from someone else, like jlebar, if you still need one.
Shu, sorry, I should have followed up with you about this yesterday: the patch that is supposed to fix this problem is attached to Bug 888347 (which is listed as blocking this bug), as well as some recent dialogue about reviewing the patch.
https://tbpl.mozilla.org/php/getParsedLog.php?id=25604532&tree=Mozilla-Central
Summary: Intermittent test_645970.js,test_logging.js,test_startup_session_sync.js | application crashed [@ mozalloc_abort(char const*)] from ASSERTION: thread pool wasn't shutdown → Intermittent test_645970.js,test_logging.js,test_startup_session_sync.js,test_startup_nosession_sync.js | application crashed [@ mozalloc_abort(char const*)] from ASSERTION: thread pool wasn't shutdown
Bug 853301 landed on inbound and in the process turned Windows debug xpcshell near perma-orange in either test_startup_nosession_sync.js or test_startup_session_sync.js. Also, more Linux failures were also occurring post-landing. Rather than continuing to hold random JS patches hostage because they just so happened to tickle this bug, I'll just start disabling the flaky tests instead. And maybe some day we can get this fixed for real and be done with it already.
https://hg.mozilla.org/integration/mozilla-inbound/rev/130edeb784f4
Whiteboard: [test_645970.js disabled on Linux debug][test_logging.js disabled on Linux and Windows debug][leave open] → [test_645970.js disabled on Linux debug][test_logging.js disabled on Linux and Windows debug][test_startup_nosession_sync.js disabled on Linux and Windows debug][test_startup_session_sync.js disabled on Linux and Windows debug][leave open]
Bug 827805 inadvertently re-enabled test_logging.js on Linux, hence the last two failures. I've re-disabled it.
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #553)
> Disabled test_logging on Linux as well.
> https://hg.mozilla.org/mozilla-central/rev/4ffb23062b3b

Pushed to beta.
https://hg.mozilla.org/releases/mozilla-beta/rev/f83885316c05
Flags: needinfo?(ryanvm)
So we should finally be able to revert the test disablings.
Give me a link to a Try push with lots of green retriggers and I'd be happy to.
Oh, well, it seems that the error is still there, just morphed to a new form:

02:05:05  WARNING -  TEST-UNEXPECTED-FAIL | /builds/slave/talos-slave/test/build/tests/xpcshell/tests/toolkit/components/osfile/tests/xpcshell/test_logging.js | Test timed out
[...]
02:05:05     INFO -  WORKER ERROR DURING SETUP NS_ERROR_NOT_INITIALIZED: Component not initialized
02:05:05     INFO -  WORKER ERROR DETAIL undefined
02:05:05     INFO -  WARNING: A phase completion condition is taking too long to complete. Condition: OS.File: flush pending requests, warn about unclosed files, shut down service. Phase: web-workers-shutdown

Bug 888347 converts the crash into a much cleaner NS_ERROR_NOT_INITIALIZED.
Bug 913899 adds the warning.

I'll try and prepare a patch.
Re-enabling all these deactivated tests.
Try: https://tbpl.mozilla.org/?tree=Try&rev=943184491620
Attachment #760563 - Attachment is obsolete: true
Attachment #809118 - Flags: review+
https://hg.mozilla.org/integration/fx-team/rev/579f2c394ccb

woowoo!
Assignee: nobody → dteller
Flags: in-testsuite+
Keywords: checkin-needed
Whiteboard: [test_645970.js disabled on Linux debug][test_logging.js disabled on Linux and Windows debug][test_startup_nosession_sync.js disabled on Linux and Windows debug][test_startup_session_sync.js disabled on Linux and Windows debug][leave open] → [fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/579f2c394ccb
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla27
Looks like there's still something going on here?
Flags: needinfo?(dteller)
:(
Looking at it.
Flags: needinfo?(dteller)
(In reply to Wes Kocher (:KWierso) from comment #634)
> Just backed out bug 927116 for this assert. :(

Actually, this could come in quite handy if it gives us a way to 100% reproduce the issue on Try.
Looks like it was bug 915233 that made this perma-fail on Windows.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: