Closed Bug 1006695 Opened 11 years ago Closed 11 years ago

Preserving system sources fails if source compression is used

Categories

(Core :: JavaScript Engine, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla32

People

(Reporter: jryans, Assigned: laszio.bugzilla)

References

Details

(Keywords: regression)

Attachments

(2 files, 5 obsolete files)

In bug 990353, a feature was added to stop saving system sources, which is enabled by default for B2G. For B2G developers, we'd like to return to the previous behavior so that certified apps can be debugged again, and so bug 1001348 attempted go back to previous behavior when we're already debugging certified apps by setting "javascript.options.discardSystemSource" to "false" at runtime. Setting this pref to false (preserving system sources once again) works correctly on single core devices, like the Keon, but causes issues (breaks the Gaia UI) on multi-core devices, like the Nexus 4. Using the discardSystemSource = false setting causes issues when set at both runtime or boot time (in the profile). :mccr8 hypothesized that it could be related to source compression (which is only done on multi-core). I manually disabled source compression in a local Nexus 4 build, and now preserving system sources works correctly. So, there appears to be strange interaction between using the discardSystemSource = false and source compression.
Is anyone looking into this? We really need the debugger back for certified bugs
Is this something you could look at Brian? It seems related to things you've worked on. Thanks.
Flags: needinfo?(bhackett1024)
Brian - As context, this is something that's blocking a lot of Gaia people from getting work done, and that we should fix quickly. I would really like to look at it, but I just have too many things on my plate right now (which, incidentally, is why I'm behind on that review I owe you).
Naveed/Brian, This bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1001348 is blocking our team for making progress on madai related features (camera) that only work on nexus4. The root of which is tracked here. Please help address this issue. Thanks a bunch! Hema
Flags: needinfo?(nihsanullah)
I really don't know this code well. The only thing I've done with source compression was changing it to happen on JS helper threads rather than on its own thread.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #5) > I really don't know this code well. The only thing I've done with source > compression was changing it to happen on JS helper threads rather than on > its own thread. FWIW, nobody knows it all that well. The person who wrote it doesn't work at Mozilla IIUC. We just need somebody to jump in and figure it out.
OK but I don't understand the urgency here. If turning off source compression works then why not just do that until a better fix is in place? Is the problem in this bug with setting discardSystemSource to false dynamically, or with it ever being set to false? If the latter, does function source accessing work correctly for content scripts (where presumably off thread compression will be used)? Also, are there any STR for desktop b2g?
A custom-prefs.js file with the two lines below causes gaia not to boot on Nexus 4. No homescreen, just a black screen: user_pref("devtools.debugger.forbid-certified-apps", false); user_pref("javascript.options.discardSystemSource", false);
(In reply to Brian Hackett (:bhackett) from comment #7) > OK but I don't understand the urgency here. If turning off source > compression works then why not just do that until a better fix is in place? Maybe this would be okay if there was an "easy" way to turn it off for Gaia developers, such as a pref or something... Currently it requires editing C++ source, unless you want to disable JS_THREADSAFE entirely, which is more like a sledgehammer. Also, doing this would make the developer setup diverge further from a real production device, which is not great, but I suppose it's workable if everyone commits to it being temporary. > Is the problem in this bug with setting discardSystemSource to false > dynamically, or with it ever being set to false? The problem seems to manifest if it is ever set to false. As in, if it is set to false in the profile during b2g startup, we get to the failure scenario. So, somehow setting it to false at boot time is not the same as reverting the patch from bug 990353 entirely (when source compression is in effect). > If the latter, does function source accessing work correctly for content scripts (where > presumably off thread compression will be used)? I am not sure of the answer here. I can attempt to find out. > Also, are there any STR for desktop b2g? I originally tested setting discardSystemSource to false in the b2g simulator (which uses b2g desktop), and it seemed to work fine there. However, I don't know at the moment whether source compression is active on b2g desktop. I can try to determine this too.
Should bug 990353 just be reverted until this bug is fixed? Does source compression work for non-system sources? Related to B2G source compression, bug 837110 suggests replacing zlib compression with a faster algorithm like lz or snappy might make source compression viable on single-core devices.
The issue is not exclusive to Nexus 4. Sources don't show up either on my hamachi: Gaia 870a5c518742665d36b17e7e88c2ab07d440b94c Gecko https://hg.mozilla.org/mozilla-central/rev/417acde736e7 BuildID 20140507040203 Version 32.0a1 ro.build.version.incremental=324 ro.build.date=Thu Dec 19 14:04:55 CST 2013
(In reply to Diego Marcos [:dmarcos] from comment #11) > The issue is not exclusive to Nexus 4. Sources don't show up either on my > hamachi: > > Gaia 870a5c518742665d36b17e7e88c2ab07d440b94c > Gecko https://hg.mozilla.org/mozilla-central/rev/417acde736e7 > BuildID 20140507040203 > Version 32.0a1 > ro.build.version.incremental=324 > ro.build.date=Thu Dec 19 14:04:55 CST 2013 To clarify this slightly, Diego was running in the default configuration of discardSystemSource = true in this case, so it's expected that the sources are missing here.
(In reply to Chris Peterson (:cpeterson) from comment #10) > Should bug 990353 just be reverted until this bug is fixed? Does source > compression work for non-system sources? We're basically committed to that bug for Tarako at this point. It's critical for our entire FFOS strategy. > Related to B2G source compression, bug 837110 suggests replacing zlib > compression with a faster algorithm like lz or snappy might make source > compression viable on single-core devices. I don't think we need to turn on source compression. It's almost certainly just some dumb logic bug in the JS engine that somebody with a few cycles needs to go in and fix.
(In reply to J. Ryan Stinnett [:jryans] from comment #9) > (In reply to Brian Hackett (:bhackett) from comment #7) > > Also, are there any STR for desktop b2g? > > I originally tested setting discardSystemSource to false in the b2g > simulator (which uses b2g desktop), and it seemed to work fine there. > However, I don't know at the moment whether source compression is active on > b2g desktop. I can try to determine this too. I've confirmed that b2g desktop does appear to use source compression, but I cannot get it to fail when discardSystemSource = false like I can with a multi-core device build.
Luke: bhackett thought you might have some insight into this B2G source compression bug because you reviewed (some of) the discardSystemSource patches in bug 990353.
Flags: needinfo?(luke)
Priority: -- → P1
I rescanned bug 990353 and the only thing that looks fishy is: http://hg.mozilla.org/mozilla-central/diff/95391b25bc11/js/xpconnect/loader/mozJSComponentLoader.cpp since the logic in the comment would seem to be invalidated by setting setDiscardSource(false) on b2g.
Flags: needinfo?(luke) → needinfo?(bobbyholley)
(In reply to Luke Wagner [:luke] (on partial PTO) from comment #16) > I rescanned bug 990353 and the only thing that looks fishy is: > > http://hg.mozilla.org/mozilla-central/diff/95391b25bc11/js/xpconnect/loader/ > mozJSComponentLoader.cpp > since the logic in the comment would seem to be invalidated by setting > setDiscardSource(false) on b2g. Yeah, but that just means we lose an optimization. It shouldn't impact correctness. This code is just saying that we can't enable lazy source retrieval for functions in the b2g Frankenstein compartment, because the source hook doesn't know how to deal with that. Ordinarily, turning off lazy source retrieval would be a memory regression, because we'd have to store the source. But in this case, it doesn't matter in production, because we just throw away all the source on b2g.
Flags: needinfo?(bobbyholley)
Attached patch jsworkers.nuwa.diff (obsolete) — Splinter Review
It looks like that the worker thread for compression is not known by Nuwa so Nuwa blocked (by chances?) when starting. This is not a bug of discarding sources; It just hides the real problem. @jryans, would you mind to try the attached patch on Nexus 4? I don't have a Nexus 4 and can only test by unconditionally enabling source compression on a single core device.
Attachment #8419248 - Flags: feedback?(cyu)
Flags: needinfo?(jryans)
Comment on attachment 8419248 [details] [diff] [review] jsworkers.nuwa.diff This works for me! After adding this patch, I am able to use the Nexus 4 with source compression (as it is by default since it's multi-core) and discardSystemSource = false. Great work! :D
Attachment #8419248 - Flags: feedback+
Flags: needinfo?(jryans)
(In reply to Ting-Yuan Huang from comment #18) > It looks like that the worker thread for compression is not known by Nuwa so > Nuwa blocked (by chances?) when starting. This is not a bug of discarding > sources; It just hides the real problem. Woah, I'm surprised that doesn't mess more stuff up! Maybe we usually don't start up many JS helpers that early.
We should've catch bugs like this in mochitest, but this case escaped from our safety net because our emulator runs only with once virtual core.
This patch makes js worker threads, only in the Nuwa process when booting, wait for Nuwa to be ready. NuwaFreezeCurrentThread() is required because anything, including IPC, can happen in the threads. This can be suboptimal that worker stuffs won't be shared among process but is much safer. I'll update try results once they are finished.
Attachment #8419248 - Attachment is obsolete: true
Attachment #8419248 - Flags: feedback?(cyu)
Attachment #8419961 - Flags: review?(luke)
Attachment #8419961 - Flags: review?(khuey)
Same as part.1 but for workers in thread pool.
Attachment #8419965 - Flags: review?(luke)
Attachment #8419965 - Flags: review?(khuey)
Comment on attachment 8419961 [details] [diff] [review] Bug 1006695 - Mark JS worker threads to be known by Nuwa. Review of attachment 8419961 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsworkers.cpp @@ +23,5 @@ > #include "jsobjinlines.h" > #include "jsscriptinlines.h" > > +#ifdef MOZ_NUWA_PROCESS > +#include "ipc/Nuwa.h" Style nit: can you indent 'include' one space (# include "ipc/Nuwa.h") here and in the other patch?
Attachment #8419961 - Flags: review?(khuey) → review+
Attachment #8419965 - Flags: review?(luke) → review+
Comment on attachment 8419961 [details] [diff] [review] Bug 1006695 - Mark JS worker threads to be known by Nuwa. Oops, wrong box.
Attachment #8419961 - Flags: review?(luke) → review?(khuey)
Nice catch Ting. Assigning it to you since you have the patches up.
Assignee: nobody → tchou
Flags: needinfo?(nihsanullah)
I guess you meant to assign to Ting-Yuan Huang.
Assignee: tchou → thuang
NuwaFreezeCurrentThread() freezed Nuwa in the tests...sigh. Some functions Nuwa preloads/calls require the worker threads to work before Nuwa forks. @Kyle and @Jason, do you think it's safe to go without freezing those workers in Nuwa? There are similar things in gc helper threads and no NuwaFreezeCurrentThread() called. I think, if those workers get no jobs/IPCs after Nuwa stabilized, then it should be OK. The following tries showed me some confidence: (The patch without NuwaFreeCurrentThread()) https://tbpl.mozilla.org/?tree=Try&rev=a28f8cbaea1b (The patch without NuwaFreeCurrentThread() + don't discard script source + force compression) https://tbpl.mozilla.org/?tree=Try&rev=57108aff0a86
Flags: needinfo?(khuey)
Flags: needinfo?(jorendorff)
(In reply to Ting-Yuan Huang from comment #28) > NuwaFreezeCurrentThread() freezed Nuwa in the tests...sigh. Some functions Hmm, I made the suggestion to freeze the js worker in its entry point. It turns out that this freezes the Nuwa process and effectively breaks the whole device except things residing in the chrome process. It freezes the Nuwa process because the js worker is used during XPCOM initialization, and the main thread is waiting for a frozen thread: #0 __futex_syscall3 () at bionic/libc/arch-arm/bionic/atomics_arm.S:182 #1 0x400c955c in __pthread_cond_timedwait_relative (cond=0x40304514, mutex=0x40315c90, reltime=0x0) at bionic/libc/bionic/pthread.c:1477 #2 0x400c9610 in __pthread_cond_timedwait (cond=0x40304514, mutex=0x40315c90, abstime=0x0, clock=0) at bionic/libc/bionic/pthread.c:1500 #3 0x40084616 in __wrap_pthread_cond_wait (cond=<value optimized out>, mtx=<value optimized out>) at mozglue/build/Nuwa.cpp:1011 #4 0x4227db60 in PR_WaitCondVar (cvar=0x40304510, timeout=4294967295) at nsprpub/pr/src/pthreads/ptsynch.c:385 #5 0x417070ee in js::GlobalWorkerThreadState::wait (this=0xbee917d4) at js/src/jsworkers.cpp:517 #6 js::SourceCompressionTask::complete (this=0xbee917d4) at js/src/jsworkers.cpp:960 #7 0x417d2568 in CompileFunctionBody (cx=0x40310f20, fun=..., options=<value optimized out>, formals=..., srcBuf=...) at js/src/frontend/BytecodeCompiler.cpp:657 #8 js::frontend::CompileFunctionBody (cx=0x40310f20, fun=..., options=<value optimized out>, formals=..., srcBuf=...) at js/src/frontend/BytecodeCompiler.cpp:668 #9 0x41680394 in JS::CompileFunction (cx=0x40310f20, obj=..., options=<value optimized out>, name=0x0, nargs=0, argnames=0x0, srcBuf=...) at js/src/jsapi.cpp:4822 #10 0x41680534 in CompileFunction (cx=0x40310f20, obj=..., options=..., name=0x0, nargs=0, argnames=0x0, bytes=0x43191000 "/* This Source Code Form is subject to the terms of the Mozilla Public\n * License, v. 2.0. If a copy of the MPL was not distributed with this file,\n * You can obtain one at http://mozilla.org/MPL/2.0/"..., length=8445) at js/src/jsapi.cpp:4841 #11 CompileFunction (cx=0x40310f20, obj=..., options=..., name=0x0, nargs=0, argnames=0x0, bytes=0x43191000 "/* This Source Code Form is subject to the terms of the Mozilla Public\n * License, v. 2.0. If a copy of the MPL was not distributed with this file,\n * You can obtain one at http://mozilla.org/MPL/2.0/"..., length=8445) at js/src/jsapi.cpp:4857 #12 0x40d2c350 in mozJSComponentLoader::ObjectForLocation (this=0x43034690, aComponentFile=<value optimized out>, aURI=<value optimized out>, aObject=..., aTableScript=..., aLocation=0x4320c8c0, aPropagateExceptions=false, aException=...) at js/xpconnect/loader/mozJSComponentLoader.cpp:952 #13 0x40d2cf88 in mozJSComponentLoader::LoadModule (this=0x43034690, aFile=<value optimized out>) at js/xpconnect/loader/mozJSComponentLoader.cpp:424 #14 0x406f2894 in nsComponentManagerImpl::KnownModule::Load (this=0x403f5860) at xpcom/components/nsComponentManager.cpp:741 #15 0x406f28ec in nsFactoryEntry::GetFactory (this=0x403afd60) at xpcom/components/nsComponentManager.cpp:1774 #16 0x406f29f8 in nsComponentManagerImpl::CreateInstanceByContractID (this=<value optimized out>, aContractID=0x403ad460 "@mozilla.org/browser/directory-provider;1", aDelegate=0x0, aIID=..., aResult=0xbee91cc4) at xpcom/components/nsComponentManager.cpp:1075 #17 0x406f3d84 in nsComponentManagerImpl::GetServiceByContractID (this=0x40375400, aContractID=0x403ad460 "@mozilla.org/browser/directory-provider;1", aIID=..., result=<value optimized out>) at xpcom/components/nsComponentManager.cpp:1435 #18 0x406c3b9a in CallGetService (aContractID=0x403ad460 "@mozilla.org/browser/directory-provider;1", aIID=..., aResult=0x0) at xpcom/glue/nsComponentManagerUtils.cpp:64 #19 0x406c3bca in nsGetServiceByContractID::operator() (this=<value optimized out>, aIID=..., aInstancePtr=0x0) at xpcom/glue/nsComponentManagerUtils.cpp:252 #20 0x406c315a in nsCOMPtr_base::assign_from_gs_contractid (this=0xbee91d94, gs=..., iid=<value optimized out>) at xpcom/glue/nsCOMPtr.cpp:92 #21 0x406e93d2 in nsCOMPtr (this=0x403143c0) at ../../dist/include/nsCOMPtr.h:658 #22 nsDirectoryService::RegisterCategoryProviders (this=0x403143c0) at xpcom/io/nsDirectoryService.cpp:477 #23 0x406ca9bc in NS_InitXPCOM2 (result=<value optimized out>, binDirectory=0x40324110, appFileLocationProvider=0x40301bb0) at xpcom/build/nsXPComInit.cpp:655 #24 0x4132d038 in XRE_InitEmbedding2 (aLibXULDirectory=<value optimized out>, aAppDirectory=0x40324110, aAppDirProvider=0x0) at toolkit/xre/nsEmbedFunctions.cpp:150 #25 0x40841550 in mozilla::ipc::ScopedXREEmbed::Start (this=0x4033f268) at ipc/glue/ScopedXREEmbed.cpp:99 #26 0x40cc0c86 in mozilla::dom::ContentProcess::Init (this=0x4033f000) at dom/ipc/ContentProcess.cpp:28 #27 0x4132ce96 in XRE_InitChildProcess (aArgc=-1092015912, aArgv=0xbee9284c, aProcess=1077145600) at toolkit/xre/nsEmbedFunctions.cpp:509 #28 0x000087a0 in main (argc=8, argv=0xbee928d4) at ipc/app/MozillaRuntimeMain.cpp:149 I think we need to remove the NuwaFreezeCurrentThread() and let the worker freeze after we have finished XPCOM initialization.
Retest with NuwaFreezeCurrentThread() call removed from both patches: https://tbpl.mozilla.org/?tree=Try&rev=774b10720b22
Luke, do you think removing the freeze is OK? The situation of js workers should be similar to gc helpers. I don't worry about web workers; They should not run in such early stages.
Attachment #8419961 - Attachment is obsolete: true
Attachment #8419965 - Attachment is obsolete: true
Attachment #8419961 - Flags: review?(khuey)
Attachment #8419965 - Flags: review?(khuey)
Attachment #8422241 - Flags: review?(luke)
Attachment #8422241 - Flags: review?(khuey)
Attachment #8422244 - Flags: review?(luke)
Attachment #8422244 - Flags: review?(khuey)
Comment on attachment 8422241 [details] [diff] [review] Bug 1006695 - Mark JS worker threads to be known by Nuwa. v2 I can't think of a problem (but I'm not that familiar with Nuwa).
Attachment #8422241 - Flags: review?(luke) → review+
Attachment #8422244 - Flags: review?(luke) → review+
What's the ETA for this?
If we are not waiting for kyle's review, we should land be able to land this. There will be a problem only when we freeze a worker thread and then uses it in the Nuwa process. Since we don't run PreloadSlowThings() in the Nuwa process, I can't think of any situation where we might need to use it.
Flags: needinfo?(jorendorff)
Keywords: checkin-needed
also had to backout the other cset for test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=39997995&tree=Mozilla-Inbound
Hi Luke, would you mind to review again? Sorry I only tried on the B2G platforms last time, now I pushed full tries: https://tbpl.mozilla.org/?tree=Try&rev=2d81378204ef https://tbpl.mozilla.org/?tree=Try&rev=e0b4aa70c756 The changes to the previous patches (v2) are replacing header inclusions with prototype declarations. This consists with what exists in jsgc.c and seems to be an acceptable compromise (to introducing new or refactoring inclusion paths).
Attachment #8422241 - Attachment is obsolete: true
Attachment #8422244 - Attachment is obsolete: true
Attachment #8425482 - Flags: review?(luke)
Attachment #8425482 - Flags: review?(luke) → review+
Attachment #8425484 - Flags: review?(luke) → review+
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
I can confirm that this resolves the issue on my Nexus 4.
Status: RESOLVED → VERIFIED
Do you think we can uplift this to 31? Bug 990353 regressed devtools so that we can't see sources without crashing gaia on Fx31.
Blocks: 1018385
Yes, the patch only depends on Nuwa which is landed in 28 or 29.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: