Open Bug 1506263 Opened 6 years ago Updated 2 years ago

Local builds don't update BUILDID resulting in cache corruption (XDR)

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

People

(Reporter: kats, Unassigned)

References

(Depends on 1 open bug)

Details

Attachments

(1 obsolete file)

I just built a debug GeckoView example app locally on m-c tip [1] and it's crashing on startup. It wasn't doing this when I was using an m-c from a few days ago, I just updated today. The crash is on line 449 of JSScript.cpp which is the assertion at [2]. The line number doesn't match on searchfox because this file was just updated today in bug 1505784 so presumably that's what caused the regression. lldb in Android studio tells me that scriptBits == 204801 == 0x32001. sourceObjectArg is non-null (js::HandleScriptSourceObject with inner ptr 0xa1c06b88). It doesn't know what OwnSource is. [1] use a Fennec mozconfig, then "./mach build && ./mach package && ./mach android install-geckoview_example" to build and install it to a device [2] https://searchfox.org/mozilla-central/rev/a3894a4dcfb5d42f2e6eee6cb9faf7141879ef1a/js/src/vm/JSScript.cpp#529
Flags: needinfo?(jdemooij)
In case it matters, the GeckoView example app is e10s-enabled, and this crash is happening in the content process. Note that to successfully debug the content process in Android Studio, you need to add these lines to your ~/.lldbinit: settings append target.exec-search-paths /home/kats/zspace/mozilla-fennec/obj-android-debug/toolkit/library settings append target.exec-search-paths /home/kats/zspace/mozilla-fennec/obj-android-debug/mozglue/build (Replace paths to point to your objdir as appropriate)
Oh also, crash stack below. This is on the "JS Helper" thread. mozilla::Result<mozilla::Ok, JS::TranscodeResult> js::XDRScript<(js::XDRMode)1>(js::XDRState<(js::XDRMode)1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::Handle<JSFunction*>, JS::MutableHandle<JSScript*>) JSScript.cpp:449 js::XDRState<(js::XDRMode)1>::codeScript(JS::MutableHandle<JSScript*>) Xdr.cpp:226 js::ScriptDecodeTask::parse(JSContext*) HelperThreads.cpp:567 js::HelperThread::handleParseWorkload(js::AutoLockHelperThreadState&) HelperThreads.cpp:2249 js::HelperThread::threadLoop() HelperThreads.cpp:2578 js::HelperThread::ThreadMain(void*) HelperThreads.cpp:2024 js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) Thread.h:236 __pthread_start(void*) 0x00000000b6d98140 __start_thread 0x00000000b6d96084
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #0) > lldb in Android studio tells me that scriptBits == 204801 == OwnSource. > sourceObjectArg is non-null (js::HandleScriptSourceObject with inner ptr > 0xa1c06b88). It doesn't know what OwnSource is. Hm so after my changes, scriptBits should only contain these bits: enum ScriptBits { NeedsArgsObj, OwnSource, HasLazyScript, }; It looks like you have XDR data that matched the previous build: scriptBits = 0x32001 before my changes meant (NoScriptRval | OwnSource | HasSingleton | TreatAsRunOnce) which looks more reasonable. XDR caches should get invalidated based on the build id. Are you sure the build id changed in your case?
Flags: needinfo?(jdemooij) → needinfo?(kats)
(In reply to Jan de Mooij [:jandem] from comment #3) > (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #0) > > lldb in Android studio tells me that scriptBits == 204801 == OwnSource. Sorry I managed to misquote this somehow. s/OwnSource/0x32001/
I don't know if the buildid changed - what is the buildid for local builds, and what causes it to update?
Flags: needinfo?(kats)
Apparently we don't force buildid.h to regenerate on local builds [1]. There is some reasoning in [2] that suggests this is intentional for avoid re-linking libxul all the time, but [3] claims the opposite. [1] https://searchfox.org/mozilla-central/rev/4e094f66ced333d69b24cd49273789e3a1173dfc/moz.build#113 [2] https://searchfox.org/mozilla-central/rev/4e094f66ced333d69b24cd49273789e3a1173dfc/Makefile.in#34-36 [3] https://searchfox.org/mozilla-central/rev/4e094f66ced333d69b24cd49273789e3a1173dfc/js/src/shell/js.cpp#10032-10040
At any rate, I worked around the problem by uninstalling and reinstalling the app, and it seems like a one-time problem so I don't really care about this bug anymore. Feel free to close if it's not worth dealing with.
This bug is worth fixing in order to avoid issues related to dirty network caches after a local-rebuild of a different version. I think we should involve some build peers to know if there is a way to trash the cache between 2 different local builds, without relying on the build-id. Could we fake the update of the build-id string by changing the binary post-linking ?
Flags: needinfo?(gps)
Priority: -- → P3
Summary: Assertion failure in JSScript.cpp in GeckoView example app → Local builds don't update BUILDID resulting in cache corruption (XDR)
Redirecting to glandium.
Flags: needinfo?(gps) → needinfo?(mh+mozilla)
This sounds like a result from the move to moz.build.
Flags: needinfo?(mh+mozilla) → needinfo?(cmanchester)
Yes, this is almost certainly related to bug 1298328. For other caches we have a post-build step that does a bit of work to try to invalidate them after the build runs, here: https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/python/mozbuild/mozbuild/backend/base.py#206 Would something similar be feasible for the XDR cache? Is there a directory we can remove after each build or something like that?
Flags: needinfo?(cmanchester)
I notice that that code doesn't support android to begin with. It might be nice to have a solution that covers android as well (Comment 0 covers geckoview). There are at least a few caches that float around in profiles using BUIlDIDs. - startupcache - preloadcache - Necko JS bytecode cache - WASM/asm.js cache The startupcache uses .purgecaches (when supported). There are possibly some other mechanisms in place for invalidation, but they are not clear to me. Preload cache must be invalidated even in artifact builds. (I think it already does.. somehow?)
Looking at where .purgecache is used: https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/toolkit/xre/nsAppRunner.cpp#4351-4360 https://searchfox.org/mozilla-central/rev/c0b26c40769a1e5607a1ae8be37fe64df64fc55e/toolkit/xre/nsAppRunner.cpp#2802-2803 It seems that it does clean-up the startup cache, but I do not see it clean the Necko cache (JS bytecode / WASM), nor the persistent-data (ASM.js cache / WASM)

This is going to be a blocker for Bug 1535138 since XDR format will be updated multiple times and it is already busting people's local builds.

Blocks: 1535138

SpiderMonkey cache files rely on BUILDID to invalidate old files.
Unfortunately the BUILDID is intentionally not updated on local
incremental builds which leads to profile corruption. This patch forces
an update in this case when SpiderMonkey changes are pulled without a
clobber build. The MOZ_BUILDID env variable is still respected for
reproducible builds and if SpiderMonkey is not changed, there is no
forced update.

Comment on attachment 9063587 [details]
Bug 1506263 - Regenerate buildid.h when SpiderMonkey is updated

I'm wondering if something like this would even remotely be considered acceptable. It solves the problem pretty directly but it is quite gross.

Attachment #9063587 - Flags: feedback?(core-build-config-reviews)
Attachment #9063587 - Flags: feedback?(core-build-config-reviews)
Attachment #9063587 - Attachment is obsolete: true

(In reply to Ted Campbell [:tcampbell] from comment #16)

Comment on attachment 9063587 [details]
Bug 1506263 - Regenerate buildid.h when SpiderMonkey is updated

I'm wondering if something like this would even remotely be considered acceptable. It solves the problem pretty directly but it is quite gross.

You've abandoned the revision; can you explain what you saw? Because scanning the patch, this is basically the "right" approach: in some way, you need to wire a dep into buildid.h. That's Hard for a bunch of reasons, not least that buildid.h is magic and handled specially many places in the build. It's also produced very early, probably before anything in SM is actually built, so you'll have a hard time actually making a dep happen in time. But there may be ways to achieve what you need, and a build peer (glandium, in all likelihood) can provide guidance.

In the interim, what did you witness? Presumably, you found that the cross-directory dependency wasn't built in time and didn't work well in practice?

Flags: needinfo?(tcampbell)

The first problem was as you expected that I couldn't make a cross-directory dependency actually trigger a build so clobber builds wouldn't work. The second issue I ran into in a few variants I tried was that it would take two |./mach build| to get the world in sync since XUL wasn't always rebuilt.

In the short term, I'll add back XDR_VERSION and a phabricator/herald rule so there is at least any mechanism to save things (outside of touching CLOBBER). Jan has an interesting suggestion of computing the hash of certain key files as a build step (similar to any of our other SpiderMonkey GENERATED_FILES steps). One wrinkle there is I might want to normalize text so things like newline encoding don't result in different binaries.

Flags: needinfo?(tcampbell)
Depends on: 1550486

(In reply to Ted Campbell [:tcampbell] from comment #18)

The first problem was as you expected that I couldn't make a cross-directory dependency actually trigger a build so clobber builds wouldn't work. The second issue I ran into in a few variants I tried was that it would take two |./mach build| to get the world in sync since XUL wasn't always rebuilt.

That makes sense, thanks for confirming.

In the short term, I'll add back XDR_VERSION and a phabricator/herald rule so there is at least any mechanism to save things (outside of touching CLOBBER). Jan has an interesting suggestion of computing the hash of certain key files as a build step (similar to any of our other SpiderMonkey GENERATED_FILES steps). One wrinkle there is I might want to normalize text so things like newline encoding don't result in different binaries.

This is what I was going to suggest: you can do a lot of things with GENERATED_FILES and get good dependency trees. Avoiding buildid.h entirely will be much better for this long-term.

Glad to see you have ideas about how to solve your problem!

See Also: → 1545981

The GENERATED_FILES approach is having a number limitations (it is awkward to take a dependency on mozconfig, for example) and probably needs first-class build system support to not be full of holes. I've filed Bug 1551639 to see if we can get the build system to provide an alternate build id.

Depends on: 1551639
Priority: P3 → P2

(In reply to Ted Campbell [:tcampbell] from comment #20)

The GENERATED_FILES approach is having a number limitations (it is awkward to take a dependency on mozconfig, for example) and probably needs first-class build system support to not be full of holes. I've filed Bug 1551639 to see if we can get the build system to provide an alternate build id.

Show your WIP? The build system has deep support for depending on mozconfigs (and all bits of the build system itself); you shouldn't need to do really think about this at all. That is, GENERATED_FILES and its ilk will accommodate changes to mozconfigs that change defines, substs, etc.

Depends on: 1557215
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: