Closed Bug 1545981 Opened 6 years ago Closed 5 years ago

Assertion failure: false (Expected script), at js/src/vm/HelperThreads.cpp:1753

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox68 --- fix-optional

People

(Reporter: violet.bugreport, Assigned: tcampbell)

References

Details

(Keywords: assertion, testcase)

STP:

Just to go js fiddle: https://jsfiddle.net/ using the latest revision (https://hg.mozilla.org/mozilla-central/rev/72be82c6809e), the assertion failure will occur.

It must be a regression just these days.

Type: task → defect

Stack from gdb:

#0 0x00007f0d047a09d0 in __GI___nanosleep (requested_time=requested_time@entry=0x7ffe569e4ec0, remaining=remaining@entry=0x7ffe569e4ec0) at ../sysdeps/unix/sysv/linux/nanosleep.c:28
#1 0x00007f0d047a08aa in sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#2 0x00007f0cf56a8714 in ah_crap_handler(int) (signum=11) at toolkit/xre/nsSigHandlers.cpp:95
#3 0x00007f0cf56a87ed in child_ah_crap_handler(int) (signum=11)
at toolkit/xre/nsSigHandlers.cpp:105
#4 0x00007f0cf65c4a87 in WasmTrapHandler(int, siginfo_t*, void*) (signum=11, info=0x7ffe569e50f0, context=<optimized out>)
at js/src/wasm/WasmSignalHandlers.cpp:962
#5 0x00007f0d05602890 in <signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
#6 0x00007f0cf5a12b1d in js::GlobalHelperThreadState::finishSingleParseTask(JSContext*, js::ParseTaskKind, JS::OffThreadToken*) (this=<optimized out>, cx=0x7f0cec024000, kind=<optimized out>, token=<optimized out>)
at js/src/vm/HelperThreads.cpp:1753
#7 0x00007f0cf5a13336 in js::GlobalHelperThreadState::finishScriptDecodeTask(JSContext*, JS::OffThreadToken*) (this=0x7f0d04aa8680 <_IO_2_1_stderr
>, cx=0x7f0d04aa98b0 <_IO_stdfile_2_lock>, token=0x556113a66960 <gMozCrashReason>)
at js/src/vm/HelperThreads.cpp:1818
#8 0x00007f0cf2b85322 in nsJSUtils::ExecutionContext::JoinDecode(JS::OffThreadToken**) (this=0x7ffe569e57b8, aOffThreadToken=0x7f0d042b3f70) at dom/base/nsJSUtils.cpp:289
#9 0x00007f0cf42d04da in mozilla::dom::ScriptLoader::EvaluateScript(mozilla::dom::ScriptLoadRequest*) (this=0x7f0ced39c310, aRequest=<optimized out>) at dom/script/ScriptLoader.cpp:2658
#10 0x00007f0cf42ce453 in mozilla::dom::ScriptLoader::ProcessRequest(mozilla::dom::ScriptLoadRequest*) (this=0x7f0ced39c310, aRequest=0x7f0d042b3f20) at dom/script/ScriptLoader.cpp:2222
#11 0x00007f0cf42ceee5 in mozilla::dom::ScriptLoader::ProcessOffThreadRequest(mozilla::dom::ScriptLoadRequest*) (this=0x7f0ced39c310, aRequest=0x7f0d042b3f20) at dom/script/ScriptLoader.cpp:1944
#12 0x00007f0cf42d6aed in mozilla::dom::(anonymous namespace)::NotifyOffThreadScriptLoadCompletedRunnable::Run() (this=<optimized out>) at dom/script/ScriptLoader.cpp:1971
#13 0x00007f0cf186feda in mozilla::SchedulerGroup::Runnable::Run() (this=0x7f0ce9ab4420)
at xpcom/threads/SchedulerGroup.cpp:295
#14 0x00007f0cf1884307 in nsThread::ProcessNextEvent(bool, bool*) (this=0x7f0cee181390, aMayWait=<optimized out>, aResult=0x7ffe569e60c7) at xpcom/threads/nsThread.cpp:1180
#15 0x00007f0cf1886244 in NS_ProcessNextEvent(nsIThread*, bool) (aThread=0x7f0d04aa8680 <_IO_2_1_stderr
>, aMayWait=false)
at xpcom/threads/nsThreadUtils.cpp:486
#16 0x00007f0cf1e815da in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (this=0x7f0d0429a470, aDelegate=0x7ffe569e6300) at ipc/glue/MessagePump.cpp:88
#17 0x00007f0cf1e148dd in MessageLoop::RunInternal() (this=0x7ffe569e6300)
at ipc/chromium/src/base/message_loop.cc:315
#18 0x00007f0cf1e14836 in MessageLoop::RunHandler() (this=0x7ffe569e6300)
at ipc/chromium/src/base/message_loop.cc:308

I can't reproduce with a new profile, so it must be also related to my current profile. I can reliably reproduce with my current profile.

:violet.bugreport, could you try to find a regression window in using mozregression?

Flags: needinfo?(violet.bugreport)

I've seen this on one of my debug builds, when visiting https://google.com accidentally from the searchbar. Unfortunately I'm not able to repro it anymore looks like, and I wasn't running under rr when I saw it :(

I also experienced on my debug build while visiting https://www.reddit.com. It was 100% reproducible but I can't anymore reproduce it now after clobber build.

What's happening (most likely) is that XDR decoding fails but doesn't report an exception. When we finish a parse/decode task we then expect there to be either an exception or a script. In this case we have neither.

Would mean we have to check the XDRResult in ScriptDecodeTask::parse and MultiScriptsDecodeTask::parse.

Has anyone here seen this on an official or clobber build? I'm suspicious this is a form of Bug 1506263. The crash stack is the content bytecode cache.

Based on comment 6, I'll take this bug. Due to Bug 1506263 there are XDR decoding errors and due to this bug they are handled incorrectly.

Assignee: nobody → tcampbell

I never saw it in a clobber build, so suspecting of bug 1506263 would make sense.

I can reproduce this when visiting https://www.youtube.com/ or https://appear.in/ in my debug build. This two sites get crashes immediately. Visiting https://google.com is fine on my build.

Ted - this can mess up developers (and cause a lot of wasted time). Are you planning on fixing it in 68, or are we going to have someone handling Mozilla Build deal with it? If so, we should ping them on it.

Flags: needinfo?(tcampbell)

I've now proposed an ugly hack on Bug 1506263 to try and do something about this. When the next round of XDR changes happen I'll probably also add back XDR_VERSION from years past to through up more barriers to prevent profile corruption on incremental builds.

Flags: needinfo?(tcampbell)

Chris, I'm wondering if it is reasonable for './mach build' to check if the vcs rev has changed since the last time it ran and if so to clobber buildid.h.

I was hoping that adding a private version number inside spidermonkey would solve the problem but it is full of holes. Almost any bytecode optimization or script representation changes we do would require a version bump. It would be better if BuildID was actually useful on local builds.

Flags: needinfo?(cmanchester)

I would be hesitant to add that to every './mach build' invocation. Also it seems like that could still be a problem for people with local changes that aren't committed.

This was discussed a bit in bug 1506263, but I think we need to come up with another way to invalidate the xdr cache. What was the issue with the approach posted in that bug? If touching js is potentially enough to trigger this problem I think refreshing the buildid in that case is probably ok although it would mean unnecessarily re-linking libxul some of the time.

Flags: needinfo?(cmanchester)

The concern is that many types of SpiderMonkey changes can subtly invalidate the data (eg, we make changes to how bytecode is emitted for the same piece of JS). I'm trying out an alternative strategy of using the 'spidermonkey_checks' step as in input to buildid.h step. This will restrict the re-linking to just when spidermonkey changes happen.

It is a good question about if JS-only changes cause problems. That would be a detail of the respective caches (necko, startup, preload). Current focus is on spidermonkey-changes because that is what is causing people the most trouble (eg questions of 'why did firefox allocate 30GB of memory when I loaded reddit')

(In reply to Ted Campbell [:tcampbell] from comment #12)

When the next round of XDR changes happen I'll probably also add back XDR_VERSION from years past [...]

For what it's worth, we could generate an XDRVersion.h file at compile time that contains a hash based on, say, vm/Opcodes.h and frontend/*. This would be similar to how we generate MOpcodes.h/LOpcodes.h from the MIR/LIR files...

Bug 1506263 looks into the incremental build issue. There is also the matter of the actual assert that is failing due to missing mechanisms for reporting decode errors on version changes. I'll open a bug for that.

Flags: needinfo?(violet.bugreport)
See Also: → 1506263
Depends on: 1550854
Priority: -- → P2

We've fixed the major buildid issues so this should hopefully be resolved.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.