Closed Bug 1640445 Opened 5 years ago Closed 5 years ago

Intermittent SUMMARY: ThreadSanitizer: data race /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:2117:13 in updateEdge<js::BaseScript>

Categories

(Core :: XPConnect, defect, P5)

defect

Tracking

()

RESOLVED FIXED
85 Branch
Tracking Status
firefox85 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mccr8)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=303497691&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/WSxpqqoCSvWzHm4ZPiRb7A/runs/0/artifacts/public/logs/live_backing.log


[task 2020-05-23T23:10:15.326Z] 23:10:15 INFO - GECKO(2217) | #9 mozilla::(anonymous namespace)::TraceOp(JSTracer*, void*) /builds/worker/checkouts/gecko/js/xpconnect/loader/ScriptPreloader.cpp:203:14 (libxul.so+0x190fc9e)
[task 2020-05-23T23:10:15.326Z] 23:10:15 INFO - GECKO(2217) | #10 traceEmbeddingBlackRoots /builds/worker/checkouts/gecko/js/src/gc/RootMarking.cpp:438:5 (libxul.so+0x6d03992)
[task 2020-05-23T23:10:15.326Z] 23:10:15 INFO - GECKO(2217) | #11 js::gc::GCRuntime::traceRuntimeCommon(JSTracer*, js::gc::GCRuntime::TraceOrMarkRuntime) /builds/worker/checkouts/gecko/js/src/gc/RootMarking.cpp:421:5 (libxul.so+0x6d03992)
[task 2020-05-23T23:10:15.326Z] 23:10:15 INFO - GECKO(2217) | #12 js::gc::GCRuntime::traceRuntimeForMajorGC(JSTracer*, js::gc::AutoGCSession&) /builds/worker/checkouts/gecko/js/src/gc/RootMarking.cpp:284:3 (libxul.so+0x6d02fea)
[task 2020-05-23T23:10:15.327Z] 23:10:15 INFO - GECKO(2217) | #13 js::gc::GCRuntime::updateRuntimePointersToRelocatedCells(js::gc::AutoGCSession&) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:2533:3 (libxul.so+0x6cb1b79)
[task 2020-05-23T23:10:15.327Z] 23:10:15 INFO - GECKO(2217) | #14 js::gc::GCRuntime::compactPhase(JS::GCReason, js::SliceBudget&, js::gc::AutoGCSession&) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:6280:5 (libxul.so+0x6cca345)
[task 2020-05-23T23:10:15.327Z] 23:10:15 INFO - GECKO(2217) | #15 js::gc::GCRuntime::incrementalSlice(js::SliceBudget&, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason, js::gc::AutoGCSession&) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:6731:13 (libxul.so+0x6ccc2b8)
[task 2020-05-23T23:10:15.327Z] 23:10:15 INFO - GECKO(2217) | #16 js::gc::GCRuntime::gcCycle(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:7086:3 (libxul.so+0x6ccdae6)
[task 2020-05-23T23:10:15.327Z] 23:10:15 INFO - GECKO(2217) | #17 js::gc::GCRuntime::collect(bool, js::SliceBudget, mozilla::Maybe<JSGCInvocationKind> const&, JS::GCReason) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:7297:9 (libxul.so+0x6ccefc0)
[task 2020-05-23T23:10:15.328Z] 23:10:15 INFO - GECKO(2217) | #18 js::gc::GCRuntime::gcSlice(JS::GCReason, long) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:7388:3 (libxul.so+0x6ccf8e0)
[task 2020-05-23T23:10:15.328Z] 23:10:15 INFO - GECKO(2217) | #19 JS::IncrementalGCSlice(JSContext*, JS::GCReason, long) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:8212:21 (libxul.so+0x6cd31c7)
[task 2020-05-23T23:10:15.331Z] 23:10:15 INFO - GECKO(2217) | #20 nsJSContext::GarbageCollectNow(JS::GCReason, nsJSContext::IsIncremental, nsJSContext::IsShrinking, long) /builds/worker/checkouts/gecko/dom/base/nsJSEnvironment.cpp:1173:5 (libxul.so+0x26c982a)
[task 2020-05-23T23:10:15.331Z] 23:10:15 INFO - GECKO(2217) | #21 InterSliceGCRunnerFired(mozilla::TimeStamp, void*) /builds/worker/checkouts/gecko/dom/base/nsJSEnvironment.cpp:1743:3 (libxul.so+0x26cfc72)
[task 2020-05-23T23:10:15.331Z] 23:10:15 INFO - GECKO(2217) | #22 operator() /builds/worker/checkouts/gecko/dom/base/nsJSEnvironment.cpp:2338:22 (libxul.so+0x26dc0bf)
[task 2020-05-23T23:10:15.331Z] 23:10:15 INFO - GECKO(2217) | #23 std::_Function_handler<bool (mozilla::TimeStamp), DOMGCSliceCallback(JSContext*, JS::GCProgress, JS::GCDescription const&)::$_18>::_M_invoke(std::_Any_data const&, mozilla::TimeStamp&&) /builds/worker/fetches/clang/bin/../lib/gcc/x86_64-unknown-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:301:9 (libxul.so+0x26dc0bf)
[task 2020-05-23T23:10:15.332Z] 23:10:15 INFO - GECKO(2217) | #24 operator() /builds/worker/fetches/clang/bin/../lib/gcc/x86_64-unknown-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/std_function.h:706:14 (libxul.so+0xae2c1e)
[task 2020-05-23T23:10:15.333Z] 23:10:15 INFO - GECKO(2217) | #25 mozilla::IdleTaskRunner::Run() /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:54:14 (libxul.so+0xae2c1e)
[task 2020-05-23T23:10:15.333Z] 23:10:15 INFO - GECKO(2217) | #26 nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1211:14 (libxul.so+0xb05248)
[task 2020-05-23T23:10:15.334Z] 23:10:15 INFO - GECKO(2217) | #27 NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:501:10 (libxul.so+0xb09c42)
[task 2020-05-23T23:10:15.334Z] 23:10:15 INFO - GECKO(2217) | #28 mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:87:21 (libxul.so+0x13c5349)
[task 2020-05-23T23:10:15.334Z] 23:10:15 INFO - GECKO(2217) | #29 RunInternal /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:315:10 (libxul.so+0x135223c)
[task 2020-05-23T23:10:15.335Z] 23:10:15 INFO - GECKO(2217) | #30 RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:308:3 (libxul.so+0x135223c)
[task 2020-05-23T23:10:15.335Z] 23:10:15 INFO - GECKO(2217) | #31 MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:290:3 (libxul.so+0x135223c)
[task 2020-05-23T23:10:15.336Z] 23:10:15 INFO - GECKO(2217) | #32 nsBaseAppShell::Run() /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:137:27 (libxul.so+0x4840523)
[task 2020-05-23T23:10:15.337Z] 23:10:15 INFO - GECKO(2217) | #33 nsAppStartup::Run() /builds/worker/checkouts/gecko/toolkit/components/startup/nsAppStartup.cpp:271:30 (libxul.so+0x63254f5)
[task 2020-05-23T23:10:15.337Z] 23:10:15 INFO - GECKO(2217) | #34 XREMain::XRE_mainRun() /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4651:22 (libxul.so+0x64411aa)
[task 2020-05-23T23:10:15.338Z] 23:10:15 INFO - GECKO(2217) | #35 XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4793:8 (libxul.so+0x6441d93)
[task 2020-05-23T23:10:15.338Z] 23:10:15 INFO - GECKO(2217) | #36 XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4847:21 (libxul.so+0x64420f4)
[task 2020-05-23T23:10:15.339Z] 23:10:15 INFO - GECKO(2217) | #37 mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/Bootstrap.cpp:45:12 (libxul.so+0x644cca2)
[task 2020-05-23T23:10:15.340Z] 23:10:15 INFO - GECKO(2217) | #38 do_main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:217:22 (firefox+0xca433)
[task 2020-05-23T23:10:15.341Z] 23:10:15 INFO - GECKO(2217) | #39 main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:331:16 (firefox+0xca433)
[task 2020-05-23T23:10:15.470Z] 23:10:15 INFO - GECKO(2217) | #40 XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/nsAppRunner.cpp:4847:21 (libxul.so+0x64420f4)
[task 2020-05-23T23:10:15.470Z] 23:10:15 INFO - GECKO(2217) | #41 mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) /builds/worker/checkouts/gecko/toolkit/xre/Bootstrap.cpp:45:12 (libxul.so+0x644cca2)
[task 2020-05-23T23:10:15.470Z] 23:10:15 INFO - GECKO(2217) | #42 do_main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:217:22 (firefox+0xca433)
[task 2020-05-23T23:10:15.470Z] 23:10:15 INFO - GECKO(2217) | #43 main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:331:16 (firefox+0xca433)
[task 2020-05-23T23:10:15.470Z] 23:10:15 INFO - GECKO(2217) | SUMMARY: ThreadSanitizer: data race /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:2117:13 in updateEdge<js::BaseScript>

This is a race in the script preloader.

ScriptPreloader::WriteCache() runs off main thread and iterates through the scripts in the cache, reading the mScript field. These are also traced as roots by the GC, which can update this field at the same time.

Component: JavaScript: GC → XPConnect

Previous read of size 8 at 0x7b280008b8d0 by thread T65 (mutexes: write M481739268680787328, write M481457793704076576):
#0 operator bool /builds/worker/workspace/obj-build/dist/include/js/RootingAPI.h:356:56 (libxul.so+0x1912ca0)
#1 mozilla::ScriptPreloader::WriteCache() /builds/worker/checkouts/gecko/js/xpconnect/loader/ScriptPreloader.cpp:709:11 (libxul.so+0x1912ca0)
#2 mozilla::ScriptPreloader::Run() /builds/worker/checkouts/gecko/js/xpconnect/loader/ScriptPreloader.cpp:736:12 (libxul.so+0x19133af)
#3 non-virtual thunk to mozilla::ScriptPreloader::Run() /builds/worker/checkouts/gecko/js/xpconnect/loader/ScriptPreloader.cpp (libxul.so+0x1914119)

The log also has ==2320==WARNING: Symbolizer buffer too small in it a bunch of times, which I guess is why it isn't symbolicated. I'm not sure if that's fixable.

The most straightforward way to fix this would be to acquire the lock in ScriptPreloader::Trace. I'm not sure if we run JS while holding the lock, because that could then obviously cause a deadlock.

I can try that. I'm not very familiar with the preloader, but hopefully any obvious issues would show up on try.

Assignee: nobody → continuation

Jon, does that sound reasonable from the perspective of the GC? I don't know how other code might handle this. It can't be a very common issue or we would have seen it before. (Or maybe TSan just doesn't test the preloader very well.)

Flags: needinfo?(jcoppeard)

That sounds like it would work. The GC doesn't hold any locks during tracing.

We do want to avoid pauses during GC though. Do you know how likely it is we would block the GC on this and for how long? I guess it only happens once, so maybe that's not too bad.

Flags: needinfo?(jcoppeard)

We intentionally delay the first browser-triggered GC on startup, to avoid interfering with loading, so maybe that's why it usually doesn't happen.

Pushed by amccreight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/91d9671c44b5 Hold the lock while tracing mScripts. r=kmag

kmag pointed out that we can end up running the GC while holding the loader lock, so taking the lock while in the GC will not work.

Attachment #9154990 - Attachment is obsolete: true
Blocks: tsan

I'm taking another crack at this.

See Also: → 1672230
Attachment #9154990 - Attachment description: Bug 1640445 - Hold the lock while tracing mScripts. → Bug 1640445 - Avoid a race between the GC and WriteCache() with an extra field.
Attachment #9154990 - Attachment is obsolete: false
Pushed by amccreight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/43705136290d Avoid a race between the GC and WriteCache() with an extra field. r=kmag,decoder
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: