Perma ThreadSanitizer: data race /builds/worker/workspace/build/src/js/xpconnect/loader/ScriptPreloader.cpp:930:17 in mozilla::ScriptPreloader::OffThreadDecodeCallback(JS::OffThreadToken*, void*) when Gecko 75 merges to Beta on 2020-03-09
Categories
(Core :: XPConnect, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox73 | --- | unaffected |
firefox74 | --- | unaffected |
firefox75 | + | verified |
People
(Reporter: noemi_erli, Assigned: kmag)
References
(Blocks 1 open bug)
Details
(Keywords: regression)
Attachments
(3 files, 1 obsolete file)
[Tracking Requested - why for this release]:
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=288692316&repo=try&lineNumber=2196
[task 2020-02-13T11:20:50.946Z] 11:20:50 INFO - Buffered messages finished
[task 2020-02-13T11:20:50.947Z] 11:20:50 ERROR - TEST-UNEXPECTED-FAIL | automation.py | application terminated with exit code -6
[task 2020-02-13T11:20:50.948Z] 11:20:50 INFO - runtests.py | Application ran for: 0:03:00.081722
[task 2020-02-13T11:20:50.948Z] 11:20:50 INFO - zombiecheck | Reading PID log: /tmp/tmpbS_28opidlog
[task 2020-02-13T11:20:50.949Z] 11:20:50 INFO - ==> process 1949 launched child process 1966
[task 2020-02-13T11:20:50.950Z] 11:20:50 INFO - zombiecheck | Checking for orphan process with PID: 1966
[task 2020-02-13T11:20:50.950Z] 11:20:50 INFO - Traceback (most recent call last):
[task 2020-02-13T11:20:50.951Z] 11:20:50 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2852, in doTests
[task 2020-02-13T11:20:50.951Z] 11:20:50 INFO - e10s=options.e10s
[task 2020-02-13T11:20:50.952Z] 11:20:50 INFO - File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2371, in runApp
[task 2020-02-13T11:20:50.952Z] 11:20:50 INFO - raise exc(value).with_traceback(tb)
[task 2020-02-13T11:20:50.953Z] 11:20:50 INFO - AttributeError: 'timeout' object has no attribute 'with_traceback'
[task 2020-02-13T11:20:50.954Z] 11:20:50 ERROR - Automation Error: Received unexpected exception while running application
[task 2020-02-13T11:20:50.954Z] 11:20:50 ERROR -
[task 2020-02-13T11:20:50.955Z] 11:20:50 INFO - Stopping web server
[task 2020-02-13T11:20:50.962Z] 11:20:50 INFO - Stopping web socket server
[task 2020-02-13T11:20:50.984Z] 11:20:50 INFO - Stopping ssltunnel
[task 2020-02-13T11:20:51.003Z] 11:20:51 WARNING - leakcheck | refcount logging is off, so leaks can't be detected!
Reporter | ||
Updated•5 years ago
|
Comment 1•5 years ago
|
||
I'm not too familiar with TSan reports, but it looks like we're accessing mToken in MaybeFinishOffThreadDecode() without holding the lock? I don't know why this would only be failing on beta.
Comment 2•5 years ago
|
||
This race sounds familiar, we likely fixed this on central in the last cycle but didn't uplift (I can't find the bug right now, can look later again). In general, there might be quite a few races that have been fixed on central only without an uplift. So we probably need a general strategy here. Maybe for now, we shouldn't run TSan on beta at all until this has settled a bit.
Comment 3•5 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #2)
This race sounds familiar, we likely fixed this on central in the last cycle but didn't uplift (I can't find the bug right now, can look later again). In general, there might be quite a few races that have been fixed on central only without an uplift. So we probably need a general strategy here. Maybe for now, we shouldn't run TSan on beta at all until this has settled a bit.
Isn't this for Gecko-75-as-beta? So it should contain all of the fixes on m-c.
Comment 4•5 years ago
|
||
Bug 1608462 is in the same callback, but a different error.
Comment 5•5 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #3)
(In reply to Christian Holler (:decoder) from comment #2)
This race sounds familiar, we likely fixed this on central in the last cycle but didn't uplift (I can't find the bug right now, can look later again). In general, there might be quite a few races that have been fixed on central only without an uplift. So we probably need a general strategy here. Maybe for now, we shouldn't run TSan on beta at all until this has settled a bit.
Isn't this for Gecko-75-as-beta? So it should contain all of the fixes on m-c.
Yes, it is.
Comment 6•5 years ago
|
||
Updated•5 years ago
|
Comment 7•5 years ago
|
||
This also appeared intermittently on autoland now although I am not sure it is exactly the same issue. The problem here is that one stack is consistently missing (most likely caused by a suppression).
We can suppress ScriptPreloader::OffThreadDecodeCallback
to make this work on beta, but I suspect we are hiding one or more bugs here that we are not aware of. It could be masked by suppressions for some of the other outstanding JS bugs.
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 9•5 years ago
|
||
Bugbug thinks this bug is a regression, but please revert this change in case of error.
Comment 10•5 years ago
|
||
(In reply to Christian Holler (:decoder) from comment #7)
This also appeared intermittently on autoland now although I am not sure it is exactly the same issue. The problem here is that one stack is consistently missing (most likely caused by a suppression).
We can suppress
ScriptPreloader::OffThreadDecodeCallback
to make this work on beta, but I suspect we are hiding one or more bugs here that we are not aware of. It could be masked by suppressions for some of the other outstanding JS bugs.
I've used this for classifying on autoland too. Should we file another bug for that?
Comment 11•5 years ago
|
||
No, I can add the suppression in this bug. But I am afraid of adding such a generic suppression and if we can resolve this bug in time, I would prefer doing that.
Kris, I've looked into the one stack we have here and this is part of the code involved in the race:
if (cache->mToken && !cache->mFinishDecodeRunnablePending) {
cache->mFinishDecodeRunnablePending = true; // <--- Race Report here
NS_DispatchToMainThread(
NewRunnableMethod("ScriptPreloader::DoFinishOffThreadDecode", cache,
&ScriptPreloader::DoFinishOffThreadDecode));
}
The only other location where we write to mFinishDecodeRunnablePending
is in ScriptPreloader::DoFinishOffThreadDecode()
where we set it to false and that method is not guarded by any locks.
When those two methods are racing, a runnable can be dispatched and mFinishDecodeRunnablePending
be immediately set to false
afterwards, which is probably undesired.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 12•5 years ago
|
||
Updated•5 years ago
|
Comment 13•5 years ago
|
||
Comment 14•5 years ago
|
||
bugherder |
Comment 15•5 years ago
|
||
Christian,
Should we mark this as verify fixed since this is not happening anymore in beta simulations?
Comment 16•5 years ago
|
||
Verified fixed in latest beta sims:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f4b39d095b171278401890819d2d91dece81f055&group_state=expanded
Updated•5 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (off-topic) |
Comment 19•5 years ago
|
||
kmag says he will make the racing field atomic. The race should be harmless because it is just an optimization.
P3
Comment 20•5 years ago
|
||
Tentatively assigning to kmag because he said he will make the racing field atomic.
Comment 21•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:peterv, maybe it's time to close this bug?
Updated•4 years ago
|
Comment 22•4 years ago
|
||
Depends on D94856
Comment 23•4 years ago
|
||
Comment 24•4 years ago
|
||
bugherder |
Comment 25•4 years ago
|
||
oops, didn't need to be leave-open anymore
Updated•4 years ago
|
Comment 26•4 years ago
|
||
I accidentally got these two mixed up!!
Comment 27•4 years ago
|
||
Comment 28•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Description
•