Closed Bug 1522861 Opened 5 years ago Closed 5 years ago

Permafail Assertion failure: curThreadData, at /builds/worker/workspace/build/src/xpcom/build/IOInterposer.cpp:505 when Gecko 66 merges to Beta 2019-01-21

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- unaffected
firefox66 + verified

People

(Reporter: RaulG, Assigned: florian)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

[Tracking Requested - why for this release]:

Central as Beta simulation: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d86a8f66f46ca3221d538868788a0c52fe14d5a1&searchStr=linux%2Cx64%2Cquantumrender%2Cdebug%2Ctest-linux64-qr%2Fdebug-gtest%2C%28gtest%29&selectedJob=224032521

Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=224032521&repo=try&lineNumber=1208

Log snippet:

[task 2019-01-25T14:55:57.866Z] 14:55:57 INFO - Copy/paste: /builds/worker/workspace/build/venv/bin/mozinstall -h
[task 2019-01-25T14:55:58.078Z] 14:55:58 INFO - Reading from file tmpfile_stdout
[task 2019-01-25T14:55:58.079Z] 14:55:58 INFO - Output received:
[task 2019-01-25T14:55:58.079Z] 14:55:58 INFO - Usage: mozinstall [options] installer
[task 2019-01-25T14:55:58.080Z] 14:55:58 INFO - Options:
[task 2019-01-25T14:55:58.080Z] 14:55:58 INFO - -h, --help show this help message and exit
[task 2019-01-25T14:55:58.080Z] 14:55:58 INFO - -d DEST, --destination=DEST
[task 2019-01-25T14:55:58.083Z] 14:55:58 INFO - Directory to install application into. [default:
[task 2019-01-25T14:55:58.083Z] 14:55:58 INFO - "/builds/worker/workspace"]
[task 2019-01-25T14:55:58.083Z] 14:55:58 INFO - --app=APP Application being installed. [default: firefox]
[task 2019-01-25T14:55:58.083Z] 14:55:58 INFO - mkdir: /builds/worker/workspace/build/application
[task 2019-01-25T14:55:58.083Z] 14:55:58 INFO - Getting output from command: ['/builds/worker/workspace/build/venv/bin/mozinstall', '/builds/worker/workspace/build/installer.tar.bz2', '--destination', '/builds/worker/workspace/build/application']
[task 2019-01-25T14:55:58.083Z] 14:55:58 INFO - Copy/paste: /builds/worker/workspace/build/venv/bin/mozinstall /builds/worker/workspace/build/installer.tar.bz2 --destination /builds/worker/workspace/build/application
[task 2019-01-25T14:56:17.681Z] 14:56:17 INFO - Reading from file tmpfile_stdout
[task 2019-01-25T14:56:17.681Z] 14:56:17 INFO - Output received:
[task 2019-01-25T14:56:17.682Z] 14:56:17 INFO - /builds/worker/workspace/build/application/firefox/firefox
[task 2019-01-25T14:56:17.682Z] 14:56:17 INFO - Running post-action listener: _resource_record_post_action
[task 2019-01-25T14:56:17.683Z] 14:56:17 INFO - [mozharness: 2019-01-25 14:56:17.682988Z] Finished install step (success)
[task 2019-01-25T14:56:17.683Z] 14:56:17 INFO - [mozharness: 2019-01-25 14:56:17.683266Z] Running stage-files step.
[task 2019-01-25T14:56:17.683Z] 14:56:17 INFO - Running pre-action listener: _resource_record_pre_action
[task 2019-01-25T14:56:17.683Z] 14:56:17 INFO - Running main action method: stage_files
[task 2019-01-25T14:56:17.684Z] 14:56:17 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-clearkey to /builds/worker/workspace/build/application/firefox
[task 2019-01-25T14:56:17.685Z] 14:56:17 ERROR - shutil error: Destination path '/builds/worker/workspace/build/application/firefox/gmp-clearkey' already exists
[task 2019-01-25T14:56:17.685Z] 14:56:17 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-fake to /builds/worker/workspace/build/application/firefox
[task 2019-01-25T14:56:17.685Z] 14:56:17 INFO - Moving /builds/worker/workspace/build/tests/bin/plugins/gmp-fakeopenh264 to /builds/worker/workspace/build/application/firefox

It looks like these assertions are being hit off the main thread in a Gecko profiler unit test that calls functions on a background thread, so maybe this is due to the profiler. I think that the assert means that some IOInterposer TLS hasn't been set up on these threads, but I have no idea why this would start breaking.

Component: XPCOM → Gecko Profiler

Markus, any ideas? Thanks.

Flags: needinfo?(mstange)

Regression from bug 1522497? This is a beta simulation and the final merge of central to beta (for Gecko 66) is on Monday.

Florian, please investigate.

Blocks: 1522497
Component: Gecko Profiler → XPCOM
Flags: needinfo?(florian)
Keywords: regression
Attached patch Tentative patchSplinter Review
This patch adds ifdefs in the profiler code to match the behavior at https://searchfox.org/mozilla-central/rev/465dbfe030dfec7756b9b523029e90d48dd5ecce/xpcom/build/IOInterposer.h#259-261
Attachment #9039144 - Flags: review?(mstange)
Comment on attachment 9039144 [details] [diff] [review]
Tentative patch

Review of attachment 9039144 [details] [diff] [review]:
-----------------------------------------------------------------

Interesting! I think I've understood what's happening here.

On Nightly, IOInterposer::Init is always called before any thread starts that calls IOInterposer::RegisterCurrentThread. That means that IOInterposer::RegisterCurrentThread always succeeds and IOInterposer::UnregisterCurrentThread always finds data for the current thread.
However, if IOInterposer::Init is not called before a thread calls IOInterposer::RegisterCurrentThread, then IOInterposer::RegisterCurrentThread will fail. Now, two things can happen:
 - Nothing calls IOInterposer::Init until the thread shuts down. Then IOInterposer::UnregisterCurrentThread won't do anything and all is fine.
 - Something calls IOInterposer::Init before the thread shuts down. Then IOInterposer::UnregisterCurrentThread sees that the interposer has been initialized, but it won't find any data for the current thread, and it'll assert.

The profiler is now calling IOInterposer::Init. On Nightly, this happens:
 - If we're startup profiling, we will call IOInterposer::Init before any thread has called IOInterposer::RegisterCurrentThread. All is fine.
 - If we're not startup profiling, then IOInterposerInit has already called IOInterposer::Init. The profiler's call to IOInterposer::Init does nothing. Things are fine as well.

But on release and beta, in the second case, IOInterposer::Init has not been called! So the profiler's call will initialize it, and we get into the state described above where the IOInterposer has been initialized between calls to IOInterposer::RegisterCurrentThread and IOInterposer::UnregisterCurrentThread.

I think this is a problem with the IOInterposer API and not with the profiler. So we shouldn't work around it in the profiler. I'll attach an alternative patch.
Attachment #9039144 - Flags: review?(mstange) → review-

However, this does mean that bug 1522497 had the unintended effect of actually giving us IO information on Beta + Release. We now actually make use of the IOInterposer on those channels, if the user starts the profiler, and has the "Main Thread IO" checkbox checked, whereas we just wouldn't have gotten any information on those channels in the past.

Comment on attachment 9039144 [details] [diff] [review]
Tentative patch

Review of attachment 9039144 [details] [diff] [review]:
-----------------------------------------------------------------

Actually yes, let's land this as a quick workaround while the other patch gets reviewed. It fixes the problem and is harmless.
Attachment #9039144 - Flags: review- → review+

requesting checkin-needed for attachment 9039144 [details] [diff] [review], not for the phabricator patch.

Flags: needinfo?(mstange)
Flags: needinfo?(florian)
Keywords: checkin-needed

Pushed by ccoroiu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/21211736651c
ifdef out the profiler calls to IOInterposer on release and beta to match the IOInterposerInit behavior, r=mstange.

Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Assignee: nobody → florian

Needinfo :mstange to make sure you don't forget to open a new bug for your other patch :)

Flags: needinfo?(mstange)
Attachment #9039273 - Attachment is obsolete: true

It seems like a similar patch to the other patch eventually landed in bug 1705579.

Flags: needinfo?(mstange.moz)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: