<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Updated

•

6 years ago

Flags: needinfo?(ehsan)

Comment 1

•

6 years ago

Yes, looks like it does main-thread IO.

The common way to off-load these types of IO to a background thread is to use the "STS thread", aka the stream transport service. This is a thread used by Necko for the delivery messages of network notifications internally and is commonly used by various components in Gecko to run a small piece of code off the main thread.

There are two general techniques for doing so:

a) A fire and forget technique, like how we save the spell checking custom dictionary (when the user adds a custom word to the spell checking dictionary here: https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/extensions/spellcheck/src/mozPersonalDictionary.cpp#347. We grab the STS thread event target and post a runnable to it, and the runnable's Run() method is responsible to do the IO. That function will run off the main thread. The WaitForSave() function uses a monitor to optionally block on an ongoing pending save where needed (e.g. when kicking off another save operation, you don't wanna race with a previously running one.)

b) A fire and wait for completion technique, like how we save the preferences file (https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/modules/libpref/Preferences.cpp#4166). This is very similar to the above except for the setup used to fire the runnable. Here we use a "sync runnable" (see the call to SyncRunnable::DispatchToThread()). That dispatches the runnable to a background thread, but blocks the main thread until that runnable has been processed on the background thread! But while doing so, instead of actually blocking the main thread's event queue, we'd be processing the incoming messages, so the main thread will remain responsive, but we just won't make progress running more code from our function (in other words, we're spinning a nested event loop here.)

Which one of these solutions is appropriate here is really up to Dana since she's the domain expert here, I'll let her speak to that part. :-)

Flags: needinfo?(ehsan)

Comment 2

•

6 years ago

In PSM we generally subclass CryptoTask like so: https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/security/manager/ssl/ContentSignatureVerifier.cpp#41
The downside of this approach is that it creates (and destroys when it's done) a new thread for every task created. Our webcrypto implementation uses the same framework but has a thread pool, which presumably is much more efficient: https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/dom/crypto/WebCryptoThreadPool.h#18 (I don't know why we didn't just make all CryptoTasks use a thread pool when we implemented that, but oh well).

Flags: needinfo?(dkeeler)

Comment 3

•

6 years ago

Hmm, looks like usage of the transient thread is optional? https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/security/manager/ssl/CryptoTask.cpp#43 Presumably one could just create a CryptoTask and submit it to the STS thread, no?

Comment 4

•

6 years ago

I suppose so but is it a good idea to be using the socket thread for unrelated tasks?

Comment 5

•

6 years ago

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #4)

I suppose so but is it a good idea to be using the socket thread for unrelated tasks?

Not the socket thread, the stream transport thread. This is a common source of confusion, see this thread for example: https://groups.google.com/d/msg/mozilla.dev.platform/v2MePAm0YFQ/RgMxVgDLe4YJ. The Necko team definitely thinks this is OK and actually invites people to use this thread for these types of tasks. As far as I understand other types of activities that can happen on this thread are also IO in nature.

Comment 6

•

6 years ago

Ohhhh that makes a lot more sense. Thanks!
Looking at how the cert override service works, I imagine we could get away with fire-and-forget using the stream transport service in most cases (since we update in-memory behind a lock and the writing is only for saving state). For the initial read, again it would be nice to do that on a background thread, but in that case we'd essentially need to block any lookups until the read had completed (a bit like DataStorage: https://searchfox.org/mozilla-central/rev/7556a400affa9eb99e522d2d17c40689fa23a729/security/manager/ssl/DataStorage.cpp#629 )

Comment 7

•

6 years ago

That makes sense. Maybe we should file a follow-up bug to move some of those cases to the stream transport service?

(For Nihanth's original purpose, I hope what I said in comment 3 makes sense?)

BugBot [:suhaib / :marco/ :calixte]

Comment 8

•

6 years ago

Sure - I filed bug 1555854.

Comment 9

•

6 years ago

The priority flag is not set for this bug.
:keeler, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dkeeler)

Comment 10

•

6 years ago

Nihanth, are you planning on working on this?

Flags: needinfo?(dkeeler) → needinfo?(nhnt11)

Assignee

Comment 11

•

6 years ago

•

Edited

Sure, sorry I didn't get a chance to pick this up already. Thank you both for already starting the discussion to figure out the best way to do this! Here's what I understood from reading the conversation above and looking at the links:

Dana already landed a patch to make CryptoTasks use the stream transport thread, so that's done.
We should create a subclass of CryptoTask to handle the I/O. It needs to override CalculateResult to do the actual I/O, and CallCallback which can be used to set a ready state on the override service and to propagate errors. Either the constructor can accept a boolean indicating read or write, or we can just have two subclasses.
Reads should set a ready state to false, and lookups should wait until the state is true.
Writes can be a "fire and forget".
I need to double-check if the stream transport service uses a thread or a thread-pool. If it's a pool, maybe writes should block until ready too, to ensure they execute in the right order.

Thanks! I'll start working on a patch. Let me know if I'm missing something.

Assignee

Updated

•

6 years ago

Assignee: nobody → nhnt11

Status: NEW → ASSIGNED

Flags: needinfo?(nhnt11)

Assignee

Comment 12

•

6 years ago

I need some help to continue. I started working on making a CryptoTask subclass for writing, and realized I had two options:

Iterate on mSettingsTable and build all the data to be written into a string, and pass this to the Task to write.
Figure out the right way to access mSettingsTable from the Task.

What's the right approach here? If 1, is there an example of string building that I should follow or can I just use an ostringstream? 2 feels very unsafe and wrong, but if there's a good way to do it, I'm all ears.

Flags: needinfo?(ehsan)

Flags: needinfo?(dkeeler)

Comment 13

•

6 years ago

This is really a PSM specific question so I'll leave it to Dana to decide how she prefers this to be handled...

Flags: needinfo?(ehsan)

Comment 14

•

6 years ago

I would go with option 1. Here's an example of marshaling a bunch of data into a string and then creating a task that writes it out: https://searchfox.org/mozilla-central/rev/b3b401254229f0a26f7ee625ef5f09c6c31e3949/security/manager/ssl/DataStorage.cpp#895 (I would take that implementation with a grain of salt, though - it's a bit dated and uses some paradigms that maybe aren't best practice any longer).

Flags: needinfo?(dkeeler)

Priority: -- → P1

Whiteboard: [psm-assigned]

Assignee

Comment 15

•

6 years ago

Attached file Bug 1555557 - Do cert override file writes off the main thread. r=keeler! — Details

Assignee

Comment 16

•

6 years ago

Thanks Dana! I posted a proof-of-concept patch that moves the writing off the main thread by implementing a CryptoTask subclass as a private inner class of nsCertOverrideService and passing it the string to write. It works for me locally, but I haven't run any tests. I'm posting the patch for initial feedback before implementing the reading part.

Comment 17

•

6 years ago

Nihanth, is this still something you're available to do?

Flags: needinfo?(nhnt11)

Assignee

Comment 18

•

6 years ago

Apologies for being away from this bug, had to prioritize Skyline stuff. I will be picking this back up soon. Leaving the needinfo for tracking.

Flags: needinfo?(nhnt11)

Updated

•

5 years ago

Attachment #9072908 - Attachment description: Bug 1555557 - Do cert override file writes off the main thread. r=keeler → Bug 1555557 - Do cert override file writes off the main thread. r=keeler!,baku

Updated

•

5 years ago

Severity: normal → S4

Updated

•

5 years ago

Attachment #9072908 - Attachment description: Bug 1555557 - Do cert override file writes off the main thread. r=keeler!,baku → Bug 1555557 - Do cert override file writes off the main thread. r=keeler!

Comment 19

•

5 years ago

Pushed by nhnt11@gmail.com: https://hg.mozilla.org/integration/autoland/rev/7fbc8bcf6859 Do cert override file writes off the main thread. r=keeler

Atila Butkovits

Comment 20

•

5 years ago

Backed out as requested.

Backout link: https://hg.mozilla.org/integration/autoland/rev/1a9b9ba8a0c9dc3f46e11cc821b4c42afb244814

Comment 21

•

5 years ago

Pushed by nhnt11@gmail.com: https://hg.mozilla.org/integration/autoland/rev/6a93e7a5e237 Do cert override file writes off the main thread. r=keeler

Atila Butkovits

Comment 22

•

5 years ago

Backed outfor causing build bustage on nsCertOverrideService.cpp.

Backout link: https://hg.mozilla.org/integration/autoland/rev/f1bba90e7986517a2802dc2a4a6b74451f6406c0

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=6a93e7a5e2378be1878c44383fccc4c26ed4736d

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307596273&repo=autoland&lineNumber=39823

Atila Butkovits

Updated

•

5 years ago

Flags: needinfo?(nhnt11)

Assignee

Comment 23

•

5 years ago

(In reply to Atila Butkovits from comment #22)

Backed outfor causing build bustage on nsCertOverrideService.cpp.

Backout link: https://hg.mozilla.org/integration/autoland/rev/f1bba90e7986517a2802dc2a4a6b74451f6406c0

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=6a93e7a5e2378be1878c44383fccc4c26ed4736d

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307596273&repo=autoland&lineNumber=39823

Build failure was due to GetCurrentThreadSerialEventTarget() getting renamed in bug 1637500.

Flags: needinfo?(nhnt11)

Comment 24

•

5 years ago

Pushed by nhnt11@gmail.com: https://hg.mozilla.org/integration/autoland/rev/7e9307a4a744 Do cert override file writes off the main thread. r=keeler

Cristina Coroiu [:ccoroiu]

Assignee

Comment 25

•

5 years ago

Attached file Bug 1555557 - Do cert override file writes off the main thread. r=keeler! (obsolete) — Details

Comment 26

•

5 years ago

Backed out changeset 7e9307a4a744 (bug 1555557) for talos failures

Backout: https://hg.mozilla.org/integration/autoland/rev/f43be4059409ec108f2ea366144107be89b1a7ba

Failure push: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=7e9307a4a7441dc00fa8f53472b463adc3063e0d

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307666556&repo=autoland&lineNumber=2722

Flags: needinfo?(nhnt11)

Assignee

Comment 27

•

5 years ago

At first glance I have no idea how that failure is related to anything in this patch, but I'll look into it....

Flags: needinfo?(nhnt11)

~~https://treeherder.mozilla.org/#/jobs?repo=try&revision=178ad1f5ab6e721b14d800a7c10f123f4a6e0904~~

Assignee

Comment 28

•

5 years ago

•

Edited

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f88ed7e0d85eacfa46a44c233ba0401787fb157d

Assignee

Comment 29

•

5 years ago

•

Edited

(In reply to Nihanth Subramanya [:nhnt11] from comment #27)

At first glance I have no idea how that failure is related to anything in this patch, but I'll look into it....

The log shows a JS error which threw me off - there's a crash in nsCertOverrideService::Init() because we MOZ_RELEASE_ASSERT that adding the AsyncShutdown blocker succeeded, and apparently, in some cases, it doesn't.

I need to figure out the answers to:

When does this happen?
Is this a real issue that needs to be fixed/worked around or is MOZ_RELEASE_ASSERT just unnecessary here?

So far I've found:

GetShutdownBarrier definitely worked because there's an assert there too that isn't being hit.
In m-c, there are some consumers that do the MOZ_RELEASE_ASSERT on the AddBlocker result and some that don't.
I THINK AddBlocker might be failing in cases when we're somehow already shutting down and/or past the target shutdown phase.

Assignee

Comment 30

•

5 years ago

(In reply to Nihanth Subramanya [:nhnt11] from comment #29)

(In reply to Nihanth Subramanya [:nhnt11] from comment #27)
3. I THINK AddBlocker might be failing in cases when we're somehow already shutting down and/or past the target shutdown phase.

Based on the stack trace in the log, it does indeed appear that we are hitting PSM init after the ScopedXPCOMStartup destructor has been called, and we are crashing because we're trying to add a blocker to a shutdown phase that's already in progress (or maybe even finished).

I think that it should be sufficient here to WARN and return instead of MOZ_RELEASE_ASSERT.

Dana, do you have any input on this diagnosis+solution? It appears there's an XHR during shutdown which is triggering PSM init.

Here's the stack trace: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=307741501&repo=try&lineNumber=3640

Flags: needinfo?(dkeeler)

Assignee

Comment 31

•

5 years ago

(In reply to Nihanth Subramanya [:nhnt11] from comment #30)

(In reply to Nihanth Subramanya [:nhnt11] from comment #29)

(In reply to Nihanth Subramanya [:nhnt11] from comment #27)
3. I THINK AddBlocker might be failing in cases when we're somehow already shutting down and/or past the target shutdown phase.

I think that it should be sufficient here to WARN and return instead of MOZ_RELEASE_ASSERT.

Perhaps in addition to this, we should remember this state and return early in Write() if we failed to add the shutdown blocker. Just in case.

~~https://treeherder.mozilla.org/#/jobs?repo=try&revision=572389b4e8c119376f1c204c9dafef969f03a4b7~~

Assignee

Comment 32

•

5 years ago

•

Edited

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a8c7cd6fda8ad17f4c7d5921edf5bbdb44064cf9

Comment 33

•

5 years ago

I feel like if we non-fatally fail in a function called from nsCertOverrideService::Init() we should propagate the error (that way, the override service will never be created and no writers will be dispatched. It would also be interesting to know what is causing PSM to be initialized when we're shutting down.

Flags: needinfo?(dkeeler)

Assignee

Comment 34

•

5 years ago

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #33)

I feel like if we non-fatally fail in a function called from nsCertOverrideService::Init() we should propagate the error (that way, the override service will never be created and no writers will be dispatched. It would also be interesting to know what is causing PSM to be initialized when we're shutting down.

Sounds good, in that case we should try adding the blocker before adding any observers so we don't leak. I'll update the patch.

Updated

•

5 years ago

Attachment #9159692 - Attachment description: Bug 1555557 - Do cert override file writes off the main thread - follow-up cleanup. r=jya! → Bug 1555557 - Do cert override file writes off the main thread. r=keeler!

Updated

•

5 years ago

Attachment #9159692 - Attachment is obsolete: true