Move crypto off the main thread

RESOLVED WONTFIX

Status

()

defect
RESOLVED WONTFIX
9 years ago
11 months ago

People

(Reporter: philikon, Assigned: philikon)

Tracking

unspecified
Points:
---
Dependency tree / graph
Bug Flags:
blocking-fx-sync1.4 -

Firefox Tracking Flags

(blocking2.0 beta8+, fennec2.0b2+)

Details

(Whiteboard: [qa-])

Attachments

(1 attachment, 4 obsolete attachments)

No description provided.
- split nss declaration to a separate script that will set .nss .nss_t on a given object
-- need to get the nss file path first because the worker won't have access to xpcom

- move the generateKeypair code to a worker script that imports nss declaration and calls keypair generation

- need to use nsIWindowMediator to get a window object that has ChromeWorker object

- decide if we want a new async generateKeypair API or internally wrap with sync/async
Flags: blocking-fx-sync1.4+
Target Milestone: --- → 1.4
Sounds like we're going to make a generateKeypairAsync IWeaveCrypto2 interface.

base_records/keys.js:PubKeyManager.createKeypair will also need to be made async and choose between calling the sync vs async version depending on if the crypto module implements Crypto2.

service.js:Service:_remoteSetup will need to be made async as well to handle PubKeys.createKeypair being async
Assignee: nobody → philipp
Depends on: 570858
Worth mentioning that eventually WeaveCrypto should become a .jsm instead of being an xpcom component, that could be accelerated if wedging an async API into IWeaveCrypto2 or whatever is inconvenient.
Assignee: philipp → nobody
Component: Sync → Crypto
QA Contact: sync → crypto
We're blocked on bug 570858 for now, moving out.

dolske: we could accelerate it, but trying to be conservative until we drop 3.5/3.6 support.
Assignee: nobody → philipp
Flags: blocking-fx-sync1.4+ → blocking-fx-sync1.4-
Target Milestone: 1.4 → 2.0
Using ChromeWorker isn't as straight-forward as it seems. While there's access to ctypes, it's lacking essentials such as atob/btoa. Also, the fact that ChromeWorkers are hung off a DOM window rather than JSM namespaces scares me a bit.

I have an implementation of this using nsIThread which has proven to be much more reliable than the ChromeWorker prototype.
Summary: Move keygen off the main thread → Move crypto off the main thread
Convert WeaveCrypto.js from an XPCOM component to a JSM.

For some reason this breaks one test (test_crypto_crypt.js) because it used an improper base64 string for the 'badiv' variable (trailing =); the failure origins from bug 439276. Making this a proper base64 string makes the tests pass but fail for the binary component.
Attachment #469607 - Flags: review?(mconnor)
Provide a threaded implementation of IWeaveCrypto. Yes, it keeps the synchronous API. All hail Sync.js.
Attachment #469609 - Flags: review?(mconnor)
Comment on attachment 469607 [details] [diff] [review]
Part 1 (v1): Convert WeaveCrypto.js into a JSM

>+  delete Svc.Crypto;
>+  return Svc.Crypto = cryptoSvc;
>+});
>+
>+

nit: random extra newline?
Attachment #469607 - Flags: review?(mconnor) → review+
Remove extraneous line. Ensure services-crypto alias for m-c integration.
Attachment #469607 - Attachment is obsolete: true
Attachment #469619 - Flags: review?(mconnor)
Comment on attachment 469609 [details] [diff] [review]
Part 2 (v1): Threaded implementation of IWeaveCrypto

this hurts my head, and my soul, a little.  But it works, and doesn't feel too evil.  Getting sdwilsh to review as well as the threading stuff is not my forte.
Attachment #469609 - Flags: review?(sdwilsh)
Attachment #469609 - Flags: review?(mconnor)
Attachment #469609 - Flags: review+
Attachment #469619 - Flags: review?(mconnor) → review+
Attachment #469609 - Attachment description: Part 2 (v2): Threaded implementation of IWeaveCrypto → Part 2 (v1): Threaded implementation of IWeaveCrypto
Comment on attachment 469609 [details] [diff] [review]
Part 2 (v1): Threaded implementation of IWeaveCrypto

>+++ b/services/crypto/modules/threaded.js
nit: license block is not formatted per http://www.mozilla.org/MPL/boilerplate-1.1/

>+/*
>+ * Implementation of IWeaveCrypto that defers method calls to another thread
>+ * but keeps the synchronous API. (Don't ask...)
>+ */
I know it says don't ask, but...care to explain to your reviewer?

>+function ThreadedCryptoService() {
>+  let threadManager = Cc["@mozilla.org/thread-manager;1"].getService();
>+  this.currentThread = threadManager.currentThread;
You only ever mean for this to be on the main thread, right?  At least, that's where I'm betting you want your callbacks to go.  threadManager.mainThread is what you want here.

Although, thinking about that even more, you can just always dispatch to the main thread when you get your answer, and not have to track it at all.  More details here would be appreciated.


This also seems to be lacking in tests, which is always a good thing to have when you start throwing threads into the mix.  Care to write some (would also maybe help explain why you are keeping the sync API semantics)

r- mostly for tests, but also due to mainThread stuff mentioned above.
Attachment #469609 - Flags: review?(sdwilsh) → review-
Thanks for reviewing this, Shawn.

(In reply to comment #11)
> >+/*
> >+ * Implementation of IWeaveCrypto that defers method calls to another thread
> >+ * but keeps the synchronous API. (Don't ask...)
> >+ */
> I know it says don't ask, but...care to explain to your reviewer?

Sure! Since the crypto API is used in the guts of the SyncEngine object (underlying object to every engine), changing this to an async API would mean rewriting a whole lot of Sync's APIs to be async. This has been avoided so many times before (the asynchronous network calls mock a sync API too, for instance) that it would now constitute a pretty big rewrite.

By keeping the sync API we can just drop this threaded version of WeaveCrypto in and everything continues to work just fine.

> >+function ThreadedCryptoService() {
> >+  let threadManager = Cc["@mozilla.org/thread-manager;1"].getService();
> >+  this.currentThread = threadManager.currentThread;
> You only ever mean for this to be on the main thread, right?  At least, that's
> where I'm betting you want your callbacks to go.  threadManager.mainThread is
> what you want here.

Yeah, you're right. Holding on to the current thread like that seems wrong. It might dispatch to the wrong thread if one of the methods isn't called from whatever this.currentThread points. Although Sync.js's magic should mitigate that, I think I'll just have it dispatch to mainThread every time.

> This also seems to be lacking in tests, which is always a good thing to have
> when you start throwing threads into the mix.  Care to write some (would also
> maybe help explain why you are keeping the sync API semantics)

WeaveCrypto has lots of tests in services/sync/tests/unit (yeah, those should be moved over to services/crypto eventually) and they all pass with the threaded component!
(In reply to comment #12)
> By keeping the sync API we can just drop this threaded version of WeaveCrypto
> in and everything continues to work just fine.
Alright, then can you explain to me how the multithreaded version of this still works?  What magic am I missing?

> Yeah, you're right. Holding on to the current thread like that seems wrong. It
> might dispatch to the wrong thread if one of the methods isn't called from
> whatever this.currentThread points. Although Sync.js's magic should mitigate
> that, I think I'll just have it dispatch to mainThread every time.
Should probably throw if you aren't on the main thread when calling these too for good measure.

> WeaveCrypto has lots of tests in services/sync/tests/unit (yeah, those should
> be moved over to services/crypto eventually) and they all pass with the
> threaded component!
Do this all pass, assertion free, in debug builds?
Comment on attachment 469609 [details] [diff] [review]
Part 2 (v1): Threaded implementation of IWeaveCrypto

(In reply to comment #13)
> (In reply to comment #12)
> > By keeping the sync API we can just drop this threaded version of WeaveCrypto
> > in and everything continues to work just fine.
> Alright, then can you explain to me how the multithreaded version of this still
> works?  What magic am I missing?

>+ThreadedCryptoService.deferToThread = function deferToThread(methodname) {
>+    let [exec, execCb] = Sync.withCb(function(callback) {
>+      let runner = new CallbackRunner(this.crypto[methodname], this.crypto,
>+                                      args, callback, this.currentThread);
>+      this.backgroundThread.dispatch(runner, Ci.nsIThread.DISPATCH_NORMAL);
>+    }, this);
>+
>+    let [returnval, error] = exec(execCb);
This could be written as just

let [retval, error] = Sync(function(callback) ...)();

Sync.withCb is to explicitly get a separate callback. Otherwise, a plain Sync() call will pass in the callback to the first argument of the provided function.
See http://hg.mozdev.org/jsmodules/file/tip/test/unit/test_Sync.js#l47

for other uses of Sync() instead of Sync.withCb()
(In reply to comment #13)
> (In reply to comment #12)
> > By keeping the sync API we can just drop this threaded version of WeaveCrypto
> > in and everything continues to work just fine.
> Alright, then can you explain to me how the multithreaded version of this still
> works?  What magic am I missing?

The magic of Sync.js. See http://mxr.mozilla.org/services-central/source/fx-sync/services/sync/modules/ext/Sync.js

  let [syncFunc, callback] = Sync.withCb(someAsyncFunc);

turns an asynchronous function to a synchronous one (syncFunc). You can then call syncFunc(callback) or however the original API of someAsyncFunc was. Sync.js will dispatch the async function and periodically check for the function to dispatch the callback, at which point it returns to the caller.

> > Yeah, you're right. Holding on to the current thread like that seems wrong. It
> > might dispatch to the wrong thread if one of the methods isn't called from
> > whatever this.currentThread points. Although Sync.js's magic should mitigate
> > that, I think I'll just have it dispatch to mainThread every time.
> Should probably throw if you aren't on the main thread when calling these too
> for good measure.

Good idea!

> > WeaveCrypto has lots of tests in services/sync/tests/unit (yeah, those should
> > be moved over to services/crypto eventually) and they all pass with the
> > threaded component!
> Do this all pass, assertion free, in debug builds?

Will check (so far developing in add-on space). Let me get back to you on that.
(In reply to comment #14)
> This could be written as just
> 
> let [retval, error] = Sync(function(callback) ...)();
> 
> Sync.withCb is to explicitly get a separate callback. Otherwise, a plain Sync()
> call will pass in the callback to the first argument of the provided function.

Good to know, will incorporate that. Thanks!
Comment on attachment 469609 [details] [diff] [review]
Part 2 (v1): Threaded implementation of IWeaveCrypto

>+ThreadedCryptoService.deferToThread = function deferToThread(methodname) {
>+  return function threadMethod() {
>+    let [exec, execCb] = Sync.withCb(function(callback) ..);
>+    let [returnval, error] = exec(execCb);
>+    if (error)
>+      throw error;
>+    return returnval;
Actually, this should be able to be simplified to just..
return Sync(function(callback) ..)();

With some modifications to Runner:

>+Runner.prototype = {
>+  run: function run() {
>+    this.func.apply(this.thisObj, this.args);
if (success)
  this.func.call(this.thisObj, successVal);
else
  this.func.throw(exception);

The Sync/async callback function has a .throw method that will throw at the synchronous wrapper level of the async function. See:
http://hg.mozdev.org/jsmodules/file/tip/test/unit/test_Sync.js#l76
(In reply to comment #16)
> > Do this all pass, assertion free, in debug builds?
> 
> Will check (so far developing in add-on space). Let me get back to you on that.

Hmm, they don't :( I'm getting these assertion errors:

###!!! ASSERTION: nsExceptionManager not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /Users/philipp/dev/mozilla-weave/mc/xpcom/base/nsExceptionService.cpp, line 99
nsExceptionManager::Release()+0x000000A7 [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x01619E09]
nsExceptionService::DoDropThread(nsExceptionManager*)+0x00000085 [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x0161902D]
nsExceptionService::DropAllThreads()+0x0000002F [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x01619069]
nsExceptionService::Shutdown()+0x0000002C [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x01619324]
nsExceptionService::Observe(nsISupports*, char const*, unsigned short const*)+0x00000011 [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x01619379]
nsObserverList::NotifyObservers(nsISupports*, char const*, unsigned short const*)+0x00000061 [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x015B2F91]
nsObserverService::NotifyObservers(nsISupports*, char const*, unsigned short const*)+0x0000014C [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x015B3FE2]
mozilla::ShutdownXPCOM(nsIServiceManager*)+0x000001ED [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x01596FBD]
NS_ShutdownXPCOM_P+0x00000011 [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/XUL +0x01597584]
NS_ShutdownXPCOM+0x00000011 [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/libxpcom.dylib +0x00001A47]
main+0x00000E1C [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/xpcshell +0x000057AA]
start+0x00000036 [/Users/philipp/dev/mozilla-weave/mc/obj-ff-dbg/dist/bin/xpcshell +0x000010FE]
###!!! ASSERTION: nsExceptionManager not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /Users/philipp/dev/mozilla-weave/mc/xpcom/base/nsExceptionService.cpp, line 99
(In reply to comment #16)
> turns an asynchronous function to a synchronous one (syncFunc). You can then
> call syncFunc(callback) or however the original API of someAsyncFunc was.
> Sync.js will dispatch the async function and periodically check for the
> function to dispatch the callback, at which point it returns to the caller.
Oh gosh oh gosh oh gosh.  You folks are still spinning the event loop :(

Bug 529380 should get fixed sooner rather than later.  You should really stop adding new things on top of that bad behavior if at all possible.

> Will check (so far developing in add-on space). Let me get back to you on that.
I'm asking for this because there are some thread safety checks that we don't run in opt builds.
Blocks: 592372
Blocks: 592375
(In reply to comment #20)
> Bug 529380 should get fixed sooner rather than later.  You should really stop
> adding new things on top of that bad behavior if at all possible.

I concur. We need to kill Sync.js entirely, so we should not be adding any new users.
Ensure there are no assertions when running on a trunk debug build:
* Make sure to kill the background thread before XPCOM shuts down
* Don't use Components.Exception (apparently not thread safe)

Also incorporated Mardak's feedback and simplified call to Sync().
Attachment #469609 - Attachment is obsolete: true
Attachment #478877 - Flags: review?(sdwilsh)
(In reply to comment #21)
> We need to kill Sync.js entirely, so we should not be adding any new users.

Nothing would make me happier to kill Sync.js, but at this point all of our APIs are synchronous. Crypto sits at the core of our code so making that an async API all of a sudden would mean a large rewrite. We're eventually going to have to bite this bullet and I'm fine with that, but if we want to do crypto-off-the-main thread in a realistic time frame, I don't see another way.
I concur on the need to do this in the short-term, and I think sdwilsh grudgingly agrees, from IRC conversations.  We're going to fix this the right way, but it's a pretty major refactoring that is impractical in the Fx4 timeframe.  I expect that we'll do this in Q1, FWIW.

Filed Bug 600059 about doing this properly.
Try builds of Part 1 (v2) + Part 2 (v2) + fixes to Makefiles and XPCOM manifests are available here: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/pweitershausen@mozilla.com-3c5b46c4a09f/

Tests were successful modulo the usual three or four intermittent oranges.
blocking2.0: --- → beta8+
Comment on attachment 478877 [details] [diff] [review]
Part 2 (v2): Threaded implementation of IWeaveCrypto

I feel like makeException should maybe be a global function, but then I also don't see the point of using it over Components.exception (maybe shorter?).  At the very least, if you are going to do it, you should adjust the call site up so it's throwing at the call site to this methods (at least for the public facing APIs).  Although, looking closely, I think you are making it on the object so you magically get the ThreadedCryptoService version of makeException, which is some JS magic that needs to be better documented.

>+  // Make sure to kill the thread before XPCOM shuts down.
>+  Services.obs.addObserver(this, "profile-before-change", true);
You actually want xpcom-shutdown-threads here.

>+  observe: function observe() {
>+    Services.obs.removeObserver(this, "profile-before-change");
No need for this since you are a weak reference anyway.

r=sdwilsh, with the understanding that once fx-4 ships, getting off of sync.js needs to be a very top-priority task, if not the top priority task.
Attachment #478877 - Flags: review?(sdwilsh) → review+
(In reply to comment #26)
> I feel like makeException should maybe be a global function, but then I also
> don't see the point of using it over Components.exception (maybe shorter?).  At
> the very least, if you are going to do it, you should adjust the call site up
> so it's throwing at the call site to this methods (at least for the public
> facing APIs).  Although, looking closely, I think you are making it on the
> object so you magically get the ThreadedCryptoService version of makeException,
> which is some JS magic that needs to be better documented.

Yes, I factored the exception instantiation out to its own method so that I can override in the ThreadedCryptoService "subclass". Not sure what's magical about this, but I'll add another comment to be safe.

> >+  // Make sure to kill the thread before XPCOM shuts down.
> >+  Services.obs.addObserver(this, "profile-before-change", true);
> You actually want xpcom-shutdown-threads here.

Aha, didn't know about this one. Problem is, it's sent even after xpcom-shutdown and I'm getting assertions like

  ###!!! ASSERTION: nsExceptionManager not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /Users/philipp/dev/mozilla-weave/mcthread/xpcom/base/nsExceptionService.cpp, line 99

even if I use xpcom-will-shutdown, not to mention anything later than that.

> >+  observe: function observe() {
> >+    Services.obs.removeObserver(this, "profile-before-change");
> No need for this since you are a weak reference anyway.
> 
> r=sdwilsh, with the understanding that once fx-4 ships, getting off of sync.js
> needs to be a very top-priority task, if not the top priority task.

Understood.
(In reply to comment #27)
> Yes, I factored the exception instantiation out to its own method so that I can
> override in the ThreadedCryptoService "subclass". Not sure what's magical about
> this, but I'll add another comment to be safe.
It's magical because the inheritance isn't exactly obvious (or at least wasn't to me).

> Aha, didn't know about this one. Problem is, it's sent even after
> xpcom-shutdown and I'm getting assertions like
> 
>   ###!!! ASSERTION: nsExceptionManager not thread-safe:
> '_mOwningThread.GetThread() == PR_GetCurrentThread()', file
> /Users/philipp/dev/mozilla-weave/mcthread/xpcom/base/nsExceptionService.cpp,
> line 99
Sure, but that assertion shouldn't be related to you using a different observer topic.  That seems indicative of a bug somewhere.  Set a breakpoint on it, and get the JS stack and the c++ stack and see what's up.  xpcom-shutdown-threads is the right topic to use here.

(see also https://wiki.mozilla.org/XPCOM_Shutdown)
Depends on: 562431
Comment on attachment 469619 [details] [diff] [review]
Part 1 (v2): Convert WeaveCrypto.js into a JSM

Part 1 is in fact bug 562431, marking this as obsolete.
Attachment #469619 - Attachment is obsolete: true
Duplicate of this bug: 603386
From my experimentation, moving the observer-topic to something as late as "xpcom-shutdown-threads" causes problems because nsExceptionService (specifically the per-thread managers it creates) observe "xpcom-shutdown" (which occurs before "xpcom-shutdown-threads") causing it to complain when it is shutdown (presumably because a manager is still being held by our thread?).  I think the original observer-topic of "profile-before-change" is correct enough for our purposes, unless someone else can explain/understand the exception-service issue better.
(In reply to comment #31)
> From my experimentation, moving the observer-topic to something as late as
> "xpcom-shutdown-threads" causes problems because nsExceptionService
> (specifically the per-thread managers it creates) observe "xpcom-shutdown"
> (which occurs before "xpcom-shutdown-threads") causing it to complain when it
> is shutdown (presumably because a manager is still being held by our thread?). 

Filed bug 603509 for this.
(In reply to comment #32)
> Filed bug 603509 for this.

... which is a dupe of bug 450812. We could (and probably should) take another stab at this. Bug 429988's usage of a much earlier observer topic (cf bug 450913) than xpcom-shutdown-threads is more evidence of the problem not being related to our use of threads but rooted in nsExceptionService.

However, in the mean time I don't think we should block on that. Moving crypto off the main thread is a high priority for Fennec perf. Using an earlier observer notification than xpcom-shutdown (profile-before-change being the next one up) clearly solves the issues at hand, and as bug 429988 demonstrates it has worked before.

So unless there are any objections, I propose landing this as is.
I think that makes sense.  We can push on that bug harder, and we should open a followup to Do The Right Thing, depending on that bug, but we want this ASAP.
Posted patch v3Splinter Review
Address sdwilsh's review comments, except for the observer notification which still is 'profile-before-change' for the aforementioned reasons.

Also fix a the definition of makeException in threaded.js to actually make sense and rebase on bug 562431 (formerly part 1).
Attachment #478877 - Attachment is obsolete: true
tracking-fennec: --- → 2.0b2+
Pushed v3 to fx-sync: http://hg.mozilla.org/services/fx-sync/rev/fb506072dca8
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 603489
What is the timeline/process for getting this to m-c?
(In reply to comment #37)
> What is the timeline/process for getting this to m-c?

It's landed already (bug 603388)! For some reason it was no longer blocking that... weird.
Blocks: 603388
Whiteboard: [qa-]
Depends on: 608109
Backed out due to crashes (bug 604756): http://hg.mozilla.org/services/fx-sync/rev/2fad8545bcb8

Marking this as WONTFIX, filed bug 608156 to run WeaveCrypto in Chrome Worker.
Resolution: FIXED → WONTFIX
Blocks: 604603
i am confused.  Does this problem still block b2?
(In reply to comment #40)
> i am confused.  Does this problem still block b2?

The problem of crypto blocking the main thread still persists, but we're going to tackle it differently. That's what bug 608156 is about, so probably that should block now.
Component: Firefox Sync: Crypto → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.