Closed Bug 529380 Opened 15 years ago Closed 13 years ago

weave spins the event loop which is BAD

Categories

(Cloud Services :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 600059
Future

People

(Reporter: dietrich, Unassigned)

References

Details

spinning the event loop can trigger a crash in gtk (bug 519438). sdwilsh said weave does this. until bug 519438 is closed, weave should avoid doing this.
OS: Windows NT → Linux
This might be bug 526391?
Or one instance of our spinning the loop and triggering a crash.
FYI, we spin the event loop whenever we do any network requests.
Depends on: 526391
The core bugs seem to be fixed, but this still isn't the best solution out there.  Marking Future, since we should eventually fix this, but we don't have a targeted release.
Target Milestone: --- → Future
Underlying issue was fixed that could cause the crash, however this is a bad bug.  From bug 496019 comment 6:
It's bad because storage js can cause a nested event loop at any time it feels
like it even though we know there are times where a nested event loop is
unsafe.

Also note that spinning the event loop has caused security bugs in the past (see bug 477432 comment 0).  The static analysis there would have flagged this as a bad behavior.
Summary: weave can potentially trigger a crash via bug 519438 → weave spins the event loop which is BAD
What's on the stack when Weave spins the event loop?
(In reply to comment #6)
> What's on the stack when Weave spins the event loop?
Should be chrome code
Last I checked, the add-on code would setTimeout-wrap the calls into the service, which implements its own lock/mutex mechanism. So the top of the stack would be the setTimeout.
(In reply to comment #8)
> Last I checked, the add-on code would setTimeout-wrap the calls into the
> service, which implements its own lock/mutex mechanism. So the top of the stack
> would be the setTimeout.
I clearly saw a call to nsIThread::processNextEvent in the code I was shown earlier.
(In reply to comment #9)
> I clearly saw a call to nsIThread::processNextEvent in the code I was shown
> earlier.

http://mxr.mozilla.org/services-central/source/fx-sync/services/sync/modules/ext/Sync.js#142
Weave is spinning event loops on more than just network requests.

Here it is spinning it to implement a 15-second quasi-sleep:
http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/service.js#993

Here it spins it for 0 duration sleeps, likely to clear out the event queue, but without any comments as to why it is doing that:
http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/engines.js#495
http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/engines.js#703

Here is a helper function that spins the event loop for "asynchronous" database queries:
http://mxr.mozilla.org/mozilla-central/source/services/sync/modules/util.js#159

That helper function appears to be used a number of times in non-test code:
http://mxr.mozilla.org/mozilla-central/ident?i=queryAsync

The good news is that as long as each piece of software in the system that decides to spin the event loop keeps itself to one such event loop, the worst-case scenario is likely to be performance degradation (from JS having to spin the event loop) and failure of the software as its control flow is starved out by other event loop spinners.

The bad news is that it appears many Labs projects use the Sync.js module and the Resource.js module which relies on it and not all of them adopted locks.  For example, Contacts/People does it too but without guards.  (In fact, its publicly exposed API may potentially be vulnerable to a stack-blowing attack; I haven't worked it through.)

I would strongly encourage moving away from event loop spinning as soon as possible.  If the code would become too unpleasant with a direct transition to callbacks, our JS implementation does provide generators which allow for suspension of a single JS function while maintaining all state.  Unfortunately, that is just for the one function and not the whole JS stack.  This can be worked around by introducing library code to maintain a stack of pending generators at the expense of intuitiveness.  (Thunderbird's global database does this if you are looking for an example.)
(In reply to comment #11)
> Weave is spinning event loops on more than just network requests.

Since none of Weave's APIs are async, it basically uses Sync.js on anything that is (network requests, async SQL queries, off-thread crypto, etc.)

> The good news is that as long as each piece of software in the system that
> decides to spin the event loop keeps itself to one such event loop, the
> worst-case scenario is likely to be performance degradation (from JS having to
> spin the event loop) and failure of the software as its control flow is starved
> out by other event loop spinners.

It might actually be worse, as bug 604565 demonstrates (though that might be a pathological case since it only occurs on OSX debug builds.)

> The bad news is that it appears many Labs projects use the Sync.js module and
> the Resource.js module which relies on it and not all of them adopted locks. 
> For example, Contacts/People does it too but without guards.

I think the fact that Sync.js kills babies has been widely recognised now, but it can't hurt to reiterate it indeed. I can't speak for Labs projects, but going forward in Sync all new code is going to provide async APIs. We've recently added an async version of Resource (bug 603301) and there are even plans to bring a non-Weave specific version of this to netwerk (bug 581560) for other projects (e.g. the ones from Labs) to use.

> I would strongly encourage moving away from event loop spinning as soon as
> possible.

Absolutely agreed. We have bug 600059 to track this effort, but it's not going to be quick or easy.

> If the code would become too unpleasant with a direct transition to
> callbacks, our JS implementation does provide generators which allow for
> suspension of a single JS function while maintaining all state.  Unfortunately,
> that is just for the one function and not the whole JS stack.  This can be
> worked around by introducing library code to maintain a stack of pending
> generators at the expense of intuitiveness.

Using 'yield' to write async calls in a sync manner is neat (I'm familiar with it from the Twisted Python framework), but it's just syntactic sugar over the fact that the APIs need to be async. I'd like all of our APIs to take callbacks. Then we're free to use whatever mechanism on top of that to make our code look sane.
yield's neat, that's what we did before Sync.js... not 100% sure why we switched, but see bug 496297.
I'm just going to dupe this to bug 600059, since that's what we actually need to do here.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.