Last Comment Bug 718121 - (supersnappy) Allow chrome and content to run on different threads
(supersnappy)
: Allow chrome and content to run on different threads
Status: RESOLVED INCOMPLETE
[Snappy]
: perf
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- enhancement with 59 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 327240 795761 (view as bug list)
Depends on: 742497 js-unwind 761872 772820
Blocks: 384323 thread
  Show dependency treegraph
 
Reported: 2012-01-13 17:30 PST by Brian Hackett (:bhackett)
Modified: 2015-04-26 23:30 PDT (History)
126 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
WIP prototype (2676737a824c) (279.17 KB, patch)
2012-02-06 13:30 PST, Brian Hackett (:bhackett)
no flags Details | Diff | Review
WIP prototype (2676737a824c) (368.87 KB, patch)
2012-02-12 15:32 PST, Brian Hackett (:bhackett)
no flags Details | Diff | Review
WIP prototype (2676737a824c) (457.31 KB, patch)
2012-02-23 11:36 PST, Brian Hackett (:bhackett)
no flags Details | Diff | Review
WIP prototype (2676737a824c) (465.21 KB, patch)
2012-02-24 07:10 PST, Brian Hackett (:bhackett)
no flags Details | Diff | Review
WIP prototype (2676737a824c) (635.86 KB, patch)
2012-03-05 17:13 PST, Brian Hackett (:bhackett)
no flags Details | Diff | Review
WIP prototype (2676737a824c) (646.89 KB, patch)
2012-03-06 12:42 PST, Brian Hackett (:bhackett)
no flags Details | Diff | Review
WIP prototype (2676737a824c) (775.78 KB, patch)
2012-03-26 15:00 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
WIP prototype (2e7a23be3310) (849.73 KB, patch)
2012-03-30 09:30 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
prototype (2e7a23be3310) (876.91 KB, patch)
2012-04-02 18:10 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
prototype (2e7a23be3310) (957.45 KB, patch)
2012-04-12 09:51 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review
prototype (2e7a23be3310) (1.10 MB, patch)
2012-04-23 11:59 PDT, Brian Hackett (:bhackett)
no flags Details | Diff | Review

Description Brian Hackett (:bhackett) 2012-01-13 17:30:47 PST
After going through bug 384412 and having some extended discussions with luke, I've been looking for a somewhat minimal extension to the event/threads model which avoids locking the browser on content stalls.  For now this is an experimental bug to help me learn some of our events/threads/etc. code and see if I can prototype these changes.

Definition: a |constellation| is either all of chrome or a collection of content data (JS compartments, windows/tabs) that is disconnected from all other content constellations.

Use a pool of threads with a single, shared event queue (as now).  As new events come in, threads may pick them up and start executing, under the following constraints:

1. All events start out in chrome, and may switch among constellations as they execute.

2. Constellations may be locked exclusively.  A thread must lock a constellation in order to access any of its data, and will generally hold the lock until switching to another constellation.  There is no multithreaded access to any data.

3. Chrome is locked only when in active use by a thread, and is unlocked as the thread tries to enter content.  Additionally, once a thread starts executing a content script, it holds the lock on that content constellation until the event finishes.  This is only done for the first such content constellation, to keep weird addons from deadlocking, and otherwise ensures run-to-completion for content.

The main idea here is that even if content is stalled, it will not keep chrome locked and chrome scripts can continue to execute.  Even if those chrome scripts block when trying to enter the stalled content, their threads will unlock chrome first and other threads can then start executing other pending events.

Now, a few problems:

4. If many events require access to content, all threads in the pool could end up blocked trying to enter the content.

5. This seems to really break run-to-completion for chrome, with unknown effects on the browser and addons.

So, a refinement:

6. Only keep two threads in the pool, a main thread and a reserve thread.  Normally, only the main thread runs, and behaves as it does now (since it is the only running thread, there is no lock contention).

7. When a "high priority" UI input event comes in (e.g. a mouse click) activate a watchdog and if small X ms pass without the event being serviced, give the event to the reserve thread.  The reserve thread starts executing (ignoring events further up in the queue), running chrome handlers and scheduling another event to call any content handlers once content wakes back up.  The reserve thread should avoid touching content, though an addon could break this --- the reserve thread would then block, oh well.

Points 1-3 do not require a notion of a high priority event and are cleaner, but due to 4. would really need per-constellation event queues to scale well.  But doing that is more complicated, and due to 5. may not work well in practice, so going with 6. and 7. for experimenting, even if that is not the permanent solution, seems a good way to go.
Comment 1 Andreas Gal :gal 2012-01-18 20:46:44 PST
> the reserve thread starts executing (ignoring events further up in the queue),

This re-ordering of event processing can cause all sorts of confusion in chrome, right? We could maybe enforce that certain classes of events are processed in-order internally, but out-of-order w.r.t. to other classes, but that is not a guarantee that we won't break stuff in subtle ways either, agreed?

> running chrome handlers and scheduling another event to call any content handlers once content wakes back up.

Thats often now how we interact with content, right? Doesn't event bubbling require here that content gets to process the event first and then we decide whether chrome gets a stab at it or not? (think multi-touch events, if content grabs them and prevents the default we don't want to do scrolling, otherwise we do want to deliver it to chrome and do the chrome event handling).

> The reserve thread should avoid touching content, though an addon could break this --- the reserve thread would then block, oh well.

50% of people use addons and many addons touch content, so thats a big oh well :-/

I think we need some strong invariants here. If we can come up with those, we can see if we can hammer the code into shape to comply with them (and we enforce them).
Comment 2 Olli Pettay [:smaug] 2012-01-19 06:51:34 PST
(In reply to Andreas Gal :gal from comment #1)
> > running chrome handlers and scheduling another event to call any content handlers once content wakes back up.
> 
> Thats often now how we interact with content, right? Doesn't event bubbling
> require here that content gets to process the event first and then we decide
> whether chrome gets a stab at it or not?

Currently (DOM) events dispatched to the content are handled so that first event target chain is created.
The chain is usually:
(1)chrome window -> (2)chrome document -> (3)chrome elements -> (4)message manager -> (5)content window -> (6)content document -> (7)content elements up to the event target
Event is then handled by the event target chain items, first capture phase 1->7, then bubble phase 7->1
(and then additional system/default handling 1->7->1)

Chrome handles currently some content events in capture phase and some events in bubble phase.
Handling in bubble phase requires usually that content has processed the event first.
This is the case for example with 'click'.
I would assume that doing any thread locking during event handling would significantly slow down event handling
(by default I don't for example accept any extra nsCOMPtr usage in event handling, since that shows up badly in performance profiles.)
We could move the chrome event handling which needs to handle content events to message manager, but then we're in
E10s setup (and we should then just finally implement rest of E10s to get it working).
Comment 3 Luke Wagner [:luke] 2012-01-19 16:07:56 PST
(In reply to Andreas Gal :gal from comment #1)
> This re-ordering of event processing can cause all sorts of confusion in
> chrome, right? We could maybe enforce that certain classes of events are
> processed in-order internally, but out-of-order w.r.t. to other classes, but
> that is not a guarantee that we won't break stuff in subtle ways either,
> agreed?

There is no question that the proposed experiment could possibly break chrome JS.  But the question is: does it?  how bad?  are there simple techniques that fix most of it?  It seems that most discussions on this subject get paralyzed by all these unknown possibilities of breakage; the goal here is to look and see.  This seems like an approach that would be near and dear to your heart ;-)

> > The reserve thread should avoid touching content, though an addon could break this --- the reserve thread would then block, oh well.
> 
> 50% of people use addons and many addons touch content, so thats a big oh
> well :-/

Yes, but do all those addons like to touch content from high-priority events?  "High-priority event" is not even well-defined yet, so clearly we don't know, but it's worth experimenting.  (Recall that the "oh well" case here is "behave like FF does today".)

(In reply to Olli Pettay [:smaug] from comment #2)
> I would assume that doing any thread locking during event handling would
> significantly slow down event handling

What if the synchronization was only used when actually calling content JS?  I don't know if the event code is structured in such a way as to make this possible, but, assuming it is, calling JS definitely seems expensive enough to amortize acquiring/releasing an uncontended lock.
Comment 4 Andreas Gal :gal 2012-01-20 05:55:33 PST
Luke, don't get me wrong. I am not trying to discourage this work. Quite to the contrary. This is very important, and we have been working on a closely related proposal over the last weeks (green threads instead of real threads, but similar idea), and I want this work to happen. We are just trying to enumerate all the potential and likely problems we have to solve.
Comment 5 Brian Hackett (:bhackett) 2012-02-06 13:30:57 PST
Created attachment 594791 [details] [diff] [review]
WIP prototype (2676737a824c)

Barely working prototype to allow the browser to service UI events and remain semi-usable while content JS is ilooping.  Only tested on OS X x64.

- Splits work across two execution threads: the main thread services native events (mouse activity etc.) as before, and all gecko events are punted to a second thread which runs the gecko event loop.

- Adds two locks, one protecting all JS content compartments (content lock), one protecting everything else (chrome lock).

- Fixes lots of places (by no means all) where code expected to be running on the main thread, and could instead be running on another thread with the chrome lock held.

- Allows event handling JS to avoid blocking on content in some cases, by posting an exception when using content data if content cannot be quickly locked.

The next things this needs are a better boundary between content and chrome (such that e.g. DOM structures for content windows are protected by a content lock) and the ability to distinguish between different content constellations.
Comment 6 Olli Pettay [:smaug] 2012-02-06 13:43:09 PST
(In reply to Brian Hackett (:bhackett) from comment #5)
> Barely working prototype to allow the browser to service UI events and
> remain semi-usable while content JS is ilooping.  Only tested on OS X x64.
Very interesting!

> The next things this needs are a better boundary between content and chrome
> (such that e.g. DOM structures for content windows are protected by a
> content lock) and the ability to distinguish between different content
> constellations.
This is the part which I think is hard. How will addref/release work?
Currently cycle collectable objects must be addrefed/released only in the mainthread.
It could be possible to have separate purple buffers for chrome and content, but that might slow down addref/release.
Comment 7 Olli Pettay [:smaug] 2012-02-06 13:52:36 PST
(I know, this is probably too early to even ask)
Does the patch affect badly to event handling speed. It does add
code to the hot path in EventListenerManager.

I think that code should live in EventDispatcher where locking would happen only
when event dispatching crosses content-chrome boundary.
Comment 8 Brian Hackett (:bhackett) 2012-02-06 14:59:46 PST
With this patch addref/release can happen off the main thread, but no matter where they happen the chrome lock must be held, which should avoid racing on purple buffer accesses.  When refcounted structures can be protected by the content lock there will be a need for either multiple purple buffers or an additional lock which protects the purple buffer itself.

I suspect that doing the latter will be good enough.  I need to understand the purple buffer code and the performance pinchpoints better, but refcounted structures should still be able to test for membership in the purple buffer lock free, so the lock would only need to be taken when adding or removing an object from the buffer.

I don't know yet about the event handling overhead, and there will be I think a lot more changes necessary before it can be measured.  Unless this code is white hot I hope that taking locks will not be a big problem; PR_Lock is wrapping pthreads mutexes, which are pretty fast if uncontended (I just measured a pthread lock;unlock sequence at 20ns on my macbook).  The gecko event loop already takes locks (the event list is thread safe) and maybe these could be merged with the chrome lock.
Comment 9 Olli Pettay [:smaug] 2012-02-07 00:27:52 PST
(In reply to Brian Hackett (:bhackett) from comment #8)
> With this patch addref/release can happen off the main thread, but no matter
> where they happen the chrome lock must be held, which should avoid racing on
> purple buffer accesses.  When refcounted structures can be protected by the
> content lock there will be a need for either multiple purple buffers or an
> additional lock which protects the purple buffer itself.
> 
> I suspect that doing the latter will be good enough. 
Since Addref/release happens so often and is *extremely* performance critical, doing the latter
doesn't sound possible.

> I need to understand
> the purple buffer code and the performance pinchpoints better, but
> refcounted structures should still be able to test for membership in the
> purple buffer lock free, so the lock would only need to be taken when adding
> or removing an object from the buffer.
This happens all the time.

> I don't know yet about the event handling overhead, and there will be I
> think a lot more changes necessary before it can be measured.  Unless this
> code is white hot I hope that taking locks will not be a big problem;
> PR_Lock is wrapping pthreads mutexes,
PR_Lock shows up for example in the XPConnect profiles all the time. We should try to get rid of it when possible.
Perhaps locking could happen similarly to current cx pushing, where some real work is done only if
cx is changing.
Comment 10 Luke Wagner [:luke] 2012-02-07 10:28:30 PST
For DOM C++ objects, we could make the mapping from object to purple buffer efficient by doing something like we do with js::Cell::compartment.  Perhaps we could even literally allocate DOM objects in a compartment.  Extending the notion of 'compartment' to DOM has come up as a useful thing several times now in different contexts.  In the general case, of course, we just have some random non-DOM cycle-collected C++ object, but perhaps we can convert all such cases that are touched from content threads to be allocated like DOM objects?

An even further stretch is GC'ing the DOM which may remove enough hot add/rem refs that the single-shared-locked purple buffer wouldn't be killer.  (We can dream, right?)
Comment 11 Boris Zbarsky [:bz] 2012-02-07 10:39:10 PST
None of that is feasible in the short term.  At the moment DOM nodes can and do move between compartments, and making them not be nsISupports would be nice, but requires a full rewrite of the editor and a lot of Gecko first.
Comment 12 Olli Pettay [:smaug] 2012-02-07 11:15:59 PST
(In reply to Luke Wagner [:luke] from comment #10)
> For DOM C++ objects, we could make the mapping from object to purple buffer
> efficient by doing something like we do with js::Cell::compartment.  Perhaps
> we could even literally allocate DOM objects in a compartment. 
It was, IIRC, before FF3 when we tried to allocate each DOM Node in a document
from same memory arena. It gave some performance boost but caused extra memory usage.


> An even further stretch is GC'ing the DOM which may remove enough hot
> add/rem refs that the single-shared-locked purple buffer wouldn't be killer.
> (We can dream, right?)
AFAIK, in theory a well working, properly optimized CC can be faster than GC, since it doesn't
need to handle whole heap, only possible garbage.
But sure, we might be able to utilize GC better with DOM.
bsmedberg might have some comments about that. XPCOMGC wasn't very successful project.
Comment 13 Luke Wagner [:luke] 2012-02-07 14:06:59 PST
(In reply to Boris Zbarsky (:bz) from comment #11)
> None of that is feasible in the short term.

Dreaming, remember ;-)  Srsly, though, if an issue comes up enough, it seems worth considering a medium-term investment.

(In reply to Olli Pettay [:smaug] from comment #12)
> It was, IIRC, before FF3 when we tried to allocate each DOM Node in a
> document from same memory arena. It gave some performance boost but caused extra
> memory usage.

Waving hands here, but: all the recent/planned GC improvements should help with those issues you raised.  Was the old DOM allocator even as sophisticated as the old JS GC?

> XPCOMGC wasn't very successful project.

MDN indicates XPCOMGC was a conservative GC that replaced all ref-counting.  I wasn't suggesting a conservative GC or replacing all ref-counting, just a DOM subset with exact rooting (and hand-waving, I know).
Comment 14 Olli Pettay [:smaug] 2012-02-07 14:36:20 PST
> Waving hands here, but: all the recent/planned GC improvements should help
> with those issues you raised.  Was the old DOM allocator even as
> sophisticated as the old JS GC?
It was just a memory arena. Not GC. I'm sure we could do better now, but
there are obvious problems with such approach (DOM nodes may move from one document to another).
Though, I guess compartments have similar problems. I don't know how they are handled.

> I wasn't suggesting a conservative GC or replacing all ref-counting, just a
> DOM subset with exact rooting (and hand-waving, I know).
Not sure what "a DOM subset with exact rooting" means, but if you can come up with a 
better JS<->C++ memory handling model, I'm all for it :)


(In reply to Luke Wagner [:luke] from comment #10)
> For DOM C++ objects, we could make the mapping from object to purple buffer
> efficient by doing something like we do with js::Cell::compartment.
Hmm, maybe this could work. Having basically mapping from memory location to a purple buffer.
I wonder how slow that would be. There would need to be some thread checks too, and locking in
slow/wrong-thread paths.
I wish we had spare bits in nsCycleCollectingAutoRefCnt...
Comment 15 Brian Hackett (:bhackett) 2012-02-12 15:32:14 PST
Created attachment 596527 [details] [diff] [review]
WIP prototype (2676737a824c)

Somewhat more functional.

Instead of using a single content zone and associated thread, splits content into multiple zones (these are 1:1 with outer windows, for now) and gives each one a thread which spins a gecko event loop.  Then while content associated with one thread is spinning, gecko events associated with chrome or with other zones can still run.  (The two thread model in the previous patch breaks down quickly when trying to actually interact with other content windows).

This is functional enough that, while one tab is ilooping in JS, other tabs can be switched to, closed, interacted with, etc., at least for simple examples.  The content locks still only protect JS compartments, and the next steps are to start moving C++ structures under these locks so that lock transitions only happen when switching between chrome and content.
Comment 16 Brian Hackett (:bhackett) 2012-02-23 11:36:32 PST
Created attachment 600111 [details] [diff] [review]
WIP prototype (2676737a824c)

Start moving ownership of content DOM structures under the corresponding content lock.  For now just nsGlobalWindow and nsDocument are checked, but it probably isn't a huge leap to do things for the rest of the DOM too.  With an ilooping tab the browser starts up and is partially usable, still needs some more work (can't switch between tabs and the tab titles and 'X' aren't being updated).
Comment 17 Brian Hackett (:bhackett) 2012-02-24 07:10:04 PST
Created attachment 600384 [details] [diff] [review]
WIP prototype (2676737a824c)

Fix issues with rendering tab titles and allowing tab usage.  The problem earlier was just that some gecko events were blocked on accessing the ilooping content.  When those events are running on the chrome thread then most gecko events will stall.  One important invariant that isn't yet in place is that any time content is accessed either (a) a non-blocking trylock is used, or (b) the event is on the loop for that zone's content thread.  This should prevent slow content from stalling other parts of the browser.
Comment 18 Igor Bukanov 2012-02-29 04:29:23 PST
Do we really need two threads here? I.e. why not to use the existing operation callback implementation to interrupt a long-running scripts and just process high priority events from the operation callback?
Comment 19 Brian Hackett (:bhackett) 2012-02-29 06:54:39 PST
The main problem is that interrupting would break run to completion, the model that scripts run in a single threaded world and should not be able to see effects from other stuff running in parallel.  High priority events can have side effects on e.g. DOM stuff that are visible to the script that was interrupted, and if those events run and the long script resumed, it could behave incorrectly.

The second problem is that this is intended to help responsiveness in more cases than just long running scripts, but rather any long event in the browser which can be isolated to content.  e.g. bug 725920, a gigantic JS app that takes 11 seconds to parse, or bug 627635, synchronous main thread I/O.
Comment 20 Igor Bukanov 2012-02-29 07:11:47 PST
(In reply to Brian Hackett (:bhackett) from comment #19)
> The main problem is that interrupting would break run to completion, the
> model that scripts run in a single threaded world and should not be able to
> see effects from other stuff running in parallel.

I do not see how having other stuff running on a parallel thread avoids this issue. From a script point of view the operation callback invocation is completely transparent yet it avoids any synchronization issues that arise with parallel threads.

>  High priority events can
> have side effects on e.g. DOM stuff that are visible to the script that was
> interrupted, and if those events run and the long script resumed, it could
> behave incorrectly.

Right, but I do not see how this can be an argument using an operation callback.

> The second problem is that this is intended to help responsiveness in more
> cases than just long running scripts, but rather any long event in the
> browser which can be isolated to content.  e.g. bug 725920, a gigantic JS
> app that takes 11 seconds to parse.

Our parser/emitter can also call the operation callback periodically. 

> or bug 627635, synchronous main thread
> I/O.

This is an example of something that operational callback cannot address. Yet thread synchronization can be nasty so avoiding it can bring benefits faster.
Comment 21 Brian Hackett (:bhackett) 2012-02-29 07:26:43 PST
The multithreaded model in this bug preserves run to completion by using a) locks for each zone in which content scripts can observe effects (window + other windows opened by the original), and b) separate event queues and threads for each of those zones.  When a content script is running, it locks the zone until the script finishes, preventing side effects from events running in parallel.  That script is also running on the thread and event queue associated with the zone, so that only future activity in the zone is stalled while the script runs.

The browser can already run nested event loops in the middle of content scripts in some cases, e.g. when an alert dialog is up.  This is a bug, and breaks script behavior in observable ways, see bug 637264.  Allowing this to happen at any time we invoke the operation callback would I think make the breakage widespread.
Comment 22 Brian Hackett (:bhackett) 2012-03-05 17:13:48 PST
Created attachment 603117 [details] [diff] [review]
WIP prototype (2676737a824c)

Lots more changes, mostly to increase robustness of the locking and allow some websites to be usable without busting asserts (tested google.com and en.wikipedia.org).  Have only tested usability when running these sites alone, next thing is to get things working when loading the sites with ilooping content in the background.

The major new change this patch makes is modifying nsISupports to add a GetZone() method to fetch the zone/lock protecting the object.  This change was mainly made for XPConnect, which frequently wraps arbitrary nsISupports objects and needs to know the lock protecting those objects is held when the objects are later accessed via JS.  I don't know if this change will end up being necessary (and still not sure if it is even possible to stick a change to nsISupports), but there's definitely a need for this functionality and a new method on nsISupports is the clean/fast way to go about it.
Comment 23 Brian Hackett (:bhackett) 2012-03-06 12:42:31 PST
Created attachment 603418 [details] [diff] [review]
WIP prototype (2676737a824c)

Replace enough locks with trylocks to allow browsing wikipedia with an ilooping content tab in the background.
Comment 24 Brian Hackett (:bhackett) 2012-03-26 15:00:32 PDT
Created attachment 609502 [details] [diff] [review]
WIP prototype (2676737a824c)

Updated prototype, works pretty well on most websites (still some flakiness, but few crashes).  Needs a rebase next.
Comment 25 Patrick Walton (:pcwalton) 2012-03-26 18:04:36 PDT
What does it mean to stick content in this patch?
Comment 26 Brian Hackett (:bhackett) 2012-03-26 18:43:03 PDT
(In reply to Patrick Walton (:pcwalton) from comment #25)
> What does it mean to stick content in this patch?

Sticking a content zone's lock means that the content will remain locked until the current gecko or native event finishes.  There is an exception to this, if a different content zone is locked during the event --- to preserve lock ordering constraints the first one may be unlocked and then relocked after the second zone is taken.  But if an event running content script only touches a single content zone (as should always be the case, though there are current exceptions for e.g. nested event loops) then run to completion will be preserved for those content scripts.
Comment 27 Brian Hackett (:bhackett) 2012-03-30 09:30:25 PDT
Created attachment 610910 [details] [diff] [review]
WIP prototype (2e7a23be3310)

Rebased.
Comment 28 Brian Hackett (:bhackett) 2012-04-02 18:10:58 PDT
Created attachment 611676 [details] [diff] [review]
prototype (2e7a23be3310)

Some more tweaks, works pretty well now for browsing most websites, with or without an ilooping tab in the background (rm the WIP).  Still busts asserts in some cases and other races to fix.

This also adds some profiling bits to count the number of lock attempts.  I opened up 10 tabs (using gwagner's membench, so an assortment of websites) and got the numbers below.  These distinguish lock attempts on chrome vs. content zones, blocking locks vs. trylocks, and new lock acquisitions vs. reacquiring a lock already held (relocking it).

Lock LockContent: 6916
Lock LockChrome: 103010
Lock RelockContent: 61058
Lock RelockChrome: 2302889
Lock TryLockContent: 6494
Lock TryLockChrome: 0
Lock TryRelockContent: 230644
Lock TryRelockChrome: 0

For both chrome and content, in the great majority of cases lock attempts are being made on locks which are already held.  Rather than try to shift lock acquires around to try to reduce this %, I think it would be better to just make relocking very fast (it's currently almost as slow as the initial acquisition).  With some stack pointer (black) magic it should be possible to usually determine that the current thread already has a lock held, even without knowing the current PRThread or accessing any TLS.  Will play around with this.

But estimating the locking overhead for an initial acquire (of an uncontended lock) at 20ns and a relock at 2ns (I think these are both high, but no hard data other than measuring pthreads lock/unlock pairs at 10ns on my MBP) this would point to locking overhead of less than 10ms to open and load 10 tabs.  So even if these estimates turn out low I don't think there's much reason to worry about the locking overhead.
Comment 29 Ted Mielczarek [:ted.mielczarek] 2012-04-03 05:26:01 PDT
I think it'd also be worthwhile to run some smoketests with some of the most popular add-ons installed, like AdBlock for example. It's possible that there are patterns of chrome/content access that show up more often in add-ons than the core Firefox code.
Comment 30 Brian Hackett (:bhackett) 2012-04-03 07:37:22 PDT
(In reply to Ted Mielczarek [:ted] from comment #29)
> I think it'd also be worthwhile to run some smoketests with some of the most
> popular add-ons installed, like AdBlock for example. It's possible that
> there are patterns of chrome/content access that show up more often in
> add-ons than the core Firefox code.

Definitely, but my biggest concern with addons is more about whether they will work correctly.  This patch doesn't (well, shouldn't) affect the semantics of content scripts at all, but chrome script behavior can change at the point where it first accesses a content zone.  At this point it needs to lock the content, and two things happen:

- The chrome lock is released before taking the content lock, to preserve lock ordering and avoid deadlocks.  It is reacquired after the content is locked, but releasing the lock at all breaks run-to-completion for the chrome code --- some other thread could have modified chrome-owned structures.

- Acquiring content is generally done as a trylock, which can fail if the content lock's owner is stalled doing some long-running operation.  If the trylock fails, an exception is thrown in the calling chrome code.

If the second causes problems it could be changed to a blocking lock if the calling script is known to be an addon, though doing that would be a problem for responsiveness if the chrome event loop gets stalled.  What I'm hoping (knowing next to nothing about how addons actually hook in to the browser) is that we'll generally know the content zone being accessed before the addon script even starts executing, e.g. the addon does stuff in response to a DOM event firing.  Then the lock would already be held when the addon script starts and it would see no change in behavior.

This is of course speculative, and addon compatibility is certainly the most important piece left to understand and address.
Comment 31 Brendan Eich [:brendan] 2012-04-10 19:33:36 PDT
From IRC:

[2:59pm] brendan: any way to detect (as in a DRD) chrome hazards?
[2:59pm] bhackett: DRD?
[2:59pm] bhackett: oh, data races
[2:59pm] brendan: yup
[2:59pm] brendan: since you are changing the execution model for some add-ons
[2:59pm] bhackett: well, proper data races in C++ are still just being caught via assertions
[2:59pm] bhackett: but there isn't anything in place for add-ons
[3:00pm] brendan: right, i'm thinking JS races
[3:00pm] bhackett: I think that the way that would work would be to keep track of places where chrome addon code accesses a content zone for the first time
[3:00pm] bhackett: that is the only place where new interleavings could come in, since the chrome lock is released when the content lock is taken
[3:01pm] bhackett: are you thinking about how to prevent these, or how to convey them to addon developers for debugging?
[3:09pm] brendan: the latter
[3:10pm] bhackett: that should be straightforward
[3:11pm] bhackett: we can just record the JS stack at the points where the chrome lock is released, and convey them to Firebug / other tools

/be
Comment 32 Brian Hackett (:bhackett) 2012-04-12 09:51:52 PDT
Created attachment 614416 [details] [diff] [review]
prototype (2e7a23be3310)

More fixes and stabilizing, with some more improvements like letting the cycle collector run (it had been inadvertently disabled before) and allowing the browser to shut down cleanly.  I've tried out adblock and noscript, and both work fine.
Comment 33 Olli Pettay [:smaug] 2012-04-12 10:11:31 PDT
Comment on attachment 614416 [details] [diff] [review]
prototype (2e7a23be3310)

>@@ -936,46 +939,55 @@ public:
>         nsPurpleBufferEntry *e = mFreeList;
>         mFreeList = (nsPurpleBufferEntry*)
>             (PRUword(mFreeList->mNextInFreeList) & ~PRUword(1));
>         return e;
>     }
> 
>     nsPurpleBufferEntry* Put(nsISupports *p)
>     {
>+        PR_Lock(mLock);
>+
>         nsPurpleBufferEntry *e = NewEntry();
>         if (!e) {
>+            PR_Unlock(mLock);
>             return nsnull;
>         }
> 
>         ++mCount;
> 
>         e->mObject = p;
> 
> #ifdef DEBUG_CC
>         mNormalObjects.PutEntry(p);
> #endif
> 
>+        PR_Unlock(mLock);
>+
>         // Caller is responsible for filling in result's mRefCnt.
>         return e;
>     }
> 
>     void Remove(nsPurpleBufferEntry *e)
>     {
>+        PR_Lock(mLock);
>+
>         NS_ASSERTION(mCount != 0, "must have entries");
> 
> #ifdef DEBUG_CC
>         mNormalObjects.RemoveEntry(e->mObject);
> #endif
> 
>         e->mNextInFreeList =
>             (nsPurpleBufferEntry*)(PRUword(mFreeList) | PRUword(1));
>         mFreeList = e;
> 
>         --mCount;
>+
>+        PR_Unlock(mLock);
>     }

This part is extremely hot, so we may need to figure out something which doesn't lock.
Put and Remove show up already badly in the profiles, and we really don't want to make them any slower.
Comment 34 Justin Lebar (not reading bugmail) 2012-04-12 10:14:11 PDT
> This part is extremely hot, so we may need to figure out something which doesn't lock.

Could you use TLS here?
Comment 35 Andrew McCreight [:mccr8] 2012-04-12 10:18:20 PDT
If each object is only used on one thread, then each thread could probably have its own purple buffer.  I don't know what you'd do in situations where one thread decrefs an object, putting it in its own purple buffer, then another thread addrefs it, which should cause it to get ejected from the purple buffer, if that's a possibility.  You could either lock in that situation, and hope it doesn't happen much, or just not eject the object from the purple buffer, and hope that our CC optimizations are good enough to deal with it without causing too much trouble.
Comment 36 Brian Hackett (:bhackett) 2012-04-12 10:24:32 PDT
I don't think using TLS would be significantly faster than using locks, but I don't know that for sure.  An alternative is to have separate purple buffers for each zone, and ensure the thread has locked an nsISupport's zone before inc/decref'ing it (the patch already pretty much does this).  Put and Remove still need to figure out which zone a given nsISupports is in, which is an indirect call (via the shiny new nsISupports::GetZone) that is no worse than the original IncRef/DecRef and could probably be optimized effectively by a compiler doing PGO.
Comment 37 Brian Hackett (:bhackett) 2012-04-12 10:29:44 PDT
(In reply to Andrew McCreight [:mccr8] from comment #35)
> If each object is only used on one thread, then each thread could probably
> have its own purple buffer.  I don't know what you'd do in situations where
> one thread decrefs an object, putting it in its own purple buffer, then
> another thread addrefs it, which should cause it to get ejected from the
> purple buffer, if that's a possibility.  You could either lock in that
> situation, and hope it doesn't happen much, or just not eject the object
> from the purple buffer, and hope that our CC optimizations are good enough
> to deal with it without causing too much trouble.

Most previously-main-thread-only objects can be accessed on multiple different threads, with the chrome/content locks ensuring those threads don't race each other.  Generally, content objects will be accessed either by the main thread or by the content thread associated with the object's zone, and chrome objects may be accessed by the main thread or by any of the content zones.
Comment 38 Willy_ Foo_Foo 2012-04-15 01:45:20 PDT
This will be implemented when?
& windows & Linux will also be beneficial?
Comment 39 Brian Hackett (:bhackett) 2012-04-15 06:23:30 PDT
I'm actively working on implementing this, but don't know when it will make it into a release.  And yes, this will be implemented for all platforms.  (Though currently it only works on mac).
Comment 40 CruNcher 2012-04-21 04:34:19 PDT
Do you think this is a good testcase for it http://www.notebookcheck.net/NVIDIA-GeForce-GTX-580M-SLI.56637.0.html it stalls Chrome completely on my System for 2s until it's done rendering, though this is with very extensive extensions (Ghostery ABP and Flagfox running @ the same time.
Comment 41 CruNcher 2012-04-21 04:35:46 PDT
User on Lower End systems even seem to experience on Windows "Not Responding Situations from Chrome for several seconds"
Comment 42 Ksec 2012-04-21 04:42:07 PDT
@ CruNcher I see no problem with that page at all. Why would it stall chrome?
Comment 43 CruNcher 2012-04-21 05:25:54 PDT
It seems their is some heavy overhead in the html5 loop going on and that stalls everything it's highly red in the profiler as well.
Comment 44 CruNcher 2012-04-21 05:30:38 PDT
this only stalls chrome due loading after loading everything is fine 
html5.offmainthread = true doesn't help
Comment 45 Leman Bennett [Omega] 2012-04-21 06:01:51 PDT
Please Cruncher, don't post one sentence at a time. You'll spam the bug causing the developers to move the work to an unlinked and quiet bug.
Comment 46 CruNcher 2012-04-21 06:19:30 PDT
Yeah sorry i have to further look into this as well first i have 2 pages now to test why this happens and what actually stalls Chrome on them and why in this particular setup :)
Comment 47 Brian Hackett (:bhackett) 2012-04-23 11:59:00 PDT
Created attachment 617581 [details] [diff] [review]
prototype (2e7a23be3310)

Mostly passes reftests.  No crashes, but some unexpected failures and tests which use plugins are broken.
Comment 48 Brendan Eich [:brendan] 2012-06-04 12:04:08 PDT
Any updates? I think I chatted about this last month and there was good progress. Save your rebased patches here, please!

/be
Comment 49 Brendan Eich [:brendan] 2012-06-04 12:59:34 PDT
WTFbugzilla!
Comment 50 Brian Hackett (:bhackett) 2012-06-04 13:37:00 PDT
A couple weeks ago I moved development to the Maple branch and have been working there mostly on fixing test failures.  The main new thing is beefed up assertions that when inc/dec ref'ing nsISupports objects that the specific lock specified by GetZone() is held.  Previously we just kept intersecting the held locks on each inc/dec ref and failed if that ever became empty.  This is good for a couple reasons:

- XPConnect is the main consumer of GetZone() calls, so having those actually return the appropriate zone gives good guarantees things should work when wrapping up arbitrary nsISupports things and accessing them from chrome JS.

- Allows for new extensions that leverage GetZone() to find races.  I want to write a compiler extension (for Clang or gcc 4.6, don't know yet) that checks that the lock is held for GetZone() whenever calling any method on an nsISupports object (with exceptions, such as GetZone() itself) or accessing global variables / static members.  This should give very strong race checking for this stuff, and shouldn't hurt debug build perf too much once some speed enhancements to owned-lock-checking go in.
Comment 51 Brian Hackett (:bhackett) 2012-06-04 13:38:37 PDT
Also, pretty soon (probably not this week or next, what with PLDI) I want to move development to the JM branch and get nightlies spinning.
Comment 52 Dão Gottwald [:dao] 2012-10-02 06:29:17 PDT
*** Bug 795761 has been marked as a duplicate of this bug. ***
Comment 53 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-10-12 02:19:30 PDT
How does painting work here? What happens when we need to paint and a visible tab's content zone is busy? It looks like we do a try-lock, but what happens if that fails?

It would be really helpful if nsThreadUtils.h had some documentation for what all the locks do, what invariants need to govern them, and when each should be used.
Comment 54 Till Schneidereit [:till] 2013-02-22 06:17:39 PST
*** Bug 327240 has been marked as a duplicate of this bug. ***
Comment 55 Caspy7 2013-06-12 18:48:35 PDT
Ping.
Hadn't seen any work/updates on this in some time and was wondering if there was progress to report or if perhaps supersnappy is being relegated for some other effort?

There was also mention of moving development to the JM branch, but JM is going (has gone?) away.  I'm wondering if there's a branch where development is happening.
Comment 56 Brian Hackett (:bhackett) 2013-06-12 18:56:10 PDT
This work is on indefinite hiatus.  Electrolysis work is (once again) underway which should get most of the benefits of this approach.

https://wiki.mozilla.org/Electrolysis

Note You need to log in before you can comment on or make changes to this bug.