Closed
Bug 1323372
Opened 8 years ago
Closed 8 years ago
stylo: StyleSharingCandidateCache lifetime is too long in sequential traversal and too short in parallel traversal
Categories
(Core :: CSS Parsing and Computation, defect)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
People
(Reporter: bholley, Assigned: bholley)
References
Details
I just got a UAF crash in the StyleSharingCandidate crash reloading stylo. As it turns out, the cache (which lives in TLS) never gets cleared in sequential traversal, which means that the UnsafeNodes it stores can point to garbage memory. Looks like this all changed around in https://github.com/servo/servo/pull/12668
Servo presumably doesn't hit this because we're likely only using the parallel traversal there.
So the sequential traversal needs to clear the TLS before returning, which is easy enough. The parallel traversal seems suboptimal though. Right now we clear it each time we process a WorkUnit:
https://github.com/servo/servo/blob/master/components/style/parallel.rs#L110
As soon as we set CHUNK_SIZE=1 on desktop like we plan to, the cache won't be doing anything at all.
So we need one of the following:
(a) A way to dispatch some kind of finalization task to each thread in the rayon thread pool that processes tasks, so that we can clear the TLS when the traversal completes.
(b) Some non-TLS way to pass thread-local data down a rayon traversal.
(b) Would certain be nicer, since we could potentially get rid of the UnsafeNode usage which allowed me to UAF here.
Assignee | ||
Comment 2•8 years ago
|
||
Thinking about this a little bit more, I think we can do some variant of (b) by passing an immutable borrow of a shared data structure down the rayon stack. This struct would have N slots of Option<LocalContext> (one for each worker), and provide callers mutable access to the entry at the index corresponding with WorkerThread::current().index().
This effectively gives us TLS without the static scoping, which is exactly what we want here. It scopes the lifetime of the cache the way we want and, with any luck, should allow us to remove the UnsafeNode stuff.
I'll have a look at this tomorrow.
Flags: needinfo?(nmatsakis) → needinfo?(bobbyholley)
Comment 3•8 years ago
|
||
Some comments,
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0)
> I just got a UAF crash in the StyleSharingCandidate crash reloading stylo.
> As it turns out, the cache (which lives in TLS) never gets cleared in
> sequential traversal, which means that the UnsafeNodes it stores can point
> to garbage memory.
I think that diagnostic is wrong. We do clear it in sequential traversal, just because of that reason:
https://github.com/servo/servo/blob/master/components/style/sequential.rs#L56
> Looks like this all changed around in
> https://github.com/servo/servo/pull/12668
>
> Servo presumably doesn't hit this because we're likely only using the
> parallel traversal there.
That is also wrong, servo is now using the sequential traversal depending on DOM size:
https://github.com/servo/servo/pull/12862
> So the sequential traversal needs to clear the TLS before returning, which
> is easy enough. The parallel traversal seems suboptimal though. Right now we
> clear it each time we process a WorkUnit:
>
> https://github.com/servo/servo/blob/master/components/style/parallel.rs#L110
>
> As soon as we set CHUNK_SIZE=1 on desktop like we plan to, the cache won't
> be doing anything at all.
Note that this is not before processing a work unit, but before sending the _children_ of an element, that may spawn multiple work units. The theory is that that is fine since the cache only matches siblings.
That being said, what may explain what you describe (during parallel traversal though, not sequential) is the following:
In servo the root element never matches, so we clear the cache right after doing its stuff. In Gecko we style subtrees that may test the cache before clearing it before, which may be wrong. Also work stealing may make us access a stale cache?
Any chance you can get a stack of that so we can better diagnose what's what went on there?
Comment 4•8 years ago
|
||
Oh, I see what you mean now when you say it's short in the parallel one, and you're right, with CHUNK_SIZE=1 its behavior would be pretty bad, we only share styles among siblings in the same chunk (which is ok if the chunk is 64, but not if it's 1).
That being said, given we clear it even if there are no more children to process, I doubt we leave stale elements in the cache at all during parallel traversal. So I'm really curious to see what's going on now. I guess Gecko doesn't do anything crazy like getting rid of elements while we're styling right? I'm _really_ confused right now :)
That being said, I just wake up, probably I should grab some coffee.
Comment 5•8 years ago
|
||
To capture IRC conversation:
08:30 <bholley> emilio: tried a few times, can't reproduce
08:30 <bholley> emilio: so not sure what it is
08:30 <bholley> emilio: either way, it seems like the proposal from https://bugzilla.mozilla.org/show_bug.cgi?id=1323372#c2 would solve any safety issues and get us better cache sharing
08:30 <firebot> Bug 1323372 — NEW, bobbyholley@gmail.com — stylo: StyleSharingCandidateCache lifetime is too long in sequential traversal and too short in para
08:30 <bholley> emilio: so seems like we should do that
08:31 <bholley> emilio: gotta run to bed though. gnite!
08:31 <emilio> bholley: I recall thinking hard about it when rewriting the cache, so I was surprised that in all this time the only crash would be in gecko :)
08:32 <emilio> bholley: Sounds good, good night! :)
I've also doing a try run at https://github.com/servo/servo/pull/14583, to check if I'm correct.
Assignee | ||
Comment 6•8 years ago
|
||
I looked at this for a bit this afternoon, and wrote some patches to add a scoped_tls.rs module, which implements the approach from comment 2. It generally seems like it's going to work, though will require a PR to Rayon to expose the index of the current worker thread.
I still need to hook it up to all the context classes, which may require some refactoring. I'll see if I can get it finished up tomorrow.
Assignee | ||
Comment 7•8 years ago
|
||
Emilio figured out the UAF along with double-borrows on the cache, which led to https://github.com/servo/servo/pull/14599
Hopefully we can fix the footgun in this bug.
Assignee | ||
Comment 8•8 years ago
|
||
Corresponding (incomplete) Servo PR: https://github.com/servo/servo/pull/14610
Flags: needinfo?(bobbyholley)
Comment 9•8 years ago
|
||
Looks like you all have made good progress here. =) I... think I'm fine with exposing the index of the current worker thread in Rayon, though I could imagine it being a bit troublesome later if we evolve the thread-pool to dynamically adjust the number of workers. I wonder though if it would be better to add this "scoped TLS" data structure to Rayon itself as a utility?
I can also imagine other schemes that might help to throw data out of the cache when it is no longer needed (crossbeam's epochs come to mind). But is this "scoped TLS" actually the *right* approach you would want anyhow?
Assignee | ||
Comment 10•8 years ago
|
||
(In reply to Niko Matsakis from comment #9)
> Looks like you all have made good progress here. =) I... think I'm fine with
> exposing the index of the current worker thread in Rayon, though I could
> imagine it being a bit troublesome later if we evolve the thread-pool to
> dynamically adjust the number of workers. I wonder though if it would be
> better to add this "scoped TLS" data structure to Rayon itself as a utility?
Adding to Rayon is fine with me. I've PRed both approaches - take your pick. I don't care which we do, just as long as I can use it without delay. :-)
https://github.com/nikomatsakis/rayon/pull/185
https://github.com/nikomatsakis/rayon/pull/186
> I can also imagine other schemes that might help to throw data out of the
> cache when it is no longer needed (crossbeam's epochs come to mind). But is
> this "scoped TLS" actually the *right* approach you would want anyhow?
I think so, yeah. The limitation that the TLS be Send might be troublesome, but for the Style System case we must drop the data immediately after the traversal to avoid UAF. So we either need to drop on the main thread (requiring Send), or we need to reliably dispatch a task to each worker, which seems like unacceptable overhead, unless you have clever ideas.
Assignee | ||
Comment 11•8 years ago
|
||
This is fixed in https://github.com/servo/servo/pull/14642/
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•