Stylo accesses the preference service from multiple threads.

NEW
Unassigned

Status

()

enhancement
P3
normal
11 months ago
10 months ago

People

(Reporter: kmag, Unassigned, NeedInfo)

Tracking

(Blocks 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Reporter

Description

11 months ago
Bug 1351200 added a special case to the preference service's IsMainThread assertions to allow fetching preferences off-main-thread during stylo traversal, on the premise that stylo would guarantee that pref getters were never called reentrantly. That's not the case, though. They wind up being called reentrantly on a regular basis:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e18b78e2ab42964f3a3c4e47a4c5542c5d54181d&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable

This has mostly worked so far, since preference lookups generally don't modify any data other than access counts. But bug 1471025 changes access count statistics to require a hashtable GetOrCreate operation, which asserts when called reentrantly. And when we migrate to Robin Hood hashtables, every lookup will potentially modify the hashtable, which *will* cause heap corruption if it happens reentrantly.
Flags: needinfo?(manishearth)
Reporter

Updated

11 months ago
Blocks: 1402910
It's not re-entrant, it's accessing prefs with multiple threads, right? I don't see any reentrancy there unless I'm misunderstanding.

They're only reads, so IMO they should be made safe. Can we just either limit the statistics to the main thread, or guard them with a lock? They're debug-only after all.
Flags: needinfo?(kmaglione+bmo)
Reporter

Comment 2

11 months ago
(In reply to Emilio Cobos Álvarez (:emilio) from comment #1)
> It's not re-entrant, it's accessing prefs with multiple threads, right? I
> don't see any reentrancy there unless I'm misunderstanding.

I don't understand what you mean. The accesses are definitely re-entrant. We repeatedly crash on this assertion:

https://hg.mozilla.org/try/rev/1ef2dd64ef5f791b649b61d3c3811f7f5edcd8a5#l1.13

Which means we enter one Lookup call before another has finished.

> They're only reads, so IMO they should be made safe. Can we just either
> limit the statistics to the main thread, or guard them with a lock? They're
> debug-only after all.

I'm going to limit the statistics to the main thread as a short term work-around, but that only fixes part of the problem. There are plans to migrate our hash tables to Robin Hood hashing, which means every look-up has a possibility of modifying the hashtable.

No, we can't guard them with a lock. We used to do that, but it was too expensive, because preference getters are hot code, so we limited them to the main thread. For the off-thread access that we can't avoid, we have atomic var caches.

The reason that Stylo was allowed this exemption in the first place is that it was supposed to do the locking and guarantee the safety.
Flags: needinfo?(kmaglione+bmo)
> Which means we enter one Lookup call before another has finished.

(This is a terminology issue, this situation can be caused by two threads attempting to call the same function, and one thread calling the function whilst it is already running -- emilio is using the term to only mean the latter (this is how I've learnt it too), whereas you seem to be using it to mean both (a not uncommon definition))

> because preference getters are hot code

I think the suggestion is to force *stylo* parallel traversal pref getters to use a mutex -- make Gecko_GetBaseSize and similar calls use a mutex (and modify the assert to check that the mutex is locked if in parallel traversal, perhaps? that may be expensive?)

Atomic var caches can't really work for the lang font prefs system; it's a lot of data. That said there is already a cache here, which I think we skip in parallel traversal, but it could be RW locked instead, to relieve pressure from the prefs service.

But I feel like that shouldn't be necessary, there are very few prefs we're fetching off main thread (it's basically just the lang prefs and maybe some zoom things) and it feels like we can afford some locking here only in the stylo case.
Flags: needinfo?(manishearth)
Reporter

Comment 4

11 months ago
(In reply to Manish Goregaokar [:manishearth] from comment #3)
> > because preference getters are hot code
> 
> I think the suggestion is to force *stylo* parallel traversal pref getters
> to use a mutex -- make Gecko_GetBaseSize and similar calls use a mutex (and
> modify the assert to check that the mutex is locked if in parallel
> traversal, perhaps? that may be expensive?)

Well, something has to do locking, and since Stylo is the only code that access preference getters off the main thread, I think that's the right place to do it, yes.

And, yes, it would be nice to modify the assert to make sure the right mutex is held by the current thread. It might be expensive, but it would only need to be done in debug builds, so it should probably be fine.
We could also fix how all the lang font group stuff works, I remember Jonathan wanting to change this. It'd certainly be nice, because dynamic pref getters are also pretty expensive...
Summary: Stylo accesses the preference service reentrantly → Stylo accesses the preference service from multiple threads.
Priority: -- → P3
Reporter

Comment 6

10 months ago
In practice, this is going to block bug 1402910, which is qf:p1 for Firefox 64. I don't think P3 is an appropriate priority unless that bug is also re-prioritized.
Flags: needinfo?(mreavy)
I can try to take a look at this next week.
Flags: needinfo?(emilio)
Is the plan to not make concurrent hashtable reads work? I think that's a no-go, we use a bunch of other hash tables that we populate on the main thread and lookup off-main-thread, like ServoElementSnapshotTable. I think that needs to just work.

Now, in terms of the DEBUG-only _writes_ that the pref service does, it's not clear to me why that can't use an internal lock given it's not used in release builds, but I can add a lock on Stylo's side. I think it's slightly unfortunate though.
Flags: needinfo?(emilio) → needinfo?(kmaglione+bmo)
Reporter

Comment 9

10 months ago
(In reply to Emilio Cobos Álvarez (:emilio) from comment #8)
> Is the plan to not make concurrent hashtable reads work? I think that's a
> no-go, we use a bunch of other hash tables that we populate on the main
> thread and lookup off-main-thread, like ServoElementSnapshotTable. I think
> that needs to just work.

I imagine it may be possible to disable Robin Hood hashing on certain hash tables. Maybe comment on bug 1402910. As for the preference hash tables, I don't think that's an option. They really do need to be as small and fast as possible, and concurrent reads are not possible in Robin Hood hash tables.

> Now, in terms of the DEBUG-only _writes_ that the pref service does, it's
> not clear to me why that can't use an internal lock given it's not used in
> release builds, but I can add a lock on Stylo's side. I think it's slightly
> unfortunate though.

Like I said, this isn't about the statistics hash table. I already have a workaround for that by only updating it on the main thread. The real issue is that we need to update our hash tables to use Robin Hood hashing, which makes all look-ups potential write operations, and isn't safe if we have multiple concurrent reads on different threads.
Flags: needinfo?(kmaglione+bmo)
(In reply to Kris Maglione [:kmag] from comment #9)
> (In reply to Emilio Cobos Álvarez (:emilio) from comment #8)
> > Now, in terms of the DEBUG-only _writes_ that the pref service does, it's
> > not clear to me why that can't use an internal lock given it's not used in
> > release builds, but I can add a lock on Stylo's side. I think it's slightly
> > unfortunate though.
> 
> Like I said, this isn't about the statistics hash table. I already have a
> workaround for that by only updating it on the main thread. The real issue
> is that we need to update our hash tables to use Robin Hood hashing, which
> makes all look-ups potential write operations, and isn't safe if we have
> multiple concurrent reads on different threads.

I know bug 1402910 makes it sound like things are faster with Robin Hood hashing, but after writing (I think) better benchmarks, my recollection is that the picture is a little less clear that Robin Hood hashing would be good, particularly at higher load factors.  (It's also possible that our hash functions are in need of help...)  I need to get back to the benchmarking and get better numbers via cachegrind or similar.
I suspect having writes in lookup operations for hash tables would be a real nightmare, not just for Stylo...
Reporter

Comment 12

10 months ago
(In reply to Emilio Cobos Álvarez (:emilio) from comment #11)
> I suspect having writes in lookup operations for hash tables would be a real
> nightmare, not just for Stylo...

It's a common feature of modern hash table implementations, and they do it for good reason.

It's entirely possible that we may need to disable it for hash tables that need concurrent read access without locking, but the preference hash tables definitely to not fall into that category. It's only meant to ever be accessed from a single thread, and the exception granted to stylo is entirely based on the premise that stylo will do the work of preventing concurrent access. The comments from Nathan and Bobby in bug 1351200 were very clear on that point.
Hi Sean -- I wanted to put this on your radar to assess if/when this will block bug 1402910.  Thanks!
Flags: needinfo?(mreavy) → needinfo?(svoisen)
You need to log in before you can comment on or make changes to this bug.