mozilla::unicode::GetNaked is not thread-safe
Categories
(Core :: Find Backend, defect)
Tracking
()
People
(Reporter: alexhenrie24, Assigned: alexhenrie24, NeedInfo)
References
(Regressed 1 open bug)
Details
Attachments
(1 obsolete file)
As Jonathan noted at https://phabricator.services.mozilla.com/D81875#2541722:
On looking at this again, I think André is right that there could be a thread-safety concern here. The issue is the use of GetNaked from Places code, which runs off-main-thread; AFAICT there's nothing to protect us from this happening at the same time as nsFind is using it from the main thread. So we really should address that.
Assignee | ||
Comment 1•4 years ago
|
||
Updated•4 years ago
|
Comment 2•4 years ago
|
||
I wonder if it'd be feasible to reduce the amount/scope of locking involved here. Some thoughts:
ToNaked()
is thread-safe for ASCII characters, as it just does an early return; this will be a very common case. It seems unfortunate to always lock around code that is callingToNaked
but may well be dealing with safe, ASCII-only text.- Over time, most calls to
GetNaked()
will tend to be satisfied from the cache; it would be nice if we could read it without locking. Indeed, HashMap offers areadonlyThreadsafeLookup
method which sounds nice... - However, that wouldn't work if another thread happened to update the cache at the same time.
- Maybe we could do all cache updates from the main thread? So if a secondary thread comes up with a new entry, instead of modifying the hashmap directly, it'd post a runnable to the main thread to perform the update.
- Then the main thread would be allowed to read the cache without locking. We'd need to hold the lock in two cases:
- (a) when updating the cache (happens on the main thread only); and
- (b) when reading the cache from a secondary thread.
- In particular, this means that main-thread usage that is either ASCII-only or satisfied by existing cache entries never takes the lock.
- The Places code would also avoid locking when dealing with ASCII-only text, which is probably fairly common.
So overall, we'd have something like:
- At the beginning of
GetNaked
, lock only if!NS_IsMainThread()
. - When about to insert a new cache entry, if
NS_IsMainThread()
then lock and continue, else post aCacheNakedMappingRunnable
to the main thread.- Note that the runnable would need to use
put
rather thanputNew
to update the cache, as there could be multiple attempts to add the same mapping.
- Note that the runnable would need to use
WDYT - would something along these lines be workable? Does it sound like on balance it'd be a win?
Comment 3•4 years ago
|
||
A possibly easier alternative... Make the cache thread-local?
Comment 4•4 years ago
|
||
Possible, but it looks like Places may call this from multiple secondary threads, so that could lead to quite a bit of duplication. There's less benefit from the cache if each thread has to populate its own copy, and the overall memory footprint increases.
(Ideally, it'd be nice to have a single naked-char cache shared across all processes, not just threads, but that's a different level of effort, as we wouldn't want lookups to involve IPC. So I'm not proposing we pursue that for now.)
Comment 5•4 years ago
|
||
Actually, let's take a step back here; I think there's a totally different strategy we should consider.
Instead of computing GetNaked mappings at runtime from decomposition and transliteration rules, and caching the result for performance (which introduces the thread-safety issue), we should just precompute them at build time (or before build time, at update-to-new-Unicode-version time) and store a simple table as constant data.
There are only 1400 non-identity mappings at present (bug 1649187 will increase this to 1906, and future Unicode updates may create a few more, but it's unlikely there will ever be much larger numbers), so the total amount of data involved won't be too much -- and we'll get to simplify the runtime code and avoid adding ICU transliterator code and data to subset we ship.
We could store the mappings in a multi-level lookup table similar to others in nsUnicodePropertyData, though even just binary-searching a sorted array would probably be sufficiently performant for our needs here.
Assignee | ||
Comment 6•4 years ago
|
||
I think creating the table of character mappings at build time is a great idea :-)
Comment 7•4 years ago
|
||
The severity field is not set for this bug.
:mikedeboer, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Assignee | ||
Comment 8•4 years ago
|
||
The fix for Bug 1649187 also fixed the thread-safety issue:
https://hg.mozilla.org/integration/autoland/rev/d61a3c845eb6
Description
•