Closed Bug 1652910 Opened 4 years ago Closed 4 years ago

mozilla::unicode::GetNaked is not thread-safe

Categories

(Core :: Find Backend, defect)

defect

Tracking

()

RESOLVED FIXED
88 Branch

People

(Reporter: alexhenrie24, Assigned: alexhenrie24, NeedInfo)

References

(Regressed 1 open bug)

Details

Attachments

(1 obsolete file)

As Jonathan noted at https://phabricator.services.mozilla.com/D81875#2541722:

On looking at this again, I think André is right that there could be a thread-safety concern here. The issue is the use of GetNaked from Places code, which runs off-main-thread; AFAICT there's nothing to protect us from this happening at the same time as nsFind is using it from the main thread. So we really should address that.

Assignee: nobody → alexhenrie24
Status: NEW → ASSIGNED

I wonder if it'd be feasible to reduce the amount/scope of locking involved here. Some thoughts:

  • ToNaked() is thread-safe for ASCII characters, as it just does an early return; this will be a very common case. It seems unfortunate to always lock around code that is calling ToNaked but may well be dealing with safe, ASCII-only text.
  • Over time, most calls to GetNaked() will tend to be satisfied from the cache; it would be nice if we could read it without locking. Indeed, HashMap offers a readonlyThreadsafeLookup method which sounds nice...
  • However, that wouldn't work if another thread happened to update the cache at the same time.
  • Maybe we could do all cache updates from the main thread? So if a secondary thread comes up with a new entry, instead of modifying the hashmap directly, it'd post a runnable to the main thread to perform the update.
  • Then the main thread would be allowed to read the cache without locking. We'd need to hold the lock in two cases:
    • (a) when updating the cache (happens on the main thread only); and
    • (b) when reading the cache from a secondary thread.
  • In particular, this means that main-thread usage that is either ASCII-only or satisfied by existing cache entries never takes the lock.
  • The Places code would also avoid locking when dealing with ASCII-only text, which is probably fairly common.

So overall, we'd have something like:

  • At the beginning of GetNaked, lock only if !NS_IsMainThread().
  • When about to insert a new cache entry, if NS_IsMainThread() then lock and continue, else post a CacheNakedMappingRunnable to the main thread.
    • Note that the runnable would need to use put rather than putNew to update the cache, as there could be multiple attempts to add the same mapping.

WDYT - would something along these lines be workable? Does it sound like on balance it'd be a win?

A possibly easier alternative... Make the cache thread-local?

Possible, but it looks like Places may call this from multiple secondary threads, so that could lead to quite a bit of duplication. There's less benefit from the cache if each thread has to populate its own copy, and the overall memory footprint increases.

(Ideally, it'd be nice to have a single naked-char cache shared across all processes, not just threads, but that's a different level of effort, as we wouldn't want lookups to involve IPC. So I'm not proposing we pursue that for now.)

Actually, let's take a step back here; I think there's a totally different strategy we should consider.

Instead of computing GetNaked mappings at runtime from decomposition and transliteration rules, and caching the result for performance (which introduces the thread-safety issue), we should just precompute them at build time (or before build time, at update-to-new-Unicode-version time) and store a simple table as constant data.

There are only 1400 non-identity mappings at present (bug 1649187 will increase this to 1906, and future Unicode updates may create a few more, but it's unlikely there will ever be much larger numbers), so the total amount of data involved won't be too much -- and we'll get to simplify the runtime code and avoid adding ICU transliterator code and data to subset we ship.

We could store the mappings in a multi-level lookup table similar to others in nsUnicodePropertyData, though even just binary-searching a sorted array would probably be sufficiently performant for our needs here.

I think creating the table of character mappings at build time is a great idea :-)

The severity field is not set for this bug.
:mikedeboer, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mdeboer)
Attachment #9163687 - Attachment is obsolete: true

The fix for Bug 1649187 also fixed the thread-safety issue:

https://hg.mozilla.org/integration/autoland/rev/d61a3c845eb6

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch
Regressions: 1707842
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: