Closed Bug 653895 Opened 13 years ago Closed 2 years ago

Cached text and selection indexes in parent process aren't always synced

Categories

(Core :: DOM: Content Processes, defect, P5)

defect

Tracking

()

RESOLVED INVALID

People

(Reporter: mwu, Unassigned)

References

Details

(Whiteboard: [e10s])

Attachments

(1 file, 1 obsolete file)

Sometimes we can end up with cached text and selection indexes that are out of sync with each other. The most common scenario is while adding text - the selection is correct but the text hasn't updated, so the selection is out of bounds. This caused bug 617298.
Does anyone understand well why we have inconsistent indices/text?  Is the inconsistent data in the java stuff or gecko?  (I think gecko, right?)  In theory I don't think any IMEs should be able to observe us "cheating" if we play our cards right.
There is at least one inconsistency (which this patch https://bug617298.bugzilla.mozilla.org/attachment.cgi?id=526345 attempts to fix) where async calls from chrome to content set state and the getters that read that state get processed before all the callbacks that update the cached state  (mIMESelectionAnchor, mIMESelectionFocus and mIMECacheText) get called.

That patch seemed to fix the problem of things getting out of sync when adding text, but then we were getting bit occasionally when deleting text with the swift keyboard, so either this is an incomplete fix or there is some other state inconsistency.
OK, that makes more sense.  I'm still not 100% clear about the java/gecko interaction here --- if you've got this information, do you mind diagramming the information flow between java/chrome/content?  It sounds like this is a race condition, and nailing down where that's happening would help in designing a fix.
Brad and I chatted about this a bit on IRC.  It looks like we have two classes of race conditions that are leading to race conditions, which cause us to present inconsistent data to IMEs: first, our Java thread forwards "set selection" and "set text" and other IME state changes to chrome as the Java thread finds out about them.  These state changes are forwarded from chrome to content as they come in, and content responds to them by updating the cached chrome state as it can process them.  Because state queries from the java thread can come in at any point during the chrome<->content updates, the java thread and IMEs can observe inconsistent cached values.  This problem was causing the crash in SwiftKey.

To fix this, I think we need
 (1) Treat IME sets/queries as "transactions" on the Java thread, in order to present consistent state to IMEs.  This requires
   - grab a consistent snapshot of IME state in chrome upon first IME action on the java thread
   - present a consistent view of IME state to IMEs during subsequent actions within the same java-event-loop iteration on the java thread
   - at the end of the java-event-loop iteration, forward a changeset encapsulating proposed state changes to the chrome thread
 (2) Resolve conflicting IME changesets atomically on the chrome thread, forward them atomically to content, and have content update chrome atomically after resolution

If we do this, it will be possible for IMEs to observe stale state, but that state will be consistent while the IME is doing stuff.

I think we can achieve this through the following mechanisms
 (i) observe (somehow) when the java thread starts processing events, and observe when it drops back into the event loop.  Brad tells me that we can implement our own event loop.  That should work here.
 (ii) when the java thread notices IME action starting, grab a consistent snapshot of all IME state from chrome.  This could be called the start of an IME transaction. (We may or may not want to block content on the end of the IME transaction.  Unclear atm.)
 (iii) when the java-thread IME transaction ends, forward the updated state atomically to chrome
 (iv) in chrome, resolve conflicts in content updates somehow.  This part isn't clear.  What we do now may be sufficient.
 (v) if the java-thread state changes resolve, forward them to content atomically
 (vi) process state changes atomically in content
 (vii) forward state updates from content->chrome atomically

(i) isn't clear to me.  Brad, how would we implement our own java event loop?
The heuristics for (iv) aren't clear, but should be easy to twiddle.
(v)-(vii) can be done relatively easy in our existing code.

Brad also implied in our chat that IMEs actually only have well-known interactions with our java thread, so fully recorded transactions may not be needed.  But if that's easy enough to implement, that seems to be the way to go IMHO.

Did I understand correctly?  How does this proposal sound?
Attached patch WIP patch (obsolete) — Splinter Review
so the idea with this patch is to remove the getting GeckoEvents for text and selection and to always have the results of setter events return synchronously. Since the java main thread isn't the gecko main thread, we can legally block it waiting for the child process to respond.

Having gone through this less intrusive alternative is also possible in that the getters are called synchronously on the java thread (i.e. not using events) and the getter functions can block the java main thread on the child process. I'm going to go through and implement that approach next and we can see which one is less messy.
Attachment #539813 - Flags: feedback?(mwu)
Attachment #539813 - Flags: feedback?(jones.chris.g)
Comment on attachment 539813 [details] [diff] [review]
WIP patch

I don't feel comfortable enough with the java code to provide feedback here, sorry.

How does this patch get us closer to the model in comment 4?  (If that's not the model we want, why not?)

Could this change cause us to block the IME on the content process when a chrome-process UI element has focus?  E.g., through a race condition.  I don't think that's going to be acceptable, because then content can DoS the URL bar, e.g.
Attachment #539813 - Flags: feedback?(jones.chris.g)
(In reply to comment #6)
> Comment on attachment 539813 [details] [diff] [review] [review]
> WIP patch
> 
> I don't feel comfortable enough with the java code to provide feedback here,
> sorry.
> 
> How does this patch get us closer to the model in comment 4?  
It gets us there, except for the fact that the consistent snapshot is only mutated on the java main thread where all IME transactions take place. The model you spelled out in comment 4 would seem imply that we'd only mutate state in the chrome thread.

> (If that's not
> the model we want, why not?)
> 
> Could this change cause us to block the IME on the content process when a
> chrome-process UI element has focus?  E.g., through a race condition.  I
> don't think that's going to be acceptable, because then content can DoS the
> URL bar, e.g.

I not sure that particular scenario is possible, but with this patch the java main process can block on the content process. If we go this route, we'll need the chrome process to monitor and correct for deadlock.

Also, the alternative impl I mentioned in comment 5 keeps crashing on my atrix, haven't been able to nail down why.
Comment on attachment 539813 [details] [diff] [review]
WIP patch

I've tried this patch to see if it will help with SwiftKey (bug 672661, but it's more than just the reported issue).
The implementation kind of works, but seems like only because this way we don't provide a proper feedback to SwiftKey. It does not detect the actual changes in the edit box, so its auto-complete feature just does not work as supposed, making the fundamental functionality of SwiftKey useless.
I'll do more debugging to see what could be improved here.
Some stuff is working with this approach, but now it looks like I hit a SynchronousQueue deadlock condition (bug 680311). Have to incorporate those changes.
Unfortunately using the approach from the bug 680311 defeats the purpose of this implementation, as the whole idea is to have synchronous requests. Need to come up with an alternative solution.
Assignee: nobody → alexp
Blocks: 691163
Blocks: 691237
Attached patch WIP Patch 2Splinter Review
Fixes some non-working bits from the first WIP patch, uses deadlock fix from the bug 680311, implements the text and selection caching in Java layer in attempt to have consistent state all the time. This approach fixes the problems with SwiftKey X, improves work with Swype in some cases, but there are still issues to be fixed.
Attachment #539813 - Attachment is obsolete: true
Attachment #539813 - Flags: feedback?(mwu)
This bug is postponed for now, as no sync issues have been noticed yet with a new native UI.
Whiteboard: [e10s]
According to my comment on bug 691163 https://bugzilla.mozilla.org/show_bug.cgi?id=691163#c7 I still have problem with Firefox 8.0 

(In reply to Alex Pakhotin (:alexp) from comment #12)
> This bug is postponed for now, as no sync issues have been noticed yet with
> a new native UI.
Assignee: alexp → nobody
Component: IPC → DOM: Content Processes
https://bugzilla.mozilla.org/show_bug.cgi?id=1472046

Move all DOM bugs that haven’t been updated in more than 3 years and has no one currently assigned to P5.

If you have questions, please contact :mdaly.
Priority: -- → P5

If we still had similar problems with Fenix we certainly would have a newer bug for it.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: