Closed Bug 666722 Opened 13 years ago Closed 13 years ago

Make sure DOM cookies act correctly with multiple content processes

Categories

(Core :: Networking: Cookies, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: benjamin, Unassigned)

References

Details

Multiple content processes are a primary goal for desktop Firefox, and we need to at least audit that cookie access and modification is correctly atomic or serialized, both from the DOM API (document.cookie) as well as at the necko level. I believe that Chromium does not make any guarantees about cookie synchronization, and synchronization was removed from the HTML spec, but roc disagreed and thought it was necessary and possible to lock cookie access across processes.
This was discussed on m.d.t.dom a while ago: http://groups.google.com/group/mozilla.dev.tech.dom/browse_thread/thread/56ca2e673c78f1b3/31dc22187a4a6c6a?lnk=gst&q=cookies#31dc22187a4a6c6a .

If memory serves, the three proposals were
 - don't do anything; new races can be observed by content (i.e. inconsistent state withing a single event-loop iteration).  This may be what chrome does.
 - run-to-completion consistency: behaviors observable by content are same as in single-process, but inconsistency between event-loop iterations can be caused by multiple content processes in addition to server interaction.
 - "global mutex" in HTML5 spec
Cookie synchronization was not removed from the spec.
http://www.whatwg.org/specs/web-apps/current-work/complete/fetching-resources.html#fetching-resources

I hold to my position that it is feasible and valuable to do this synchronization. I think the storage mutex can probably be implemented per-domain (and if it can't, we should try changing the spec so it can).

I believe strongly enough in this that I'd be willing to do the work myself if I could find the time.
So basically what I would do is have implement a storage mutex per domain and ensure that a process is only allowed to hold the storage mutex for one domain at a time. If we accidentally try to get the storage mutex for a domain while we're holding the storage mutex for another domain, that's a bug, an assertion and possibly even a telemetry report with a stack trace (and we would react by dropping the storage mutex for the first domain). The one-domain-at-a-time restriction means you can't get a deadlock across content processes, at least if the waits-for edges are all for storage mutexes. I don't know what other kinds of waits-for edges we might have among content processes.

To avoid deadlocks when chrome-process scripts access cookies and localStorage, I would simply make those NOT get the storage mutex. If chrome needs a consistent view of all cookies or localStorage it will have to explicitly bring all content processes to a safe point in their event loops.
roc/jst, can I get you to find the right person to work on this/mentor it?
(In reply to comment #3)
> So basically what I would do is have implement a storage mutex per domain
> and ensure that a process is only allowed to hold the storage mutex for one
> domain at a time. If we accidentally try to get the storage mutex for a
> domain while we're holding the storage mutex for another domain, that's a
> bug, an assertion and possibly even a telemetry report with a stack trace

This can happen with nested event loops and plugin interaction, no?  How can we assert() when web apps use just available (albeit deprecated) web interfaces?  Or plugins start reaching across content processes.

It might be feasible to whack-a-mole all these cases away, but where's the motivation for doing this vs. something simpler like run-to-completeness consistency?
I see http://www.whatwg.org/specs/web-apps/current-work/complete/webappapis.html#storage-mutex answers comment 5.

If we ensured all pages from the same domain ran in the same process (not sure if we want that, but for the sake of argument), and we kept the policy that chrome-process scripts needed to roll-their-own-cookie-consistency, what would be left to do here?
We can't ensure all pages from the same domain run in the same process unless we allow IFRAMEs to run in a different process from the outer page, which would be nice, but hard. Right? Chrome doesn't do it.

There are other issues too, like cookies being sent with cross-origin image loads.
Yeah.

So,

(In reply to comment #5)
> where's the
> motivation for doing this vs. something simpler like run-to-completeness
> consistency?
I don't know what "run-to-completion consistency" actually means, in particular how it's different from the storage mutex.
I thought that was linked from comment 1, but I see it's not.  Maybe I pasted the wrong link ...  Anywho,

When document.cookie is first accessed in an event-loop iteration (or nested iterations), the process "checks out" a copy of the master version held by the chrome process.  During the event loop iteration (nested iterations), the content process modifies its checkout.  Requests made while the cookie is "checked out" use the local copy.  At the end of the loop (nested loops), the checked-out copy is sent back to the chrome process.  The chrome process could commit the checkin, merge it, or drop it depending on whatever heuristics we want.

As I recall, the conclusion of the earlier discussion was that there are no new behaviors introduced by this scheme as compared to single-process except through plugin interaction, but the storage mutex would have the same problem since we'd have to drop it when calling into plugins.

The possible advantages of this scheme are that it should be simpler, should perform better, and wouldn't need whack-a-mole deadlock avoidance.

This is stronger consistency than Chrome provides.
(In reply to comment #10)
> The possible advantages of this scheme are that it should be simpler, should
> perform better, and wouldn't need whack-a-mole deadlock avoidance.

Should note: this is assuming we don't implement the storage mutex for localStorage.
I want to implement the storage mutex for local storage.

The chrome process can't always merge the changes in a reasonable way.

I don't think that approach would deal with the situation where a script sets a cookie, spins for a bit, then sets another cookie, and some other process tries to issue a network request between the cookie sets (it should send both cookies or none).
(In reply to comment #12)
> I want to implement the storage mutex for local storage.

OK, maybe we should discuss in that bug first :).

> The chrome process can't always merge the changes in a reasonable way.

Yep.  Changes could be stomped or ignored.

> I don't think that approach would deal with the situation where a script
> sets a cookie, spins for a bit, then sets another cookie, and some other
> process tries to issue a network request between the cookie sets (it should
> send both cookies or none).

Correct, it would not.  That's not a new behavior though.
(In reply to comment #12)
> I don't think that approach would deal with the situation where a script
> sets a cookie, spins for a bit, then sets another cookie, and some other
> process tries to issue a network request between the cookie sets (it should
> send both cookies or none).

Wait sorry, I misread --- the scheme in comment 10 would indeed result in the request from the other process only sending both cookies or none.  Maybe I misunderstand your scenario?
No, you're right, it would.

The main problem is that it breaks down when merging is not possible. It can also lead to weird errors even when merging appears to be possible, because you're not tracking dependencies (even if you were, you couldn't do anything if you found a conflict).
It also sounds quite hard to implement to me; implementing "check out" and "merge" in a performant way sounds hard.
We can just stomp with whatever's latest and reduce to current behavior, if I remember the discussion correctly.

What do you expect to be hard about implementing it, in particular what would be harder than implementing storage mutex performantly?
"check out" suggests that you have to make a copy of all cookies the script might get access to. That could be a significant amount of data.
Correct.  memcpy() is cheap though.  How much data are we talking?
I don't know. Don't cookies live in a database?

(In reply to comment #17)
> We can just stomp with whatever's latest and reduce to current behavior, if
> I remember the discussion correctly.

Under what conditions to cookies get stomped in desktop Firefox today?
(In reply to comment #20)
> I don't know. Don't cookies live in a database?

In-memory cached DB, I thought ...  if not, then no one will notice the memcpy() after the disk IO.

> (In reply to comment #17)
> > We can just stomp with whatever's latest and reduce to current behavior, if
> > I remember the discussion correctly.
> 
> Under what conditions to cookies get stomped in desktop Firefox today?

Lots!  Any time { HTTP response, tab } B sets after { HTTP response, tab } A.  It doesn't matter if the get/set is atomic to A and B, A still gets stomped.

From the earlier discussion I remember that it's easy for servers to observe that we're using this scheme.  But I thought of a couple of ways script can observe it too, though I think they're pretty edge-case-y.  So it's not true that we wouldn't introduce new behaviors, I retract that.
(In reply to comment #21)
> (In reply to comment #20)
> > I don't know. Don't cookies live in a database?
> 
> In-memory cached DB, I thought ...  if not, then no one will notice the
> memcpy() after the disk IO.
During run-time they are in a hash table, but they are also written out to disk for persistence (with the exception of session cookies).
So, if we wanted to implement this we'd need the following:

- A "halfsync" IPC call that blocks the child but not the parent.  On the parent side there needs to be a way to tell the IPC code when it's ok to process the message.
- To eliminate the nested event loop or drop the storage mutex on every nested event loop situation that can run script.  e.g. on Linux accessing dataTransfer on a drag and drop can spin up a nested gtk event loop.  If you do that while holding the storage mutex and then mouse over a cross-domain iframe we lose.
- Logic to tell when a process dies so we can terminate any holds it has on a storage mutex.  Presumably a lot of this infrastructure exists for plugins already.

And then of course the actual implementation.
(In reply to comment #23)
> - To eliminate the nested event loop or drop the storage mutex on every
> nested event loop situation that can run script.  e.g. on Linux accessing
> dataTransfer on a drag and drop can spin up a nested gtk event loop.  If you
> do that while holding the storage mutex and then mouse over a cross-domain
> iframe we lose.

In situations like that, it's a bug to allow script to run in the nested event loop anyway. Event handlers running in the same page where a script is currently accessing dataTransfer could easily cause page breakage.
(In reply to comment #24)
> (In reply to comment #23)
> > - To eliminate the nested event loop or drop the storage mutex on every
> > nested event loop situation that can run script.  e.g. on Linux accessing
> > dataTransfer on a drag and drop can spin up a nested gtk event loop.  If you
> > do that while holding the storage mutex and then mouse over a cross-domain
> > iframe we lose.
> 
> In situations like that, it's a bug to allow script to run in the nested
> event loop anyway. Event handlers running in the same page where a script is
> currently accessing dataTransfer could easily cause page breakage.

The events are being fired at a different window here.  You're right that we have plenty of bugs where we break run-to-completion semantics, but that's not what I'm talking about here, I don't think.
They are in that case, but in the general case you can get back to the original window, e.g. if they're same-origin. All such cases are bugs.

I say that these nested event loops that allow script reentry are bugs whatever we do, so eliminating those bugs is a good thing even if we don't want to do the storage mutex. And if we decide to tolerate those bugs for the time being, we can auto-drop the outer storage mutex and we'll be no worse off than if we hadn't implemented it (and still better off in the vast majority of cases).
At the risk of a cascade of new comments, roc is there any particular bugs stemming from your last comment you'd like to see fixed to call this "done enough" for the initial E10S stand up?  It would be good to clarify that and get that work going along with the rest of the E10S platform work.
I don't think this needs to block other E10S platform work.
It doesn't need to block, but it would be good if we could scope this work and get it on the radar at https://wiki.mozilla.org/Electrolysis/Short_Terms_Goals
I think the decision at the content planning meeting was that we weren't going to do this.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> I think the decision at the content planning meeting was that we weren't
> going to do this.

Which one is "this", multiple content processes, or what the bug was filed for?
What this bug was filed for, namely implementing the "storage mutex".
You need to log in before you can comment on or make changes to this bug.