666722 - Make sure DOM cookies act correctly with multiple content processes

Reporter

Description

•

13 years ago

Multiple content processes are a primary goal for desktop Firefox, and we need to at least audit that cookie access and modification is correctly atomic or serialized, both from the DOM API (document.cookie) as well as at the necko level. I believe that Chromium does not make any guarantees about cookie synchronization, and synchronization was removed from the HTML spec, but roc disagreed and thought it was necessary and possible to lock cookie access across processes.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 1

•

13 years ago

This was discussed on m.d.t.dom a while ago: http://groups.google.com/group/mozilla.dev.tech.dom/browse_thread/thread/56ca2e673c78f1b3/31dc22187a4a6c6a?lnk=gst&q=cookies#31dc22187a4a6c6a .

If memory serves, the three proposals were
 - don't do anything; new races can be observed by content (i.e. inconsistent state withing a single event-loop iteration).  This may be what chrome does.
 - run-to-completion consistency: behaviors observable by content are same as in single-process, but inconsistency between event-loop iterations can be caused by multiple content processes in addition to server interaction.
 - "global mutex" in HTML5 spec

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 2

•

13 years ago

Cookie synchronization was not removed from the spec.
http://www.whatwg.org/specs/web-apps/current-work/complete/fetching-resources.html#fetching-resources

I hold to my position that it is feasible and valuable to do this synchronization. I think the storage mutex can probably be implemented per-domain (and if it can't, we should try changing the spec so it can).

I believe strongly enough in this that I'd be willing to do the work myself if I could find the time.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 3

•

13 years ago

So basically what I would do is have implement a storage mutex per domain and ensure that a process is only allowed to hold the storage mutex for one domain at a time. If we accidentally try to get the storage mutex for a domain while we're holding the storage mutex for another domain, that's a bug, an assertion and possibly even a telemetry report with a stack trace (and we would react by dropping the storage mutex for the first domain). The one-domain-at-a-time restriction means you can't get a deadlock across content processes, at least if the waits-for edges are all for storage mutexes. I don't know what other kinds of waits-for edges we might have among content processes.

To avoid deadlocks when chrome-process scripts access cookies and localStorage, I would simply make those NOT get the storage mutex. If chrome needs a consistent view of all cookies or localStorage it will have to explicitly bring all content processes to a safe point in their event loops.

Benjamin Smedberg

Reporter

Comment 4

•

13 years ago

roc/jst, can I get you to find the right person to work on this/mentor it?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 5

•

13 years ago

(In reply to comment #3)
> So basically what I would do is have implement a storage mutex per domain
> and ensure that a process is only allowed to hold the storage mutex for one
> domain at a time. If we accidentally try to get the storage mutex for a
> domain while we're holding the storage mutex for another domain, that's a
> bug, an assertion and possibly even a telemetry report with a stack trace

This can happen with nested event loops and plugin interaction, no?  How can we assert() when web apps use just available (albeit deprecated) web interfaces?  Or plugins start reaching across content processes.

It might be feasible to whack-a-mole all these cases away, but where's the motivation for doing this vs. something simpler like run-to-completeness consistency?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 6

•

13 years ago

I see http://www.whatwg.org/specs/web-apps/current-work/complete/webappapis.html#storage-mutex answers comment 5.

If we ensured all pages from the same domain ran in the same process (not sure if we want that, but for the sake of argument), and we kept the policy that chrome-process scripts needed to roll-their-own-cookie-consistency, what would be left to do here?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 7

•

13 years ago

We can't ensure all pages from the same domain run in the same process unless we allow IFRAMEs to run in a different process from the outer page, which would be nice, but hard. Right? Chrome doesn't do it.

There are other issues too, like cookies being sent with cross-origin image loads.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 8

•

13 years ago

Yeah.

So,

(In reply to comment #5)
> where's the
> motivation for doing this vs. something simpler like run-to-completeness
> consistency?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

13 years ago

I don't know what "run-to-completion consistency" actually means, in particular how it's different from the storage mutex.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 10

•

13 years ago

I thought that was linked from comment 1, but I see it's not.  Maybe I pasted the wrong link ...  Anywho,

When document.cookie is first accessed in an event-loop iteration (or nested iterations), the process "checks out" a copy of the master version held by the chrome process.  During the event loop iteration (nested iterations), the content process modifies its checkout.  Requests made while the cookie is "checked out" use the local copy.  At the end of the loop (nested loops), the checked-out copy is sent back to the chrome process.  The chrome process could commit the checkin, merge it, or drop it depending on whatever heuristics we want.

As I recall, the conclusion of the earlier discussion was that there are no new behaviors introduced by this scheme as compared to single-process except through plugin interaction, but the storage mutex would have the same problem since we'd have to drop it when calling into plugins.

The possible advantages of this scheme are that it should be simpler, should perform better, and wouldn't need whack-a-mole deadlock avoidance.

This is stronger consistency than Chrome provides.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 11

•

13 years ago

(In reply to comment #10)
> The possible advantages of this scheme are that it should be simpler, should
> perform better, and wouldn't need whack-a-mole deadlock avoidance.

Should note: this is assuming we don't implement the storage mutex for localStorage.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

13 years ago

I want to implement the storage mutex for local storage.

The chrome process can't always merge the changes in a reasonable way.

I don't think that approach would deal with the situation where a script sets a cookie, spins for a bit, then sets another cookie, and some other process tries to issue a network request between the cookie sets (it should send both cookies or none).

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 13

•

13 years ago

(In reply to comment #12)
> I want to implement the storage mutex for local storage.

OK, maybe we should discuss in that bug first :).

> The chrome process can't always merge the changes in a reasonable way.

Yep.  Changes could be stomped or ignored.

> I don't think that approach would deal with the situation where a script
> sets a cookie, spins for a bit, then sets another cookie, and some other
> process tries to issue a network request between the cookie sets (it should
> send both cookies or none).

Correct, it would not.  That's not a new behavior though.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 14

•

13 years ago

(In reply to comment #12)
> I don't think that approach would deal with the situation where a script
> sets a cookie, spins for a bit, then sets another cookie, and some other
> process tries to issue a network request between the cookie sets (it should
> send both cookies or none).

Wait sorry, I misread --- the scheme in comment 10 would indeed result in the request from the other process only sending both cookies or none.  Maybe I misunderstand your scenario?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

13 years ago

No, you're right, it would.

The main problem is that it breaks down when merging is not possible. It can also lead to weird errors even when merging appears to be possible, because you're not tracking dependencies (even if you were, you couldn't do anything if you found a conflict).

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 16

•

13 years ago

It also sounds quite hard to implement to me; implementing "check out" and "merge" in a performant way sounds hard.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 17

•

13 years ago

We can just stomp with whatever's latest and reduce to current behavior, if I remember the discussion correctly.

What do you expect to be hard about implementing it, in particular what would be harder than implementing storage mutex performantly?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 18

•

13 years ago

"check out" suggests that you have to make a copy of all cookies the script might get access to. That could be a significant amount of data.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 19

•

13 years ago

Correct.  memcpy() is cheap though.  How much data are we talking?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

13 years ago

I don't know. Don't cookies live in a database?

(In reply to comment #17)
> We can just stomp with whatever's latest and reduce to current behavior, if
> I remember the discussion correctly.

Under what conditions to cookies get stomped in desktop Firefox today?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 21

•

13 years ago

(In reply to comment #20)
> I don't know. Don't cookies live in a database?

In-memory cached DB, I thought ...  if not, then no one will notice the memcpy() after the disk IO.

> (In reply to comment #17)
> > We can just stomp with whatever's latest and reduce to current behavior, if
> > I remember the discussion correctly.
> 
> Under what conditions to cookies get stomped in desktop Firefox today?

Lots!  Any time { HTTP response, tab } B sets after { HTTP response, tab } A.  It doesn't matter if the get/set is atomic to A and B, A still gets stomped.

From the earlier discussion I remember that it's easy for servers to observe that we're using this scheme.  But I thought of a couple of ways script can observe it too, though I think they're pretty edge-case-y.  So it's not true that we wouldn't introduce new behaviors, I retract that.

Shawn Wilsher :sdwilsh

Comment 22

•

13 years ago

(In reply to comment #21)
> (In reply to comment #20)
> > I don't know. Don't cookies live in a database?
> 
> In-memory cached DB, I thought ...  if not, then no one will notice the
> memcpy() after the disk IO.
During run-time they are in a hash table, but they are also written out to disk for persistence (with the exception of session cookies).

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 23

•

13 years ago

So, if we wanted to implement this we'd need the following:

- A "halfsync" IPC call that blocks the child but not the parent.  On the parent side there needs to be a way to tell the IPC code when it's ok to process the message.
- To eliminate the nested event loop or drop the storage mutex on every nested event loop situation that can run script.  e.g. on Linux accessing dataTransfer on a drag and drop can spin up a nested gtk event loop.  If you do that while holding the storage mutex and then mouse over a cross-domain iframe we lose.
- Logic to tell when a process dies so we can terminate any holds it has on a storage mutex.  Presumably a lot of this infrastructure exists for plugins already.

And then of course the actual implementation.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 24

•

13 years ago

(In reply to comment #23)
> - To eliminate the nested event loop or drop the storage mutex on every
> nested event loop situation that can run script.  e.g. on Linux accessing
> dataTransfer on a drag and drop can spin up a nested gtk event loop.  If you
> do that while holding the storage mutex and then mouse over a cross-domain
> iframe we lose.

In situations like that, it's a bug to allow script to run in the nested event loop anyway. Event handlers running in the same page where a script is currently accessing dataTransfer could easily cause page breakage.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 25

•

13 years ago

(In reply to comment #24)
> (In reply to comment #23)
> > - To eliminate the nested event loop or drop the storage mutex on every
> > nested event loop situation that can run script.  e.g. on Linux accessing
> > dataTransfer on a drag and drop can spin up a nested gtk event loop.  If you
> > do that while holding the storage mutex and then mouse over a cross-domain
> > iframe we lose.
> 
> In situations like that, it's a bug to allow script to run in the nested
> event loop anyway. Event handlers running in the same page where a script is
> currently accessing dataTransfer could easily cause page breakage.

The events are being fired at a different window here.  You're right that we have plenty of bugs where we break run-to-completion semantics, but that's not what I'm talking about here, I don't think.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

13 years ago

They are in that case, but in the general case you can get back to the original window, e.g. if they're same-origin. All such cases are bugs.

I say that these nested event loops that allow script reentry are bugs whatever we do, so eliminating those bugs is a good thing even if we don't want to do the storage mutex. And if we decide to tolerate those bugs for the time being, we can auto-drop the outer storage mutex and we'll be no worse off than if we hadn't implemented it (and still better off in the vast majority of cases).

JP Rosevear [:jpr]

Comment 27

•

13 years ago

At the risk of a cascade of new comments, roc is there any particular bugs stemming from your last comment you'd like to see fixed to call this "done enough" for the initial E10S stand up?  It would be good to clarify that and get that work going along with the rest of the E10S platform work.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 28

•

13 years ago

I don't think this needs to block other E10S platform work.

JP Rosevear [:jpr]

Comment 29

•

13 years ago

It doesn't need to block, but it would be good if we could scope this work and get it on the radar at https://wiki.mozilla.org/Electrolysis/Short_Terms_Goals

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 30

•

13 years ago

I think the decision at the content planning meeting was that we weren't going to do this.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → WONTFIX

(no longer active)

Comment 31

•

13 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30)
> I think the decision at the content planning meeting was that we weren't
> going to do this.

Which one is "this", multiple content processes, or what the bug was filed for?

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 32

•

13 years ago

What this bug was filed for, namely implementing the "storage mutex".