Open Bug 1315757 Opened 8 years ago Updated 2 years ago

Share atoms across processes

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

Tracking Status
firefox52 --- wontfix

People

(Reporter: bhackett1024, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MemShrink:P2][e10s-multi:+])

Atoms, script data and script sources use up a lot of data, and when using multiple content processes there is a fair amount of redundant data allocated.  I measured memory usage on an old memory benchmark that opens 10 different domains in separate tabs in a single content process, and got these values for that process:

460,926,976 B (100.0%) -- explicit
├───57,904,984 B (12.56%) -- js-non-window
│   ├──29,667,016 B (06.44%) -- runtime
│   │  ├──10,682,544 B (02.32%) ── script-data
│   │  ├───4,300,864 B (00.93%) ── atoms-table
│   │  ├─────179,152 B (00.04%) -- script-sources

I then increased the maximum number of content processes allowed and loaded each of those pages in separate tabs and content processes, and got these totals across those content processes:

explicit: 746,020 KB
js-non-window: 188,469 KB
runtime: 75,327 KB
script-data: 20,285 KB
atoms-table: 12,703 KB
script-sources: 455 KB

Out of the 286 MB of overhead the separate content processes have (subtract the single process value from the multiprocess total), 130 MB are from js-non-window sources, 45 MB of that are from the runtime, and 18 MB of that are from script-data, atoms-table, and script-sources.

Unfortunately I didn't realize it when I did these measurements but atoms-table is just counting the table, not the atoms themselves.  Looking at some content processes, the size of the atoms data seems to be consistently at least 1x the size of the table itself.  If I apply that fudge factor to the numbers above then the duplicate data overhead of the atoms grows from 8.4 MB to 16.8 MB, and the total data/atoms/sources overhead grows to 27 MB, or 9.3% of the total multiprocess memory overhead.

It would be really nice to remove this overhead.  This bug will focus on fixing the atoms overhead, and the machinery it adds should be able to be reused for script data and script sources.  Below is a description of how atoms can be shared across processes.

---

Let's say there is a shared atoms table somewhere in the chrome process.  This is not affiliated with any particular runtime, though all clients of the table (see (A) below) in both the chrome process and its (transitive) children processes have read-only access to the table and to the atoms themselves.  Other processes access the table/atoms through read-only shared memory blocks.  The following also hold:

1. Each client maintains a mark bitmap of all the shared atoms that they are using (one bit per atom, ideally).  This bitmap needs to be readable by the chrome process, so for children processes it will need to be in a read/write shared memory block.

2. Each client also maintains a local atoms table.  When the client wants to atomize a string, first it looks in the local table, then it looks in the shared table (setting a mark bit if found), and if both fail it adds an entry to the local table.  Strings in the local table take precedence over those in the shared table, ensuring that each atom has a canonical representation within a given client.  When an entry is added to the local table, that string can be queued up to be requested in (3).

3. Clients may send IPDL messages that asynchronously request new entries for the shared table.

4. The chrome process has an OS thread (separate from the main thread) to process the new-entry IPDL messages, and to clean up no longer used table entries --- those which are not marked in any client's bitmap.

5. Synchronization on the table can happen with a multiple-readers-single-writer lock in shared memory.  Since clients will I think need to busy-wait while waiting for a write to finish, those writes should be for as short a duration as possible.  All writes happen on the OS thread in (4).

7. When a client GCs it has the option of keeping track of which shared atoms it encounters in its traversal.  If it does, it can build up a new mark bitmap of the shared atoms that are in use, and then memcpy it over the shared mark bitmap when the GC is finished.

8. When a client GCs it also has the option of replacing references to local atoms with references to shared atoms, if there is a corresponding shared atom.  This might be best to do as part of a compacting GC.  Local atoms replaced in this way can be removed from the local table and collected.  This will require fixing the GC / Gecko so that atoms can be moved, which I think mainly requires handling for pinned atoms that the GC does not trace (it seems best to just remove pinning entirely).

Some other issues:

A. What is a client?  The two options are 'runtime' and 'zone'.  With 'runtime' the handling of the atoms in a runtime doesn't change all that much; there is still a canonical runtime-wide representation of a given atom, and most of the new changes come in during atomization and GC.  With 'zone' then there is no longer a canonical runtime wide representation of a given atom, and more code will need to change, but there will be no runtime wide atoms state which a runtime wide GC is needed to collect.  It's probably best to use 'runtime' for now and 'zone' as a followup to help with the removal of full GCs.

B. A wrinkle in the above is that after the OS thread in the chrome process adds a new atom it is immediately vulnerable to being collected, since no clients have had a chance to mark it yet. Depending on the heuristics used for removing table entries it might be good to avoid collecting a new entry until the client which requested the entry has done (8) once.

C. If a content process is exploited then it can DDOS the chrome process with new entries or prevent entries from being collected by always marking them, and it can deadlock the chrome process by messing with the shared lock.  This seems OK though.  I'm assuming that there isn't a way for an exploited content process to gain write access to read-only shared memory it shares with its chrome process; otherwise the content process could scribble on the atoms table / atoms and cause the chrome process to crash.
Whiteboard: [MemShrink]
These numbers sound great.
Blocks: e10s-multi
Whiteboard: [MemShrink] → [MemShrink:P1]
Brian, any word on this? It sounds like a pretty big win for e10s-multi.
Flags: needinfo?(bhackett1024)
Whiteboard: [MemShrink:P1] → [MemShrink:P1][e10s-multi]
Blocks: 987955
Whiteboard: [MemShrink:P1][e10s-multi] → [MemShrink:P1][e10s-multi:?]
At the workweek we decided to instead focus on improving our handling of threads in the JS engine (bug 1323066) to support Quantum DOM's work on improving responsiveness when there are few content processes.  Doing this would still be good but I don't have an ETA.
Flags: needinfo?(bhackett1024)
Whiteboard: [MemShrink:P1][e10s-multi:?] → [MemShrink:P1][e10s-multi:+]
Too late for firefox 52, mass-wontfix.
Priority: -- → P3
Whiteboard: [MemShrink:P1][e10s-multi:+] → [MemShrink:P2][e10s-multi:+]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.