Closed Bug 1447867 Opened 6 years ago Closed 6 years ago

Use shm_open for IPC shared memory on POSIX platforms

Categories

(Core :: IPC, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed
firefox63 --- fixed

People

(Reporter: jld, Assigned: jld)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

On almost all Unix-like systems, we can use shm_open and shm_unlink in base::SharedMemory instead of opening and unlinking temporary files.  This will allow us to delete (almost) all 1,200 lines of file_util from ipc/chromium.

On Linux with glibc, shm_open/shm_unlink is implemented by opening and unlinking temporary files in /dev/shm (i.e., we're duplicating code that the OS already has).  On other OSes it may use other, possibly more efficient, implementations; for example, this is the case on FreeBSD.

Android, however, does not support it.  But it has ashmem, which does what we want, and we already have code for it in the ipc/glue SharedMemoryBasic layer.  That could be merged into base::SharedMemory, which SharedMemoryBasic_android partially duplicates as it stands.
Blocks: 1439057
Priority: P2 → P1
Comment on attachment 8968759 [details]
Bug 1447867 - Replace base::SharedMemory POSIX backend with shm_open and ashmem.

https://reviewboard.mozilla.org/r/237454/#review243362

r=me

::: ipc/chromium/src/base/shared_memory_posix.cc:117
(Diff revision 1)
> +      if (shm_unlink(name.c_str()) != 0) {
> +        DLOG(FATAL) << "failed to unlink shm: " << strerror(errno);

Just to be clear here, if `shm_unlink` fails, we will treat the overall operation as success, and return the `shm_open`'d thing?  Because the fd will still be `>= 0`.

I guess it's not worth trying again with a different name, because `shm_unlink` will probably fail on that name, too?  But maybe we should `close` and return failure?  Except that the named shared memory segment is still there?  Not sure what the right way forward is.
Attachment #8968759 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #2)
> Comment on attachment 8968759 [details]
> > +      if (shm_unlink(name.c_str()) != 0) {
> > +        DLOG(FATAL) << "failed to unlink shm: " << strerror(errno);
> 
> Just to be clear here, if `shm_unlink` fails, we will treat the overall
> operation as success, and return the `shm_open`'d thing?  Because the fd
> will still be `>= 0`.
> 
> I guess it's not worth trying again with a different name, because
> `shm_unlink` will probably fail on that name, too?  But maybe we should
> `close` and return failure?  Except that the named shared memory segment is
> still there?  Not sure what the right way forward is.

Yes, there really isn't a “good” way to deal with this.  But on second thought, the close-and-fail option seems less bad: if we're going to leak a segment, probably better to leak a zero-length segment than a possibly large one; if it somehow *was* unlinked it's an unnecessary failure, but that doesn't seem likely.  As for retrying with a different name, I suppose that might work if the cause is transient, but it might also make us leak files until /dev/shm runs out of inodes (or equivalent).
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1a024d5ca77b
Replace base::SharedMemory POSIX backend with shm_open and ashmem. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/1a024d5ca77b
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Backed out: https://hg.mozilla.org/mozilla-central/rev/7793bcae69f6
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See Also: → 1465667
It might be worth trying to re-land this now that bug 1475612 is fixed. Pretty good odds it's the same issue.
(In reply to Kris Maglione [:kmag] from comment #8)
> It might be worth trying to re-land this now that bug 1475612 is fixed.
> Pretty good odds it's the same issue.

I'm not so sure.  This patch doesn't change anything the child does; it just changes how the fd is created in the parent, and a few things about the fd that normally wouldn't matter (close-on-exec set, append not set, offset at 0 instead of end).  Also, there doesn't seem to be a dup() for the changed prefs like there is with the shared prefs, so nothing else should think it ever had that fd.  And the bug 1455828 crashes were from reading incorrect data (and not NUL bytes either), not EBADF.

I'm going to reland it anyway (after bug 1243108, so we don't start crashing again from spurious unlink failures), because I suspect the cause was bug 1456911.

My other guess was that something was somehow writing to the wrong fd in the child, and this patch changing the append flag and file offset made it overwrite the pref blob instead of writing harmlessly after the end, but I don't have an explanation for why something would just write to an fd that hadn't been available at any point since the process was started.
(In reply to Jed Davis [:jld] (⏰UTC-6) from comment #9)
> (In reply to Kris Maglione [:kmag] from comment #8)
> > It might be worth trying to re-land this now that bug 1475612 is fixed.
> > Pretty good odds it's the same issue.
> 
> I'm not so sure.  This patch doesn't change anything the child does; it just
> changes how the fd is created in the parent, and a few things about the fd
> that normally wouldn't matter (close-on-exec set, append not set, offset at
> 0 instead of end).  Also, there doesn't seem to be a dup() for the changed
> prefs like there is with the shared prefs, so nothing else should think it
> ever had that fd.  And the bug 1455828 crashes were from reading incorrect
> data (and not NUL bytes either), not EBADF.

There doesn't need to be a dup(), just an fd that can be opened and then replaced with an fd from another open operation.

Some of the crashes from bug 1475612 were EBADF failures, but a lot were actually in the content process, when we wound up trying to decode data that we expected to be a shared string map, but wound up being some arbitrary file instead.

All that's necessary for the double close to start causing that issue is a slight change in timing. Most of the crashes in the child process actually only started showing up after an unrelated patch changed some timings enough for the fd to be closed at just the right time for that to happen.
Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/83bab8cf29bf
Replace base::SharedMemory POSIX backend with shm_open and ashmem. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/83bab8cf29bf
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: