Closed Bug 1726480 Opened 5 months ago Closed 3 months ago

Support synchronous reading at an offset in IOUtils, workers only

Categories

(Toolkit :: OS.File, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
96 Branch
Tracking Status
firefox96 --- fixed

People

(Reporter: mstange, Assigned: mstange)

References

Details

Attachments

(1 file)

When capturing profiles for local Firefox builds, we use symbolication-worker.js to dump symbols from the build.
This worker currently uses OS.File for file access.

It needs to read parts of a file into an existing Uint8Array, synchronously.

This synchronous requirement cannot be changed; the library we use for symbolication only supports synchronous reads. This code is not running on the main thread, so there are no performance concerns.
File opening can be asynchronous, but after the asynchronous "setup" is done, all reads need to happen synchronously.

Furthermore, it really needs to do partial reads. We cannot just read the entire file upfront and then synchronously copy parts of it out, because one of the files we're dealing with here is libxul, which is frequently around 2GB big (depending on the platform).

The current OS.File solution is rather inefficient: https://share.firefox.dev/2W93lnB

  • Starting the OS.File worker is slow because of the JS module imports, which ping-pong to the main thread. Switching to IOUtils would solve this.
  • For every read, OS.File creates a new buffer. This causes two problems:
    1. It requires a copy from that buffer into the existing Uint8Array (here: wasm memory).
    2. It creates garbage which then needs to be collected.
  • It uses ctypes. The profile above spends 8.7% of the readBytesInto call in WorkerPrivate::SetGCTimerMode which gets called when entering and exiting ctypes calls.

An efficient solution would have the following properties:

  • All reads should use the same file handle, so that the file doesn't need to be opened for every read.
  • There should be minimal copies.
  • The API should support reading into an existing destination buffer.

It would be nice to add the necessary APIs to IOUtils so that the OS.File dependency can be dropped from symbolication.

I did a brief experiment with mmap and it was very effective: I added a method IOUtils::ReadMmap which mapped the file into memory (read-only), and then wrapped a Uint8Array around the mmapped memory. Then symbolication-worker.js could copy pieces of that Uint8Array into wasm memory. This did the minimal number of copies, and it only read the parts of file for which the array elements were accessed, simply through the magic of mmap.

However, this solution had three drawbacks:

  • Closing the file was tied to GC timing - the file was closed when the Uint8Array was GC'ed.
  • Having JavaScript array access cause synchronous IO is a bit weird.
  • Writing to an array element caused an instant crash! The pages were mapped as read-only, so attempting to write to them is a segfault.

I'd like to propose the following API:

[Exposed=Worker]
partial namespace IOUtils {
  Promise<SyncReadFile> openFileForSyncReading(DOMString path);
};

[ChromeOnly, Exposed=Worker]
interface SyncReadFile {
  // The file size, in bytes.
  readonly attribute unsigned long size;

  // Synchronously read `dest.length` bytes at offset `offset` into `dest`.
  // Throws if the file has been closed already or if the read would be out-of-bounds.
  [Throws]
  void readBytesInto(Uint8Array dest, unsigned long offset);

  // Close the file. Subsequent calls to readBytesInto will throw.
  void close();
};

(Edit: in the attached patch, I made openFileForSyncReading synchronous, rather than returning a promise. And I changed "unsigned long" to "long long" because nsFileStream uses int64_t for the file size.)

How do people feel about the proposed API?

Flags: needinfo?(emalysz)
Flags: needinfo?(gijskruitbosch+bugs)

I'm going to let Emma chime in here. :-)

The only thing I did notice from a very brief look, shouldn't the IOUtils partial interface thingummy also be ChromeOnly?

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to :Gijs (he/him) from comment #4)

The only thing I did notice from a very brief look, shouldn't the IOUtils partial interface thingummy also be ChromeOnly?

I'm not sure - I was modelling this after other Chrome-only partial interface declarations which also don't have ChromeOnly, such as this one for ChromeUtils and this one for IOUtils.

Seems reasonable to me, and nice to see it was effective.

I CC'ed Barret who was responsible for a lot of the IOUtils implementation

Flags: needinfo?(emalysz)
Blocks: 1728602

This exposes synchronous file reading to workers. It's intended to be used
by profiler symbolication.
The API only supports reading into an existing Uint8Array. This avoids
creating garbage, and minimizes copies.
It also keeps the file open so that it doesn't need to be reopened for
each read.

The implementation uses nsFileStream.

I first tried an implementation which used mmap + memcpy rather than read,
but it didn't work for files larger than 2GiB due to limitations in NSPR's
mmap support, and the profiler needs to read >2GiB files.
Specifically, the profiler sometimes needs to read
/System/Library/dyld/dyld_shared_cache_arm64e , which is 2.2GiB big on
macOS 11.5.

Assignee: nobody → mstange.moz
Status: NEW → ASSIGNED

Emma, let me know if you'd like to review the patch, too. I requested review from kmag because of the new WebIDL interface. I cargo-culted the nsWrapperCache and cycle collection stuff from GamepadHapticActuator.

Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/2778256ad2bb
Add IOUtils::OpenFileForSyncReading. r=dom-worker-reviewers,asuth
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch
Blocks: 1741492
You need to log in before you can comment on or make changes to this bug.