Closed
Bug 1009799
Opened 10 years ago
Closed 4 years ago
[OS.File] Optimize writeAtomic for very large strings
Categories
(Toolkit Graveyard :: OS.File, defect)
Toolkit Graveyard
OS.File
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: Yoric, Unassigned, Mentored)
References
Details
(Keywords: perf, Whiteboard: [Async][lang=js][good next bug])
Attachments
(1 file, 5 obsolete files)
7.11 KB,
patch
|
Yoric
:
feedback+
|
Details | Diff | Splinter Review |
Library OS.File offers a function writeAtomic() that can be used to write strings or binary arrays off the main thread. Writing binary arrays is very fast, but writing very large strings can take a few milliseconds. The objective of this bug is to make `writeAtomic()` faster with large strings. For this purpose, we need to adapt OS.File.writeAtomic, as defined in http://dxr.mozilla.org/mozilla-central/search?tree=mozilla-central&q=osfile_async_front.jsm&redirect=true, to 1. determine if the argument passed is a long string; 2. if so, using Task.jsm and TextEncoder, walk through the string by chunks, encoding chunks of the string to small buffers; 3. post the list of chunks to the worker. Similarly, we need to adapt the worker side, as defined in http://dxr.mozilla.org/mozilla-central/search?tree=mozilla-central&q=osfile_async_worker.js&redirect=true, to 1. reassemble the list of chunks into a single Uint8Array before writing.
Assignee | ||
Updated•10 years ago
|
Mentor: dteller
Whiteboard: [Async][mentor=Yoric][lang=js][good next bug] → [Async][lang=js][good next bug]
Comment 1•10 years ago
|
||
Hi, I would like to work on this bug if no-one else is working on it.
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(dteller)
Reporter | ||
Comment 2•10 years ago
|
||
Welcome :) Have you already downloaded and built Firefox?
Flags: needinfo?(dteller)
Comment 3•10 years ago
|
||
Yeah David, I have downloaded and built Firefox. However this is going to be my first bug. Can you provide me some more insight?
Reporter | ||
Comment 4•10 years ago
|
||
Since you mention :ashutoshd in your profile, I assume that you are already present on IRC, which is a good thing :) If you have any question, my nickname over IRC is Yoric. Now, the first step for you is to look at the source code of function `OS.File.writeAtomic`. It is in file toolkit/components/osfile/modules/osfile_async_worker.jsm. In comment 0, I describe the list of changes that need to be made to this function. Do you already see how to make step 1.?
Flags: needinfo?(ashutoshdhundhara)
Comment 5•10 years ago
|
||
Attachment #8483729 -
Flags: feedback?(dteller)
Flags: needinfo?(ashutoshdhundhara)
Reporter | ||
Comment 6•10 years ago
|
||
Comment on attachment 8483729 [details] [diff] [review] Patch Review of attachment 8483729 [details] [diff] [review]: ----------------------------------------------------------------- Good start. Could you apply the modifications below, rebuild Firefox (`./mach built toolkit/components/osfile` should be sufficient) and see whether the tests still pass (`./mach xpcshell-test toolkit/components/osfile/test/xpcshell`)? ::: toolkit/components/osfile/modules/osfile_async_front.jsm @@ +30,5 @@ > let SharedAll = {}; > Cu.import("resource://gre/modules/osfile/osfile_shared_allthreads.jsm", SharedAll); > Cu.import("resource://gre/modules/XPCOMUtils.jsm", this); > Cu.import("resource://gre/modules/Timer.jsm", this); > +Cu.import("resource://gre/modules/commonjs/toolkit/loader.js"); What's that doing here? @@ +152,5 @@ > > +/** > + * Splits a long string into smaller chunks. > + * > + * This function returns small chunks of a large string encoded in Nit: "This generator yields small chunks..." (it's not a function but a generator, denoted by `function*` and it doesn't call `return` but `yield`) @@ +153,5 @@ > +/** > + * Splits a long string into smaller chunks. > + * > + * This function returns small chunks of a large string encoded in > + * Uint8Array format. The size of each chunk is less than or equal to Actually, it doesn't encode anything. You encode in the caller. @@ +165,5 @@ > +function* chunksOf(string, chunkSize) { > + for (let index = 0; index < string.length; index += chunkSize) { > + yield string.substring(index, index + chunkSize); > + } > + yield false; What's that last line for? @@ +1034,5 @@ > > /** > * Gets the number of bytes available on disk to the current user. > * > + * @param {string} Platform-specific path to a directory on the disk to Nit: Please don't fix whitespace in an unrelated part of the code, this confuses the history for people who later try and understand the code. @@ +1224,5 @@ > // - we take care of any |byteOffset|. > let refObj = {}; > TelemetryStopwatch.start("OSFILE_WRITEATOMIC_JANK_MS", refObj); > + // If `buffer` string is quite large, we need to split it into smaller chunks. > + if (typeof buffer == "string" && buffer.length > 1024) { Let's replace 1024 by a named constant. @@ +1230,5 @@ > + let chunkify = chunksOf(buffer, CHUNK_SIZE); > + let encoder = new TextEncoder(); > + while (oneChunk = chunkify.next()) { > + chunks.push(encoder.encode(oneChunk)); > + } This would give the right result, but one of our objectives here is to take advantage of Task.jsm to ensure that execution takes breaks during the encoding. So let's rewrite this as follows (we'll deal with TelemetryStopWatch later): Task.spawn(function* () { let bufferMsg; if (typeof buffer == "string" && buffer.length > LARGE_BUFFER) { // Bla let chunks = []; for (let oneChunk of chunksOf(buffer, CHUNK_SIZE)) { chunks.push(encoder.encode(oneChunk)); yield Promise.resolve(); // Defer the rest of the computation to the next tick, to avoid jank } bufferMsg = {chunks: chunks}; } else { // Bla bufferMsg = Type.void_t.in_ptr.toMsg(buffer); } return Scheduler.post(...) });
Attachment #8483729 -
Flags: feedback?(dteller) → feedback+
Comment 7•10 years ago
|
||
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #6) > Comment on attachment 8483729 [details] [diff] [review] > Patch > > Review of attachment 8483729 [details] [diff] [review]: > ----------------------------------------------------------------- > > Good start. > Could you apply the modifications below, rebuild Firefox (`./mach built > toolkit/components/osfile` should be sufficient) and see whether the tests > still pass (`./mach xpcshell-test toolkit/components/osfile/test/xpcshell`)? Thanks David! And sure, I will re-run the tests after building. > > ::: toolkit/components/osfile/modules/osfile_async_front.jsm > @@ +30,5 @@ > > let SharedAll = {}; > > Cu.import("resource://gre/modules/osfile/osfile_shared_allthreads.jsm", SharedAll); > > Cu.import("resource://gre/modules/XPCOMUtils.jsm", this); > > Cu.import("resource://gre/modules/Timer.jsm", this); > > +Cu.import("resource://gre/modules/commonjs/toolkit/loader.js"); > > What's that doing here? Oops, I was testing something and forgot to remove it. This would be fixed in the next patch. > > @@ +152,5 @@ > > > > +/** > > + * Splits a long string into smaller chunks. > > + * > > + * This function returns small chunks of a large string encoded in > > Nit: "This generator yields small chunks..." > (it's not a function but a generator, denoted by `function*` and it doesn't > call `return` but `yield`) I will rephrase it. > > @@ +153,5 @@ > > +/** > > + * Splits a long string into smaller chunks. > > + * > > + * This function returns small chunks of a large string encoded in > > + * Uint8Array format. The size of each chunk is less than or equal to > > Actually, it doesn't encode anything. You encode in the caller. This one too. > > @@ +165,5 @@ > > +function* chunksOf(string, chunkSize) { > > + for (let index = 0; index < string.length; index += chunkSize) { > > + yield string.substring(index, index + chunkSize); > > + } > > + yield false; > > What's that last line for? Isn't it like that we need to return false as an indicator that splitting is over or generator automatically takes care of this? > > @@ +1034,5 @@ > > > > /** > > * Gets the number of bytes available on disk to the current user. > > * > > + * @param {string} Platform-specific path to a directory on the disk to > > Nit: Please don't fix whitespace in an unrelated part of the code, this > confuses the history for people who later try and understand the code. Actually I have configured my editor to remove trailing spaces on every save. I will take care of this in future. > > @@ +1224,5 @@ > > // - we take care of any |byteOffset|. > > let refObj = {}; > > TelemetryStopwatch.start("OSFILE_WRITEATOMIC_JANK_MS", refObj); > > + // If `buffer` string is quite large, we need to split it into smaller chunks. > > + if (typeof buffer == "string" && buffer.length > 1024) { > > Let's replace 1024 by a named constant. Ok. > > @@ +1230,5 @@ > > + let chunkify = chunksOf(buffer, CHUNK_SIZE); > > + let encoder = new TextEncoder(); > > + while (oneChunk = chunkify.next()) { > > + chunks.push(encoder.encode(oneChunk)); > > + } > > This would give the right result, but one of our objectives here is to take > advantage of Task.jsm to ensure that execution takes breaks during the > encoding. > > So let's rewrite this as follows (we'll deal with TelemetryStopWatch later): > > Task.spawn(function* () { > let bufferMsg; > if (typeof buffer == "string" && buffer.length > LARGE_BUFFER) { > // Bla > let chunks = []; > for (let oneChunk of chunksOf(buffer, CHUNK_SIZE)) { > chunks.push(encoder.encode(oneChunk)); > yield Promise.resolve(); // Defer the rest of the computation to the > next tick, to avoid jank > } > bufferMsg = {chunks: chunks}; > } else { > // Bla > bufferMsg = Type.void_t.in_ptr.toMsg(buffer); > } > return Scheduler.post(...) > }); I will re-write it :)
Comment 8•10 years ago
|
||
David, I have a situation here. If I run tests without `TelemetryStopwatch`, one of the test (test_writeAtomic) from `toolkit/components/osfile/test/xpcshell/test_telemetry.js` fails; everything goes fine with `TelemetryStopwatch`.
Attachment #8484610 -
Flags: feedback?(dteller)
Reporter | ||
Comment 9•10 years ago
|
||
Comment on attachment 8484610 [details] [diff] [review] Improved patch as per comment 6. Review of attachment 8484610 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the delay, I had a few busy days. Good progress. Now, one of the things this demonstrates is that our tests do not trigger chunkification yet, so we'll need to add a test that does that. Let's put that new test in test_read_write.js. Something along the lines of `add_test_pair(function* test_write_large() { // 1. Generate a string much larger than LARGE_BUFFER for (let encoding of [undefined, "utf8", "utf16"]) { // 2. Use OS.File.writeAtomic(path, largeString, {encoding: encoding}) to write it to some file // 3. Use OS.File.read(path, {encoding: ...}) to read it back // 4. Check that both strings are identical } });` PS: We'll handle Telemetry once the rest works. ::: toolkit/components/osfile/modules/osfile_async_front.jsm @@ +27,5 @@ > +// Defines the minimum length of a string required for splitting into > +// smaller chunks. > +const LARGE_BUFFER = 1024; > +// Defines the size of each chunk while splitting a large string. > +const CHUNK_SIZE = 1024; Note: At some point, we will probably decide to move these constants somewhere else in the source code. We'll decide once the rest is finalized. @@ +161,5 @@ > + * > + * @param {string} string String to split > + * @param {number} chunkSize Size of each chunk size > + * > + * @return {*} Small chunk of given size Actually, this would be `@return {string} ...`. @@ +167,5 @@ > +function* chunksOf(string, chunkSize) { > + for (let index = 0; index < string.length; index += chunkSize) { > + yield string.substring(index, index + chunkSize); > + } > +} Same here, this is a useful utility, so it will probably move somewhere else, eventually. @@ +1230,5 @@ > + if (typeof buffer == "string" && buffer.length > LARGE_BUFFER) { > + let chunks = []; > + let encoder = new TextEncoder(); > + for (let oneChunk of chunksOf(buffer, CHUNK_SIZE)) { > + chunks.push(encoder.encode(oneChunk)); I realize that we forget to take care of the encoding here. Once you have prepared your test, this will cause issues. Let's deal with this once you have the test that will let you monitor your progress. @@ +1240,5 @@ > + } > + }); > + return Scheduler.post("writeAtomic", > + [Type.path.toMsg(path), bufferMsg, options], > + [options, bufferMsg]); Unfortunately, that's not going to work. Whenever you call `yield Promise.resolve()`, you instruct the environment to continue executing the contents of your `Task.spawn(...)` after a few milliseconds. However, this does not prevent the `return Scheduler.post(...)` from being executed immediately. So, you will need to make the following changes: 1/ Put the `return Scheduler.post(...)` inside the call to `Task.spawn`, to ensure that its execution is also delayed until we have finished computing the contents of the message. 2/ `return Task.spawn(...)`, so that the `return Scheduler.post(...)` is propagated. ::: toolkit/components/osfile/modules/osfile_shared_allthreads.jsm @@ +453,5 @@ > if (msg == null) { > return null; > } > + if ("chunks" in msg) { > + return msg.chunks.join(""); Wait, that's wrong. `msg.chunks` is an array of `Uint8Array`. If you try to `join` them, this will return a string such as "[object Uint8Array],[object Uint8Array]". You need to 1/ create a new Uint8Array `result` whose size is the sum of sizes of all the arrays in `msg.chunks`; 2/ loop through all the arrays in `msg.chunks` and call `result.set(chunk, offset)` – see https://developer.mozilla.org/en-US/docs/Web/API/Uint8Array#set%28%29 for the definition of the relevant method.
Attachment #8484610 -
Flags: feedback?(dteller) → feedback+
Reporter | ||
Updated•10 years ago
|
Attachment #8483729 -
Attachment is obsolete: true
Comment 10•10 years ago
|
||
Attachment #8484610 -
Attachment is obsolete: true
Attachment #8487424 -
Flags: feedback?(dteller)
Reporter | ||
Comment 11•10 years ago
|
||
Comment on attachment 8487424 [details] [diff] [review] Improved patch as per Comment 9. Review of attachment 8487424 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/osfile/modules/osfile_async_front.jsm @@ +27,5 @@ > +// Defines the minimum length of a string required for splitting into > +// smaller chunks. > +const LARGE_BUFFER = 1024; > +// Defines the size of each chunk while splitting a large string. > +const CHUNK_SIZE = 1024; Before we proceed, let's make these constants public. This will make it easier for us to benchmark and try and find the best combination. The best place, I believe, is to move their definition to osfile_shared_allthreads.jsm, in an object `Config.writeAtomic` (add a field `writeAtomic` to this object `Config`, it already exists). You will be able to access them in this file as `SharedAll.Config.writeAtomic`. @@ +167,5 @@ > +function* chunksOf(string, chunkSize) { > + for (let index = 0; index < string.length; index += chunkSize) { > + yield string.substring(index, index + chunkSize); > + } > +} This also looks quite generic. Let's also put it in osfile_shared_allthreads.jsm. Don't forget to export it, as follows exports.chunksOf = chunksOf and to add it to `EXPORTED_SYMBOLS`. @@ +1221,5 @@ > }; > // Note: Type.void_t.out_ptr.toMsg ensures that > // - the buffer is effectively shared (not neutered) between both > // threads; > // - we take care of any |byteOffset|. Let's move the comment block to the line before the call to Type.void_t.in_ptr.toMsg @@ +1228,5 @@ > + let bufferMsg; > + // If string is large, split it in to smaller chunks. > + if (typeof buffer == "string" && buffer.length > LARGE_BUFFER) { > + let chunks = []; > + let encoder = new TextEncoder("utf-8"); Users can specify an encoding via a field `options.encoding`, so we should use let encoder = new TextEncoder(options.encoding || "utf-8") @@ +1231,5 @@ > + let chunks = []; > + let encoder = new TextEncoder("utf-8"); > + for (let oneChunk of chunksOf(buffer, CHUNK_SIZE)) { > + chunks.push(encoder.encode(oneChunk)); > + yield Promise.resolve(); Could you add a comment along the lines of "defer execution to the next tick"? ::: toolkit/components/osfile/modules/osfile_shared_allthreads.jsm @@ +458,5 @@ > + // Find the length of the final array. > + let resultLength = 0; > + for (let index = 0; index < msg.chunks.length; index++) { > + resultLength += msg.chunks[index].length; > + } Nit: the following is slightly nicer and less error=prone for (let chunk of msg.chunks) { ... } @@ +464,5 @@ > + let result = new Uint8Array(resultLength); > + for (let index = 0; index < msg.chunks.length; index++) { > + result.set(msg.chunks[index], offset); > + offset += msg.chunks[index].length; > + } Here, too. ::: toolkit/components/osfile/tests/xpcshell/test_read_write.js @@ +23,5 @@ > OS.File.writeAtomic(SHARED_PATH, string2); > let string3 = yield OS.File.read(SHARED_PATH, { encoding: "utf-8" }); > do_check_eq(string3, string2); > }); > In the test, you will need to call let SharedAll = {}; Cu.import("resource://gre/modules/osfile/osfile_shared_allthreads.jsm", SharedAll); to import SharedAll.Config.writeAtomic. @@ +25,5 @@ > do_check_eq(string3, string2); > }); > > +add_test_pair(function* test_write_large() { > + let string1 = "Initial state " + Math.random(); Could you add some special characters, to be sure that this works for non-ascii strings? e.g. éßµ•¥✓ℵ®ʤπ
Attachment #8487424 -
Flags: feedback?(dteller) → feedback+
Comment 12•10 years ago
|
||
Hi David, I tried fixing the test for `undefined` encoding but its still not working. Please have a look.
Attachment #8487424 -
Attachment is obsolete: true
Attachment #8489145 -
Flags: feedback?(dteller)
Reporter | ||
Comment 13•10 years ago
|
||
Comment on attachment 8489145 [details] [diff] [review] Making generic objects public. Review of attachment 8489145 [details] [diff] [review]: ----------------------------------------------------------------- Ah, right, your change to osfile_native.jsm indicates that we should handle `undefined` encoding a bit differently. Side-note: Could you add a version number when you upload a patch? This simplifies checking the list of changes between versions. ::: toolkit/components/osfile/modules/osfile_native.jsm @@ +42,5 @@ > // Sanity check on types of options > if ("encoding" in options && typeof options.encoding != "string") { > + if (options.encoding != undefined) { > + return Promise.reject(new TypeError("Invalid type for option encoding")); > + } Let's cancel this change. ::: toolkit/components/osfile/modules/osfile_shared_allthreads.jsm @@ +71,5 @@ > + writeAtomic: { > + /** > + * Defines the minimum length of a string required for splitting into > + * smaller chunks. > + * Nit: Please remove the whitespace on this line. @@ +77,5 @@ > + */ > + LARGE_BUFFER: 1024, > + > + /** > + * Defines the size of each chunk while splitting a large string. Could you mention that it's a size in characters (by opposition to bytes). @@ +78,5 @@ > + LARGE_BUFFER: 1024, > + > + /** > + * Defines the size of each chunk while splitting a large string. > + * Nit: Please remove the whitespace on this line. ::: toolkit/components/osfile/modules/osfile_shared_front.jsm @@ +334,5 @@ > } > if ("encoding" in options && typeof options.encoding != "string") { > + if (options.encoding != undefined) { > + throw new TypeError("Invalid type for option encoding"); > + } Let's cancel this change. ::: toolkit/components/osfile/tests/xpcshell/test_read_write.js @@ +26,5 @@ > do_check_eq(string3, string2); > }); > > +add_test_pair(function* test_write_large() { > + let string1 = "ABCDEFGHéßµ•¥✓ℵ®ʤπ
Attachment #8489145 -
Flags: feedback?(dteller) → feedback+
Comment 14•10 years ago
|
||
Attachment #8489145 -
Attachment is obsolete: true
Attachment #8491653 -
Flags: feedback?(dteller)
Reporter | ||
Comment 15•10 years ago
|
||
Comment on attachment 8491653 [details] [diff] [review] Optimize writeAtomic for very large strings, v4 Review of attachment 8491653 [details] [diff] [review]: ----------------------------------------------------------------- Ok, this looks good. Let's move on to the next step, i.e. getting rid of the race condition. You'll need to pull the latest mozilla-central, to get the fix for bug 1066619. This will let us simplify the code a little and, normally, get rid of the race. This fix makes it possible to pass a Promise as an argument to Scheduler.post (here, for `bufferMsg`). So: - move the call to `Scheduler.post` out of the `Task.spawn`; - in the case in which we don't need to split the string, just do `bufferMsg = Type.void_t.in_ptr.toMsg(buffer)`; - in the case in which we need to split the string, `bufferMsg = Task.spawn(function*() { ... return {chunks: chunks}; });` – this will return the message as Promise, which will be resolved by `Scheduler.post`. Once this works, we will move to: - actually transferring the data instead of copying it (which requires bug 1069577); - getting Telemetry back in place. ::: toolkit/components/osfile/tests/xpcshell/test_read_write.js @@ +31,5 @@ > + // Generate a string much greater than LARGE_BUFFER. > + while (string1.length < 4*1024) { > + string1 += string1 + string1; > + } > + for (let encoding of [/*undefined, */"utf-8", "utf-16"]) { Does this work if you uncomment `undefined`?
Attachment #8491653 -
Flags: feedback?(dteller) → feedback+
Reporter | ||
Comment 16•10 years ago
|
||
Did you get an opportunity to proceed?
Assignee: nobody → ashutoshdhundhara
Flags: needinfo?(ashutoshdhundhara)
Comment 17•10 years ago
|
||
Hi David, Sorry for the delay. Last week I had my exams. I have acted upon your comments. See if everything is fine or not. I have some doubts which I think can be best resolved on IRC. Also, the tests still fail for `undefined` encoding. When I looked into the test logs, I saw that in some tests the object containing the `chunks` is getting compared against the input string.
Attachment #8491653 -
Attachment is obsolete: true
Attachment #8496907 -
Flags: feedback?(dteller)
Flags: needinfo?(ashutoshdhundhara) → needinfo?(dteller)
Reporter | ||
Comment 18•10 years ago
|
||
Comment on attachment 8496907 [details] [diff] [review] Optimize writeAtomic for very large strings, v5 Review of attachment 8496907 [details] [diff] [review]: ----------------------------------------------------------------- No problem, exams are a good reason :) As for the encoding, yes, that's clearly my fault. `OS.File.read` returns a Uint8Array if we don't specify an encoding. That's a very rare usecase, but we had to keep it for compatibility reasons. I will let you adapt the call to OS.File.read so that it uses "utf-8" encoding for reading when `encoding` is `undefined`. After this, let's proceed with Telemetry. ::: toolkit/components/osfile/modules/osfile_shared_allthreads.jsm @@ +49,5 @@ > "isTypedArray", > "defineLazyGetter", > "offsetBy", > + "OS", // Warning: this exported symbol will disappear > + "chunksOf" Nit: Can you put "chunksOf" above "OS"?
Attachment #8496907 -
Flags: feedback?(dteller) → feedback+
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(dteller)
Reporter | ||
Comment 19•10 years ago
|
||
Ashutoshd, I haven't heard from you in some time. How are things progressing?
Flags: needinfo?(ashutoshdhundhara)
Reporter | ||
Comment 20•10 years ago
|
||
Unfortunately, it seems that ashutoshd is not working on this anymore :/ De-assigning bug.
Assignee: ashutoshdhundhara → nobody
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(ashutoshdhundhara)
Updated•8 years ago
|
Assignee: nobody → abhishekp.bugzilla
Updated•8 years ago
|
Status: NEW → ASSIGNED
Comment 21•8 years ago
|
||
Un-assigning myself. I am exploring contributing to Firefox for Android. In the meanwhile, anyone else interested can take up this bug.
Status: ASSIGNED → NEW
Updated•8 years ago
|
Assignee: abhishekp.bugzilla → nobody
Reporter | ||
Comment 22•4 years ago
|
||
OS.File is on its way out, closing bug.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Updated•11 months ago
|
Product: Toolkit → Toolkit Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•