Closed Bug 1287246 Opened 9 years ago Closed 9 years ago

Add yielding support to StreamingLexer

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla50

Tracking Flags:

Tracking

Status

firefox50

---

fixed

People

(Reporter: seth, Assigned: seth)

References

Details

Attachments

(5 files, 7 obsolete files)

(Part 1a) - Expose LexerResult from the StreamingLexer API and add an explicit Yield type. 9 years ago Seth Fowler [:seth] [:s2h] 17.84 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
(Part 1b) - Update tests. 9 years ago Seth Fowler [:seth] [:s2h] 13.80 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
(Part 2) - Store the state for unbuffered reads in a separate structure in StreamingLexer. 9 years ago Seth Fowler [:seth] [:s2h] 5.77 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
(Part 3) - Add support for yielding to StreamingLexer. 9 years ago Seth Fowler [:seth] [:s2h] 24.22 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
(Part 4) - Make the data checking in TestStreamingLexer more discerning. 9 years ago Seth Fowler [:seth] [:s2h] 4.99 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
(Part 5) - Add GTests for StreamingLexer's support for yielding. 9 years ago Seth Fowler [:seth] [:s2h] 15.97 KB, patch		Details \| Diff \| Splinter Review
(Part 5) - Add GTests for StreamingLexer's support for yielding. 9 years ago Seth Fowler [:seth] [:s2h] 15.96 KB, patch	n.nethercote : review+	Details \| Diff \| Splinter Review
(Part 1) - Expose LexerResult from the StreamingLexer API and add an explicit Yield type. 9 years ago Seth Fowler [:seth] [:s2h] 31.55 KB, patch		Details \| Diff \| Splinter Review
(Part 2) - Store the state for unbuffered reads in a separate structure in StreamingLexer. 9 years ago Seth Fowler [:seth] [:s2h] 5.77 KB, patch		Details \| Diff \| Splinter Review
(Part 3) - Add support for yielding to StreamingLexer. 9 years ago Seth Fowler [:seth] [:s2h] 24.93 KB, patch		Details \| Diff \| Splinter Review
(Part 4) - Make the data checking in TestStreamingLexer more discerning. 9 years ago Seth Fowler [:seth] [:s2h] 4.99 KB, patch		Details \| Diff \| Splinter Review
(Part 5) - Add GTests for StreamingLexer's support for yielding. 9 years ago Seth Fowler [:seth] [:s2h] 23.22 KB, patch		Details \| Diff \| Splinter Review

Seth Fowler [:seth] [:s2h]

Assignee

Description

•

9 years ago

Alright, this is what all of the refactorings recently have been working towards. In this bug we'll add support for yielding to StreamingLexer. This will in turn allow us to decode animated images one frame at a time, which will enable a drastic reduction in memory usage for animated images.

Seth Fowler [:seth] [:s2h]

Assignee

Comment 1

•

9 years ago

Attached patch (Part 1a) - Expose LexerResult from the StreamingLexer API and add an explicit Yield type. (obsolete) — Details — Splinter Review

The first step is to provide a way to communicate that a yield has happened to Decoder code. We already have a LexerResult type; this patch just propagates it out far enough that Decoder::Decode() can get ahold of it. There can be multiple reasons to yield that are semantically different. For that reason, this patch also adds a Yield enumeration, to which we can add reasons for yielding as needed. In this patch, the only value is Yield::NEED_MORE_DATA, which takes the place of a |Nothing()| value in the old |Maybe<TerminalState>| API.

Attachment #8771632 - Flags: review?(n.nethercote)

Seth Fowler [:seth] [:s2h]

Assignee

Comment 2

•

9 years ago

Attached patch (Part 1b) - Update tests. (obsolete) — Details — Splinter Review

The tests need to be updated to reflect the change in part 1a. This is all pretty mechanical.

Attachment #8771633 - Flags: review?(n.nethercote)

Seth Fowler [:seth] [:s2h]

Assignee

Comment 3

•

9 years ago

Attached patch (Part 2) - Store the state for unbuffered reads in a separate structure in StreamingLexer. (obsolete) — Details — Splinter Review

This is a pretty small change: we're going to add another piece of state for unbuffered reads in part 3, so it'll be handy to keep them together and manage them atomically. This patch adds an UnbufferedState struct that lets us do just that.

Attachment #8771635 - Flags: review?(n.nethercote)

Seth Fowler [:seth] [:s2h]

Assignee

Comment 4

•

9 years ago

Attached patch (Part 3) - Add support for yielding to StreamingLexer. (obsolete) — Details — Splinter Review

Here's the meat of the bug. I wish I could've broken this out into smaller pieces, but it's not easy. You can read the newly added documentation to understand how the yielding API is meant to work, and I've tried to add comments wherever anything in the code might be unclear, so hopefully the patch itself will serve as a pretty good explanation of what's going on here. The structure of the code has changed a little in order to make things as easy to understand as possible. It's discussed in the patch, but I wanted to discuss here as well the reason that after yielding, we return the same data. (Or the same data minus whatever has already been consumed, in the unbuffered case.) I want to make it easy to someone writing a decoder to support the pattern "get some data, yield to allow the caller to do something with the data or maybe even terminate decoding, and then come back and keep processing the data". As an example, imagine if we implement metadata decodes with yielding, so that decoders don't need to know if they're performing a metadata decode or not: we can read the header, yield, and then continue processing the data in the header (for example by allocating textures) if the caller decides to continue processing. If we don't have a way to get back the same data that the state we yielded from had, doing this will be all kinds of tricky and complicated in many cases. I want to minimize the statefulness of the decoders, and this design reduces the amount of state the decoders need to keep. This is even more crucial in the unbuffered case; yielding would be useless if we couldn't return the same data. Consider the PNG and JPEG decoders, which are currently implemented with one giant unbuffered read. If we want to allow yielding from these decoders, we have to be able to say "I consumed X bytes, so drop those X bytes, but give me back the remaining bytes that I haven't yet consumed". We *do*, in fact, want to start yielding from those decoders, and that's why we need the ability to get the same data back for unbuffered reads. Hopefully this explains why yielding works this way, because I know it might not be obvious at first glance. Of course, it's probably clear that if you *don't* need the data from the state you yielded from, you can trivially just jump to another state, so the convenience level remains high even in that case. Easy peasy.

Attachment #8771637 - Flags: review?(n.nethercote)

Seth Fowler [:seth] [:s2h]

Assignee

Comment 5

•

9 years ago

Attached patch (Part 4) - Make the data checking in TestStreamingLexer more discerning. (obsolete) — Details — Splinter Review

Alright, if we want to trust this yielding stuff works as intended, we need to be sure that yielding is giving us the same data that the state we yielded from got. That's a lot easier to believe if the data isn't a repeated [1, 2, 3] pattern, so this is a quick patch to make every byte in the test array in TestStreamingLexer unique.

Attachment #8771638 - Flags: review?(n.nethercote)

Seth Fowler [:seth] [:s2h]

Assignee

Comment 6

•

9 years ago

Attached patch (Part 5) - Add GTests for StreamingLexer's support for yielding. (obsolete) — Details — Splinter Review

Here are some new tests. The tests are mostly for the new yielding code, although I also added a test or two related to the edge case of what state you start StreamingLexer in. (I'm sure you noticed from the code in part 3 that I gave this some thought.)

Attachment #8771639 - Flags: review?(n.nethercote)

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Assignee: nobody → seth

Seth Fowler [:seth] [:s2h]

Assignee

Comment 7

•

9 years ago

Here's a try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=83874e5dc607

Seth Fowler [:seth] [:s2h]

Assignee

Comment 8

•

9 years ago

Attached patch (Part 5) - Add GTests for StreamingLexer's support for yielding. (obsolete) — Details — Splinter Review

Fixed a warning in the test code.

Attachment #8771806 - Flags: review?(n.nethercote)

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Attachment #8771639 - Attachment is obsolete: true

Attachment #8771639 - Flags: review?(n.nethercote)

Seth Fowler [:seth] [:s2h]

Assignee

Comment 9

•

9 years ago

New try job here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5fe975d6f7ea

Nicholas Nethercote [inactive]

Comment 10

•

9 years ago

Comment on attachment 8771632 [details] [diff] [review] (Part 1a) - Expose LexerResult from the StreamingLexer API and add an explicit Yield type. Review of attachment 8771632 [details] [diff] [review]: ----------------------------------------------------------------- ::: image/Decoder.cpp @@ +121,5 @@ > PROFILER_LABEL("ImageDecoder", "Decode", js::ProfileEntry::Category::GRAPHICS); > AutoRecordDecoderTelemetry telemetry(this); > > + return DoDecode(*mIterator, aOnResume); > + }(); Is using the lambda necessary here? Seems like an overly-clever way of doing things. Please remove it if possible. ::: image/StreamingLexer.h @@ +299,5 @@ > // the network yet. We don't have to do anything special; the > // SourceBufferIterator will ensure that |aOnResume| gets called when > // more data is available. > + result = Some(LexerResult(Yield::NEED_MORE_DATA)); > + break; With this change we'll go back around the loop; before we didn't. Is that intended?

Attachment #8771632 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Updated

•

9 years ago

Attachment #8771633 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Comment 11

•

9 years ago

Comment on attachment 8771635 [details] [diff] [review] (Part 2) - Store the state for unbuffered reads in a separate structure in StreamingLexer. Review of attachment 8771635 [details] [diff] [review]: ----------------------------------------------------------------- ::: image/StreamingLexer.h @@ +361,4 @@ > MOZ_ASSERT(mTransition.UnbufferedState() == > unbufferedTransition.NextState()); > + mUnbufferedState->mBytesRemaining -= > + std::min(mUnbufferedState->mBytesRemaining, aIterator.Length()); The introduction of the min() call here changes the functionality slightly. I think you're just being more cautious, to make sure mBytesRemaining never goes negative. Is that right? A comment about this might be useful, possibly with a reference to the new assertion at the top of the function.

Attachment #8771635 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Comment 12

•

9 years ago

Comment on attachment 8771637 [details] [diff] [review] (Part 3) - Add support for yielding to StreamingLexer. Review of attachment 8771637 [details] [diff] [review]: ----------------------------------------------------------------- ::: image/StreamingLexer.h @@ +49,5 @@ > /// Possible yield reasons for the lexer. > enum class Yield > { > + NEED_MORE_DATA, // The lexer cannot continue without more data. > + OUTPUT_AVAILABLE // There is output available for the caller to consume. |Yield| and |ControlFlowStrategy| are isomorphic. Will this always be the case? Is having them as separate types important? @@ +214,5 @@ > + /** > + * Continue receiving unbuffered data. @aUnbufferedState should be the same > + * state as the @aUnbufferedState specified in the preceding call to > + * ToUnbuffered(). @aSize indicates the amount of data that has already been > + * consumed; the next state will same data that the current state delivered, Garbled sentence: "the next state will same data". @@ +357,5 @@ > + // semantically (since yield states are supposed to deliver the same data > + // as previous states, and there's no previous state here), but more > + // importantly, it's necessary to advance a SourceBufferIterator at least > + // once before you can read from it, and adding the necessary checks to > + // avoid that issue to Lex() has the potential of masking real bugs. So Moving the "to Lex()" after "checks" will make this sentence easier to parse, IMO. @@ +383,5 @@ > + // If the lexer requested a yield last time, we deliver the same data again > + // before we read anything else from |aIterator|. Note that although to the > + // callers of Lex(), Yield::NEED_MORE_DATA is just another type of yield, > + // internally they're different in that we don't redeliver the same data in > + // the Yield::NEED_MORE_DATA case, and |mYieldingToState| to state is not "to state" doesn't make sense here. Omit? @@ +385,5 @@ > + // callers of Lex(), Yield::NEED_MORE_DATA is just another type of yield, > + // internally they're different in that we don't redeliver the same data in > + // the Yield::NEED_MORE_DATA case, and |mYieldingToState| to state is not > + // set. This means that for Yield::NEED_MORE_DATA, we just directly to the > + // loop below. Garbled sentence: "we just directly to the". @@ +488,5 @@ > + // chunk. Make the necessary adjustments. (Note that the std::min call is > + // just belt-and-suspenders to keep this code memory safe even if there's > + // a bug somewhere.) > + const size_t toSkip = > + std::min(mUnbufferedState->mBytesConsumedInCurrentChunk, aIterator.Length()); Adding this kind of belt-and-suspenders comment would be a nice way to address my std::min comment from the previous patch's review. @@ +524,5 @@ > + > + MOZ_ASSERT(mTransition.UnbufferedState() == > + unbufferedTransition.NextState()); > + > + // Perform bookkeeping. Fun fact: "bookkeeping" is a word with three double letters in a row. @@ +532,5 @@ > + } > + > + MOZ_ASSERT(unbufferedTransition.Size() == 0); > + mUnbufferedState->mBytesRemaining -= > + std::min(mUnbufferedState->mBytesRemaining, aLength); Another brief comment about the std::min call?

Attachment #8771637 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Updated

•

9 years ago

Attachment #8771638 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Comment 13

•

9 years ago

Comment on attachment 8771806 [details] [diff] [review] (Part 5) - Add GTests for StreamingLexer's support for yielding. Review of attachment 8771806 [details] [diff] [review]: ----------------------------------------------------------------- ::: image/test/gtest/TestStreamingLexer.cpp @@ +403,5 @@ > + mSourceBuffer->Append(mData + i, 1); > + LexerResult result = mLexer.Lex(mIterator, mCountResumes, DoLexWithYield); > + > + switch (i) { > + case 2: I don't understand why 2 gets special treatment here. @@ +681,5 @@ > + > + // We expect 2 resumes because TestState::ONE consumes 3 bytes and then > + // yields. When the lexer resumes at TestState::TWO, which receives the same 3 > + // bytes, TerminateSuccess() gets called immediately. That's three bytes > + // total, which are delivered one at a time, requiring 2 resumes. Nit: extra space after the comma.

Attachment #8771806 - Flags: review?(n.nethercote) → review+

Nicholas Nethercote [inactive]

Comment 14

•

9 years ago

BTW: thank you for all the lovely comments in these patches.

Seth Fowler [:seth] [:s2h]

Assignee

Comment 15

•

9 years ago

Thanks for the quick review! Some comments before I upload the updated patches: (In reply to Nicholas Nethercote [:njn] from comment #10) > Is using the lambda necessary here? Seems like an overly-clever way of doing > things. Please remove it if possible. I was wondering if I was going to get away with that. =) It's not necessary; the code is just a little uglier without it due to the necessity of initializing LexerResult with an arbitrary value. That can also be avoided by breaking that code out into a separate method, of course, so I may just do that. > > ::: image/StreamingLexer.h > @@ +299,5 @@ > > // the network yet. We don't have to do anything special; the > > // SourceBufferIterator will ensure that |aOnResume| gets called when > > // more data is available. > > + result = Some(LexerResult(Yield::NEED_MORE_DATA)); > > + break; > > With this change we'll go back around the loop; before we didn't. Is that > intended? We'll drop out of the loop at the bottom, since |result| is a |Some()| value. I figured we may as well handle all of the cases the same way now that LexerResult can actually express the "need more data" concept. (In reply to Nicholas Nethercote [:njn] from comment #11) > The introduction of the min() call here changes the functionality slightly. > I think you're just being more cautious, to make sure mBytesRemaining never > goes negative. Is that right? A comment about this might be useful, possibly > with a reference to the new assertion at the top of the function. Yep, that's right. I'll discuss this more in a followup comment, but I ended up pulling this code out into a separate method. (In reply to Nicholas Nethercote [:njn] from comment #12) > |Yield| and |ControlFlowStrategy| are isomorphic. Will this always be the > case? Is having them as separate types important? I don't expect them to always remain isomorphic, no. The idea is that we can expand this enum to provide more reasons for yielding, which will make calling code clearer and should make it easier to add assertions. In particular, Yield::METADATA_AVAILABLE seems like a natural extension. I haven't spent any time thinking this through in detail, but I think we can probably make decoders not even know whether they exist for metadata decoding or a normal decode; they can just yield, and continue their work if the caller wants to let them. (In reply to Nicholas Nethercote [:njn] from comment #14) > BTW: thank you for all the lovely comments in these patches. Glad they helped!

Seth Fowler [:seth] [:s2h]

Assignee

Comment 16

•

9 years ago

(In reply to Nicholas Nethercote [:njn] from comment #13) > Comment on attachment 8771806 [details] [diff] [review] > (Part 5) - Add GTests for StreamingLexer's support for yielding. > > Review of attachment 8771806 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: image/test/gtest/TestStreamingLexer.cpp > @@ +403,5 @@ > > + mSourceBuffer->Append(mData + i, 1); > > + LexerResult result = mLexer.Lex(mIterator, mCountResumes, DoLexWithYield); > > + > > + switch (i) { > > + case 2: > > I don't understand why 2 gets special treatment here. That's the place where DoLexWithYield() actually yields, so we check for it here.

Seth Fowler [:seth] [:s2h]

Assignee

Comment 17

•

9 years ago

Attached patch (Part 1) - Expose LexerResult from the StreamingLexer API and add an explicit Yield type. — Details — Splinter Review

Here's an updated version of part 1. The review comments are addressed, and part 1a and 2b are folded together. I decided to just initialize the LexerResult value in Decoder::Decode() with TerminalState::FAILURE, which eliminates the need for the lambda. For some reason that option seemed grosser to me Friday night than it does now. =)

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Attachment #8771632 - Attachment is obsolete: true

Attachment #8771633 - Attachment is obsolete: true

Seth Fowler [:seth] [:s2h]

Assignee

Comment 18

•

9 years ago

Attached patch (Part 2) - Store the state for unbuffered reads in a separate structure in StreamingLexer. — Details — Splinter Review

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Attachment #8771635 - Attachment is obsolete: true

Seth Fowler [:seth] [:s2h]

Assignee

Comment 19

•

9 years ago

Attached patch (Part 3) - Add support for yielding to StreamingLexer. — Details — Splinter Review

This updated version of part 3 addresses the review comments. There's one other change, too. I noticed a bug when writing the tests for my next (and final, don't worry =) addition to StreamingLexer. It was possible to trip up the code as follows: 1. Start an unbuffered read. (Say it's for 3 bytes.) 2. Yield after only partially consuming the current chunk of the read. (Say the chunk holds all 3 bytes, but you only consumed 2.) 3. When you return to the unbuffered state after yielding, don't yield again, so that the current chunk of the read gets finished. (In this case, you'd consume the remaining byte, completing the unbuffered read.) If you did this with the previous version of the code, the bookkeeping (now I can't unsee the double letters!) would be wrong: we'd end up subtracted *1* from |mBytesRemaining|, not 3, so we'd wrongly think we still needed to read 2 more bytes. The issue is that we subtracted off the amount of data we consumed on the final read, which, because we yielded earlier, was not necessarily the entire chunk. Effectively we ended up forgetting about the data that we consumed when we yielded. This bug got introduced because of a confusion in the code between the length of the data that we're passing to the lexer function, and the length of the current chunk. The fix *could've* been a one-liner - I just needed to change the value that gets subtracted from |mBytesRemaining|. However, when I was fixing this issue, I noticed that we should've been updating |mBytesRemaining| in the early return in UnbufferedReadAfterYield() as well. I decided instead to pull the update out into its own little method, FinishCurrentChunkOfUnbufferedRead(), which has the simultaneous advantages of clarifying what's happening by its name, giving me more space to write comments (including clarifying what the |std::min| calls are for), and letting me explicitly name the parameter |aChunkLength| for further self-documentation. So the fix could've been a one-liner (well, two one-liners) but I made it more like a 15-liner to try to make the code more clear so similar mistakes don't happen again.

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Attachment #8771637 - Attachment is obsolete: true

Seth Fowler [:seth] [:s2h]

Assignee

Comment 20

•

9 years ago

Attached patch (Part 4) - Make the data checking in TestStreamingLexer more discerning. — Details — Splinter Review

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Attachment #8771638 - Attachment is obsolete: true

Seth Fowler [:seth] [:s2h]

Assignee

Comment 21

•

9 years ago

Attached patch (Part 5) - Add GTests for StreamingLexer's support for yielding. — Details — Splinter Review

I updated this part by adding a few new tests inspired by the issue I discussed above. There are a couple of paranoia tests but the important new one is ChunkPerStateWithUnbufferedYield(), which is a combination I hadn't covered before, and which would've exposed the issue immediately. The step-by-step instructions for reproducing it in part 3 are taken directly from this test.

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Attachment #8771806 - Attachment is obsolete: true

Seth Fowler [:seth] [:s2h]

Assignee

Comment 22

•

9 years ago

Here's one more try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6fcf9bd61275

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Blocks: 1287367

Pulsebot

Comment 23

•

9 years ago

Pushed by mfowler@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3ce690bdd3a8 (Part 1) - Expose LexerResult from the StreamingLexer API and add an explicit Yield type. r=njn https://hg.mozilla.org/integration/mozilla-inbound/rev/d858390c1386 (Part 2) - Store the state for unbuffered reads in a separate structure in StreamingLexer. r=njn https://hg.mozilla.org/integration/mozilla-inbound/rev/1888681263a2 (Part 3) - Add support for yielding to StreamingLexer. r=njn https://hg.mozilla.org/integration/mozilla-inbound/rev/faf0c4549a12 (Part 4) - Make the data checking in TestStreamingLexer more discerning. r=njn https://hg.mozilla.org/integration/mozilla-inbound/rev/c8ba6a6eec22 (Part 5) - Add GTests for StreamingLexer's support for yielding. r=njn

Carsten Book [:Tomcat]

Comment 24

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3ce690bdd3a8 https://hg.mozilla.org/mozilla-central/rev/d858390c1386 https://hg.mozilla.org/mozilla-central/rev/1888681263a2 https://hg.mozilla.org/mozilla-central/rev/faf0c4549a12 https://hg.mozilla.org/mozilla-central/rev/c8ba6a6eec22

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox50: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla50

Seth Fowler [:seth] [:s2h]

Assignee

Updated

•

9 years ago

Blocks: 1287691

You need to log in before you can comment on or make changes to this bug.