Closed Bug 1454460 Opened Last year Closed Last year

Simplify BOM handling in StreamLoader

Categories

(Core :: CSS Parsing and Computation, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: bholley, Assigned: bholley)

References

Details

Attachments

(2 files)

MozReview-Commit-ID: 5zrKyadBppO
Attachment #8968328 - Flags: review?(bzbarsky)
This is the rebased patch from the other bug.

MozReview-Commit-ID: HCop9h2abZU
Attachment #8968345 - Flags: review?(bzbarsky)
Blocks: 1346988
No longer blocks: 1454030
Comment on attachment 8968328 [details] [diff] [review]
Buffer the potentially-BOM-related bytes separately and handle them on the fly. v1

>+   * For uninteresting reasons, the caller may have the data split into two

Why not just have CSS use the "streaming mode" SRICheckDataVerifier instead?  That's designed to handle multiple chunks, etc...

>+++ b/layout/style/SheetLoadData.h
>+                                   const nsACString& aPrefix,
>+                                   const nsACString& aRest,

This needs at least documentation about how aPrefix differs from aRest.

>+    if (encoding == UTF_8_ENCODING && Encoding::UTF8ValidUpTo(bytes) == bytes.Length()) {

This is losing an optimization the old code used to have: when we had a UTF8 encoding (at least via the BOM) but not all the data was valid, it would pass in the valid length to DecodeWithoutBOMHandling.  It seems like it should be possible to preserve that optimization.  Maybe something like:

  size_t validated = 0;
  if (encoding == UTF_8_ENCODING) {
    validated = Encoding::UTF8ValidUpTo(bytes);
  }

  if (validated == bytes.Length()) {
    // Either this is UTF-8 and all valid, or it's not UTF-8 but is an
    // empty string.  This assumes that an empty string in any encoding
    // decodes to empty string, which seems like a plausible assumption.
    utf8String.Assign(bytes);
  } else {
    rv = encoding->DecodeWithoutBOMHandling(bytes, utf8String, validated);
  }

or something.

>+++ b/layout/style/StreamLoader.h
>   nsCString mBytes;

Please add a comment about how this never contains the BOM when there is a BOM involved, just the actual text bytes.

>+  nsAutoCStringN<3> mFirstThreeBytes;

I'm torn between that and mBOM or something.  By the time we get to the end it has the latter meaning.  During load it can be the first 3, or 2, or 1 or 0 bytes...

Either way, we should document what's stored in this string at what stage of the load.

r=me
Attachment #8968328 - Flags: review?(bzbarsky) → review+
Comment on attachment 8968345 [details] [diff] [review]
Part 2 - Pass a bonafide nsACString to Servo. v2

>+  if (aUTF8.IsEmpty()) {

Worth testing && !aUTF16.IsEmpty()?  Or should we just not worry too much about the empty-data case and the extra "copy" we end up doing then?

r=me
Attachment #8968345 - Flags: review?(bzbarsky) → review+
Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c92f6b8e882b
Buffer the potentially-BOM-related bytes separately and handle them on the fly. r=bz
https://hg.mozilla.org/integration/autoland/rev/985679bb1928
Pass a bonafide nsACString to Servo. r=bz
https://hg.mozilla.org/integration/autoland/rev/a368c8d6676b
Hoist SRI helper into Loader.cpp. r=bz
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #6)
> Comment on attachment 8968328 [details] [diff] [review]
> Buffer the potentially-BOM-related bytes separately and handle them on the
> fly. v1
> 
> >+   * For uninteresting reasons, the caller may have the data split into two
> 
> Why not just have CSS use the "streaming mode" SRICheckDataVerifier instead?
> That's designed to handle multiple chunks, etc...

Per discussion on IRC, I'll add a third patch to move the helper to Loader.cpp.

> 
> >+++ b/layout/style/SheetLoadData.h
> >+                                   const nsACString& aPrefix,
> >+                                   const nsACString& aRest,
> 
> This needs at least documentation about how aPrefix differs from aRest.

Done. I also renamed to aBytes1 and aBytes2, which I've decided is clearer.

> 
> >+    if (encoding == UTF_8_ENCODING && Encoding::UTF8ValidUpTo(bytes) == bytes.Length()) {
> 
> This is losing an optimization the old code used to have: when we had a UTF8
> encoding (at least via the BOM) but not all the data was valid, it would
> pass in the valid length to DecodeWithoutBOMHandling.  It seems like it
> should be possible to preserve that optimization.  Maybe something like:

ok.

> 
> >+++ b/layout/style/StreamLoader.h
> >   nsCString mBytes;
> 
> Please add a comment about how this never contains the BOM when there is a
> BOM involved, just the actual text bytes.
> 
> >+  nsAutoCStringN<3> mFirstThreeBytes;
> 
> I'm torn between that and mBOM or something.  By the time we get to the end
> it has the latter meaning.  During load it can be the first 3, or 2, or 1 or
> 0 bytes...

I'll make it mBOMBytes.

> 
> Either way, we should document what's stored in this string at what stage of
> the load.

Done.

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #7)
> Comment on attachment 8968345 [details] [diff] [review]
> Part 2 - Pass a bonafide nsACString to Servo. v2
> 
> >+  if (aUTF8.IsEmpty()) {
> 
> Worth testing && !aUTF16.IsEmpty()?  Or should we just not worry too much
> about the empty-data case and the extra "copy" we end up doing then?

I'll just switch it to !aUTF16.IsEmpty() and MOZ_ASSERT(aUTF16.IsEmpty() || aUTF8.IsEmpty());
You need to log in before you can comment on or make changes to this bug.