Last Comment Bug 541496 - Always provide strings rather than streams to the CSS parser
: Always provide strings rather than streams to the CSS parser
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla6
Assigned To: Zack Weinberg (:zwol)
:
Mentors:
Depends on: 682706 523496 670553
Blocks: 543151
  Show dependency treegraph
 
Reported: 2010-01-22 13:55 PST by Zack Weinberg (:zwol)
Modified: 2011-08-30 10:51 PDT (History)
11 users (show)
bzbarsky: in‑testsuite-
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
proposed patch (72.91 KB, patch)
2010-01-22 16:15 PST, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
patch v1a (rediff, delete a now-unused macro) (74.48 KB, patch)
2010-01-25 18:04 PST, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
Tp4 stylesheet size distribution (61.17 KB, image/png)
2010-01-26 10:27 PST, Zack Weinberg (:zwol)
no flags Details
v2: avoid a copy for inline style (88.49 KB, patch)
2010-01-26 13:25 PST, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
v2a: avoid a copy for inline style (89.16 KB, patch)
2010-01-26 13:31 PST, Zack Weinberg (:zwol)
bzbarsky: review-
Details | Diff | Splinter Review
v3 part 1: assume infallible malloc (48.22 KB, patch)
2010-03-04 15:06 PST, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
v3 part 2: don't use the low-level error to signal namespace lookup failure (11.21 KB, patch)
2010-03-04 15:08 PST, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
v3 part 3: don't use streams (47.58 KB, patch)
2010-03-04 15:11 PST, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
better handling of namespace lookup failure [checkin: comment 33] (8.81 KB, patch)
2010-03-05 10:49 PST, Zack Weinberg (:zwol)
zackw: review+
Details | Diff | Splinter Review
part 0: kill nsIUnicharStreamListener [checkin: comment 58] (20.77 KB, patch)
2011-05-04 15:19 PDT, Zack Weinberg (:zwol)
hsivonen: review+
Details | Diff | Splinter Review
part 1 (INCOMPLETE): revamp nsIUnicharStreamLoader (19.22 KB, patch)
2011-05-04 15:23 PDT, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
part 0a: whitespace cleanup on style/Loader.cpp [checkin: comment 58] (31.19 KB, patch)
2011-05-05 11:53 PDT, Zack Weinberg (:zwol)
dbaron: review+
Details | Diff | Splinter Review
part 1: revamp nsIUnicharStreamLoader and change css::Loader to use it (53.71 KB, patch)
2011-05-05 11:57 PDT, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
part 1 v2: revamp nsIUnicharStreamLoader and change css::Loader to use it (55.90 KB, patch)
2011-05-05 20:38 PDT, Zack Weinberg (:zwol)
no flags Details | Diff | Splinter Review
v4 part 1 (6.78 KB, patch)
2011-05-07 15:06 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v4 part 2 (8.36 KB, patch)
2011-05-07 15:08 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v4 part 3 (9.48 KB, patch)
2011-05-07 15:10 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v4 part 4 (25.89 KB, patch)
2011-05-07 15:12 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v4 part 5 (9.06 KB, patch)
2011-05-07 15:14 PDT, Zack Weinberg (:zwol)
bzbarsky: review-
Details | Diff | Splinter Review
v4 part 6 (13.88 KB, patch)
2011-05-07 15:17 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v5 part 1 (11.13 KB, patch)
2011-05-16 16:15 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v5 part 2 (8.38 KB, patch)
2011-05-16 16:16 PDT, Zack Weinberg (:zwol)
zackw: review+
Details | Diff | Splinter Review
v5 part 3 (10.14 KB, patch)
2011-05-16 16:18 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v5 part 4 (26.08 KB, patch)
2011-05-16 16:21 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v5 part 5 (8.30 KB, patch)
2011-05-16 16:28 PDT, Zack Weinberg (:zwol)
bzbarsky: review+
Details | Diff | Splinter Review
v5 part 6 (9.10 KB, patch)
2011-05-16 16:32 PDT, Zack Weinberg (:zwol)
zackw: review+
Details | Diff | Splinter Review

Description Zack Weinberg (:zwol) 2010-01-22 13:55:34 PST
The CSS parser currently can process either a string (nsAString, specifically) or a stream (nsIUnicharInputStream).  Almost all entry points pass a string.  It would simplify the parser slightly, and the scanner considerably, if we always passed a string; this would also facilitate lazy generation of CSS error messages.

Note that the CSSLoader already reads the entire stylesheet from the network before invoking the parser, so there is no loss of parallelism.
Comment 1 Zack Weinberg (:zwol) 2010-01-22 16:15:42 PST
Created attachment 423093 [details] [diff] [review]
proposed patch
Comment 2 Boris Zbarsky [:bz] 2010-01-22 16:58:22 PST
If we do this, we can also remove the pushback buffer, right?  I assume that will happen in followups?

I assume the main drawback here is having to create that contiguous buffer for the string; something we avoided doing before?  Have we tried gathering data on how big that buffer tends to be in practice?
Comment 3 Zack Weinberg (:zwol) 2010-01-22 17:22:27 PST
(In reply to comment #2)
> If we do this, we can also remove the pushback buffer, right?  I assume that
> will happen in followups?

I hadn't thought of that.  It would be nice, but the main read buffer is treated as constant, and I don't know if we always push back exactly the characters we just read.  I can probably ensure that the scanner does, but the parser uses Pushback() too... Definitely for a followup.

> I assume the main drawback here is having to create that contiguous buffer for
> the string; something we avoided doing before?

You know, I'm not sure we *were* avoiding that before, given that we always wait till OnStreamComplete to start parsing - isn't there a copy in the necko cache at least?

> Have we tried gathering data on how big that buffer tends to be in practice?

That would be an interesting question, yeah.  Is that a query you can run on the Google database, cevans?  (Size distribution of CSS files.)
Comment 4 Boris Zbarsky [:bz] 2010-01-22 18:04:15 PST
> I don't know if we always push back exactly the characters we just read

Hmm.  In theory we might not, and more importantly what we read is not what's in the buffer (e.g. a lone '\r' in the buffer).  So yeah, something to look into later.

> You know, I'm not sure we *were* avoiding that before, given that we always
> wait till OnStreamComplete to start parsing - isn't there a copy in the necko
> cache at least?

There isn't.  nsUnicharStreamLoader dumps the incoming data into effectively a list of 4096 byte buffers (an nsIPipe, to be exact).  Then the converter input stream converts data into its internal buffer as it goes; every time we ask it to fill it just fills that internal buffer.
Comment 5 Zack Weinberg (:zwol) 2010-01-25 18:04:52 PST
Created attachment 423462 [details] [diff] [review]
patch v1a (rediff, delete a now-unused macro)

CSS_BUFFER_SIZE was used only by code deleted by this patch, so it should go too.
Comment 6 Zack Weinberg (:zwol) 2010-01-25 18:07:11 PST
(In reply to comment #4)
> nsUnicharStreamLoader dumps the incoming data into effectively a
> list of 4096 byte buffers (an nsIPipe, to be exact).  Then the converter input
> stream converts data into its internal buffer as it goes; every time we ask it
> to fill it just fills that internal buffer.

Okay, so we had the whole thing in memory but not as one contiguous block and not pre-converted, I get it now.

My *real* goal in wanting the entire sheet in one big buffer is to enable lazy calculation of line numbers for error messages; so I hope this turns out to be an acceptable tradeoff.
Comment 7 Boris Zbarsky [:bz] 2010-01-25 19:37:14 PST
I'd still like to see some numbers for the buffer sizes involved, if just from our own Tp run (on try).
Comment 8 Zack Weinberg (:zwol) 2010-01-25 20:00:08 PST
I have just pushed my queue to try with a patch on top to report buffer sizes, and will analyze the results tomorrow morning.  Here are the sorted results for browser startup + Gmail + LiveJournal.

CSS::ParseSheet: 78 bytes
CSS::ParseSheet: 102 bytes
CSS::ParseSheet: 184 bytes
CSS::ParseSheet: 258 bytes
CSS::ParseSheet: 318 bytes
CSS::ParseSheet: 346 bytes
CSS::ParseSheet: 490 bytes
CSS::ParseSheet: 1226 bytes
CSS::ParseSheet: 1336 bytes
CSS::ParseSheet: 1340 bytes
CSS::ParseSheet: 1522 bytes
CSS::ParseSheet: 1924 bytes
CSS::ParseSheet: 2212 bytes
CSS::ParseSheet: 2552 bytes
CSS::ParseSheet: 2962 bytes
CSS::ParseSheet: 3624 bytes
CSS::ParseSheet: 4636 bytes
CSS::ParseSheet: 4908 bytes
CSS::ParseSheet: 5004 bytes
CSS::ParseSheet: 5086 bytes
CSS::ParseSheet: 5258 bytes
CSS::ParseSheet: 5350 bytes
CSS::ParseSheet: 5706 bytes
CSS::ParseSheet: 5770 bytes
CSS::ParseSheet: 6120 bytes
CSS::ParseSheet: 7982 bytes
CSS::ParseSheet: 8010 bytes
CSS::ParseSheet: 9006 bytes
CSS::ParseSheet: 10216 bytes
CSS::ParseSheet: 10368 bytes
CSS::ParseSheet: 10572 bytes
CSS::ParseSheet: 10654 bytes
CSS::ParseSheet: 10836 bytes
CSS::ParseSheet: 12096 bytes
CSS::ParseSheet: 12148 bytes
CSS::ParseSheet: 12884 bytes
CSS::ParseSheet: 14988 bytes
CSS::ParseSheet: 16516 bytes
CSS::ParseSheet: 21254 bytes
CSS::ParseSheet: 22712 bytes
CSS::ParseSheet: 24270 bytes
CSS::ParseSheet: 30686 bytes
CSS::ParseSheet: 32082 bytes
CSS::ParseSheet: 46172 bytes
CSS::ParseSheet: 50536 bytes
CSS::ParseSheet: 69298 bytes
CSS::ParseSheet: 106626 bytes
CSS::ParseSheet: 106700 bytes
CSS::ParseSheet: 250828 bytes
CSS::ParseSheet: 250828 bytes
Comment 9 Zack Weinberg (:zwol) 2010-01-26 08:33:08 PST
It looks like the Opera folks did just the analysis we want here: http://dev.opera.com/articles/view/mama-css-quantities-and-sizes/ Their presentation isn't very readable, though - I'll try to make some graphs.
Comment 10 Zack Weinberg (:zwol) 2010-01-26 09:22:17 PST
I'm not having any luck getting meaningful graphs out of the MAMA summary tables -- they've been too heavily binned.  I did find this graph from another, smaller analysis:

http://triin.net/archive/kool/webstat/figure-20.png
(from http://triin.net/2006/06/12/CSS )

which is consistent with MAMA's assertion that the average <link>ed stylesheet is about 8000 bytes long and the average inline (<style> or aggregate style="") stylesheet is about 1000 bytes long.  More importantly, both data sets agree that the size falls off very fast above the average, although there are some rare huge outliers of all three types (machine-generated HTML with over a megabyte of inline <style> content, for instance)
Comment 11 Zack Weinberg (:zwol) 2010-01-26 10:27:19 PST
Created attachment 423561 [details]
Tp4 stylesheet size distribution

Here's a size distribution profile (in bytes of UTF-16) for Tp4's style sheets.  For some reason the PNG came out ridiculously huge, you will want to view it at 50% or so.

It's a little hard to tell because I couldn't persuade my plotting program to give me sensible X axis labels, but the majority of sheets are quite small.  Here's quantiles:

  25%   50%   75%   90% 
  720  3414 14336 43850 

There are, however, outliers to 120,000 bytes.
Comment 12 Boris Zbarsky [:bz] 2010-01-26 10:32:48 PST
> machine-generated HTML with over a megabyte of inline <style> content, for
> instance

Wait.  For that case we actually have an nsAString that we wrap in a stream.  See http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsStyleLinkElement.cpp#296

How about we just change LoadInlineStyle to pass an nsAString all the way through instead of messing with streams; your code looks like it'll just make another copy of the string instead of using it directly.

I bet some of those big sheets you saw before are also inline ones (certainly the gmail one is).
Comment 13 Zack Weinberg (:zwol) 2010-01-26 13:25:33 PST
Created attachment 423612 [details] [diff] [review]
v2: avoid a copy for inline style

(In reply to comment #12)
> 
> How about we just change LoadInlineStyle to pass an nsAString all the way
> through instead of messing with streams; your code looks like it'll just make
> another copy of the string instead of using it directly.
> 
> I bet some of those big sheets you saw before are also inline ones (certainly
> the gmail one is).

That turns out to be quite easy, although I have to bump the nsICSSLoader IID.  We are still doing one copy in nsContentUtils::GetNodeTextContent, I think, but I don't think it's practical to avoid that one even when the style element has just one text node child (which should be the norm) because of the way nsTextFragment works.  Revised patch attached.  There is some more piggyback nsSubstring->nsAString conversion and whitespace cleanup.
Comment 14 Zack Weinberg (:zwol) 2010-01-26 13:31:02 PST
Created attachment 423614 [details] [diff] [review]
v2a: avoid a copy for inline style

Of course I remember to prune the header file list in nsStyleLinkingElement.cpp only after posting the patch.
Comment 15 Boris Zbarsky [:bz] 2010-02-24 10:32:56 PST
Comment on attachment 423614 [details] [diff] [review]
v2a: avoid a copy for inline style

>diff --git a/layout/style/nsCSSLoader.cpp b/layout/style/nsCSSLoader.cpp
>   SheetLoadData(CSSLoaderImpl* aLoader,
>-		const nsSubstring& aTitle,
>+		const nsAString& aTitle,

It looks like there are tabs around here for some reason (both before and after
your change).  Might be worth it to clean those out (as a followup).

>   NS_IMETHOD LoadInlineStyle(nsIContent* aElement,
>-			     nsIUnicharInputStream* aStream,
>+           const nsAString& aBuffer,

This, on the otherhand, is ending up mis-indented as a result of going from tabs to spaces.

>+++ b/layout/style/nsCSSParser.cpp
> CSSParserImpl::ParseColorString(const nsAString& aBuffer,

The behavior of this function is possibly changing here (e.g. it's changing to return NS_ERROR_DOM_SYNTAX_ERR if !NS_ColorNameToRGB() instead of returning NS_OK and not setting *aColor).  Then again, can that call return false here?

I'm not sure what all the mLowLevelError changes have to do with this bug (at first glance they're unrelated).  And they look wrong because parsers get recycled.  And I don't see us resetting mLowLevelError to a success code anywhere; nsCSSSCanner::Init used to do it for the scanner.

>+++ b/xpcom/io/nsStreamUtils.cpp
>+NS_ConsumeStream(nsIUnicharInputStream *stream, PRUint32 maxCount,
>+    PRUint32 chunk = 4096;
>+
>+    while (maxCount) {
>+	if (chunk < PR_UINT32_MAX/2)
>+	    chunk *= 2;

This doesn't look correct to me.

In particular, Read() makes no guarantees about actually reading as many bytes as you ask for.  So say maxCount is 64000 and |stream| hands out data in 4KB chunks (which is exactly what will happen with the converter stream, I think).  Then we'll double |chunk| 16 times while reading the stream.  Since we start at 2^12 bytes for |chunk|, it will end up with string lengths of 2^28 bytes, or 250MB or so, to read that 64KB stream.

What really needs to be doubled here, presumably, is the string capacity when we need to resize, right?  But string capacity growth already uses a doubling algorithm; see nsTSubstring_CharT::MutatePrep.  So all we really need to do is start the string off with a capacity of 4096 and then make sure to SetLength() before our reads; the string code will handle the doubling as needed.

I can see doubling chunk sizes until we get to the point that |n| ends up less than |toread| if desired, but I suspect 4KB chunks are as big as we'll typically be able to get our of Read.

>+extern NS_COM nsresult
>+NS_ConsumeStream(nsIUnicharInputStream *aSource, PRUint32 aMaxCount,
>+		 nsAString &aBuffer);

Some tabs snuck in here.
Comment 16 Zack Weinberg (:zwol) 2010-02-24 11:06:15 PST
(In reply to comment #15)
> It looks like there are tabs around here for some reason (both before and
> after your change).  Might be worth it to clean those out (as a followup).

Noted.

> This, on the otherhand, is ending up mis-indented as a result of going from
> tabs to spaces.

Ditto.

> >+++ b/layout/style/nsCSSParser.cpp
> > CSSParserImpl::ParseColorString(const nsAString& aBuffer,
> 
> The behavior of this function is possibly changing here (e.g. it's changing to
> return NS_ERROR_DOM_SYNTAX_ERR if !NS_ColorNameToRGB() instead of returning
> NS_OK and not setting *aColor).  Then again, can that call return false here?

I'll look into it.

> I'm not sure what all the mLowLevelError changes have to do with this bug (at
> first glance they're unrelated).  And they look wrong because parsers get
> recycled.  And I don't see us resetting mLowLevelError to a success code
> anywhere; nsCSSSCanner::Init used to do it for the scanner.

The scanner only ever set the low-level error when a stream read failed, so after this change, it is more properly a parser flag.  And in the original
patch sequencing, parsers didn't get recycled anymore at this point.  This is
what I get for shuffling the queue around so heavily.

In the *current* revision (only on my laptop at this point) this patch is the last of a set of three that, collectively, eliminate the low-level error flag altogether.

> >+++ b/xpcom/io/nsStreamUtils.cpp
> >+NS_ConsumeStream(nsIUnicharInputStream *stream, PRUint32 maxCount,
> >+    PRUint32 chunk = 4096;
> >+
> >+    while (maxCount) {
> >+	if (chunk < PR_UINT32_MAX/2)
> >+	    chunk *= 2;
> 
> This doesn't look correct to me.
> 
> In particular, Read() makes no guarantees about actually reading as many bytes
> as you ask for.  So say maxCount is 64000 and |stream| hands out data in 4KB
> chunks (which is exactly what will happen with the converter stream, I think). 
> Then we'll double |chunk| 16 times while reading the stream.  Since we start at
> 2^12 bytes for |chunk|, it will end up with string lengths of 2^28 bytes, or
> 250MB or so, to read that 64KB stream.

I forget who it was, but someone else already tripped over this problem.  My current code looks like this:

+NS_ConsumeStream(nsIUnicharInputStream *stream, PRUint32 maxCount,
+                nsAString &result)
+{
+    nsresult rv = NS_OK;
+    result.Truncate();
+
+    // nsIUnicharInputStream does not provide Available(), so allocate
+    // exponentially larger chunks, starting with 8192 bytes.  Note
+    // that some implementations will only produce 4096 bytes per call
+    // to Read().
+    PRUint32 chunk = 4096;
+
+    while (maxCount) {
+       if (chunk < PR_UINT32_MAX/2)
+           chunk *= 2;
+
+        // resize result buffer
+        PRUint32 toread = PR_MIN(chunk, maxCount);
+        PRUint32 length = result.Length();
+        result.SetLength(length + toread);
+        if (result.Length() != (length + toread))
+            return NS_ERROR_OUT_OF_MEMORY;
+
+       // repeatedly read until we run out of room
+       while (length < result.Length()) {
+           PRUnichar *buf = result.BeginWriting() + length;
+           PRUint32 n;
+
+           rv = stream->Read(buf, result.Length() - length, &n);
+           if (NS_FAILED(rv) || n == 0)
+               return rv;
+
+           length += n;
+           maxCount -= n;
+       }
+    }
+
+    return rv;
+}

Maybe I can get rid of the inner loop, based on what you say here...

> What really needs to be doubled here, presumably, is the string capacity when
> we need to resize, right?  But string capacity growth already uses a doubling
> algorithm; see nsTSubstring_CharT::MutatePrep.  So all we really need to do is
> start the string off with a capacity of 4096 and then make sure to SetLength()
> before our reads; the string code will handle the doubling as needed.

... the idea was to make sure we were reading as much as we possibly could at each go, but we could just as well take advantage of knowing that nsConverterInputStream works in 4096-byte blocks, and that would be simpler code-wise.  What do you think?

zw
Comment 17 Boris Zbarsky [:bz] 2010-02-24 11:15:48 PST
I think we should assume 4096-byte blocks for now, and as a followup make the unichar input stream hand out a length estimate (which is likely to be bigger than the actual final length, but is the best it can do, in general).
Comment 18 Zack Weinberg (:zwol) 2010-03-04 15:06:39 PST
Created attachment 430431 [details] [diff] [review]
v3 part 1: assume infallible malloc

I've revised the "no streams" patch with two new lead-in patches.  This is the first.  Now that infallible operator new seems to have stuck (finally!) we can throw out all of the out-of-memory checks in the CSS parser.  I was considering sending this in ahead of that just because the out-of-memory checks were incomplete and not handled well by higher levels of the parser.
Comment 19 Zack Weinberg (:zwol) 2010-03-04 15:08:19 PST
Created attachment 430432 [details] [diff] [review]
v3 part 2: don't use the low-level error to signal namespace lookup failure

GetNamespaceIDForPrefix was abusing the "low-level error" flag to communicate a sticky error indicator to ParseSelectorString.  This patch makes it use its own bit for that instead.
Comment 20 Zack Weinberg (:zwol) 2010-03-04 15:11:56 PST
Created attachment 430435 [details] [diff] [review]
v3 part 3: don't use streams

And here's the piece that corresponds to the original patch.  I fixed NS_ConsumeStream as discussed above, I fixed the indentation (I think) and I threw out the logic changes in ParseColorString for now.

The point of adding part 1 and 2 ahead of this change is simply that this patch can now eliminate the low-level error flag altogether.  Many of the parser entry points are now infallible (callers mostly ignore the potential error return already) but I'm leaving the external API as is for now.
Comment 21 Boris Zbarsky [:bz] 2010-03-04 21:07:36 PST
Comment on attachment 430431 [details] [diff] [review]
v3 part 1: assume infallible malloc

>+++ b/layout/style/nsCSSParser.cpp
>+    NS_NewCSSStyleRule(&rule, nsnull, declaration);
>     *aResult = rule;

Why do you need |rule|?  Why not just NS_NewCSSStyleRule directly into aResult?

>@@ -1717,21 +1708,17 @@ CSSParserImpl::GatherMedia(nsMediaList* 
>-      nsresult rv = aMedia->AppendQuery(query);

AppendQuery doesn't allocate via operator new, so is still fallible.

>@@ -1749,21 +1736,16 @@ CSSParserImpl::ParseMediaQueryExpression
>   nsMediaExpression *expr = aQuery->NewExpression();

NewExpression is still fallible.

>@@ -1822,21 +1804,16 @@ CSSParserImpl::ParseMediaQueryExpression
>         nsRefPtr<nsCSSValue::Array> a = nsCSSValue::Array::Create(2);

I'm hoping explicit calls to |::operator new| (which is what this is) are infallible?  Please check with cjones.

>@@ -1955,20 +1925,17 @@ CSSParserImpl::ParseGroupRule(nsICSSGrou
>-  if (!PushGroup(aRule)) {

This happens to be ok, since it uses nsCOMArray, unlike the earlier callsites that use nsTArray...  I really dislike that fragility.  I would prefer to assume that array grow operations are fallible until all our allocation is infallible.  That's assuming that array grow operations really become infallible.

>@@ -2272,20 +2221,16 @@ CSSParserImpl::SkipUntilStack(nsAutoTArr
>-      // Just handle out-of-memory by parsing incorrectly.  It's
>-      // highly unlikely we're dealing with a legitimate style sheet
>-      // anyway.

aStack.AppendElement is still fallible.

>@@ -3077,20 +3005,16 @@ CSSParserImpl::ParsePseudoSelector(PRInt
>   nsCOMPtr<nsIAtom> pseudo = do_GetAtom(buffer);

I'm not sure I'm happy depending on the exact allocator atom guts use...

>@@ -4881,20 +4784,16 @@ CSSParserImpl::ParseImageRect(nsCSSValue
>   nsCSSValueGradientStop* stop = aGradient->mStops.AppendElement();
>-  if (!stop) {

AppendElement is fallible.

>@@ -7575,21 +7404,18 @@ CSSParserImpl::ParseFunctionInternals(co
>-    if (!aOutput.AppendElement(newValue)) {

That's still fallible.

r- for the array issues above.
Comment 22 Boris Zbarsky [:bz] 2010-03-04 21:14:37 PST
Comment on attachment 430432 [details] [diff] [review]
v3 part 2: don't use the low-level error to signal namespace lookup failure

>+++ b/layout/style/nsCSSParser.cpp
CSSParserImpl::ParseSelectorString(const
>+  if (prefixErr)
>+    return NS_ERROR_DOM_NAMESPACE_ERR;
>+  else

No need for else after return.
Comment 23 Boris Zbarsky [:bz] 2010-03-04 21:15:03 PST
Comment on attachment 430431 [details] [diff] [review]
v3 part 1: assume infallible malloc

And fr this part 1 patch, it might be good if dbaron OKed it.
Comment 24 Boris Zbarsky [:bz] 2010-03-04 21:16:30 PST
Is it possible to do an interdiff from the last thing I reviewed to part 3?
Comment 25 Zack Weinberg (:zwol) 2010-03-05 08:31:21 PST
Comment on attachment 430431 [details] [diff] [review]
v3 part 1: assume infallible malloc

I'm gonna push back really **** the r-.  I claim that there is no point in doing this patch unless we can remove *all* of the null-checks after memory allocation (since the ultimate point here is to remove mLowLevelError and change most of the parser entry points to return void).  I also claim that the existing null-checks do more harm than good, because they conflate a sticky error state (out of memory) with a recoverable error state (syntax error) -- higher layers only see the PR_FALSE return from some parser function, assume a syntax error, and try to recover from that, when we should be bailing out completely.  But to do that reliably we'd have to check the low-level error in every parse-failure path, which is a silly thing to implement when we're trying to get rid of oom checks anyway.
Comment 26 Boris Zbarsky [:bz] 2010-03-05 08:41:22 PST
I understand why we want the patch.  I just think that currently its behavior will be that on OOM we try to scribble somewhere near address 0 (at least; I haven't carefully audited the codepaths that use nsTArray appends here) instead of just dying in a guaranteed safe way.  If we could guarantee that there are no null returns from those nsTArrays, then I would be fine with the change.  If we decide that such attempts to scribble near 0 (and crash as needed) are ok, then I might be fine with the change too.

Chris, what's the long-term plan for nsTArray?  Does it plan to become infallible?  Will it offer fallible and infallible grow methods?  Something else?
Comment 27 Zack Weinberg (:zwol) 2010-03-05 09:13:51 PST
IIRC (from the frame poisoning stuff), all current OSes start up with a fairly large region of address space reserved near 0 (at least a megabyte), but will allow the application to trim that down to one page (4096 bytes) with directed-address allocations.  I didn't test whether that area would get used for undirected allocations.

It seems to me that most uses of nsTArray want infallible-resize semantics.
Comment 28 Boris Zbarsky [:bz] 2010-03-05 09:17:37 PST
This is reserved in the "can't read" sense, not just "can't write", yes?
Comment 29 Zack Weinberg (:zwol) 2010-03-05 09:31:23 PST
(In reply to comment #28)
> This is reserved in the "can't read" sense, not just "can't write", yes?

Yup.

(In reply to comment #24)
> Is it possible to do an interdiff from the last thing I reviewed to part 3?

I'm not sure I understand you here.  You want to see part 3 relative to part 2 without part 1?  That would take some fixups to compile...
Comment 30 Boris Zbarsky [:bz] 2010-03-05 09:37:23 PST
> Yup.

Hmm.  In that case I'm probably ok with assuming nsTArray infallible here as long as we don't write to (or read from) memory pointed to by pointer members of the things we'd be allocating that way.  Or as long as we can guarantee that nsTArray infallible will land before this ever ships (including in an alpha).

> You want to see part 3 relative to part 2 without part 1?

No, I want to see how part 3 differs from "v2a", and in particular which parts I need to focus on reviewing there and which I have already reviewed...
Comment 31 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2010-03-05 09:47:25 PST
But if an array grow fails in the current code, and you assume it succeeds, will you end up reading/writing past the end of the array?
Comment 32 Zack Weinberg (:zwol) 2010-03-05 10:24:08 PST
(In reply to comment #30)
> > You want to see part 3 relative to part 2 without part 1?
> 
> No, I want to see how part 3 differs from "v2a", and in particular which parts
> I need to focus on reviewing there and which I have already reviewed...

interdiff v2a part3 comes out completely wrong - it shows a bunch of things that did *not* change and misses several things that did.  I don't know why.

But maybe we should just table part 3 until we decide what to do about part 1, because if I need to back off on part 1, part 3 has to change considerably.

(In reply to comment #31)
> But if an array grow fails in the current code, and you assume it succeeds,
> will you end up reading/writing past the end of the array?

It looks like we will, since nsTArray doesn't invalidate the array if an EnsureCapacity() call fails.  Bother.
Comment 33 Zack Weinberg (:zwol) 2010-03-05 10:49:26 PST
Created attachment 430667 [details] [diff] [review]
better handling of namespace lookup failure [checkin: comment 33]

I pushed "v3 part 2" by itself, with the requested change from comment 22, and taking out the nsCSSScanner pieces (which would have broken compilation in the absence of part 1).

http://hg.mozilla.org/mozilla-central/rev/5feacacc0d09
Comment 34 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2010-03-05 11:00:31 PST
(In reply to comment #26)
> Chris, what's the long-term plan for nsTArray?  Does it plan to become
> infallible?  Will it offer fallible and infallible grow methods?  Something
> else?

Good question!  I don't believe that's been discussed yet.  The STL-ification project wants mozilla::Vector with an infallible STL-like API; maybe a solution is to fork nsTArray into infallible mozilla::Vector and fallible mozilla::Fallible(?)Vector.
Comment 35 Zack Weinberg (:zwol) 2010-03-12 12:11:07 PST
Comment on attachment 430435 [details] [diff] [review]
v3 part 3: don't use streams

clearing review requests on patches I know will have to be revised before landing.
Comment 36 Zack Weinberg (:zwol) 2011-05-04 15:19:27 PDT
Created attachment 530171 [details] [diff] [review]
part 0: kill nsIUnicharStreamListener [checkin: comment 58]

Have finally come back around to this bug, & while mucking with ns(I)UnicharStreamLoader, I noticed that nsIUnicharStream*Listener* is only used to define a set of random callbacks from the old HTML parser - not the new one - and there is no user of these callbacks anywhere in the tree.

So I kill it.
Comment 37 Zack Weinberg (:zwol) 2011-05-04 15:23:20 PDT
Created attachment 530173 [details] [diff] [review]
part 1 (INCOMPLETE): revamp nsIUnicharStreamLoader

This is as far as I've gotten on the actual project here.  I rework nsIUnicharStreamLoader so that it produces the entire contents of the decoded stream as a nsString, rather than soaking up the data in a nsPipe which is then fed to an nsConverterInputStream.

Obviously this breaks the build, since it doesn't update css::Loader (the sole user of nsIUnicharStreamLoader, except for a test case).  However, before I do any more work I'd like to know if this is barking up the right tree and whether I have understood how to use nsIUnicodeDecoder correctly - the conversion loop (in WriteSegmentFun) was mostly copied from nsConverterInputStream and I'm not sure I got it right.
Comment 38 Boris Zbarsky [:bz] 2011-05-04 18:09:45 PDT
In general this seems like the right approach, and what we discussed.  Was there a specific concern about a particular tree up which you feel you're barking?  ;)

I haven't reviewed the details of the OnStartRequest/OnStopRequest/OnDataAvailable changes but two comments:

1)  You don't have to guesstimate a 2x expansion (esp. because in the common
    cases it's 1x).  Use GetMaxLength() on the decoder instead.
2)  You should probably check that SetCapacity succeeded.

Other than those, this looks about right at first glance.
Comment 39 Boris Zbarsky [:bz] 2011-05-04 18:10:39 PDT
Henri, do you have time to look at the unicode conversion code here?  You've dealt with unicode decoders most recently, I think.
Comment 40 Zack Weinberg (:zwol) 2011-05-04 19:16:36 PDT
(In reply to comment #38)
> In general this seems like the right approach, and what we discussed.  Was
> there a specific concern about a particular tree up which you feel you're
> barking?  ;)

I don't fully understand OnDataAvailable and ReadSegments; it's possible that I did something wrong there, as well as in the use of the UnicodeDecoder.  Especially, I'm not sure the mixture of Read and ReadSegments in OnDataAvailable is safe, but there could be other problems.
Comment 41 Boris Zbarsky [:bz] 2011-05-04 19:24:49 PDT
Ah.  Mixing Read and ReadSegments should be ok.
Comment 42 Boris Zbarsky [:bz] 2011-05-04 19:25:18 PDT
And in particular, nothing jumped out at me as obviously wrong.  I didn't check all the arithmetic details, of course....
Comment 43 Zack Weinberg (:zwol) 2011-05-04 19:31:58 PDT
(In reply to comment #38)

> 1)  You don't have to guesstimate a 2x expansion (esp. because in the common
>     cases it's 1x).  Use GetMaxLength() on the decoder instead.
> 2)  You should probably check that SetCapacity succeeded.

The common case (for the sole user, CSS) is ASCII becoming UTF-16, which is a 2x expansion.  But I will make these changes.
Comment 44 Boris Zbarsky [:bz] 2011-05-04 19:37:29 PDT
It's a 2x expansion in number of bytes.  But the srcLen is measured in bytes while the dstLen is measured in PRUnichars.  And in the common case there are just as many PRUnichars as bytes (or fewer, for UTF-8).
Comment 45 Zack Weinberg (:zwol) 2011-05-04 19:47:55 PDT
Of course.  How silly of me.
Comment 46 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-05-04 23:25:38 PDT
Comment on attachment 530171 [details] [diff] [review]
part 0: kill nsIUnicharStreamListener [checkin: comment 58]

Let's get rid this it seems that Google Desktop is now abandoned anyway, so this API isn't supported in the new parser anyway.
Comment 47 Henri Sivonen (:hsivonen) (Not doing reviews or reading bugmail until 2016-08-01) 2011-05-05 01:56:01 PDT
(In reply to comment #39)
> Henri, do you have time to look at the unicode conversion code here?  You've
> dealt with unicode decoders most recently, I think.

I tried to look at the code carefully, and it seems to me it ends up doing the same thing as the code I assume to be correct:
http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#577

Aside: I think some day, we should change the decoder interface to suck less by having an interface more like http://download.oracle.com/javase/1.4.2/docs/api/java/nio/charset/CharsetDecoder.html
Comment 48 Zack Weinberg (:zwol) 2011-05-05 11:53:03 PDT
Created attachment 530385 [details] [diff] [review]
part 0a: whitespace cleanup on style/Loader.cpp [checkin: comment 58]

layout/style/Loader.cpp has a *lot* of trailing whitespace in it.  As the next patch makes extensive changes to this file, I thought it might be a good time to get rid of all of that, but obviously this needs to be done separately from changes that need more of a review. :)
Comment 49 Zack Weinberg (:zwol) 2011-05-05 11:57:38 PDT
Created attachment 530388 [details] [diff] [review]
part 1: revamp nsIUnicharStreamLoader and change css::Loader to use it

This is the meat of the change.  I am now fairly confident that the changes in netwerk/ are correct, but it probably deserves a thorough going-over anyway.  The changes to css::Loader are *not* correct in at least one regard: the revised handling of synchronous sheet loads triggers a bunch of assertions...

###!!! ASSERTION: Should have an inner whose principal has not yet been set: '!mInner->mPrincipalSet', file /home/zack/src/mozilla/S-mc/layout/style/nsCSSStyleSheet.cpp, line 1136
###!!! ASSERTION: Bad loading table: 'mLoadingDatas.Get(&key, &loadingData) && loadingData == aLoadData', file /home/zack/src/mozilla/S-mc/layout/style/Loader.cpp, line 1632
###!!! ASSERTION: This is unsafe! Fix the caller!: 'Error', file /home/zack/src/mozilla/S-mc/content/events/src/nsEventDispatcher.cpp, line 534

(I'm not sure whether the last one of those is actually this change's fault.)  I'm posting this for review anyway because I've not been able to figure out how to do the synchronous load correctly.

One more patch to follow, but it'll be posted to bug 543151.
Comment 50 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2011-05-05 12:00:05 PDT
Comment on attachment 530388 [details] [diff] [review]
part 1: revamp nsIUnicharStreamLoader and change css::Loader to use it

bz is a better reviewer for the loader than I am
Comment 51 Boris Zbarsky [:bz] 2011-05-05 12:03:37 PDT
One issue I just realiazed for the unichar stream loader.  It needs to make the callback in all cases; the new code seems to not do that.

We should probably use a void string to represent the cases we used a null stream for before.
Comment 52 Boris Zbarsky [:bz] 2011-05-05 12:09:42 PDT
Er, yes.  And the synchronous stuff looks broken.  Spinning the event loop there is bad; that's why you get the asserts.  You should probably just NS_OpenURI as before, take the resulting nsIInputStream, read it into an nsCAutoString (possibly preallocated to the right size) using something like WriteSegmentToCString (copy that from nsBinaryStream.cpp) and then NS_ConvertUTFtoUTF16 to get an nsAString.
Comment 53 Boris Zbarsky [:bz] 2011-05-05 12:12:51 PDT
Though that will behave differently from what we have now when the input is not actually valid UTF-8.  Maybe you want to just factor out the streamloader code that converts a incoming segments to UTF16 and use it both places?
Comment 54 Zack Weinberg (:zwol) 2011-05-05 12:17:08 PDT
(In reply to comment #52)
>  Spinning the event loop there is bad; that's why you get the asserts.

Doesn't a synchronous NS_OpenURI spin the event loop?  Not in exactly the same place within the function, but still.

It looks to me like the asserts are all about not setting up the principal quite right.

(In reply to comment #51)
> One issue I just realiazed for the unichar stream loader.  It needs to make the
> callback in all cases; the new code seems to not do that.

I thought I set it up so it would (with an empty string, if no data was received).
Comment 55 Boris Zbarsky [:bz] 2011-05-05 12:28:59 PDT
> Doesn't a synchronous NS_OpenURI spin the event loop? 

It depends on the channel implementation.  If the channel has an underlying stream it can just hand out (say file://, data:, chrome://, etc, etc), then no.

> It looks to me like the asserts are all about not setting up the principal
> quite right.

The asserts happen because while spinning the event loop for a sync load of URI X you reenter this code with a request for another sync load of URI X and end up getting the same stylesheet object and setting a principal on it a second time. But the key problem is the spinning the event loop and reentering bit.

> I thought I set it up so it would (with an empty string, if no data was
> received).

In nsUnicharStreamLoader::OnStopRequest if DetermineCharset returns error you never call OnStreamComplete.  You also don't null out various things and so forth.....
Comment 56 Zack Weinberg (:zwol) 2011-05-05 20:38:42 PDT
Created attachment 530525 [details] [diff] [review]
part 1 v2: revamp nsIUnicharStreamLoader and change css::Loader to use it

I've got it to the point where it doesn't throw any assertions while loading the UA sheets, but instead it crashes hard inside nsFrame::BoxReflow after spitting out some other assertions:

WARNING: Subdocument container has no content: file /home/zack/src/mozilla/S-mc/layout/base/nsDocumentViewer.cpp, line 2380
###!!! ASSERTION: Must be a box frame!: '!mScrollCornerBox || mScrollCornerBox->IsBoxFrame()', file /home/zack/src/mozilla/S-mc/layout/generic/nsGfxScrollFrame.cpp, line 3328
###!!! ASSERTION: A box layout method was called but InitBoxMetrics was never called: 'metrics', file /home/zack/src/mozilla/S-mc/layout/generic/nsFrame.cpp, line 7199

Program received signal SIGSEGV, Segmentation fault.
nsFrame::BoxReflow (this=0x7fffe1c174b8, aState=..., aPresContext=
    0x7fffe339e800, aDesiredSize=..., aRenderingContext=0x7fffe3170af0, aX=0, 
    aY=6000, aWidth=0, aHeight=0, aMoveFrame=<value optimized out>)
(gdb) bt
#0  nsFrame::BoxReflow (this=0x7fffe1c174b8, aState=..., 
    aPresContext=0x7fffe339e800, aDesiredSize=..., 
    aRenderingContext=0x7fffe3170af0, aX=0, aY=6000, aWidth=0, aHeight=0, 
    aMoveFrame=<value optimized out>)
    at /home/zack/src/mozilla/S-mc/layout/generic/nsFrame.cpp:7108
#1  0x00007ffff52cd727 in nsFrame::DoLayout (this=0x7fffe1c174b8, aState=...)
    at /home/zack/src/mozilla/S-mc/layout/generic/nsFrame.cpp:6899
#2  0x00007ffff54b6898 in nsIFrame::Layout (this=0x7fffe1c174b8, aState=...)
    at /home/zack/src/mozilla/S-mc/layout/xul/base/src/nsBox.cpp:559
#3  0x00007ffff54bc64a in nsBoxFrame::LayoutChildAt (aState=..., 
    aBox=0x7fffe1c174b8, aRect=...)
    at /home/zack/src/mozilla/S-mc/layout/xul/base/src/nsBoxFrame.cpp:2033
#4  0x00007ffff52e7f7e in LayoutAndInvalidate (aState=..., 
    aBox=0x7fffe1c174b8, aRect=..., aScrollbarIsBeingHidden=0)
    at /home/zack/src/mozilla/S-mc/layout/generic/nsGfxScrollFrame.cpp:3268
#5  0x00007ffff52ec32b in nsGfxScrollFrameInner::LayoutScrollbars (
    this=0x7fffe1c171e0, aState=..., aContentArea=..., aOldScrollArea=...)
    at /home/zack/src/mozilla/S-mc/layout/generic/nsGfxScrollFrame.cpp:3352
#6  0x00007ffff52f03b5 in nsHTMLScrollFrame::Reflow (this=0x7fffe1c17158, 
    aPresContext=<value optimized out>, aDesiredSize=..., aReflowState=..., 
    aStatus=@0x7fffffffc840)
    at /home/zack/src/mozilla/S-mc/layout/generic/nsGfxScrollFrame.cpp:916
#7  0x00007ffff52b9046 in nsContainerFrame::ReflowChild (
    this=<value optimized out>, aKidFrame=0x7fffe1c17158, 
    aPresContext=0x7fffe339e800, aDesiredSize=..., aReflowState=..., aX=0, 
    aY=0, aFlags=0, aStatus=@0x7fffffffc840, aTracker=0x0)
    at /home/zack/src/mozilla/S-mc/layout/generic/nsContainerFrame.cpp:959
#8  0x00007ffff5375d6f in ViewportFrame::Reflow (this=0x7fffe331bbc8, 
    aPresContext=0x7fffe339e800, aDesiredSize=..., aReflowState=..., 
    aStatus=@0x7fffffffc840)
    at /home/zack/src/mozilla/S-mc/layout/generic/nsViewportFrame.cpp:293
#9  0x00007ffff523e981 in PresShell::DoReflow (this=0x7fffe3391c00, 
    target=0x7fffe331bbc8, aInterruptible=0)
    at /home/zack/src/mozilla/S-mc/layout/base/nsPresShell.cpp:7735

this has me pretty well stumped, although I assume it's still an issue with the UA sheet not getting set up correctly.  (note: this is without the additional patch in bug 543151.)
Comment 57 Boris Zbarsky [:bz] 2011-05-05 21:04:21 PDT
xul.css styles scrollcorners as -moz-box.  So at a guess, that either didn't load or didn't fully load.

Were there any CSS parse errors before the assertion point?  Can you step through what happens when xul.css is parsed?

The way the attached patch calls the streamloader is not quite right, but it only matters in error cases, so I doubt it would cause the issue above...

For what it's worth, it may be worthwhile to break the patch up into pieces to test them independently (e.g. push to try after every patch in the sequence).  Specifically:

1)  Make the unichar stream loader return a string but then have the CSSLoader
    just wrap a StringUnicharInputStream around it and pass that to the CSS
    parser.
2)  Change the CSS parser API to add an overload for parsing a full sheet from
    a string.  Convert the non-sync case in CSSLoader to use the new API.
3)  Convert inline style parsing to use the new API.
4)  Convert the paranoid sink to the new API.
5)  Convert sync sheet loads to use the unichar stream loader or something.
6)  Remove the old parsing API.

This would incidentally be easier to review too...

Oh, and could we please put the Accept header change in a separate bug?
Comment 58 Zack Weinberg (:zwol) 2011-05-06 08:49:37 PDT
I checked in patches 0 and 0a:
http://hg.mozilla.org/mozilla-central/rev/b0e5ffb6bfc7
http://hg.mozilla.org/mozilla-central/rev/b754c3a38759
Comment 59 Zack Weinberg (:zwol) 2011-05-07 10:19:29 PDT
Update: having broken up the patch a bit, I have everything except sync sheet loads providing strings to the parser, but I also have a weird assertion failure in the leaktests:

localhost.localdomain - - [06/May/2011 12:11:45] "GET /screen.css HTTP/1.1" 200 -
###!!! ASSERTION: unexpected progress values: 'progress <= progressMax', file ../../../../netwerk/protocol/http/nsHttpChannel.cpp, line 4124

It appears that nsHttpChannel's idea of how much of 'screen.css' has been processed (sp. mLogicalOffset) has gotten out of sync with the input stream pump's idea of same.  This happens in the patch that changes nsIUnicharStreamLoader, and the affected loads are all CSS, so I deduce that the revised stream loader is handling OnDataAvailable incorrectly, but I don't see how.

Will post updated patch set to this point shortly.
Comment 60 Boris Zbarsky [:bz] 2011-05-07 10:48:25 PDT
Oh, I'd missed that, sorry.  The onDataAvailable contract is that that the callee will read all the data from aInputStream.  The last attached patch on this bug will only read at most 4096 bytes, so if more than that is available things won't work right.

That might be the sync sheet issue too, since I bet this ended up reading only the first 4KB of all the chrome stylesheets.

Instead of passing 4096 to ReadSegments there you should pass aCount minus however much you've already read.
Comment 61 Zack Weinberg (:zwol) 2011-05-07 11:08:16 PDT
(In reply to comment #60)
> 
> Instead of passing 4096 to ReadSegments there you should pass aCount minus
> however much you've already read.

Oh, is *that* what that argument means?  It sounded like it was a limit on the size of each individual *chunk* passed to the callback function.
Comment 62 Zack Weinberg (:zwol) 2011-05-07 11:20:44 PDT
(In reply to comment #60)
> Oh, I'd missed that, sorry.  The onDataAvailable contract is that that the
> callee will read all the data from aInputStream.

Is that true even if onDataAvailable returns a failure code?
Comment 63 Boris Zbarsky [:bz] 2011-05-07 13:15:06 PDT
> Oh, is *that* what that argument means?

Yeah, it's the total amount to read.  The amount passed to the callback function on each call to the callback is up to the stream.

> Is that true even if onDataAvailable returns a failure code?

No.  Only if you return success.  Should have been clearer about that.
Comment 64 Zack Weinberg (:zwol) 2011-05-07 14:12:21 PDT
Thanks for clarifying.  I think I've got that part working now, but I ran into a more ... elemental ... problem with using the UnicharStreamLoader for sync loads too: nsFileInputStream doesn't implement ReadSegments.

I am presently testing a patch that implements nsFileInputStream::ReadSegments, but MXR tells me that there are rather a lot of input stream classes that don't implement ReadSegments.  I don't really know which of them we could wind up using - either for a sync load or an async one - but I'm wondering if I shouldn't deal with this in nsUnicharStreamLoader, and if so, what the tidiest way to do it would be.
Comment 65 Zack Weinberg (:zwol) 2011-05-07 15:06:57 PDT
Created attachment 530876 [details] [diff] [review]
v4 part 1

Here we go with the broken-up patch series.  This works well enough to pass reftests on my local computer; try is running the full testsuite; there are still some places where I'm not sure the code is right, which I will call out below.

I did the breakup in a different order than the one bz suggested in comment 57 so that I did not have to introduce any new uses of streams.  This first patch just adds a new nsCSSParser entry point that parses an entire sheet, but takes a string rather than a stream.
Comment 66 Zack Weinberg (:zwol) 2011-05-07 15:08:47 PDT
Created attachment 530877 [details] [diff] [review]
v4 part 2

This piece changes nsHTMLParanoidFragmentSink to use the new API and also does some en passant #include-pruning.  I'm not sure who the appropriate reviewer is - dbaron, please feel free to bounce it to someone else.
Comment 67 Zack Weinberg (:zwol) 2011-05-07 15:10:11 PDT
Created attachment 530878 [details] [diff] [review]
v4 part 3

And this part uses the new parser entry point for "normal" use of inline style.  Again, please feel free to redirect review to someone more appropriate.

I have high confidence in the patches up to this point.
Comment 68 Zack Weinberg (:zwol) 2011-05-07 15:12:16 PDT
Created attachment 530879 [details] [diff] [review]
v4 part 4

Rewrite ns(I)UnicharStreamLoader to produce a string rather than a stream, and use it for async (non-chrome, that is) style loads.

Potentially a problem here is what I mentioned in comment 64: the revised streamloader assumes that ReadSegments works, but there are a lot of streams that don't implement it.  I don't know what can reasonably be expected of the type of stream we get on the async path.
Comment 69 Zack Weinberg (:zwol) 2011-05-07 15:14:52 PDT
Created attachment 530880 [details] [diff] [review]
v4 part 5

This piece sends sync stylesheet loads through mostly the same code path as async ones.

Needing careful checking in this piece are the new NS_FeedStreamListener helper function, and the implementation of nsFileInputStream::ReadSegments that I added.  Also, is it safe to poke the content charset on a channel, as done here?
Comment 70 Zack Weinberg (:zwol) 2011-05-07 15:17:25 PDT
Created attachment 530881 [details] [diff] [review]
v4 part 6

This final piece removes no-longer-used stream-taking methods from css::Loader and nsCSSParser.  nsCSSScanner can now be cleaned up as well - that will be done in bug 543151.

I am not going to be around much this coming week (travel) and may have very little time for Mozilla hacking after that until mid-August (summer job).
Comment 71 Boris Zbarsky [:bz] 2011-05-07 16:10:24 PDT
> nsFileInputStream doesn't implement ReadSegments.

Indeed.  On purpose.  We could deal with it by wrapping in buffered streams as needed (by first testing whether the stream we're given is buffered).  This would just happen on the sync load path in the CSSLoader.

> I don't know what can reasonably be expected of the type of stream we get on
> the async path.

They will implement readSegments, as long as the caller is not breaking the API contract.  nsIStreamListener.idl documents this.

I'll take a careful look through the patches on Monday.  Given the above discussion, I doubt there's much that will need changing, so if you don't have time to address the review comments, if any, just let me know and I'll deal with it.  Thanks for working on this!
Comment 72 David Baron :dbaron: ⌚️UTC+2 (mostly busy through August 4; review requests must explain patch) 2011-05-10 12:32:46 PDT
It might make more sense if bzbarsky reviews the whole patch series here...
Comment 73 Zack Weinberg (:zwol) 2011-05-10 12:34:45 PDT
No objection from me.
Comment 74 Boris Zbarsky [:bz] 2011-05-10 12:38:54 PDT
Yeah, that's certainly my plan.
Comment 75 Boris Zbarsky [:bz] 2011-05-13 13:18:11 PDT
Comment on attachment 530876 [details] [diff] [review]
v4 part 1

Instead of adding and then later removing a second copy of all the guts of ParseSheet, how about we do:

  template<typename T>
  ParseSheet(T aInput, ....);

for the existing CSSParserImpl code, then have the exact body we have right now and have nsCSSParer::Parse (the old signature) call ParseSheet.  Then in the last patch you can remove the templating and just leave the |const nsAString&| version.

r=me with that.
Comment 76 Boris Zbarsky [:bz] 2011-05-13 13:19:40 PDT
Comment on attachment 530877 [details] [diff] [review]
v4 part 2

r=me
Comment 77 Boris Zbarsky [:bz] 2011-05-13 13:45:56 PDT
Comment on attachment 530878 [details] [diff] [review]
v4 part 3

Again, I'd prefer a templated version of ParseSheet to a copy-and-remove...

r=me with that.
Comment 78 Boris Zbarsky [:bz] 2011-05-13 14:26:35 PDT
Comment on attachment 530879 [details] [diff] [review]
v4 part 4

>+++ b/netwerk/base/src/nsUnicharStreamLoader.cpp
>+nsUnicharStreamLoader::Init(nsIUnicharStreamLoaderObserver *aObserver)
>+  mDecoder = nsnull;
>+  mContext = nsnull;
>+  mChannel = nsnull;

Those should all be null already.  No need to null them again.

>+  mBuffer.Truncate();
>+  mRawData.Truncate();

Also no need for that.

>+  mRawData.SetCapacity(512);
>+
>   return NS_OK;

Here, on the other hand, you should check whether the capacity is actually 512.  If it isn't, return NS_ERROR_OUT_OF_MEMORY so this object won't be used.

>+nsUnicharStreamLoader::OnStopRequest(nsIRequest *aRequest,
>+  if (NS_FAILED(rv)) {
>+    // Call the observer but pass it no data.
>+    // XXX Should we pass rv instead of aStatus?
>+    nsAutoString dummy;
>+    mObserver->OnStreamComplete(this, mContext, aStatus, dummy);

Nix dummy and just pass EmptyString() as the last argument.

And yes, I'd say pass rv here.  And document this behavior in the idl.  That would make this case consistent with the case when we end up with an error from DetermineCharset in OnDataAvailable.

Also, maybe the code should look more like this:

  nsresult rv = NS_OK;
  if (mRawData.Length() > 0 && NS_SUCCEEDED(aStatus)) {
    NS_ABORT_IF_FALSE(mBuffer.Length() == 0,
                      "should not have both decoded and raw data");
    rv = DetermineCharset();
  }

and then your code as now, so that if aStatus is a failure we don't try to determine the charset for no reason.

>+nsUnicharStreamLoader::DetermineCharset()
>+  NS_ABORT_IF_FALSE(IsASCII(mCharset), "non-ASCII character set name");

Sadly, you can't do that.  mCharset can come from web or css content; so it could be all sorts of broken....  Just take that line out.

>+++ b/netwerk/base/src/nsUnicharStreamLoader.h
>+  nsCString                                mRawData;
>+  nsString                                 mBuffer;

Document what those members are for, please.

r=me with the nits
Comment 79 Boris Zbarsky [:bz] 2011-05-13 17:55:49 PDT
Comment on attachment 530880 [details] [diff] [review]
v4 part 5

Poking the content charset is fine.

I'd really rather not implement nsFileInputStream::ReadSegments here.  I also think that we shouldn't add NS_FeedStreamListener (which doesn't actually handle streams that don't have all the data available up front, by the way).  Since this code is all in content-ish land anyway, nsSyncLoadService::PushSyncStreamToListener will do what we want here and is already used and tested.  Bonus points for fixing that to correctly handle errors from OnStartRequest and OnDataAvailable, though!
Comment 80 Boris Zbarsky [:bz] 2011-05-13 17:56:32 PDT
Comment on attachment 530881 [details] [diff] [review]
v4 part 6

r=me, though I expect this will be smaller with the templating approach.
Comment 81 Zack Weinberg (:zwol) 2011-05-13 19:57:51 PDT
(In reply to comment #75)
>   template<typename T>
>   ParseSheet(T aInput, ....);

I thought of that but was worried about weird template problems.  I'll try it.

> >+nsUnicharStreamLoader::Init(nsIUnicharStreamLoaderObserver *aObserver)
> >+  mDecoder = nsnull;
> >+  mContext = nsnull;
> >+  mChannel = nsnull;
> 
> Those should all be null already.  No need to null them again.

The old Init method may have been doing that too, I don't remember.  Will change.

> >+  mRawData.SetCapacity(512);
> >+
> >   return NS_OK;
> 
> Here, on the other hand, you should check whether the capacity is actually
> 512.  If it isn't, return NS_ERROR_OUT_OF_MEMORY so this object won't be
> used.

I thought string resizing was infallible these days?  But ok.

> Nix dummy and just pass EmptyString() as the last argument.

I learned something today!

> And yes, I'd say pass rv here.  And document this behavior in the idl.  That
> would make this case consistent with the case when we end up with an error
> from DetermineCharset in OnDataAvailable.

Ok.

...
> and then your code as now, so that if aStatus is a failure we don't try to
> determine the charset for no reason.

Ok.

> >+nsUnicharStreamLoader::DetermineCharset()
> >+  NS_ABORT_IF_FALSE(IsASCII(mCharset), "non-ASCII character set name");
> 
> Sadly, you can't do that.  mCharset can come from web or css content; so it
> could be all sorts of broken....  Just take that line out.

That was in the old version (as an ASSERTION rather than an ABORT_IF_FALSE); I saw it as enforcing a requirement for the OnDetermineCharset hook to sanitize its output.  I don't mind removing it tho.

> >+++ b/netwerk/base/src/nsUnicharStreamLoader.h
> >+  nsCString                                mRawData;
> >+  nsString                                 mBuffer;
> 
> Document what those members are for, please.

Ok.

> I'd really rather not implement nsFileInputStream::ReadSegments here.  I
> also think that we shouldn't add NS_FeedStreamListener (which doesn't
> actually handle streams that don't have all the data available up front, by
> the way).  Since this code is all in content-ish land anyway,
> nsSyncLoadService::PushSyncStreamToListener will do what we want here and is
> already used and tested.  Bonus points for fixing that to correctly handle
> errors from OnStartRequest and OnDataAvailable, though!

More handy things I didn't know about!  Ok, provided it turns out that I can include the necessary header from layout/style/.

I will try to produce revisions by Sunday.
Comment 82 Boris Zbarsky [:bz] 2011-05-13 20:21:20 PDT
> I thought string resizing was infallible these days?

Not yet (and possibly not ever).

> That was in the old version (as an ASSERTION rather than an ABORT_IF_FALSE)

I was young and foolish.  ;)

> Ok, provided it turns out that I can include the necessary header from
> layout/style/.

You definitely can.  Worst case, add an entry to LOCAL_INCLUDEs in layout/style/Makefile.in.
Comment 83 Zack Weinberg (:zwol) 2011-05-16 08:03:45 PDT
I've made all the requested revisions and pushed the result to try.  I have to run now - will upload revised patches later.
Comment 84 Zack Weinberg (:zwol) 2011-05-16 16:15:03 PDT
Created attachment 532784 [details] [diff] [review]
v5 part 1

The templatized ParseSheet() worked except for a C++ wart.  You can't have a template function whose callers pass the argument-with-template-type as a pointer in some cases and a reference in others; if you try, by using bare 'T' as the argument type, C++ will try to pass T by *value* when it should've passed by reference.  So I had to change the nsIUnicharInputStream overload to take a reference too.  This is a little weird, but it's temporary, so I think we can live with it.  I also renamed Parse() to ParseSheet(), even though that overload goes away in part 6, because otherwise templatizing Loader::ParseSheet in part 3 wouldn't work.
Comment 85 Zack Weinberg (:zwol) 2011-05-16 16:16:06 PDT
Created attachment 532786 [details] [diff] [review]
v5 part 2

No substantive changes in this piece, carrying r=bz forward.
Comment 86 Zack Weinberg (:zwol) 2011-05-16 16:18:22 PDT
Created attachment 532787 [details] [diff] [review]
v5 part 3

Only substantive change is switching to templatized Loader::ParseSheet.
Comment 87 Zack Weinberg (:zwol) 2011-05-16 16:21:20 PDT
Created attachment 532791 [details] [diff] [review]
v5 part 4

Changes as requested in comment 78.
Comment 88 Zack Weinberg (:zwol) 2011-05-16 16:28:51 PDT
Created attachment 532794 [details] [diff] [review]
v5 part 5

Uses nsSyncLoadService::PushSyncStreamToListener as suggested in comment 79.  I think I've "fixed [PushSyncStreamToListener] to correctly handle errors from OnStartRequest and OnDataAvailable", too, but I don't know how one would test that.
Comment 89 Zack Weinberg (:zwol) 2011-05-16 16:32:30 PDT
Created attachment 532795 [details] [diff] [review]
v5 part 6

No substantive changes, carrying r=bz forward (this is indeed a lot smaller, as predicted).

bz: I'd appreciate it if you could take this from here.  I'm probably not going to have any hacking time this week (and if I do, I want to spend it on bug 543151) and I definitely won't have time to watch the tree.
Comment 90 Boris Zbarsky [:bz] 2011-05-18 14:12:52 PDT
Comment on attachment 532784 [details] [diff] [review]
v5 part 1

r=me
Comment 91 Boris Zbarsky [:bz] 2011-05-18 14:15:56 PDT
Comment on attachment 532787 [details] [diff] [review]
v5 part 3

r=me
Comment 92 Boris Zbarsky [:bz] 2011-05-18 14:20:55 PDT
Comment on attachment 532791 [details] [diff] [review]
v5 part 4

r=me
Comment 93 Boris Zbarsky [:bz] 2011-05-18 14:22:48 PDT
Comment on attachment 532794 [details] [diff] [review]
v5 part 5

r=me.  Thanks for fixing the sync load service code!
Comment 95 Zack Weinberg (:zwol) 2011-05-18 20:58:19 PDT
(In reply to comment #94)
>
> Zack, are you planning to work on bug 543151, or do you want someone to
> steal it?

There's a relatively up-to-date patch in there which might even be correct as is; I was going to try to find time tomorrow or Friday to verify that and maybe also break it up for ease of review, but if you'd like to take it over, please be my guest.  dbaron hasn't commented on it at all :/
Comment 96 Boris Zbarsky [:bz] 2011-05-18 21:01:50 PDT
dbaron's on vacation and probably overloaded on reviews.  Please toss that patch in my queue too (not instead of), and we'll see whether I can get to him before he does?
Comment 97 Zack Weinberg (:zwol) 2011-05-18 21:05:25 PDT
Done.
Comment 99 :Paolo Amadini 2011-08-17 07:37:30 PDT
This bug changed the nsIUnicharStreamLoader interface, but the change does not
appear in the "Firefox 6 for developers" page nor the use of the interface is
detected in the add-ons compatibility checker. The effect is that I received a
severe bug report for an add-on I'm maintaining, the day after the release of
Firefox 6; probably, users testing on Aurora and Beta were not enough to detect
the issue earlier, even if the add-on has more than 50K users. Fortunately I was
available to address the issue promptly.

What is the process to track these kind of changes? Is it up to the patch author
to update the documentation? I'm asking firstly because I made an interface
change in another bug and don't want it to be lost, and secondly to understand
if, from now on, I should plan my vacations according to the release calendar :-)
Comment 100 Zack Weinberg (:zwol) 2011-08-17 08:18:39 PDT
My apologies; we thought the CSS loader was the only user of nsIUnicharStreamLoader and didn't check add-ons.  I am not sure what the procedure is for getting interface changes put into the release notes and the compatibility checker; suggest you ask on dev-platform.
Comment 101 :Paolo Amadini 2011-08-30 10:51:23 PDT
(In reply to Zack Weinberg (:zwol) from comment #100)
> My apologies; we thought the CSS loader was the only user of
> nsIUnicharStreamLoader and didn't check add-ons.  I am not sure what the
> procedure is for getting interface changes put into the release notes and
> the compatibility checker; suggest you ask on dev-platform.

Thanks for the pointer, we discussed this on dev-platform and we now have a
procedure :-) It basically consists in adding a new "addon-compat" keyword.
Also, it's welcome to CC Jorge of the Add-ons Team on the bug as well.

Note You need to log in before you can comment on or make changes to this bug.