Closed Bug 482921 Opened 15 years ago Closed 13 years ago

[HTML5] Re-implement view source syntax highlighting using the HTML5 parser

Categories

(Core :: DOM: HTML Parser, defect, P3)

Other Branch
defect

Tracking

()

RESOLVED FIXED
mozilla11

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Depends on 2 open bugs, Blocks 9 open bugs, Regressed 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(19 files, 22 obsolete files)

196.56 KB, patch
sicking
: review+
Details | Diff | Splinter Review
4.36 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
3.38 KB, patch
Gavin
: review+
Details | Diff | Splinter Review
3.67 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
7.03 KB, patch
smaug
: review+
Details | Diff | Splinter Review
882 bytes, patch
ehsan.akhgari
: review+
Details | Diff | Splinter Review
182.39 KB, patch
smaug
: review+
Details | Diff | Splinter Review
78.55 KB, patch
smaug
: review+
Details | Diff | Splinter Review
192.57 KB, patch
smaug
: review+
Details | Diff | Splinter Review
27.27 KB, patch
smaug
: review+
Details | Diff | Splinter Review
4.52 KB, patch
smaug
: review+
Details | Diff | Splinter Review
15.11 KB, patch
smaug
: review-
Details | Diff | Splinter Review
6.18 KB, patch
smaug
: review+
Details | Diff | Splinter Review
18.80 KB, patch
smaug
: review+
Details | Diff | Splinter Review
215.97 KB, patch
smaug
: review+
Details | Diff | Splinter Review
9.23 KB, text/plain
Details
62.42 KB, patch
smaug
: review+
Details | Diff | Splinter Review
464.51 KB, patch
Details | Diff | Splinter Review
143.05 KB, image/png
Details
Make the Java to C++ translator generate a second copy of the tokenizer that reports all state transitions to a syntax highlighter that feeds into view source.
This patch prepares the Java foundation for View Source. The C++ effects are uninteresting in this patch, but this stuff still causes merge conflicts if I don't get it out of my queue.
Attachment #480908 - Flags: review?(jonas)
Comment on attachment 480908 [details] [diff] [review]
Part 0: Make preparations for enabling transition reporting in the HTML5 tokenizer

rs=me

Though do we really want to switch to using the HTML5 parser for view-source for FF4? Seems like a big change in code that is notoriously hard to review and it's getting to be pretty late in the game.
Attachment #480908 - Flags: review?(jonas) → review+
Comment on attachment 480908 [details] [diff] [review]
Part 0: Make preparations for enabling transition reporting in the HTML5 tokenizer

(In reply to comment #3)
> rs=me

Thanks.

> Though do we really want to switch to using the HTML5 parser for view-source
> for FF4? Seems like a big change in code that is notoriously hard to review and
> it's getting to be pretty late in the game.

I've been assuming that View Source has missed the Firefox 4 boat already and it'll go in the next release after 4. I'd still like to put this simple preparatory patch into Firefox 4 to make things easier as far as merge conflicts go. Hence, requesting approval on the patch.

(I wanted to do the preparatory work now in order to be nice to Oracle so that they don't need to fork the parser for their use case that needs to track the tokenizer transitions.)
Attachment #480908 - Flags: approval2.0?
Attachment #480908 - Flags: approval2.0? → approval2.0+
Comment on attachment 480908 [details] [diff] [review]
Part 0: Make preparations for enabling transition reporting in the HTML5 tokenizer

Fix is landed, not sure how it can depend on another bug that isn't closed, but approval is no longer needed (showing up in FF4 queries).
Attachment #480908 - Flags: approval2.0+ → approval2.0-
sicking, I'm going to fix this bug in several parts. This is only the first part. I'm requesting review on this part in case you prefer to avoid a one-time code dump for review, but review after all parts are done WFM.

Known missing things in this patch:
 * XML syntax highlighting not supported yet (uses the old parser).
 * Error highlighting is not supported yet (not even for cases supported by the old parser).
 * There's an off-by-one error at the end of each named character reference highlight (related to bug 535530).
 * There are no fragment identifiers for line numbers.
 * The pre doesn't get split for faster bidi processing.
 * The toggling syntax highlighting on and off from the menu in the View Source window doesn't work.
 * The check mark on the Wrap Long Lines menu item gets out of sync with the wrapping. (Might be a pre-existing chrome code bug.)
Attachment #518041 - Flags: review?(jonas)
This patch adds highlighting for tokenizer-level errors. Messages explaining the errors are added too. Currently, the View Source windows doesn't support title attribute tooltips, but loading a view-source: URL into a normal browsing context shows the messages on hover.

This part also makes the load of the view-source: document end properly. Logically it would belong in Part 1, but I'm sure there will be more mistakes like that fixed in later parts, because moving them into earlier parts isn't particularly useful.
Attachment #518698 - Flags: review?(jonas)
Whiteboard: not-ready
Part 4 doesn't cover encoding sniffing for XML. Also, there's no entry point for XML view source in part 4. That's coming in part 5.
Attachment #524190 - Flags: review?(jonas)
Forgot printfs in the earlier patch
Attachment #524373 - Flags: review?(jonas)
Attachment #524170 - Attachment is obsolete: true
Attachment #524170 - Flags: review?(jonas)
Moving the last line of FlushChars() inside the |if| block to make results correct.
Attachment #518041 - Attachment is obsolete: true
Attachment #524381 - Flags: review?(jonas)
Attachment #518041 - Flags: review?(jonas)
Part 6 is intentionally optimized for simplicity and readability rather than performance, so the parser always buffers up to 1024 bytes in the XML View Source case before looking for the XML declaration if the is no HTTP-level encoding and no BOM.
Attachment #524615 - Flags: review?(jonas)
Making sure variables get initialized before use.
Attachment #524381 - Attachment is obsolete: true
Attachment #524616 - Flags: review?(jonas)
Attachment #524381 - Flags: review?(jonas)
Refreshing for patch conflicts.
Attachment #524373 - Attachment is obsolete: true
Attachment #524617 - Flags: review?(jonas)
Attachment #524373 - Flags: review?(jonas)
Refreshing for patch conflicts.
Attachment #524190 - Attachment is obsolete: true
Attachment #524618 - Flags: review?(jonas)
Attachment #524190 - Flags: review?(jonas)
Add view sourceness checks to a couple tokenizer errors that can fire on EOF.
Attachment #524617 - Attachment is obsolete: true
Attachment #525064 - Flags: review?(jonas)
Attachment #524617 - Flags: review?(jonas)
Who would be the right person to review part 9? Do I need to request ui-review, too, for this?
Comment on attachment 525663 [details] [diff] [review]
Part 8: Start a new <pre> from time to time

r=me
Attachment #525663 - Flags: review?(bzbarsky) → review+
sicking, FWIW, I think I'm done with the code patches here (except for addressing review comments, of course). I'll still have to figure out how to write reftests for this.

Bug 535530 intentionally left as a separate bug.
Comment on attachment 525699 [details] [diff] [review]
Part 9: Enable tooltips in the View Source window to expose error text when hovering on red markup

Including browser.js for that one fucntion is really overkill, and you can't depend on the browser/ file in toolkit/. I think ideally we'd move this function to a BrowserUtils.jsm under toolkit (there's been talk of adding something like this, to be shared by Firefox/Fennec/SeaMonkey, for a while). If you don't want to take that on for now, perhaps you could just copy the function (or a simplified version, if possible).
Attachment #525699 - Flags: review?(gavin.sharp) → review-
Blocks: 246620
The patch from bug 479959 is needed on top of these patches in order to avoid regressing view-source: URLs that point to plain text, JS or CSS files.
How about this? This duplicates the tooltip preparation method from browser.js (with the fix for bug 358452 applied).
Attachment #525699 - Attachment is obsolete: true
Attachment #531049 - Flags: review?(gavin.sharp)
I forgot to actually support parametrized message formatting earlier. Also, this patch separates multiple messages with a line break rather than a space.
Attachment #531059 - Flags: review?(jonas)
(In reply to comment #26)
> How about this? This duplicates the tooltip preparation method from
> browser.js (with the fix for bug 358452 applied).

This is probably fine as an interim measure if you're looking to get this landed soon, but I filed bug 655747 on support for a cleaner solution that removes the need to copy the code.
(In reply to comment #28)
> (In reply to comment #26)
> > How about this? This duplicates the tooltip preparation method from
> > browser.js (with the fix for bug 358452 applied).
> 
> This is probably fine as an interim measure if you're looking to get this
> landed soon, but I filed bug 655747 on support for a cleaner solution that
> removes the need to copy the code.

Yeah, I'd like to land this as soon as possible (subject to other reviews, of course).
Comment on attachment 531049 [details] [diff] [review]
Part 9: Enable tooltips in the View Source window to expose error text when hovering on red markup, v2

Does this patch need ui-review when it adds a tiny new UI feature? (Previously, error were highlighted in red. With this patch, they are highlighted in red as before *and* they show an error message in the tooltip on hover.)

Should I add something to make the errors focusable or something to make it possible to get at the tooltips without a pointing device?
Attachment #531049 - Flags: ui-review?(limi)
Attachment #531049 - Flags: feedback?(bolterbugz)
(In reply to comment #30)
> Comment on attachment 531049 [details] [diff] [review] [review]
> Part 9: Enable tooltips in the View Source window to expose error text when
> hovering on red markup, v2
> 
> Does this patch need ui-review when it adds a tiny new UI feature?

I think you were right to ping a11y here. I don't know if you need to ping Ux or not -- probably wouldn't hurt.

> (Previously, error were highlighted in red. With this patch, they are
> highlighted in red as before *and* they show an error message in the tooltip
> on hover.)

Nice.

> Should I add something to make the errors focusable or something to make it
> possible to get at the tooltips without a pointing device?

In short, yes please. Can you make the tooltip not appear immediately but instead make sure focus has lingered long enough? Doing it here or as a follow up is fine. (see bug 97223)
Comment on attachment 524616 [details] [diff] [review]
Part 1: Implement the bulk of HTML syntax highlighting using the new parser, v3

Review of attachment 524616 [details] [diff] [review]:
-----------------------------------------------------------------

::: parser/html/nsHtml5Highlighter.cpp
@@ +99,5 @@
> +  NS_ASSERTION(treeOp, "Tree op allocation failed.");
> +  treeOp->Init(nsGkAtoms::html, EmptyString(), EmptyString());
> +
> +  nsHtml5TreeOperation* treeOp2 = mOpQueue.AppendElement();
> +  NS_ASSERTION(treeOp2, "Tree op allocation failed.");

There's really no need to keep asserting this.

::: parser/html/nsHtml5Tokenizer.cpp
@@ +3275,5 @@
>    return pos;
>  }
>  
> +PRInt32 
> +nsHtml5Tokenizer::stateLoopReportTransitions(PRInt32 state, PRUnichar c, PRInt32 pos, PRUnichar* buf, PRBool reconsume, PRInt32 returnState, PRInt32 endPos)

Can you describe the purpose of this function. It's completely undocumented.

::: parser/html/nsHtml5TreeOpExecutor.cpp
@@ +852,5 @@
> +{
> +  if (!mViewSourceBaseURI) {
> +    nsCAutoString spec;
> +    mDocument->GetOriginalURI()->GetSpec(spec);
> +    if (StringBeginsWith(spec, NS_LITERAL_CSTRING("view-source:"))) {

Use SchemeIs instead.

@@ +856,5 @@
> +    if (StringBeginsWith(spec, NS_LITERAL_CSTRING("view-source:"))) {
> +      spec.Assign(Substring(spec, NS_LITERAL_CSTRING("view-source:").Length()));
> +    }
> +    const nsCString& charset = mDocument->GetDocumentCharacterSet();
> +    NS_NewURI(getter_AddRefs(mViewSourceBaseURI), spec, charset.get(), nsnull);

Isn't there some way to get the inner URI out of a nested URI?
(In reply to comment #32)
> Comment on attachment 524616 [details] [diff] [review] [review]
> Part 1: Implement the bulk of HTML syntax highlighting using the new parser,
> v3
> 
> Review of attachment 524616 [details] [diff] [review] [review]:
> -----------------------------------------------------------------
> 
> ::: parser/html/nsHtml5Highlighter.cpp
> @@ +99,5 @@
> > +  NS_ASSERTION(treeOp, "Tree op allocation failed.");
> > +  treeOp->Init(nsGkAtoms::html, EmptyString(), EmptyString());
> > +
> > +  nsHtml5TreeOperation* treeOp2 = mOpQueue.AppendElement();
> > +  NS_ASSERTION(treeOp2, "Tree op allocation failed.");
> 
> There's really no need to keep asserting this.

Is it OK if I address review comments by adding new patches on top of the ones attached to the bug? This would avoid having to recreate the later parts multiple times.

> ::: parser/html/nsHtml5Tokenizer.cpp
> @@ +3275,5 @@
> >    return pos;
> >  }
> >  
> > +PRInt32 
> > +nsHtml5Tokenizer::stateLoopReportTransitions(PRInt32 state, PRUnichar c, PRInt32 pos, PRUnichar* buf, PRBool reconsume, PRInt32 returnState, PRInt32 endPos)
> 
> Can you describe the purpose of this function. It's completely undocumented.

This in an autogenerated version of stateLoop() that is instrumented for reporting tokenizer transitions for View Source highlighting. stateLoopReportTransitions() is run in the View Source case. stateLoop() is run in the normal content loading case, so the View Source instrumentation doesn't run instructions on the normal content loading path.

> ::: parser/html/nsHtml5TreeOpExecutor.cpp
> @@ +852,5 @@
> > +{
> > +  if (!mViewSourceBaseURI) {
> > +    nsCAutoString spec;
> > +    mDocument->GetOriginalURI()->GetSpec(spec);
> > +    if (StringBeginsWith(spec, NS_LITERAL_CSTRING("view-source:"))) {
> 
> Use SchemeIs instead.

OK.

> @@ +856,5 @@
> > +    if (StringBeginsWith(spec, NS_LITERAL_CSTRING("view-source:"))) {
> > +      spec.Assign(Substring(spec, NS_LITERAL_CSTRING("view-source:").Length()));
> > +    }
> > +    const nsCString& charset = mDocument->GetDocumentCharacterSet();
> > +    NS_NewURI(getter_AddRefs(mViewSourceBaseURI), spec, charset.get(), nsnull);
> 
> Isn't there some way to get the inner URI out of a nested URI?

None that I'm aware of.
(In reply to comment #33)
> > Isn't there some way to get the inner URI out of a nested URI?
> 
> None that I'm aware of.

view-source URIs are nsINestedURIs, so you can use the accessor methods there, or NS_GetInnermostURI.
Address review comments and fix bitrot.
Attachment #524616 - Attachment is obsolete: true
Attachment #535305 - Flags: review?(jonas)
Attachment #524616 - Flags: review?(jonas)
Ugh, it really sucks to duplicate that amount of code. I would really like to see this done using templates instead. You can make the tokenizer class take a template argument and keep a member of that type. It can then call into the member each time it needs to instrument.

The normal parser can then instantiate the tokenizer using a template argument class which is all inlined no-oped functions which will make it compile away as to not slow down normal parsing.

The view-source tokenizer can use a template argument class which stores the data that you need just like the generated code.
(In reply to comment #36)
> Ugh, it really sucks to duplicate that amount of code. I would really like
> to see this done using templates instead.

Templates generate separate copies of machine code for instantiations with different template parameters. This code is generated by the Java to C++ translator. There's only one copy of the method in the preferred form for making modifications--i.e. Tokenizer.java. Either way, there ends up being two versions of the function in machine code. (With templates, to avoid duplicating *other* functions as a side effect, the kind of trickery that you mention would be necessary.) I think making the Java to C++ translator hide this by making it generate a template that generates equivalent code inside the C++ compiler would be entirely useless.

More generally, I feel pretty strongly that as long as we have the Java to C++ translator around (and it seems it can stay around just fine), it's not a good use of effort to try to make its output look more human-written.
So long term we really do want to move off of the generator. So we might as well start now as it'll make it easier on us in the future. The generated code is ok for now, but the bulk of us are C++ developers, and so we want the source code to be C++ and not java as to make it easier to make changes.

Couldn't you make the generator generate the templated code? Both for the main tokenizer and the view-source one?

I checked this with jst too and he was fine with this plan.
(In reply to comment #38)
> So long term we really do want to move off of the generator.

I think that's not a good idea, but I'll take that discussion off this bug.
Comment on attachment 531049 [details] [diff] [review]
Part 9: Enable tooltips in the View Source window to expose error text when hovering on red markup, v2

If we're going to go with the code-copying approach for expediency, I think you should copy the minimum needed to get the error tooltips to function.
Attachment #531049 - Flags: review?(gavin.sharp) → review-
This version of the patch adds an early return to nsHtml5TreeBuilder::characters when viewing XML source to avoid characters causing a change to the tree builder stack in that case.
Attachment #524618 - Attachment is obsolete: true
Attachment #537549 - Flags: review?(jonas)
Attachment #524618 - Flags: review?(jonas)
Comment on attachment 537555 [details] [diff] [review]
Part 9: Enable tooltips in the View Source window to expose error text when hovering on red markup, v3

Make the comment a FIXME with a bug # that points to a followup filed to share the code properly - CC me on it and I'll work on a patch.
Attachment #537555 - Flags: review?(gavin.sharp) → review+
Ah, looks like there's an existing bug on file that we can re-use - bug 480356.
sicking, I think the code is now ready for review. I need to tweak the reftests a bit still and will attach tests later.
Attachment #559516 - Flags: review?(jonas)
When fixing bug 675499, I removed code too eagerly. This patch puts back enough code to make the other patches attached to this bug work.
Attachment #559519 - Flags: review?(bzbarsky)
Comment on attachment 559519 [details] [diff] [review]
Part 0.5: Put back some code removed in bug 675499

r=me
Attachment #559519 - Flags: review?(bzbarsky) → review+
Attachment #562037 - Flags: review?(ehsan) → review+
Clearing the confusing whiteboard marking that is an artifact of Firefox 4 triage.
Whiteboard: not-ready
Attachment #518698 - Flags: review?(jonas) → review?(Olli.Pettay)
Attachment #524192 - Flags: review?(jonas) → review?(Olli.Pettay)
Attachment #518698 - Attachment is obsolete: true
Attachment #567076 - Flags: review?(Olli.Pettay)
Attachment #518698 - Flags: review?(Olli.Pettay)
Attachment #524615 - Attachment is obsolete: true
Attachment #567081 - Flags: review?(Olli.Pettay)
Attachment #524615 - Flags: review?(jonas)
Attachment #524192 - Attachment is obsolete: true
Attachment #524192 - Flags: review?(Olli.Pettay)
Attachment #525356 - Attachment is obsolete: true
Attachment #567083 - Flags: review?(Olli.Pettay)
Attachment #525356 - Flags: review?(jonas)
Attachment #531059 - Attachment is obsolete: true
Attachment #567084 - Flags: review?(Olli.Pettay)
Attachment #531059 - Flags: review?(jonas)
Attachment #559516 - Attachment is obsolete: true
Attachment #567086 - Flags: review?(Olli.Pettay)
Attachment #559516 - Flags: review?(jonas)
Attachment #562027 - Flags: review?(jonas) → review?(Olli.Pettay)
Comment on attachment 567081 [details] [diff] [review]
Part 6: Support internal encoding declarations in XML


> nsHtml5StreamParser::FinalizeSniffing(const PRUint8* aFromSegment, // can be null
>                                       PRUint32 aCount,
>                                       PRUint32* aWriteCount,
>                                       PRUint32 aCountToSniffingLimit)
> {

....
>+    if (mSniffingBuffer) {
>+      status = XML_Parse(ud.mExpat,
>+                         reinterpret_cast<const char*>(mSniffingBuffer.get()),
>+                         mSniffingLength,
>+                         PR_FALSE);
>+    }
>+    if (status == XML_STATUS_OK &&
>+        mCharsetSource < kCharsetFromMetaTag &&
>+        aFromSegment) {
>+      status = XML_Parse(ud.mExpat,
>+                         reinterpret_cast<const char*>(aFromSegment),
>+                         aCountToSniffingLimit,
aCountToSniffingLimit or aCount?


>+                         PR_FALSE);
>+    }

This could use some comments. Something which explains why first mSniffingBuffer is handled, and if that
didn't work aFromSegment.
Attachment #567081 - Flags: review?(Olli.Pettay) → review-
> loadAsHtml5 handling is a bit ugly. It is set to false in many different place.
> Any chance to make that all more readable.

It's indeed ugly.

It will be easier to deuglify it once support for the html5.parser.enable pref is removed and once support for bringing back old View Source using a #define is removed. I believe the Rapid Release rules require the #define to stick around for a while.

Are you OK with the deuglification happening later when the old code paths are removed from the codebase?

> +                    // XXX reorder point
> I wonder what this means

It means that control never falls through between two switch cases across that comment, so sequences of cases separated by that comment could be reordered relative to other such sequences of cases.

> +// The old code had a limit of 16 tokens. 1300 is a number picked my measuring
> +// the size of 16 tokens on cnn.com.
> +#define NS_HTML5_HIGHLIGHTER_PRE_BREAK_THRESHOLD 1300
> So, since at least some bidi perf problems have been fixed, make sure
> this is still needed. It would be great to get rid of this kind of limitation.
> (But I do think there are still open issues related to huge text node)

The follow-up bug is 695640.

> +nsHtml5Highlighter::Start()
> +{
> +  // Doctype
> +  mOpQueue.AppendElement()->Init(nsGkAtoms::html, EmptyString(), EmptyString());
> +
> +  mOpQueue.AppendElement()->Init(STANDARDS_MODE);
> +
> +  nsIContent** root = CreateElement(nsHtml5Atoms::html, nsnull);
> +  mOpQueue.AppendElement()->Init(eTreeOpAppendToDocument, root);
> +  mStack.AppendElement(root);
> +
> +  Push(nsGkAtoms::head, nsnull);
> I really wish there was a stack class for Push/Pop when they happen
> inside a method.

There are only three cases where that would be applicable. Also, I think having to add braces to make } implicitly pop would be much less obvious than the current code.

> Also, could there be some debug code for Pop() so ensure that Push and Pop
> are always used correctly. Currently Push/Pop feels quite error prone to me.

That would complicate the code and wouldn't buy much safety. The stack nodes currently don't know what element they represent. To even get the level of checking where push and pop are checked to try to push and pop the same element name, the stack would have to hold classes that hold element names solely for debugging purposes. Furthermore, even if such names were added, there are enough spans around that checking that a span was pushed and a span was popped doesn't indicate much about code correctness.

OTOH, the stack does check for underflows. If there really was an unmatched Pop(), the stack would underflow pretty quickly and the underflow assert would fire.

> +  nsHtml5HtmlAttributes* linkAttrs = new nsHtml5HtmlAttributes(0);
> +  nsString* rel = new nsString(NS_LITERAL_STRING("stylesheet"));
> +  linkAttrs->addAttribute(nsHtml5AttributeName::ATTR_REL, rel);
> I guess this is just the way string handling works in html5 parser atm.
> I hope there are bugs filed to fix it to be less-malloc happy, and
> even more importantly, less error prone by not using nsString*

This way of dealing with strings was OKed when the HTML5 parser was being developed. Bug 497818 is on file, though.

> +nsHtml5Highlighter::MaybeLinkifyAttributeValue(nsHtml5AttributeName* aName,
> +                                               nsString* aValue)
> +{
> +  if (!(nsHtml5AttributeName::ATTR_HREF == aName
> +      || nsHtml5AttributeName::ATTR_SRC == aName
> +      || nsHtml5AttributeName::ATTR_ACTION == aName
> +      || nsHtml5AttributeName::ATTR_CITE == aName
> +      || nsHtml5AttributeName::ATTR_BACKGROUND == aName
> +      || nsHtml5AttributeName::ATTR_LONGDESC == aName
> +      || nsHtml5AttributeName::ATTR_XLINK_HREF == aName
> +      || nsHtml5AttributeName::ATTR_DEFINITIONURL == aName)) {
> 
> || should be at the end of a line.

OK.

> +nsIContent**
> +nsHtml5Highlighter::AllocateContentHandle()
> +{
> +  if (mHandlesUsed == NS_HTML5_HIGHLIGHTER_HANDLE_ARRAY_LENGTH) {
> +    mOldHandles.AppendElement(mHandles.forget());
> +    mHandles = new nsIContent*[NS_HTML5_HIGHLIGHTER_HANDLE_ARRAY_LENGTH];
> This looks strange. Is there a reason why you couldn't use nsTArray?

The plain arrays allocated here will have pointers from elsewhere pointing to them, it's of utmost importance that the arrays don't move in memory. nsTArray does not guarantee the validity of pointers to its backing array over time, because nsTArray may allocate a new backing array and copy the old backing array contents into it. nsTArray would be a potentially harmful abstraction here.

Also, the handle backing memory is always allocated in chunks that are NS_HTML5_HIGHLIGHTER_HANDLE_ARRAY_LENGTH in size, so there's no need for an nsTArray here.

> +void
> +nsHtml5Highlighter::AppendCharacters(const PRUnichar* aBuffer,
> +                                     PRInt32 aStart,
> +                                     PRInt32 aLength)
> +{
> +  NS_PRECONDITION(aBuffer, "Null buffer");
> +
> +  PRUnichar* bufferCopy = new PRUnichar[aLength];
> +  memcpy(bufferCopy, aBuffer + aStart, aLength * sizeof(PRUnichar));
> +
> +  mOpQueue.AppendElement()->Init(eTreeOpAppendText,
> +                                 bufferCopy,
> +                                 aLength,
> +                                 CurrentNode());
> Again, string handling is rather error prone. I'd wish that all would
> happen using nsString.
> But, if HTML5 parser relies on such string data handling, better to
> fix it in a followup bug.

I think there shouldn't be a follow-up bug to change anything here.

The arguments to this method index into a buffer that participates in the tokenizer internals. Changing the tokenizer internals to use nsStrings instead of nsHtml5UTF16Buffers would do more damage to maintainability of the Gecko-independent tokenizer core than the use of nsStrings would be worth. (Also worth noting that the tokenizer internals get tested as Java and when testing as Java, the JVM checks for array boundaries, so there's a very good reason to believe that the hot code in the tokenizer that indexes into arrays is bound-wise correct.)

The tree op doesn't carry an nsString, because having nsString members on tree ops would bloat the tree ops unnecessarily if not using unions and using unions with typed that have destructors would be just asking for trouble.

In any case, changing any of this would be way out of scope for implementing View Source.

> +    /**
> +     * Starts a span with no class.
> +     */
> +    void StartSpan();
> +
> +    /**
> +     * Starts a <span> and sets the class attribute on it.
> +     *
> +     * @param aClass the class to set (MUST be a static string that does not
> +     *        need to be released!)
> +     */
> +    void StartSpan(const PRUnichar* aClass);
> +
> +    /**
> +     * End the current <span> or <a>.
> +     */
> +    void EndInline();
> EndSpanOrA() perhaps?

OK.

> +    /**
> +     * Finishes a source tag being highlighted by closing the open <span> and
> +     * <a> elements.
> +     */
> +    void FinishTag();
> Please document what "End*" means vs. Finish*

OK.

> +    /**
> +     * The element stack.
> +     */
> +    nsTArray<nsIContent**> mStack;
> There should be some comment why ** and why not just *.
> I need to still understand why it is ok to have nsIContent**. It _feels_
> very error prone.

Efficient multi-threaded shared memory code is error prone. :-(

Using nsIContent** instead of nsIContent* is not specific to View Source. It's how the HTML parser deals with DOM nodes in a way that works off the main thread. Non-main-thread code can't refcount or otherwise touch nsIContent objects in any way. Yet, the off-the-main-thread code needs to have a way to hold onto a particular node and repeatedly operate on the same node.

The way this works is that the off-the-main-thread code has an nsIContent** for each DOM node and a given nsIContent** is only ever actually deferenced into an actual nsIContent* on the main thread. When the off-the-main-thread code requests a new node, it gets an nsIContent** immediately and a tree op is enqueued for later allocating an actual nsIContent object and writing a pointer to it into the memory location pointed to by the nsIContent**.

Since tree ops are in a queue, the node creating tree op will always run before tree ops that try to further operate on the node that the nsIContent** is a handle to.

(On-the-main-thread parts of the parser use the same mechanism in order to avoid having to have duplicate code paths for on-the-main-thread and off-the-main-thread tree builder instances.)

I'll add documentation for this.

> +    static const PRUnichar kISO88591[] =
> +        { 'I', 'S', 'O', '-', '8', '8', '5', '9', '-', '1', '\0' };
> 
> Would NS_NAMED_LITERAL_STRING work here?

I suppose it would work, but wouldn't it be less efficient?

-    PRUint32 countToSniffingLimit = NS_HTML5_STREAM_PARSER_SNIFFING_BUFFER_SIZE - mSniffingLength;
-    nsHtml5ByteReadable readable(aFromSegment, aFromSegment + countToSniffingLimit);
+    PRUint32 countToSniffingLimit =
+        NS_HTML5_STREAM_PARSER_SNIFFING_BUFFER_SIZE - mSniffingLength;
+    if (mMode == NORMAL || mMode == VIEW_SOURCE_HTML) {
+      nsHtml5ByteReadable readable(aFromSegment, aFromSegment
+          + countToSniffingLimit);
'+' should be in the previous line

OK.

> +  // not the last buffer
> +  if (mMode == NORMAL || mMode == VIEW_SOURCE_HTML) {
> Perhaps you could store somewhere that mode is normal or html-viewsource

I suppose I can do that, but does that kind of optimization make the code better? That is, wouldn't it make the code less obvious to read?

> +  if (mMode == VIEW_SOURCE_HTML || mMode == VIEW_SOURCE_XML) {
> +    mTokenizer->FlushViewSource();
> +  }
> Would it be useful to have some member variable to indicate if
> mMode is either VIEW_SOURCE_HTML or VIEW_SOURCE_XML...
> or maybe not.

I think it more obvious what's happening when it's written out explicitly.

> +template<class P>
> It is not clear to me why 'P'.

P stands for policy. It's a single letter, because the template parameter is typically a single capital letter.

> Could you use some
> name which actually describes the type somehow

Would "Policy" be better than "P".

> Or at least describe here P somehow.

That would need special-casing in the generator, but it's doable.

> (Or is this all perhaps generated code - didn't verify that)

This is generated, yes.

> +struct nsHtml5SilentPolicy

> +struct nsHtml5ViewSourcePolicy

> Could you add some comments here.

OK.

> Why the naming "Policy". Could *TransitionHandler or *TransitionBroker work?

Policy is appropriate terminology here. See: http://en.wikipedia.org/wiki/Policy-based_design

> +              if (!!NS_UNLIKELY(mViewSource) && !isCurrent(nsHtml5Atoms::table)) {
> Why !!
> Same thing also elsewhere

This is generated code and the generator doesn't try to be too fancy about the context of a subexpression. However, it does try to avoid |foo != nsnull| or |foo != 0| comparisons in C++. In the general case, the right way to translate such expressions into PRBool-era Mozilla C++ was to use |!!foo| instead of just |foo| in order to cover cases where the value of the expression gets assigned to a PRBool variable.

Let's deal with this as a follow-up, since it's a broader code generation issue. I filed bug 696735.

> +nsIURI*
> +nsHtml5TreeOpExecutor::GetViewSourceBaseURI()
> +{
> +  if (!mViewSourceBaseURI) {
> +    nsCOMPtr<nsIURI> orig = mDocument->GetOriginalURI();
> +    bool isViewSource;
> = false;
> Or check that SchemeIs succeeds.

OK.

Thank you.
Attached patch Part 14: Address review comments (obsolete) — Splinter Review
(In reply to Olli Pettay [:smaug] from comment #63)
> Comment on attachment 567081 [details] [diff] [review] [diff] [details] [review]
> Part 6: Support internal encoding declarations in XML
> 
> 
> > nsHtml5StreamParser::FinalizeSniffing(const PRUint8* aFromSegment, // can be null
> >                                       PRUint32 aCount,
> >                                       PRUint32* aWriteCount,
> >                                       PRUint32 aCountToSniffingLimit)
> > {
> 
> ....
> >+    if (mSniffingBuffer) {
> >+      status = XML_Parse(ud.mExpat,
> >+                         reinterpret_cast<const char*>(mSniffingBuffer.get()),
> >+                         mSniffingLength,
> >+                         PR_FALSE);
> >+    }
> >+    if (status == XML_STATUS_OK &&
> >+        mCharsetSource < kCharsetFromMetaTag &&
> >+        aFromSegment) {
> >+      status = XML_Parse(ud.mExpat,
> >+                         reinterpret_cast<const char*>(aFromSegment),
> >+                         aCountToSniffingLimit,
> aCountToSniffingLimit or aCount?

aCountToSniffingLimit. If aCount was passed, the total number of bytes sniffed would depend on the network buffer boundaries.

> >+                         PR_FALSE);
> >+    }
> 
> This could use some comments. Something which explains why first
> mSniffingBuffer is handled, and if that
> didn't work aFromSegment.

Added a comment and addressed the review comments I said "OK" to in my previous comment in the attachment (part 14).

I hope it's OK to address the review comments in a patch that applies on top of the previous parts instead of having to tweak the earlier parts and deal with merge conflicts.
Attachment #569038 - Flags: review?(Olli.Pettay)
Comment on attachment 569038 [details] [diff] [review]
Part 14: Address review comments


>     /**
>-     * Finishes a source tag being highlighted by closing the open <span> and
>-     * <a> elements.
>+     * Finishes highlighting an input by closing the open <span> and
>+     * <a> elements in the highlighter output.
>      */
>-    void FinishTag();
>+    void FinishHighlightingInputTag();

This looks a bit strange. There isn't <input> anywhere. I think the original method name is better,
especially if you still clarify how Finish* is different from *End.
The description could somehow mention also that the method calls StartCharacters().
Attachment #569038 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 567083 [details] [diff] [review]
Part 7: Add an id for each line, rebased


>+    inline void InitAddLineNumberId(nsIContent** aNode,
>+                                    const PRInt32 aLineNumber) {
Nit, looks like the file uses non-gecko coding style. { should be in the next line.
But better to be consistent with the rest of the file.
Attachment #567083 - Flags: review?(Olli.Pettay) → review+
Attachment #567084 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 567079 [details] [diff] [review]
Part 5: Add entry point for XML view source, make it easy to disable this new view source implementation using a #define, rebased

># HG changeset patch
># Parent be84560121a19c300dcd47cc85f50de98676d519
> nsHTMLDocument::StartDocumentLoad(const char* aCommand,
>                                   nsIChannel* aChannel,
>                                   nsILoadGroup* aLoadGroup,
>                                   nsISupports* aContainer,
>                                   nsIStreamListener **aDocListener,
>                                   bool aReset,
>                                   nsIContentSink* aSink)
> {
>-  bool loadAsHtml5 = nsHtml5Module::sEnabled;
>+  bool viewSource = aCommand && !nsCRT::strcmp(aCommand, "view-source") &&
>+    NS_USE_NEW_VIEW_SOURCE;
>+  bool loadAsHtml5 = nsHtml5Module::sEnabled || viewSource;
>+
>   if (aSink) {
>     loadAsHtml5 = PR_FALSE;
>   }
> 
>   nsCAutoString contentType;
>   aChannel->GetContentType(contentType);
> 
>-  if (contentType.Equals("application/xhtml+xml") &&
>-      (!aCommand || nsCRT::strcmp(aCommand, "view-source") != 0)) {
>+  if (contentType.Equals("application/xhtml+xml") && !viewSource) {
>     // We're parsing XHTML as XML, remember that.
> 
>     mIsRegularHTML = PR_FALSE;
>     mCompatMode = eCompatibility_FullStandards;
>     loadAsHtml5 = PR_FALSE;
>   }
> #ifdef DEBUG
>   else {
>     NS_ASSERTION(mIsRegularHTML,
>                  "Hey, someone forgot to reset mIsRegularHTML!!!");
>   }
> #endif
>   
>-  if (loadAsHtml5 && 
>-      !(contentType.EqualsLiteral("text/html") && 
>-        aCommand && 
>-        (!nsCRT::strcmp(aCommand, "view") ||
>-         !nsCRT::strcmp(aCommand, "view-source")))) {
>+  if (loadAsHtml5 && !viewSource && !(contentType.EqualsLiteral("text/html") &&
>+      aCommand && !nsCRT::strcmp(aCommand, "view"))) {
>     loadAsHtml5 = PR_FALSE;
>   }

As I mentioned, perhas loadAsHtml5 could be simpler, but not a big deal.
Attachment #567079 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 562027 [details] [diff] [review]
Part 12: Reftests

rs
Attachment #562027 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 567078 [details] [diff] [review]
Part 4: Add XML highlighting support to parser core, v2, rebased

> nsHtml5Tokenizer::emitCurrentTagToken(bool selfClosing, PRInt32 pos)
> {
>   cstart = pos + 1;
>   maybeErrSlashInEndTag(selfClosing);
>   stateSave = NS_HTML5TOKENIZER_DATA;
>   nsHtml5HtmlAttributes* attrs = (!attributes ? nsHtml5HtmlAttributes::EMPTY_ATTRIBUTES : attributes);
>   if (endTag) {
Is endTag defined in the class? Seems like so. I really wish the code was using some coding style which
makes it clear what kind of variable is being used: parameter, static variable, local variable.
Please file a followup to fix the coding style in .java files.
Attachment #567078 - Flags: review?(Olli.Pettay) → review+
(In reply to Olli Pettay [:smaug] from comment #70)
> Please file a followup to fix the coding style in .java files.

The .java files comply with the Java coding conventions. It would be inappropriate to use Gecko C++ code style in the .java files.

When I was writing the translator, I was specifically instructed not to waste time on making the translator munge the names into more Gecko-like forms in the C++ output.
Comment on attachment 567086 [details] [diff] [review]
Part 11: Deduplicate the tokenizer loop in .cpp, rebased

As discussed on IRC, s/P::viewingSource/P::reportErrors/

this is pretty much all mechanical changes.
And note to myself, the policy naming comes from http://en.wikipedia.org/wiki/Policy-based_design
Attachment #567086 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 567076 [details] [diff] [review]
Part 2: Highlight tokenizer-level errors, rebased


>+nsHtml5Tokenizer::errNcrControlChar(PRUnichar ch)
>+{
>+  if (mViewSource) {
>+    mViewSource->AddErrorToCurrentNode("errNcrControlChar");
>+  }
>+  return ch;
>+}
>+
>+void
>+nsHtml5Tokenizer::errGarbageAfterLtSlash()
>+{
>+  mViewSource->AddErrorToCurrentNode("errGarbageAfterLtSlash");
>+}
You're inconsistently adding null checks for mViewSource.
Please null check always. Caller shouldn't need to guess if there is
a null check or not.

>+    case eTreeOpAddError: {
>+      nsIContent* node = *(mOne.node);
>+      char* msgId = mTwo.charPtr;
>+      nsAutoString klass;
>+      node->GetAttr(kNameSpaceID_None, nsGkAtoms::_class, klass);
>+      if (!klass.IsEmpty()) {
>+        klass.Append(NS_LITERAL_STRING(" error"));
I think it would be nice to have all the different class name defined somewhere higher up,
or perhaps even in some header file using #define.
Would that be possible? But not a big deal.
Attachment #567076 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 567074 [details] [diff] [review]
Part 1: Implement the bulk of HTML syntax highlighting using the new parser, v4, rebased

>+     *
>+     * @param aCommand the parser command (Yeah, this is bad API design. Let's
>+     * make this better when retiring nsIParser)
>      */
>-    virtual void MarkAsNotScriptCreated();
>+    virtual void MarkAsNotScriptCreated(const char* aCommand);
File a followup


>+nsHtml5Tokenizer::stateLoopReportTransitions(PRInt32 state, PRUnichar c, PRInt32 pos, PRUnichar* buf, bool reconsume, PRInt32 returnState, PRInt32 endPos)
note to myself, this is removed in some other patch. Luckily, since most of this patch is about that method.


>+nsIURI*
>+nsHtml5TreeOpExecutor::GetViewSourceBaseURI()
>+{
>+  if (!mViewSourceBaseURI) {
>+    nsCOMPtr<nsIURI> orig = mDocument->GetOriginalURI();
>+    bool isViewSource;
>+    orig->SchemeIs("view-source", &isViewSource);
>+    if (isViewSource) {
>+      nsCOMPtr<nsINestedURI> nested = do_QueryInterface(orig);
>+      NS_ASSERTION(nested, "URI with scheme view-source didn't QI to nested!");
>+      nested->GetInnerURI(getter_AddRefs(mViewSourceBaseURI));
>+    } else {
>+      mViewSourceBaseURI = orig;
>+    }
>+  }
>+  return mViewSourceBaseURI;
>+}
In which case view source uri scheme is not
view-source? Could you add some comment here.



>+      // Reuse the fix for bug 467852
>+      // URLs that execute script (e.g. "javascript:" URLs) should just be
>+      // ignored.  There's nothing reasonable we can do with them, and allowing
>+      // them to execute in the context of the view-source window presents a
>+      // security risk.  Just return the empty string in this case.
>+      bool openingExecutesScript = false;
>+      rv = NS_URIChainHasFlags(uri,
>+                               nsIProtocolHandler::URI_OPENING_EXECUTES_SCRIPT,
>+                               &openingExecutesScript);
>+      NS_ENSURE_SUCCESS(rv, NS_OK);
>+      if (openingExecutesScript) {
>+        return NS_OK;
>+      }
Maybe
if (NS_FAILED(rv) || openingExecutesScript) {
  return NS_
}
or
NS_ENSURE_TRUE(NS_SUCCEEDED(rv) && !openingExecutesScript, NS_OK)


looks good to me. Such a huge patch for fortunately not much to comment.
Attachment #567074 - Flags: review?(Olli.Pettay) → review+
Comment on attachment 567077 [details] [diff] [review]
Part 3: Highlight tree builder-level errors, with more null checks, rebased


>                                         case HTML:
>+                                            // XXX figure out a way to report this in the Gecko View Source case
>                                             err("Non-space characters found without seeing a doctype first. Expected \u201C<!DOCTYPE html>\u201D.");
File a followup please


>-                            err("Start tag seen without seeing a doctype first. Expected \u201C<!DOCTYPE html>\u201D.");
>+                            // ]NOCPP]
>+                            errStartTagWithoutDoctype();
>+                            // [NOCPP[

Why NOCPP?
Actually, could you perhaps add a comment to each NOCPP case you're adding.



>+++ b/parser/html/nsHtml5AtomList.h
I hope there is a bug filed to merge this with nsGkAtoms


>+            if ((returnState & NS_HTML5TOKENIZER_DATA_AND_RCDATA_MASK)) {
> 
>+            } else {
>+
>+            }
Er, what? 


Again, huge patch, but in this case most of it is actually context.


Could you please create a new patch which contains all the changes, and review comments - something which applies to trunk.
And upload that to tryserver so that people (well, at least I) could test it a bit.

I could then re-review that patch. (hopefully a relatively fast review.)

I still wonder about testing...
Attachment #567077 - Flags: review?(Olli.Pettay) → review+
(In reply to Henri Sivonen (:hsivonen) from comment #71)
> When I was writing the translator, I was specifically instructed not to
> waste time on making the translator munge the names into more Gecko-like
> forms in the C++ output.

Had that person seen what the generated code would look like? Did that person know that translation would not be integrated in the normal build process?

I think we have a lot more context to judge what the right thing to do is now, than we did back when a lot of these discussions were had.

Please remember that there will come a day when you will no longer be working on the parser and it will be handed to someone else. And that person will want to make big sweeping changes to the parser. Be that to fix bugs, add new features, improve performance or due to making gecko-wide changes (like the recent PRBool->bool migration).

At that point this person will have to decide to either
1. Learn the translator toolchain, learn the translator code, learn the java code and make the changes to the javacode and/or translator.
or
2. Learn the generated C++ code and make the changes to the generated C++ code.

There will be little to no incentive incentive to keep the javacode in par with the C++ code beyond the minimum needed to generate the gecko code.

I strongly suspect that this person will choose to work on the C++ code directly.

The harder it is for this person to work with the C++ code directly, the more likely it is that this person will introduce bugs which will permanent in HTML parsing algorithms over time. After all, bugs like that is what got us to the current far from optimal HTML parsing rules.

I understand the desire to work on the javacode and not worry about the C++ code. But please do keep in mind what the long term effects this will have.
(In reply to Olli Pettay [:smaug] from comment #73)
> You're inconsistently adding null checks for mViewSource.
> Please null check always. Caller shouldn't need to guess if there is
> a null check or not.

I carefully avoided adding null checks to those methods that are only ever called from the View Source version of the tokenizer loop.

Adding null checks to the methods I didn't add them to would be a waste of CPU cycles and code size.

Do you want me to add null checks anyway just in case?

(In reply to Jonas Sicking (:sicking) from comment #76)
> (In reply to Henri Sivonen (:hsivonen) from comment #71)
> > When I was writing the translator, I was specifically instructed not to
> > waste time on making the translator munge the names into more Gecko-like
> > forms in the C++ output.
> 
> Had that person seen what the generated code would look like?

IIRC, you (and jst) hadn't seen the generated code at that point, but it was clear that not munging the names would mean that
 1) Method names would not be capitalized.
 2) Arguments wouldn't be of form aFoo.
 3) Fields wouldn't be of form mFoo.

I'm not sure if anyone thought of sFoo statics back then.

> Did that
> person know that translation would not be integrated in the normal build
> process?

I don't recall.
(In reply to Henri Sivonen (:hsivonen) from comment #77)
> Do you want me to add null checks anyway just in case?
Yes, for consistency. Non of that code is performance-vice hot.
Assignee: hsivonen → nobody
(In reply to Olli Pettay [:smaug] from comment #66)
> This looks a bit strange. There isn't <input> anywhere. I think the original
> method name is better,
> especially if you still clarify how Finish* is different from *End.
> The description could somehow mention also that the method calls
> StartCharacters().

Changed again.

(In reply to Olli Pettay [:smaug] from comment #72)
> As discussed on IRC, s/P::viewingSource/P::reportErrors/

Done.

(In reply to Olli Pettay [:smaug] from comment #73)
> >+    case eTreeOpAddError: {
> >+      nsIContent* node = *(mOne.node);
> >+      char* msgId = mTwo.charPtr;
> >+      nsAutoString klass;
> >+      node->GetAttr(kNameSpaceID_None, nsGkAtoms::_class, klass);
> >+      if (!klass.IsEmpty()) {
> >+        klass.Append(NS_LITERAL_STRING(" error"));
> I think it would be nice to have all the different class name defined
> somewhere higher up,
> or perhaps even in some header file using #define.
> Would that be possible? But not a big deal.

Added comments pointing to the CSS file.

(In reply to Olli Pettay [:smaug] from comment #74)
> Comment on attachment 567074 [details] [diff] [review] [diff] [details] [review]
> Part 1: Implement the bulk of HTML syntax highlighting using the new parser,
> v4, rebased
> 
> >+     *
> >+     * @param aCommand the parser command (Yeah, this is bad API design. Let's
> >+     * make this better when retiring nsIParser)
> >      */
> >-    virtual void MarkAsNotScriptCreated();
> >+    virtual void MarkAsNotScriptCreated(const char* aCommand);
> File a followup

Filed bug 698442.

> >+nsIURI*
> >+nsHtml5TreeOpExecutor::GetViewSourceBaseURI()
> >+{
> >+  if (!mViewSourceBaseURI) {
> >+    nsCOMPtr<nsIURI> orig = mDocument->GetOriginalURI();
> >+    bool isViewSource;
> >+    orig->SchemeIs("view-source", &isViewSource);
> >+    if (isViewSource) {
> >+      nsCOMPtr<nsINestedURI> nested = do_QueryInterface(orig);
> >+      NS_ASSERTION(nested, "URI with scheme view-source didn't QI to nested!");
> >+      nested->GetInnerURI(getter_AddRefs(mViewSourceBaseURI));
> >+    } else {
> >+      mViewSourceBaseURI = orig;
> >+    }
> >+  }
> >+  return mViewSourceBaseURI;
> >+}
> In which case view source uri scheme is not
> view-source? Could you add some comment here.

Added a comment. I really can't recall if I did that just to fail gracefully or if I actually saw a case where the scheme isn't view-source.

> >+      // Reuse the fix for bug 467852
> >+      // URLs that execute script (e.g. "javascript:" URLs) should just be
> >+      // ignored.  There's nothing reasonable we can do with them, and allowing
> >+      // them to execute in the context of the view-source window presents a
> >+      // security risk.  Just return the empty string in this case.
> >+      bool openingExecutesScript = false;
> >+      rv = NS_URIChainHasFlags(uri,
> >+                               nsIProtocolHandler::URI_OPENING_EXECUTES_SCRIPT,
> >+                               &openingExecutesScript);
> >+      NS_ENSURE_SUCCESS(rv, NS_OK);
> >+      if (openingExecutesScript) {
> >+        return NS_OK;
> >+      }
> Maybe
> if (NS_FAILED(rv) || openingExecutesScript) {
>   return NS_
> }

Done.

(In reply to Olli Pettay [:smaug] from comment #75)
> Comment on attachment 567077 [details] [diff] [review] [diff] [details] [review]
> Part 3: Highlight tree builder-level errors, with more null checks, rebased
> 
> 
> >                                         case HTML:
> >+                                            // XXX figure out a way to report this in the Gecko View Source case
> >                                             err("Non-space characters found without seeing a doctype first. Expected \u201C<!DOCTYPE html>\u201D.");
> File a followup please

Bug 698445.

> >-                            err("Start tag seen without seeing a doctype first. Expected \u201C<!DOCTYPE html>\u201D.");
> >+                            // ]NOCPP]
> >+                            errStartTagWithoutDoctype();
> >+                            // [NOCPP[
> 
> Why NOCPP?
> Actually, could you perhaps add a comment to each NOCPP case you're adding.

Well, in this case, I broke out the error reporting from a pre-existing NOCPP block, so I don't know what comment I should add.

Basically, there's no support ATM for tree builder errors that would require attaching the error to a the text content of an element or to the EOF. In the EOF case, there's simply no corresponding text to highlight. I actually can't remember what exactly my rationale was for not supporting attaching tree builder errors to text nodes. There must have been a good reason, but it's been many months since I wrote this code.

Added general comments to this effect instead of commenting on each NOCPP block.

Filed bug 698448 and bug 698450.

> >+++ b/parser/html/nsHtml5AtomList.h
> I hope there is a bug filed to merge this with nsGkAtoms

Already filed as bug 539435.

> >+            if ((returnState & NS_HTML5TOKENIZER_DATA_AND_RCDATA_MASK)) {
> > 
> >+            } else {
> >+
> >+            }
> Er, what? 

Fixed in a later patch in the series of parts.

> Could you please create a new patch which contains all the changes, and
> review comments - something which applies to trunk.
> And upload that to tryserver so that people (well, at least I) could test it
> a bit.

Attaching a patch that addresses all your review comments so far. I'll attach another all-in-one patch shortly.

(In reply to Olli Pettay [:smaug] from comment #78)
> (In reply to Henri Sivonen (:hsivonen) from comment #77)
> > Do you want me to add null checks anyway just in case?
> Yes, for consistency. Non of that code is performance-vice hot.

Added as NS_LIKELY so that I can tell apart the null-checks that aren't really needed without reauditing the code in case we don't want the checks after all at some future date.
Assignee: nobody → hsivonen
Attachment #569038 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #570730 - Flags: review?(Olli.Pettay)
Would it be possible to improve xhtml handling.
about:home is "valid" XHTML, but view-source does give some errors.

Though, I guess inline doctype+entities is not that common.

Anyway, that is for a possible followup, since the current view-source isn't really better in that
case.
Comment on attachment 570730 [details] [diff] [review]
Part 14: Address review comments, v2

>+++ b/parser/html/nsHtml5TreeOpExecutor.cpp
>@@ -900,16 +900,18 @@ nsHtml5TreeOpExecutor::GetViewSourceBase
>     nsCOMPtr<nsIURI> orig = mDocument->GetOriginalURI();
>     bool isViewSource;
>     orig->SchemeIs("view-source", &isViewSource);
>     if (isViewSource) {
>       nsCOMPtr<nsINestedURI> nested = do_QueryInterface(orig);
>       NS_ASSERTION(nested, "URI with scheme view-source didn't QI to nested!");
>       nested->GetInnerURI(getter_AddRefs(mViewSourceBaseURI));
>     } else {
>+      // Fail gracefully if the base URL isn't a view-source: URL.
>+      // Not sure if this can ever happen.
>       mViewSourceBaseURI = orig;

File a followup to sort this out.
Attachment #570730 - Flags: review?(Olli.Pettay) → review+
Thank you.

Part 6 is marked as r-. Does part 14 sufficiently address the lack of comments in part 6? That is, can I land these patches now?
(In reply to Olli Pettay [:smaug] from comment #83)
> File a followup to sort this out.

Filed bug 698691.
(In reply to Olli Pettay [:smaug] from comment #82)
> Would it be possible to improve xhtml handling.
> about:home is "valid" XHTML, but view-source does give some errors.
> 
> Though, I guess inline doctype+entities is not that common.

The cost of supporting correct highlighting for internal subsets is radically disproportionate to the utility (i.e. how often people view source on Web pages that are XML, have internal subsets and would benefit from correct highlighting). It's a bit embarrassing that this shows on about:home, but about:home isn't a Web page and is really weird compared to typical Web pages.

It might be easier to improve the highlighting of entity references in XML, but you can't have custom entity references that work on the Web without an internal subset anyway.

If, in the future, we replace expat with an XML5 parser derived from the HTML5 tokenizer we use for HTML, we'll get these features "for free".

> Anyway, that is for a possible followup, since the current view-source isn't
> really better in that case.

Right. Filed bug 698693 and bug 698694.
Blocks: 698693
If we can't turn off error highlighting when doing view-source on non-HTML contents, shouldn't we turn off all highlighting. It seems very bad if we indicate errors where there are none.
I.e. shouldn't we turn off highlighting for resources which we wouldn't pass through the HTML parser if we weren't view-sourcing it? Of course we should keep the highlighting for HTML content.
(In reply to Jonas Sicking (:sicking) from comment #87)
> If we can't turn off error highlighting when doing view-source on non-HTML
> contents, shouldn't we turn off all highlighting. It seems very bad if we
> indicate errors where there are none.

Doing some error highlighting including some incorrect error highlighting is for parity with old View Source. It would be possible to turn off error highlights in XML View Source while keeping the rest of syntax highlighting for XML. (Internal subsets would still be mishighlighted but not highlighted as errors.)

But, frankly, if someone is using internal subsets and custom entities *on the Web*, they deserve some red highlights. :-) about:home is a very special case, because it's loaded in Firefox often (and is there as a View Source bait) but doesn't come from a Web server.

(In reply to Jonas Sicking (:sicking) from comment #88)
> I.e. shouldn't we turn off highlighting for resources which we wouldn't pass
> through the HTML parser if we weren't view-sourcing it? Of course we should
> keep the highlighting for HTML content.

The highlighting is used only for HTML and XML resources. Other resources are treated as plain text. Just like before.

If the new implementation didn't try best-effort highlighting of XML, it would fail feature parity with the old implementation which also used the HTML tokenizer to highlight XML and, therefore, failed with internal subsets.

Throwing out the highlighting for XML because it can't do internal subsets would be a pretty serious baby&bathwater case. Especially when the old implementation couldn't highlight internal subsets, either, for the exact same reason.
I'm not sure what you mean by being on par with the old highlighter. The old highliter didn't highlight errors, did it?
(In reply to Jonas Sicking (:sicking) from comment #90)
> I'm not sure what you mean by being on par with the old highlighter. The old
> highliter didn't highlight errors, did it?

It did highlight some errors (though nowhere near as many as this new impl).

Navigate about:home without these patches and View Source. You'll see some red there in the doctype.
Jonas, I suggest I land these patches first and when you've experimented with the new implementation first-hand, we can deal with problems in follow-up bugs. 

Possible solutions include: not doing anything, disabling specific tokenizer error highlights in the XML case, disabling all error highlights in the XML case or disabling syntax highlighting for about:, chrome: and res: URLs. But let's not try to decide on what constitutes a problem and what the right solution is before you've experimented with a build and let's not keep these patches bitrotting until things are perfect.
(In reply to Henri Sivonen (:hsivonen) from comment #84)
> Thank you.
> 
> Part 6 is marked as r-. Does part 14 sufficiently address the lack of
> comments in part 6? That is, can I land these patches now?

You can land the patches.

And btw, I like the improved error reporting in view source :)
Could the owner of this bug take a look at dev-l10n? It would be much appreciated (and I think the discussion would be too long for a closed bug).
https://lists.mozilla.org/listinfo/dev-l10n
The highlightings for &lt;, &quot;, etc. are wrong in 20111102 build. I guess it is due to this bugfix.
In addition, the title bar does not specify which page the source comes from.
(In reply to Fanolian from comment #97)
> The highlightings for &lt;, &quot;, etc. are wrong in 20111102 build. I
> guess it is due to this bugfix.

Bug 535530.

> In addition, the title bar does not specify which page the source comes from.

Bug 699356.
Depends on: 699356
Depends on: 699365
Depends on: 700034
Depends on: 700042
Depends on: 700260
Depends on: 700315
This should probably mentioned in dev docs. Feel free to copy from http://hsivonen.iki.fi/view-source/ to MDN.

Note that at this point, it's still not clear if this feature makes it to Firefox 10 or whether it needs to be turned off on Aurora for the Firefox 10 release train.
Keywords: dev-doc-needed
Depends on: 703965
Blocks: 703685
Depends on: 704667
Documented:

https://developer.mozilla.org/en/View_source

And mentioned on Firefox 10 for developers.
See bug 710175: "Revert to old View Source back end for Firefox 10"
Documentation needs an update, I think?
(In reply to j.j. from comment #101)
> See bug 710175: "Revert to old View Source back end for Firefox 10"
> Documentation needs an update, I think?

MDN edited accordingly.
Blocks: 713810
Depends on: 731234
Depends on: 742414
Depends on: 743208
Editing the target milestone, because this was bug 710175 made the code landed here unused on the Firefox 10 branch and the target milestone field here seems to be causing confusion.
Target Milestone: mozilla10 → mozilla11
Depends on: 808401
Depends on: 978465
Regressions: 1663794
You need to log in before you can comment on or make changes to this bug.