Last Comment Bug 535530 - View Source highlighting of entities off by one; When a semicolon has been tokenized in a character reference, should return to the return state eagerly
: View Source highlighting of entities off by one; When a semicolon has been to...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: Trunk
: All All
: P3 normal with 1 vote (vote)
: mozilla11
Assigned To: Henri Sivonen (:hsivonen)
:
: Andrew Overholt [:overholt]
Mentors:
: 701775 705733 (view as bug list)
Depends on: 704058
Blocks: 482921
  Show dependency treegraph
 
Reported: 2009-12-17 02:27 PST by Henri Sivonen (:hsivonen)
Modified: 2011-12-14 01:01 PST (History)
11 users (show)
hsivonen: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
wontfix


Attachments
Transition eagerly when a semicolon matches (61.17 KB, patch)
2011-11-23 06:43 PST, Henri Sivonen (:hsivonen)
bugs: review+
christian: approval‑mozilla‑aurora-
Details | Diff | Splinter Review

Description Henri Sivonen (:hsivonen) 2009-12-17 02:27:56 PST
Spinoff from bug 535499:

If document.write ends with the semicolon of a named character reference, the tokenizer state is the data state when returning to the parser thread. (Instead, the return state is the data state and the next input character will cause the return to the data state.) This causes a useless speculation failure.

Should move to the return state eagerly to put the tokenizer into the data state in time for the state transfer over the thread boundary.
Comment 1 :aceman 2011-11-05 13:33:40 PDT
Could you please update the summary and also add comment that this also causes one more character after semicolor highlighted as part the preceding HTML entity reference. You linked to this bug in http://hsivonen.iki.fi/view-source/ . But people not reading that article will file bugs against it and will never find this bug.
Comment 2 Bill Gianopoulos [:WG9s] 2011-11-09 05:57:20 PST
(In reply to :aceman from comment #1)
> Could you please update the summary and also add comment that this also
> causes one more character after semicolor highlighted as part the preceding
> HTML entity reference. You linked to this bug in
> http://hsivonen.iki.fi/view-source/ . But people not reading that article
> will file bugs against it and will never find this bug.

Well if that IS in fact the same bug as this, then the importance is completely wrong.  Fixing incorrect highlighting is NOT an enhancement.
Comment 3 Henri Sivonen (:hsivonen) 2011-11-09 06:05:05 PST
(In reply to Bill Gianopoulos from comment #2)
> Well if that IS in fact the same bug as this, then the importance is
> completely wrong.  Fixing incorrect highlighting is NOT an enhancement.

Good point.
Comment 4 Bill Gianopoulos [:WG9s] 2011-11-09 06:12:59 PST
It occurs to me that this might be confusing for those not familiar with US-English marketing-speak.

"Enhancement" in software bug reporting context means "Request for a new feature".  Therefore, it is not appropriate for fixing an existing feature.
Comment 5 Marat Tanalin | tanalin.com 2011-11-11 10:51:37 PST
Probably related: bug 701775. Not a dupe since current bug (535530) was reported far before HTML5 parser has been first used for View Source.
Comment 6 Henri Sivonen (:hsivonen) 2011-11-11 15:16:07 PST
*** Bug 701775 has been marked as a duplicate of this bug. ***
Comment 7 Henri Sivonen (:hsivonen) 2011-11-23 06:43:34 PST
Created attachment 576479 [details] [diff] [review]
Transition eagerly when a semicolon matches
Comment 8 Henri Sivonen (:hsivonen) 2011-11-23 11:51:57 PST
Comment on attachment 576479 [details] [diff] [review]
Transition eagerly when a semicolon matches

This patch first fixes a major brainfart that turns out not to have much impact for Gecko: it moves the setting of the reconsume variable to immediately before the calls to transition() so that transition() sees the right value for reconsume.

Then it fixes the actual problem by breaking the named character reference matching loop early when the semicolon is the last character that matches. Normally, the loop is broken by the first character that doesn't match, because HTML requires longest match. However, since the semicolon never appears in the middle of a named character name, there can't be a longer match after a semicolon has been seen.

When the loop is broken early, there's no need to reconsume the semicolon and this needs to be taken into account when setting the start of the next text run when applicable.
Comment 9 Alice0775 White 2011-11-28 23:34:09 PST
*** Bug 705733 has been marked as a duplicate of this bug. ***
Comment 10 Henri Sivonen (:hsivonen) 2011-11-30 05:13:30 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/eb1badf94821
Comment 12 Henri Sivonen (:hsivonen) 2011-11-30 06:55:42 PST
(In reply to Ms2ger from comment #11)
> Looks like you're leaking on OSX Crashtests:
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=7657775&tree=Mozilla-Inbound
> https://tbpl.mozilla.org/php/getParsedLog.php?id=7658113&tree=Mozilla-Inbound

Must be bug 699365. That bug added a crashtest. *This* bug neither adds a crashtest nor changes anything that affects refcounting.
Comment 13 Marco Bonardo [::mak] 2011-12-01 04:01:05 PST
https://hg.mozilla.org/mozilla-central/rev/eb1badf94821
Comment 14 Henri Sivonen (:hsivonen) 2011-12-13 05:03:48 PST
Comment on attachment 576479 [details] [diff] [review]
Transition eagerly when a semicolon matches

(This comment has been posted on all the bugs mentioned in this comment, except bug 710142, so that the release drivers see it regardless of the order in which they process approval requests.)

The new View Source implementation landed before Firefox 10 moved to Aurora. Afterwards, a bunch of regressions were identified. Many of the regression fixes didn't land before Firefox 10 moved to Aurora but they have now been fixed on trunk except for bug 710142.

To avoid shipping with regressions, we either need to land all the regression fixes on Aurora for Firefox 10 (followed by a fix for bug 710142 in Beta if it doesn't make it before Dec 20th) or switch back to the old View Source implementation on Aurora. The new View Source implementation provides much better diagnostics for Web developers than the old View Source implementation.

So I'd like to ask the release drivers to either approve bug 535530, bug 699356, bug 699365, bug 700034, bug 700361, bug 703965, bug 704667 and bug 705473 plus bug 695640, which is a non-regression tweak, or to approve bug 710175 for reverting to the old View Source implementation for Firefox 10.
Comment 15 christian 2011-12-13 14:38:32 PST
Comment on attachment 576479 [details] [diff] [review]
Transition eagerly when a semicolon matches

[triage comment]
We decided to back out the new view source parser (bug 710175) rather than take this fixup for Firefox 10.

Denying for Aurora.
Comment 16 Henri Sivonen (:hsivonen) 2011-12-14 01:01:34 PST
Firefox 10 no longer affected for View Source due to bug 710175 landing. WONTFIX for Firefox 10 as far as speculative parsing goes.

Note You need to log in before you can comment on or make changes to this bug.