Closed Bug 664104 Opened 13 years ago Closed 10 years ago

Implement spec changes to ruby parsing

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox33 - ---

People

(Reporter: hsivonen, Assigned: yuki.sekiguchi)

References

Details

(Keywords: dev-doc-needed)

Attachments

(9 files, 4 obsolete files)

69.11 KB, patch
Details | Diff | Splinter Review
15.73 KB, patch
Details | Diff | Splinter Review
7.30 KB, patch
wchen
: review+
Details | Diff | Splinter Review
2.94 KB, patch
wchen
: review+
Details | Diff | Splinter Review
6.95 KB, patch
wchen
: review+
Details | Diff | Splinter Review
1.62 KB, patch
wchen
: review+
Details | Diff | Splinter Review
14.86 KB, patch
wchen
: review+
Details | Diff | Splinter Review
69.64 KB, patch
wchen
: review+
Details | Diff | Splinter Review
7.27 KB, patch
Details | Diff | Splinter Review
The parsing spec got changed in response to http://www.w3.org/Bugs/Public/show_bug.cgi?id=12935 and will probably change a bit more. Implement the changes.
Attached patch Java WIP (obsolete) — Splinter Review
Attached patch C++ WIP (obsolete) — Splinter Review
Attached patch C++ WIP, v2Splinter Review
Attachment #539181 - Attachment is obsolete: true
Attached patch Java WIP, v2Splinter Review
The new patches implement the proposal from www.w3.org/Bugs/Public/show_bug.cgi?id=13113 (Test cases not included before seeing how exactly the spec settles.)
Attachment #539180 - Attachment is obsolete: true
(In reply to Henri Sivonen (:hsivonen) from comment #4) > The new patches implement the proposal from > www.w3.org/Bugs/Public/show_bug.cgi?id=13113 > > (Test cases not included before seeing how exactly the spec settles.) The new ruby markup model is now part of the W3C HTML specification: http://lists.w3.org/Archives/Public/public-html/2013Dec/0010.html
I don't see https://github.com/w3c/html/commit/e2ddb663fd04803d2be7f16026e2117ced167c01 changing the parsing. And even if it did, I think we should not implement W3C-only parsing changes. The parsing algorithm is too important to play along with WHATWG and W3C diverging on parsing.
It does change the parsing, see https://github.com/w3c/html/commit/e2ddb663fd04803d2be7f16026e2117ced167c01#diff-36cd38f49b9afa08222c0dc9ebfe35ebL98664 and below. A perhaps more readable view of the (small) changes is in: https://github.com/html5lib/html5lib-python/pull/126/files I wouldn't call this a "W3C-only parsing change", it certainly isn't what it intends to be. The only reason I moved ahead with this is because it had implementer support. (It also matches what every single user I have spoken to wants, and reflects existing content I could find.)
(In reply to Robin Berjon from comment #7) > The only reason I moved ahead with this is because it had > implementer support. From which implementors?
I was pinged by a volunteer developer willing to help this bug. Henri, or anyone else watching this bug, any concerns to make this work progress? Here's the current status updates in case you missed any: * We've got good coverage of test cases submitted here[1] and here[2]. * Though these cases are not merged to html5lib-tests yet, Robin in HTML WG said he can merge them once Mozilla is happy to this change and the developer is assigned. * WebKit has landed this change 2 months ago[3]. * I talked with Jet in the last TPAC, and then 2 months ago with fantasai. He looked fine for this change. * I asked dbaron to pay attention to this bug a month ago at CSS WG F2F since a developer might be on board in June. Yuki, the developer, is currently learning your WIP patch, and is working on to integrate test cases. Hopefully he'll speak up in this bug in a week or so, but if you have any concerns on this work, could you please let us know? I'd be more than happy to discuss further. [1] https://github.com/html5lib/html5lib-tests/pull/27/files [2] https://github.com/darobin/html5lib-tests/pull/1/files [3] http://trac.webkit.org/changeset/167437
(In reply to Koji Ishii from comment #9) > I was pinged by a volunteer developer willing to help this bug. Henri, or > anyone else watching this bug, any concerns to make this work progress? My main concern is that I think it's very unhealthy for the WHATWG and HTML WG parsing algorithms to diverge. It also worries me that the WebKit change was reviewed by someone who I don't recognize as having been active in parsing algorithm spec matters. I'd like there to be a plan for making sure that the WHATWG and HTML WG parsing algorithms and the implementations converge. Is there such a plan? Note that I'm not saying that Hixie always right or anything like that, but if he hasn't changed his mind yet, I'd like to know what the plan for getting the WHATWG spec changed is if there is a seriously good reason to believe that doing the HTML WG version is the Right Thing in this case. What's the reason to believe that the HTML WG has made the right call here?
Assignee: hsivonen → nobody
Status: ASSIGNED → NEW
Good point, I agree with you, I do not want to make a diverge either. Here's what I think. Several folks in i18n world have been working on investigating use cases and which markup supports which use cases. This work resulted in a wiki page[1] and then a W3C Note[2]. The previous investigations and discussions missed two important ruby stakeholders; publishers and educations. I've reached to publishers a few years ago during my work on EPUB3, and then to people in educational market only one year or less ago. Educational materials in Japan use ruby heavily, and it became pretty obvious that the previous spec was not good enough for them. With such investigation, additional information, and the wiki/note, I think we can respond to any objections except one point that implementers are not interested in the presented use cases. IIUC, that was one of the biggest points for a long time. So the plan is that, once it's proved that at least two implementers are good with the change, I'd like to work with W3C (Robin and Richard specifically) to re-start the conversation between W3C and WHATWG on this matter, and resolve the diverge. I won't leave this issue just by landing on Gecko, rather, I'm willing to follow up until WHATWG and all other implementations match. Does this plan sound good to you? I preferred working on code first because the time Yuki can work on the patch is limited that we may loose him if we'd wait for the conversation. That said, if we start the conversation first, I cannot commit implementations, but I know WHATWG requires commitments of implementations to reach consensus, so it's a bit chicken-and-egg problem for me. Working on code first then to discuss will help me to resolve this chicken-and-egg problem. The reviewer of the WebKit change is the engineering manager of Safari and WebKit. In the last TPAC discussion, Ryosuke Niwa was invited from WebKit team and he was good with the change. I've been chatting with him on and off since then on this topic. If I understand his words correctly, his team is all on board with adding rb back. Our plan was Ryosuke to find appropriate reviewer once the patch was submitted, but Darin responded quicker while Ryosuke was off to Asia. I hope this clarifies your concern at least a bit. [1] http://www.w3.org/International/wiki/Rb [2] http://www.w3.org/TR/ruby-use-cases/
(In reply to Koji Ishii from comment #11) > Good point, I agree with you, I do not want to make a diverge either. Here's > what I think. > > Several folks in i18n world have been working on investigating use cases and > which markup supports which use cases. This work resulted in a wiki page[1] > and then a W3C Note[2]. The previous investigations and discussions missed > two important ruby stakeholders; publishers and educations. I've reached to > publishers a few years ago during my work on EPUB3, and then to people in > educational market only one year or less ago. Educational materials in Japan > use ruby heavily, and it became pretty obvious that the previous spec was > not good enough for them. > > With such investigation, additional information, and the wiki/note, I think > we can respond to any objections except one point that implementers are not > interested in the presented use cases. IIUC, that was one of the biggest > points for a long time. > > So the plan is that, once it's proved that at least two implementers are > good with the change, I'd like to work with W3C (Robin and Richard > specifically) to re-start the conversation between W3C and WHATWG on this > matter, and resolve the diverge. I won't leave this issue just by landing on > Gecko, rather, I'm willing to follow up until WHATWG and all other > implementations match. Does this plan sound good to you? Sounds good. > I preferred working on code first because the time Yuki can work on the > patch is limited that we may loose him if we'd wait for the conversation. OK. Let's not block him. > The reviewer of the WebKit change is the engineering manager of Safari and > WebKit. In the last TPAC discussion, Ryosuke Niwa was invited from WebKit > team and he was good with the change. I've been chatting with him on and off > since then on this topic. If I understand his words correctly, his team is > all on board with adding rb back. Our plan was Ryosuke to find appropriate > reviewer once the patch was submitted, but Darin responded quicker while > Ryosuke was off to Asia. I hope this clarifies your concern at least a bit. This addresses my WebKit review concern. Thanks.
Thank you for clarifying the situation. I'll upload patches soon.
Attached patch Part1: Parse new ruby model. (obsolete) — Splinter Review
Implemented new ruby model in the following HTML5 parser algorithm to htmlparser. http://www.w3.org/html/wg/drafts/html/CR/syntax.html#parsing-main-inbody > A start tag whose tag name is one of: "rb", "rp", "rtc" > If the stack of open elements has a ruby element in scope, then generate implied end tags. If the current node is not then a ruby element, this is a parse error. > > Insert an HTML element for the token. > A start tag whose tag name is "rt" > If the stack of open elements has a ruby element in scope, then generate implied end tags, except for rtc elements. If the current node is not then a ruby element or an rtc element, this is a parse error. > > Insert an HTML element for the token.
Attachment #8441172 - Flags: review?(hsivonen)
Translated code of Part1.
Attachment #8441173 - Flags: review?(hsivonen)
Current Mozilla considered ruby related elements as HTMLUnknownElement, but they should be HTMLElement. http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-ruby-element > 4.5.21 The ruby element > Uses HTMLElement. http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-rb-element > 4.5.22 The rb element > Uses HTMLElement. http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-rt-element > 4.5.23 The rt element > Uses HTMLElement. http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-rtc-element > 4.5.24 The rtc element > Uses HTMLElement. http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-rp-element > 4.5.25 The rp element > Uses HTMLElement.
Attachment #8441179 - Flags: review?(hsivonen)
Part3 patch will fix expected failure about ruby editing.
Attachment #8441180 - Flags: review?(hsivonen)
Since I don't know the rule of updating parser/htmlparser/tests/mochitest/html5lib_tree_construction/, this patch may violate the rule. My main concern is that the original pull request is not merged to master.
Attachment #8441182 - Flags: review?(hsivonen)
Same as part5, my main concern is that the original pull request is not merged to master.
Attachment #8441183 - Flags: review?(hsivonen)
Hi Henri, Could you review this patch? I used "suggested reviewers" feature in bugzilla, and it suggests you. I confused that the suggestion was different from module owners. Please feel free to tell me if I send to wrong reviewer.
(In reply to Yuki Sekiguchi from comment #19) > Created attachment 8441183 [details] [diff] [review] > Part6: Update html5lib-tests to > https://github.com/html5lib/html5lib-tests/pull/27. > > Same as part5, my main concern is that the original pull request is not > merged to master. It's sort of a chicken-and-egg problem here. That pull request won't be merged unless there's implementer interest. But it hasn't met with technical objections beyond a divergence discussion very similar to the one here, with the same conclusion (clearly, it's a concern we all have and wish to address).
Comment on attachment 8441172 [details] [diff] [review] Part1: Parse new ruby model. Unfortunately, I won't have the time to review this properly before I head out of office and will be back the week after next. wchen, can you take a look?
Attachment #8441172 - Flags: review?(hsivonen) → review?(wchen)
Attachment #8441173 - Flags: review?(hsivonen) → review?(wchen)
Attachment #8441179 - Flags: review?(hsivonen) → review?(wchen)
Attachment #8441180 - Flags: review?(hsivonen) → review?(wchen)
Attachment #8441182 - Flags: review?(hsivonen) → review?(wchen)
Attachment #8441183 - Flags: review?(hsivonen) → review?(wchen)
Comment on attachment 8441172 [details] [diff] [review] Part1: Parse new ruby model. Review of attachment 8441172 [details] [diff] [review]: ----------------------------------------------------------------- You also need to handle rt, rb, rtc and rp in the 'An end tag whose tag name is "html"' case in the "in body" insertion mode. ::: src/nu/validator/htmlparser/impl/TreeBuilder.java @@ +2433,5 @@ > + generateImpliedEndTagsExceptFor("rtc"); > + } > + if (eltPos != currentPtr) { > + int rtcPos = findLastInScope("rtc"); > + if (rtcPos != currentPtr) { !isCurrent("rtc")
Attachment #8441172 - Flags: review?(wchen) → review-
Comment on attachment 8441180 [details] [diff] [review] Part4: Remove expected failure about ruby editing. yay
Attachment #8441180 - Flags: review?(wchen) → review+
Attachment #8441182 - Flags: review?(wchen) → review+
Attachment #8441183 - Flags: review?(wchen) → review+
Comment on attachment 8441173 [details] [diff] [review] Part2: Translate new ruby model parser. Needs new translation after addressing review comments.
Attachment #8441173 - Flags: review?(wchen)
Comment on attachment 8441179 [details] [diff] [review] Part3: Make ruby related element HTMLElement. Review of attachment 8441179 [details] [diff] [review]: ----------------------------------------------------------------- ::: parser/htmlparser/public/nsHTMLTagList.h @@ +132,5 @@ > +HTML_HTMLELEMENT_TAG(rb) > +HTML_HTMLELEMENT_TAG(rp) > +HTML_HTMLELEMENT_TAG(rt) > +HTML_HTMLELEMENT_TAG(rtc) > +HTML_HTMLELEMENT_TAG(ruby) Let's move all these entries below HTML_TAG(q, Shared) to keep the order consistent.
Attachment #8441179 - Flags: review?(wchen) → review+
Attachment #8441172 - Attachment is obsolete: true
Attachment #8443354 - Flags: review?(wchen)
Attachment #8443354 - Attachment description: Part1-1: Support html end tag. Use isCurrent(). → Part1-2: Support html end tag. Use isCurrent().
Attachment #8441173 - Attachment is obsolete: true
Attachment #8443355 - Flags: review?(wchen)
Hi William, Thank you for reviewing. I fixed your comments. If all patches are r+, Could you commit these patches?
Attachment #8443355 - Flags: review?(wchen) → review+
Attachment #8443354 - Flags: review?(wchen) → review+
Thank you all!
I think this is a technically terrible decision, as discussed in the master bug, bug 33339. I'm sorry I wasn't aware that you were actually going ahead with this so that I could point this out, I thought my comments in bug 33339 were sufficient to point out why this was a mistake.
Where is the intent to implement for this change? We should not land things without intent to implement. (Also, it seems if anything layout has to improve first here.)
I was under the impression that the only issue here was parser algorithm divergence (comment 21) and that we were OK with the new changes as long as we follow up to solve the divergence (comment 12). The patches in this bug are relatively small, isolated and located in code that doesn't change very often. If we reach the consensus that we don't want the additional elements, it should be easy to revert the changes. (In reply to Anne (:annevk) from comment #36) > Where is the intent to implement for this change? I didn't find one, and I wasn't aware that it was necessary here. It looked like implementation started years ago on this bug. Also, we have someone currently looking at the layout side of ruby.
The parser divergence is just a symptom of the overall ruby design. See discussion in bug 33339 (it has pointers to spec bugs where this is discussed further). The tl;dr is that as far as I can tell, the additional complexity of the proposed changes is not warranted by the relatively minor gain. There's lots of features we haven't added to HTML because the cost:benefit ratio doesn't justify it; IMHO this particular feature is just getting less push-back from the average Jane because it's not an area that most English-speaking people know or care about, so they ignore it.
Hixie, thank you for your comment, but I don't think this is the right place to discuss how much a ruby feature is important. I'm in sync with Richard and Robin to find the right way to restart the conversation, and it's my goal to resolve the temporary divergence. So please stay tuned, and I look forward to talking to you in the right place.
Please backout and send an intent to implement. Landing new features without one is not acceptable.
Flags: needinfo?(yuki.sekiguchi)
(In reply to :Ms2ger from comment #40) > Please backout and send an intent to implement. Landing new features without > one is not acceptable. I don't think this requires backing out, but an intent to implement should be sent.
Thank you David, we'll be working on it.
(In reply to :Ms2ger from comment #40) > Please backout and send an intent to implement. Landing new features without > one is not acceptable. According to David's comment, I won't request backing out. Koji sent an intent to implement: https://groups.google.com/forum/#!topic/mozilla.dev.platform/VqSAwVuskm8
Flags: needinfo?(yuki.sekiguchi)
Depends on: 1042885
So where has the conversation been reopened? There was no response to my comments on dev.platform. Why are we going down this path?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Making the bug status not reflect the actual state of the tree doesn't help anything.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
We need to check what if any developer documentation changes need to be made here.
Keywords: dev-doc-needed
Depends on: 1135306
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: