Implement spec changes to ruby parsing

RESOLVED FIXED in mozilla33

Status

()

defect
RESOLVED FIXED
8 years ago
4 years ago

People

(Reporter: hsivonen, Assigned: yuki.sekiguchi)

Tracking

({dev-doc-needed})

Trunk
mozilla33
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox33-)

Details

Attachments

(9 attachments, 4 obsolete attachments)

69.11 KB, patch
Details | Diff | Splinter Review
15.73 KB, patch
Details | Diff | Splinter Review
7.30 KB, patch
wchen
: review+
Details | Diff | Splinter Review
2.94 KB, patch
wchen
: review+
Details | Diff | Splinter Review
6.95 KB, patch
wchen
: review+
Details | Diff | Splinter Review
1.62 KB, patch
wchen
: review+
Details | Diff | Splinter Review
14.86 KB, patch
wchen
: review+
Details | Diff | Splinter Review
69.64 KB, patch
wchen
: review+
Details | Diff | Splinter Review
7.27 KB, patch
Details | Diff | Splinter Review
Reporter

Description

8 years ago
The parsing spec got changed in response to http://www.w3.org/Bugs/Public/show_bug.cgi?id=12935 and will probably change a bit more. Implement the changes.
Reporter

Comment 1

8 years ago
Posted patch Java WIP (obsolete) — Splinter Review
Reporter

Comment 2

8 years ago
Posted patch C++ WIP (obsolete) — Splinter Review
Reporter

Comment 3

8 years ago
Posted patch C++ WIP, v2Splinter Review
Attachment #539181 - Attachment is obsolete: true
Reporter

Comment 4

8 years ago
Posted patch Java WIP, v2Splinter Review
The new patches implement the proposal from
www.w3.org/Bugs/Public/show_bug.cgi?id=13113

(Test cases not included before seeing how exactly the spec settles.)
Attachment #539180 - Attachment is obsolete: true
(In reply to Henri Sivonen (:hsivonen) from comment #4)
> The new patches implement the proposal from
> www.w3.org/Bugs/Public/show_bug.cgi?id=13113
> 
> (Test cases not included before seeing how exactly the spec settles.)

The new ruby markup model is now part of the W3C HTML specification:
http://lists.w3.org/Archives/Public/public-html/2013Dec/0010.html
Reporter

Comment 6

6 years ago
I don't see https://github.com/w3c/html/commit/e2ddb663fd04803d2be7f16026e2117ced167c01 changing the parsing. And even if it did, I think we should not implement W3C-only parsing changes. The parsing algorithm is too important to play along with WHATWG and W3C diverging on parsing.

Comment 7

5 years ago
It does change the parsing, see https://github.com/w3c/html/commit/e2ddb663fd04803d2be7f16026e2117ced167c01#diff-36cd38f49b9afa08222c0dc9ebfe35ebL98664 and below. A perhaps more readable view of the (small) changes is in: https://github.com/html5lib/html5lib-python/pull/126/files

I wouldn't call this a "W3C-only parsing change", it certainly isn't what it intends to be. The only reason I moved ahead with this is because it had implementer support. (It also matches what every single user I have spoken to wants, and reflects existing content I could find.)
Reporter

Comment 8

5 years ago
(In reply to Robin Berjon from comment #7)
> The only reason I moved ahead with this is because it had
> implementer support. 

From which implementors?

Comment 9

5 years ago
I was pinged by a volunteer developer willing to help this bug. Henri, or anyone else watching this bug, any concerns to make this work progress?

Here's the current status updates in case you missed any:
* We've got good coverage of test cases submitted here[1] and here[2].
* Though these cases are not merged to html5lib-tests yet, Robin in HTML WG said he can merge them once Mozilla is happy to this change and the developer is assigned.
* WebKit has landed this change 2 months ago[3].
* I talked with Jet in the last TPAC, and then 2 months ago with fantasai. He looked fine for this change.
* I asked dbaron to pay attention to this bug a month ago at CSS WG F2F since a developer might be on board in June.

Yuki, the developer, is currently learning your WIP patch, and is working on to integrate test cases. Hopefully he'll speak up in this bug in a week or so, but if you have any concerns on this work, could you please let us know? I'd be more than happy to discuss further.

[1] https://github.com/html5lib/html5lib-tests/pull/27/files
[2] https://github.com/darobin/html5lib-tests/pull/1/files
[3] http://trac.webkit.org/changeset/167437
Reporter

Comment 10

5 years ago
(In reply to Koji Ishii from comment #9)
> I was pinged by a volunteer developer willing to help this bug. Henri, or
> anyone else watching this bug, any concerns to make this work progress?

My main concern is that I think it's very unhealthy for the WHATWG and HTML WG parsing algorithms to diverge. It also worries me that the WebKit change was reviewed by someone who I don't recognize as having been active in parsing algorithm spec matters.

I'd like there to be a plan for making sure that the WHATWG and HTML WG parsing algorithms and the implementations converge.

Is there such a plan?

Note that I'm not saying that Hixie always right or anything like that, but if he hasn't changed his mind yet, I'd like to know what the plan for getting the WHATWG spec changed is if there is a seriously good reason to believe that doing the HTML WG version is the Right Thing in this case. What's the reason to believe that the HTML WG has made the right call here?
Reporter

Updated

5 years ago
Assignee: hsivonen → nobody
Reporter

Updated

5 years ago
Status: ASSIGNED → NEW

Comment 11

5 years ago
Good point, I agree with you, I do not want to make a diverge either. Here's what I think.

Several folks in i18n world have been working on investigating use cases and which markup supports which use cases. This work resulted in a wiki page[1] and then a W3C Note[2]. The previous investigations and discussions missed two important ruby stakeholders; publishers and educations. I've reached to publishers a few years ago during my work on EPUB3, and then to people in educational market only one year or less ago. Educational materials in Japan use ruby heavily, and it became pretty obvious that the previous spec was not good enough for them. 

With such investigation, additional information, and the wiki/note, I think we can respond to any objections except one point that implementers are not interested in the presented use cases. IIUC, that was one of the biggest points for a long time.

So the plan is that, once it's proved that at least two implementers are good with the change, I'd like to work with W3C (Robin and Richard specifically) to re-start the conversation between W3C and WHATWG on this matter, and resolve the diverge. I won't leave this issue just by landing on Gecko, rather, I'm willing to follow up until WHATWG and all other implementations match. Does this plan sound good to you?

I preferred working on code first because the time Yuki can work on the patch is limited that we may loose him if we'd wait for the conversation. That said, if we start the conversation first, I cannot commit implementations, but I know WHATWG requires commitments of implementations to reach consensus, so it's a bit chicken-and-egg problem for me. Working on code first then to discuss will help me to resolve this chicken-and-egg problem.

The reviewer of the WebKit change is the engineering manager of Safari and WebKit. In the last TPAC discussion, Ryosuke Niwa was invited from WebKit team and he was good with the change. I've been chatting with him on and off since then on this topic. If I understand his words correctly, his team is all on board with adding rb back. Our plan was Ryosuke to find appropriate reviewer once the patch was submitted, but Darin responded quicker while Ryosuke was off to Asia. I hope this clarifies your concern at least a bit.

[1] http://www.w3.org/International/wiki/Rb
[2] http://www.w3.org/TR/ruby-use-cases/
Reporter

Comment 12

5 years ago
(In reply to Koji Ishii from comment #11)
> Good point, I agree with you, I do not want to make a diverge either. Here's
> what I think.
> 
> Several folks in i18n world have been working on investigating use cases and
> which markup supports which use cases. This work resulted in a wiki page[1]
> and then a W3C Note[2]. The previous investigations and discussions missed
> two important ruby stakeholders; publishers and educations. I've reached to
> publishers a few years ago during my work on EPUB3, and then to people in
> educational market only one year or less ago. Educational materials in Japan
> use ruby heavily, and it became pretty obvious that the previous spec was
> not good enough for them. 
> 
> With such investigation, additional information, and the wiki/note, I think
> we can respond to any objections except one point that implementers are not
> interested in the presented use cases. IIUC, that was one of the biggest
> points for a long time.
> 
> So the plan is that, once it's proved that at least two implementers are
> good with the change, I'd like to work with W3C (Robin and Richard
> specifically) to re-start the conversation between W3C and WHATWG on this
> matter, and resolve the diverge. I won't leave this issue just by landing on
> Gecko, rather, I'm willing to follow up until WHATWG and all other
> implementations match. Does this plan sound good to you?

Sounds good.

> I preferred working on code first because the time Yuki can work on the
> patch is limited that we may loose him if we'd wait for the conversation.

OK. Let's not block him.

> The reviewer of the WebKit change is the engineering manager of Safari and
> WebKit. In the last TPAC discussion, Ryosuke Niwa was invited from WebKit
> team and he was good with the change. I've been chatting with him on and off
> since then on this topic. If I understand his words correctly, his team is
> all on board with adding rb back. Our plan was Ryosuke to find appropriate
> reviewer once the patch was submitted, but Darin responded quicker while
> Ryosuke was off to Asia. I hope this clarifies your concern at least a bit.

This addresses my WebKit review concern. Thanks.
Assignee

Comment 13

5 years ago
Thank you for clarifying the situation.
I'll upload patches soon.
Assignee

Comment 14

5 years ago
Posted patch Part1: Parse new ruby model. (obsolete) — Splinter Review
Implemented new ruby model in the following HTML5 parser algorithm to htmlparser. 
http://www.w3.org/html/wg/drafts/html/CR/syntax.html#parsing-main-inbody
> A start tag whose tag name is one of: "rb", "rp", "rtc"
> If the stack of open elements has a ruby element in scope, then generate implied end tags. If the current node is not then a ruby element, this is a parse error.
> 
> Insert an HTML element for the token.

> A start tag whose tag name is "rt"
> If the stack of open elements has a ruby element in scope, then generate implied end tags, except for rtc elements. If the current node is not then a ruby element or an rtc element, this is a parse error.
> 
> Insert an HTML element for the token.
Attachment #8441172 - Flags: review?(hsivonen)
Assignee

Comment 15

5 years ago
Translated code of Part1.
Attachment #8441173 - Flags: review?(hsivonen)
Assignee

Comment 16

5 years ago
Current Mozilla considered ruby related elements as HTMLUnknownElement, but they should be HTMLElement.

http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-ruby-element
> 4.5.21 The ruby element
> Uses HTMLElement.

http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-rb-element
> 4.5.22 The rb element
> Uses HTMLElement.

http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-rt-element
> 4.5.23 The rt element
> Uses HTMLElement.

http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-rtc-element
> 4.5.24 The rtc element
> Uses HTMLElement.

http://www.w3.org/html/wg/drafts/html/CR/text-level-semantics.html#the-rp-element
> 4.5.25 The rp element
> Uses HTMLElement.
Attachment #8441179 - Flags: review?(hsivonen)
Assignee

Comment 17

5 years ago
Part3 patch will fix expected failure about ruby editing.
Attachment #8441180 - Flags: review?(hsivonen)
Assignee

Comment 18

5 years ago
Since I don't know the rule of updating parser/htmlparser/tests/mochitest/html5lib_tree_construction/, this patch may violate the rule.
My main concern is that the original pull request is not merged to master.
Attachment #8441182 - Flags: review?(hsivonen)
Assignee

Comment 19

5 years ago
Same as part5, my main concern is that the original pull request is not merged to master.
Attachment #8441183 - Flags: review?(hsivonen)
Assignee

Comment 20

5 years ago
Hi Henri,

Could you review this patch?
I used "suggested reviewers" feature in bugzilla, and it suggests you. I confused that the suggestion was different from module owners. Please feel free to tell me if I send to wrong reviewer.

Comment 21

5 years ago
(In reply to Yuki Sekiguchi from comment #19)
> Created attachment 8441183 [details] [diff] [review]
> Part6: Update html5lib-tests to
> https://github.com/html5lib/html5lib-tests/pull/27.
> 
> Same as part5, my main concern is that the original pull request is not
> merged to master.

It's sort of a chicken-and-egg problem here. That pull request won't be merged unless there's implementer interest. But it hasn't met with technical objections beyond a divergence discussion very similar to the one here, with the same conclusion (clearly, it's a concern we all have and wish to address).
Reporter

Comment 22

5 years ago
Comment on attachment 8441172 [details] [diff] [review]
Part1: Parse new ruby model.

Unfortunately, I won't have the time to review this properly before I head out of office and will be back the week after next.

wchen, can you take a look?
Attachment #8441172 - Flags: review?(hsivonen) → review?(wchen)
Reporter

Updated

5 years ago
Attachment #8441173 - Flags: review?(hsivonen) → review?(wchen)
Reporter

Updated

5 years ago
Attachment #8441179 - Flags: review?(hsivonen) → review?(wchen)
Reporter

Updated

5 years ago
Attachment #8441180 - Flags: review?(hsivonen) → review?(wchen)
Reporter

Updated

5 years ago
Attachment #8441182 - Flags: review?(hsivonen) → review?(wchen)
Reporter

Updated

5 years ago
Attachment #8441183 - Flags: review?(hsivonen) → review?(wchen)
Comment on attachment 8441172 [details] [diff] [review]
Part1: Parse new ruby model.

Review of attachment 8441172 [details] [diff] [review]:
-----------------------------------------------------------------

You also need to handle rt, rb, rtc and rp in the 'An end tag whose tag name is "html"' case in the "in body" insertion mode.

::: src/nu/validator/htmlparser/impl/TreeBuilder.java
@@ +2433,5 @@
> +                                    generateImpliedEndTagsExceptFor("rtc");
> +                                }
> +                                if (eltPos != currentPtr) {
> +                                    int rtcPos = findLastInScope("rtc");
> +                                    if (rtcPos != currentPtr) {

!isCurrent("rtc")
Attachment #8441172 - Flags: review?(wchen) → review-
Comment on attachment 8441180 [details] [diff] [review]
Part4: Remove expected failure about ruby editing.

yay
Attachment #8441180 - Flags: review?(wchen) → review+
Attachment #8441182 - Flags: review?(wchen) → review+
Attachment #8441183 - Flags: review?(wchen) → review+
Comment on attachment 8441173 [details] [diff] [review]
Part2: Translate new ruby model parser.

Needs new translation after addressing review comments.
Attachment #8441173 - Flags: review?(wchen)
Comment on attachment 8441179 [details] [diff] [review]
Part3: Make ruby related element HTMLElement.

Review of attachment 8441179 [details] [diff] [review]:
-----------------------------------------------------------------

::: parser/htmlparser/public/nsHTMLTagList.h
@@ +132,5 @@
> +HTML_HTMLELEMENT_TAG(rb)
> +HTML_HTMLELEMENT_TAG(rp)
> +HTML_HTMLELEMENT_TAG(rt)
> +HTML_HTMLELEMENT_TAG(rtc)
> +HTML_HTMLELEMENT_TAG(ruby)

Let's move all these entries below HTML_TAG(q, Shared) to keep the order consistent.
Attachment #8441179 - Flags: review?(wchen) → review+
Assignee

Comment 27

5 years ago
Attachment #8441172 - Attachment is obsolete: true
Attachment #8443354 - Flags: review?(wchen)
Assignee

Updated

5 years ago
Attachment #8443354 - Attachment description: Part1-1: Support html end tag. Use isCurrent(). → Part1-2: Support html end tag. Use isCurrent().
Assignee

Comment 28

5 years ago
Attachment #8441173 - Attachment is obsolete: true
Attachment #8443355 - Flags: review?(wchen)
Assignee

Comment 30

5 years ago
Hi William,

Thank you for reviewing.

I fixed your comments. If all patches are r+, Could you commit these patches?
Attachment #8443355 - Flags: review?(wchen) → review+
Attachment #8443354 - Flags: review?(wchen) → review+
Assignee

Comment 34

5 years ago
Thank you all!
I think this is a technically terrible decision, as discussed in the master bug, bug 33339. I'm sorry I wasn't aware that you were actually going ahead with this so that I could point this out, I thought my comments in bug 33339 were sufficient to point out why this was a mistake.

Comment 36

5 years ago
Where is the intent to implement for this change? We should not land things without intent to implement.

(Also, it seems if anything layout has to improve first here.)
I was under the impression that the only issue here was parser algorithm divergence (comment 21) and that we were OK with the new changes as long as we follow up to solve the divergence (comment 12).

The patches in this bug are relatively small, isolated and located in code that doesn't change very often. If we reach the consensus that we don't want the additional elements, it should be easy to revert the changes.

(In reply to Anne (:annevk) from comment #36)
> Where is the intent to implement for this change?
I didn't find one, and I wasn't aware that it was necessary here. It looked like implementation started years ago on this bug. Also, we have someone currently looking at the layout side of ruby.
The parser divergence is just a symptom of the overall ruby design. See discussion in bug 33339 (it has pointers to spec bugs where this is discussed further). The tl;dr is that as far as I can tell, the additional complexity of the proposed changes is not warranted by the relatively minor gain. There's lots of features we haven't added to HTML because the cost:benefit ratio doesn't justify it; IMHO this particular feature is just getting less push-back from the average Jane because it's not an area that most English-speaking people know or care about, so they ignore it.

Comment 39

5 years ago
Hixie, thank you for your comment, but I don't think this is the right place to discuss how much a ruby feature is important. I'm in sync with Richard and Robin to find the right way to restart the conversation, and it's my goal to resolve the temporary divergence. So please stay tuned, and I look forward to talking to you in the right place.
Please backout and send an intent to implement. Landing new features without one is not acceptable.
Flags: needinfo?(yuki.sekiguchi)
(In reply to :Ms2ger from comment #40)
> Please backout and send an intent to implement. Landing new features without
> one is not acceptable.

I don't think this requires backing out, but an intent to implement should be sent.

Comment 42

5 years ago
Thank you David, we'll be working on it.
Assignee

Comment 43

5 years ago
(In reply to :Ms2ger from comment #40)
> Please backout and send an intent to implement. Landing new features without
> one is not acceptable.

According to David's comment, I won't request backing out.
Koji sent an intent to implement:
https://groups.google.com/forum/#!topic/mozilla.dev.platform/VqSAwVuskm8
Flags: needinfo?(yuki.sekiguchi)
So where has the conversation been reopened? There was no response to my comments on dev.platform.

Why are we going down this path?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Making the bug status not reflect the actual state of the tree doesn't help anything.
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago4 years ago
Resolution: --- → FIXED
We need to check what if any developer documentation changes need to be made here.
Keywords: dev-doc-needed
Depends on: 1135306
You need to log in before you can comment on or make changes to this bug.