1.94 KB, text/html
8.71 KB, text/html
5.73 KB, text/html
885.08 KB, patch
Robert Sayre: review+
|Details | Diff | Splinter Review|
No idea on a target for this, but we have many failing tests from the html5lib suite.
mrbkap had looked into this last summer. /be
(In reply to comment #1) > > mrbkap had looked into this last summer. I also pinged him on bug 328930, but after I signed off irc, I realized I wasn't clear if he wanted to do that himself, or just wanted it done. This will make a fine tracking bug if working on the parser becomes a priority.
I certainly want to work on this. The problem is that I simply don't have time to do so between school and my other bugs (the new residual style stuff ended up as P3 for Firefox 3).
Mozilla Corporation has contracted me to translate the Validator.nu HTML parser into Gecko-compatible C++.
Assigning to me and adjusting summary to plan.
Instead of making this a tracking bug, bugs pertaining to the HTML5 parser repo have "[HTML5] " at the start of their summary.
Henri, if you don't mind, I'm going to make HTML parser bugs that will be resolved (either as FIXED or WONTFIX) as dependencies of this one so they don't get lost.
(In reply to comment #7) > Henri, if you don't mind, I'm going to make HTML parser bugs that will be > resolved (either as FIXED or WONTFIX) as dependencies of this one so they don't > get lost. OK. Makes sense.
Created attachment 389048 [details] [diff] [review] mochitest test cases for html5lib tokenizer tests These are 2,722 html5 tokenizer tests from the html5lib suite, converted to mochitest using the parser harness from bug 366936. There are 18 failures, attached in separate result file. Please let me know if you'd like these submitted as bugs and/or added to the known failures list. I've included most of the tests from the tokenizer suite, but omitted the following: - xmlViolation.test - contentModelFlags.test - tests involving invalid unicode chars - tests involving multi-byte unicode chars The latter two classes were excluded because they cause exceptions in the parser sub-harness, in the call to btoa(). I've made a few changes to the parser sub-harness to enable it to work with these files, and have verified that the changes don't interfere with the original treebuilder tests.
(In reply to comment #9) > Created an attachment (id=389048) [details] > mochitest test cases for html5lib tokenizer tests > > These are 2,722 html5 tokenizer tests from the html5lib suite, converted to > mochitest using the parser harness from bug 366936. Great! Thank you! > There are 18 failures, > attached in separate result file. Please let me know if you'd like these > submitted as bugs and/or added to the known failures list. The tests that fail and whose description mentions [R]CDATA (or just CDATA) haven't been converted to tree builder tests correctly. The original tokenizer tests have additional JSON items for initializing the tokenizer to the RCDATA or the CDATA state and for setting up the element name that the tokenizer should look for to end the [R]CDATA run. To convert these tests (from escapeFlag.test, it seems) to tree builder tests correctly, you need to add an <xmp> start tag. This tests the CDATA case. To test the RCDATA case, you need to replace "xmp" with "textarea" everywhere in the tests. I don't see why "U+0080 in lookahead region" failed. Odd. The other failures are genuine bugs that definitely need addressing. I'm particularly surprised at the failures related to CR and and the failures related to bogus entities.
Created attachment 390501 [details] [diff] [review] mochitest test cases for html5lib tokenizer tests (v2) This is a much better implementation of the html5lib tokenizer tests. It correctly handles unicode characters and other escaped characters. I omitted only tests for 0xd800 and 0xdfff, as the server-side js that produces the test markup contains an algorithm for generating utf-8 that doesn't correctly handle these invalid characters. I've made some modifications to a few of the tests (e.g., the escapeFlag and contentModelFlags tests) to account for the fact that the tokenizer output is run through the tree builder, as Henri notes above. There are 3,375 tests in total with only 4 failures; see next attachment.
Created attachment 390503 [details] tokenizer test results for tests in comment 12 correct attachment
Created attachment 391186 [details] [diff] [review] mochitest test cases for html5lib parser tests (v3) I've added to this patch a refresh of the tree builder tests which already existed in the repo. I've refreshed the existing tests from html5lib, and added new tests which weren't being run. I've also updated the parser sub-harness to better handle MathML and SVG elements, and to know to skip #document-fragment tests. There doesn't seem to be any clean way of running fragment tests without direct access to the parser; if anyone can think of a way, let me know and I'll add them. Test results attached separately.
Created attachment 391187 [details] tree builder test results for tests in comment 15 Attached are the tree builder test results obtained from running the updated tree builder tests against the HTML5 parser. The results can be summarized as follows: - only 1 known failure still fails (an <isindex> case) - 7 tests fail from regressions.txt, but they all appear to be correct according to the HTML5 parsing algorithm. Henri, can you review these? If they are correct I will update them in regressions.txt - 1 failure involving <keygen>; should this element appear in the DOM? Maybe not, see bug 101019. - 8 failures in MathML and SVG tests involving xlink:href and xml:lang attributes. Henri, can you review and let me know if the tests need updating or if a bug should be filed?
Created attachment 391458 [details] [diff] [review] mochitest test cases for html5lib parser tests (v4) Here's another update of the html5lib tests. This update allows Java->C++ tests to be run; test_html5_tree_construction_js_compare.html runs all of the tree construction tests by comparing their output in Gecko vs the JS validator from Henri. The js_compare mode ignores comments and doctypes, since the JS version of the parser does not emit those. Results of this are attached separately; there are currently 13 failures. Most of the failures are due to the fact that the JS parser doesn't seem to be able to handle xml:lang or xlink:href correctly, but the first two failures are potentially interesting.
Comment on attachment 391458 [details] [diff] [review] mochitest test cases for html5lib parser tests (v4) I feel I'm too inexperienced with Mochitest coding patterns to review this properly. sayrer wrote the code originally, so I think he'd be a better reviewer than me. However, here are some observations: * The dumpTree() method looks a bit odd. Why the global state variable? Is the purpose to defer the action somehow? * Is reorderToMatchExpected() for the old parser only? I'd expect it to be unnecessary with the new parser. * In principtle, (walker.currentNode.namespaceURI.toLowerCase().indexOf("math") != -1) should be (walker.currentNode.namespaceURI == "http://www.w3.org/1998/Math/MathML") and likewise (walker.currentNode.namespaceURI == "http://www.w3.org/2000/svg") for SVG. Probably doesn't matter in practice now, but if code happened to somehow break in a way that makes it matter, it would be good to have the right namespace tests in the harness. * html5lib_license.txt should probably be updated to contain a fresh copy of the upstream copyright year and contributor list.
Comment on attachment 391458 [details] [diff] [review] mochitest test cases for html5lib parser tests (v4) we shouldn't have this stuff conditioned on a bunch of global variables. please refactor to avoid them. looks good otherwise. iirc, reorderToMatchExpected was required because the original test suite from html5lib depended on python's dictionary ordering details.
Created attachment 393791 [details] [diff] [review] mochitest test cases for html5lib parser tests (v5) Completely refactored code in parser*.js to avoid global variables. Also, I verified that reorderToMatchExpected() has no effect on current tests so I removed it.
How should we handle the current test failures? Shall I flag them as 'todo' so that they won't show as failures on tinderbox, and check them in, so that the tests can start running and catch regressions? Or would you prefer to leave them out of the repo for now?
I'd prefer checking in with todo flags.
I've updated the tests to mark current failures as 'todo' and pushed as http://hg.mozilla.org/mozilla-central/rev/fc7d931fd75b.
(In reply to comment #24) > I've updated the tests to mark current failures as 'todo' and pushed as > http://hg.mozilla.org/mozilla-central/rev/fc7d931fd75b. Backed out because of persistent leaks. E.g. http://tinderbox.mozilla.org/showlog.cgi?log=Firefox-Unittest/1250318073.1250321057.30450.gz
Is there a bug on file about turning this on by default? I can't find one and just wanted to see what the plan was for turning this on by default. A few users (including myself) agree that we all haven't seen any major issues with the HTML5 parser is a long time and perhaps this could be enabled by default to get wider testing.
(In reply to comment #26) > Is there a bug on file about turning this on by default? None other than this one, though one could argue that this bug as currently worded is about more than flipping the pref by default. > I can't find one and > just wanted to see what the plan was for turning this on by default. I have a shared bugzilla query named "HTML5 Parsing Blocker". I'm working on the assumption that the length of that list needs to go to zero before the pref can be flipped by default. In addition to items on that list, there needs to be a new "slow tp" test suite that tests tp4 over a simulated slow network. I don't know how that item is progressing. Then the parser core needs to pass code review (blocking on me to write more documentation). > A few > users (including myself) agree that we all haven't seen any major issues with > the HTML5 parser is a long time and perhaps this could be enabled by default to > get wider testing. I think the HTML5 parser on the trunk is in good enough shape to be turned on by trunk testers. The problem is that it isn't in good enough shape for Mochitest.
(In reply to comment #27) > (In reply to comment #26) > > Is there a bug on file about turning this on by default? > > None other than this one, though one could argue that this bug as currently > worded is about more than flipping the pref by default. > > I think the HTML5 parser on the trunk is in good enough shape to be turned on > by trunk testers. The problem is that it isn't in good enough shape for > Mochitest. So, I sonder if what we need here is for a bug to be opened on flipping the preference with all the mochitest issues being in bugs that block it. It seems without that, there is no real direction to developers on what needs ot be done to get this enabled by default.
Enabled by default now: http://hg.mozilla.org/mozilla-central/rev/129e19d979f0 I'm leaving this open as a tracking bug.
Had to revert this yesterday: http://hg.mozilla.org/mozilla-central/rev/ccb50d524490
Enabled by default again: http://hg.mozilla.org/mozilla-central/rev/358113b3642e