Closed Bug 1132677 Opened 10 years ago Closed 10 years ago

[Compat Data][Importer] Parsing of inconsistent specification table cells (e.g. ES1/ES3)

Categories

(developer.mozilla.org Graveyard :: General, defect)

All
Other
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fs, Assigned: jwhitlock)

References

Details

(Whiteboard: [specification][type:bug])

What did you do? ================ See https://browsercompat.herokuapp.com/importer/1029 What happened? ============== Seems that currently only {{SpecName}} and {{Spec2}} are successfully parsed and imported. However, the JavaScript docs (and likely other pages) are a bit inconsistent at the moment. Historical specs like ES1 and ES3 appear like this: <td>ECMAScript 1st Edition.</td> <td>Standard</td> <td>Comment</td> or <td>ECMAScript 3rd Edition.</td> <td>Standard</td> <td>Comment</td> What should have happened? ========================== The parser should parse these as ES1 and ES3. https://github.com/jwhitlock/web-platform-compat/blob/1079920_import_process/mdn/scrape.py#L73 Is there anything else we should know? ====================================== This will fix hundreds of errors that say: "Section <h2>Specifications</h2> was not parsed, because rule "kuma" failed to match." (I expect more of these after bug 1132658) It is sad that it is inconsistent right now. Really happy to have this is as a win afterwards: Once we successfully parsed this, we can display it in a consistent way on MDN using the new compat data. :-)
Blocks: 1132269
Here's the API data fields for a Specification: http://web-platform-compat.readthedocs.org/en/latest/draft/resources.html#specifications I'm guessing something like: slug: ES1 mdn_key: blank name: {"en": "ECMAScript 1st Edition"} uri: {"en": "http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%201st%20edition,%20June%201997.pdf"} maturity: "Standard" (https://browsercompat.herokuapp.com/browse/maturities/7) For other Specifications and Maturities, the SpecName and Spec2 source is parsed to get names, translations, URIs, and maturities. Even if you don't convert all the pages, it would be helpful to get the desired data into SpecName and Spec2, to minimize the work to import into the API. If you can get ECMA into SpecName and Spec2, then I can adjust the scraper to do something similar to matching browser names. I'm guessing 2 - 4 hours effort.
I've opened bug 1168102 for updating SpecName and Spec2. I don't think this is a blocking issue. The parser will treat "ECMAScript 1st Edition" as "{{SpecName('ES1')}}", call it out as an issue, and continue parsing.
I've updated the parser. The code is in PR #32 [1], and it may take a few weeks for that to be accepted and close the ticket. The parse of Web/JavaScript/Reference/Operators/instanceof [2] now succeeds with two warnings for the missing SpecName and Spec2 templates. Others that aren't "ECMAScript 1st Edition." or 3rd Edition will display as an error, but parsing will continue. [1] https://github.com/mozilla/web-platform-compat/pull/32 [2] https://browsercompat.herokuapp.com/importer/1029
Assignee: nobody → jwhitlock
Status: NEW → ASSIGNED
Commits pushed to master at https://github.com/mozilla/web-platform-compat https://github.com/mozilla/web-platform-compat/commit/4af641960bbcb8a3cb31e283dcb51514d2f3ea57 bug 1132677 - Handle by-name ES1/ES3 specs When 'ECMAScript 1st/3rd Edition.' appears as the Specification name, convert to ES1/ES3. Other text is an issue, not a parse error. https://github.com/mozilla/web-platform-compat/commit/dcde80b5cae4ccc60dae33d9efd007e0321f5993 bug 1132677 - Remove issue from specrow tests Add sample specifications so that specrow tests aren't complicated with 'unknown_spec' issues. https://github.com/mozilla/web-platform-compat/commit/c4ee25cd99fdd58617a69ad42197c4eafd6b25a4 bug 1132677 - Narrow context for SpecName issues Instead of the context being the whole first <td> element, just highlight the errored KumaScript. https://github.com/mozilla/web-platform-compat/commit/ef6443ff53aff0b352f63d661d357e122f62b648 fix bug 1132677 - Handle text in spec status cell Text (instead of {{Spec2(key)}}) in a specification status will result in an warning-level issue, instead of halting scraping. The status text is not parsed, but instead the SpecName key is used. https://github.com/mozilla/web-platform-compat/commit/d3fce8051fed602b2eb9eecfafab5d6aa590fc5f Merge pull request #32 from jwhitlock/1132677_es1_es3 Fix bug 1132677 - Parse inconsistent specification table cells
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.