Closed
Bug 1132677
Opened 10 years ago
Closed 10 years ago
[Compat Data][Importer] Parsing of inconsistent specification table cells (e.g. ES1/ES3)
Categories
(developer.mozilla.org Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: fs, Assigned: jwhitlock)
References
Details
(Whiteboard: [specification][type:bug])
What did you do?
================
See https://browsercompat.herokuapp.com/importer/1029
What happened?
==============
Seems that currently only {{SpecName}} and {{Spec2}} are successfully parsed and imported.
However, the JavaScript docs (and likely other pages) are a bit inconsistent at the moment.
Historical specs like ES1 and ES3 appear like this:
<td>ECMAScript 1st Edition.</td>
<td>Standard</td>
<td>Comment</td>
or
<td>ECMAScript 3rd Edition.</td>
<td>Standard</td>
<td>Comment</td>
What should have happened?
==========================
The parser should parse these as ES1 and ES3.
https://github.com/jwhitlock/web-platform-compat/blob/1079920_import_process/mdn/scrape.py#L73
Is there anything else we should know?
======================================
This will fix hundreds of errors that say:
"Section <h2>Specifications</h2> was not parsed, because rule "kuma" failed to match." (I expect more of these after bug 1132658)
It is sad that it is inconsistent right now. Really happy to have this is as a win afterwards: Once we successfully parsed this, we can display it in a consistent way on MDN using the new compat data. :-)
Assignee | ||
Comment 1•10 years ago
|
||
Here's the API data fields for a Specification:
http://web-platform-compat.readthedocs.org/en/latest/draft/resources.html#specifications
I'm guessing something like:
slug: ES1
mdn_key: blank
name: {"en": "ECMAScript 1st Edition"}
uri: {"en": "http://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262,%201st%20edition,%20June%201997.pdf"}
maturity: "Standard" (https://browsercompat.herokuapp.com/browse/maturities/7)
For other Specifications and Maturities, the SpecName and Spec2 source is parsed to get names, translations, URIs, and maturities. Even if you don't convert all the pages, it would be helpful to get the desired data into SpecName and Spec2, to minimize the work to import into the API.
If you can get ECMA into SpecName and Spec2, then I can adjust the scraper to do something similar to matching browser names. I'm guessing 2 - 4 hours effort.
Assignee | ||
Comment 2•10 years ago
|
||
I've opened bug 1168102 for updating SpecName and Spec2. I don't think this is a blocking issue. The parser will treat "ECMAScript 1st Edition" as "{{SpecName('ES1')}}", call it out as an issue, and continue parsing.
Assignee | ||
Comment 3•10 years ago
|
||
I've updated the parser. The code is in PR #32 [1], and it may take a few weeks for that to be accepted and close the ticket.
The parse of Web/JavaScript/Reference/Operators/instanceof [2] now succeeds with two warnings for the missing SpecName and Spec2 templates. Others that aren't "ECMAScript 1st Edition." or 3rd Edition will display as an error, but parsing will continue.
[1] https://github.com/mozilla/web-platform-compat/pull/32
[2] https://browsercompat.herokuapp.com/importer/1029
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → jwhitlock
Status: NEW → ASSIGNED
Comment 4•10 years ago
|
||
Commits pushed to master at https://github.com/mozilla/web-platform-compat
https://github.com/mozilla/web-platform-compat/commit/4af641960bbcb8a3cb31e283dcb51514d2f3ea57
bug 1132677 - Handle by-name ES1/ES3 specs
When 'ECMAScript 1st/3rd Edition.' appears as the Specification name,
convert to ES1/ES3. Other text is an issue, not a parse error.
https://github.com/mozilla/web-platform-compat/commit/dcde80b5cae4ccc60dae33d9efd007e0321f5993
bug 1132677 - Remove issue from specrow tests
Add sample specifications so that specrow tests aren't complicated with
'unknown_spec' issues.
https://github.com/mozilla/web-platform-compat/commit/c4ee25cd99fdd58617a69ad42197c4eafd6b25a4
bug 1132677 - Narrow context for SpecName issues
Instead of the context being the whole first <td> element, just
highlight the errored KumaScript.
https://github.com/mozilla/web-platform-compat/commit/ef6443ff53aff0b352f63d661d357e122f62b648
fix bug 1132677 - Handle text in spec status cell
Text (instead of {{Spec2(key)}}) in a specification status will result
in an warning-level issue, instead of halting scraping. The status text
is not parsed, but instead the SpecName key is used.
https://github.com/mozilla/web-platform-compat/commit/d3fce8051fed602b2eb9eecfafab5d6aa590fc5f
Merge pull request #32 from jwhitlock/1132677_es1_es3
Fix bug 1132677 - Parse inconsistent specification table cells
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•5 years ago
|
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•