[Compat Data] Improve MDN importer, Round 2

RESOLVED FIXED

Status

enhancement
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: jwhitlock, Assigned: jwhitlock)

Tracking

Details

(Whiteboard: [specification][type:feature])

What problems would this solve?
===============================
Importing data from MDN into the API is an iterative process:

1) A parser extracts data from an MDN page and reports any detected data issues,
2) A human fixes data issues and badly scraped data by changing the MDN page, rerunning the parser as needed,
3) The page is imported into the API,
4) A human adds additional information to the API,
5) The MDN page's tables are replaced with versions generated from the API
6) Further data additions are made in the API rather than on the MDN page.

One part of making this process effective is improving the parser.

Who would use this?
===================
MDN staff and volunteers who are converting MDN pages to use API-backed compatibility data

What would users see?
=====================
Importer issues would be limited to data quality issues and issues that are best fixed manually.

What would users do? What would happen as a result?
===================================================
MDN staff and volunteers will quickly convert the 1000+ pages with compatibility data to use the API in Q2 2015.

Is there anything else we should know?
======================================
This is a tracking bug for desired MDN importer issues.  The goal is not a perfect importer.  The importer is temporary code, and will be discarded after the import task is done.  Instead, the goal is for MDN staff to determine what importer improvements are worth doing, and which are human-level tasks.

Propose improvements as bugs blocking this bug, and use +! or CC to signal votes for that improvement.  The top voted-improvements will be estimated and bundled up as Q2 2015 deliverables as budget allows.
Blocks: 996570
Severity: normal → enhancement
Blocks: 1132781
No longer blocks: 1132781
Depends on: 1132781
Regarding the bugs filed by :fscholz above, we may need to discuss with the writing team about dedicating some time to testing/improving this. :groovecoder, can you reach out to them?
Flags: needinfo?(lcrouch)
After a quick chat with Ali, she told me that Jeremie is still acting as a stakeholder, representing us as the initial customers for this project. This should be enough, we will go through him.
Depends on: 1134373
Depends on: 1134426
Depends on: 1134450
Depends on: 1134474
Depends on: 1134586
Depends on: 1134587
Depends on: 1134624
Depends on: 1135000
Depends on: 1135060
Depends on: 1138455
Depends on: 1138458
Depends on: 1139433
Depends on: 1139619
Depends on: 1140009
Clearing my needinfo. The writing team is sending feedback thru Jeremie, and Trevor Hobson is even sending pull requests to the code itself. [1]

[1] https://github.com/jwhitlock/web-platform-compat/pulls
Flags: needinfo?(lcrouch)
Commits pushed to master at https://github.com/mozilla/web-platform-compat

https://github.com/mozilla/web-platform-compat/commit/e8cd01947348efdc0c0f212ba79db7d1ca7f8226
bug 1132269 - Refactor attribute handling

Capture attribute details at leaf node, and consume attributes higher in
the parse tree, where validation can be customized.  Add more tests for
the Specification <h2>, for code coverage.

https://github.com/mozilla/web-platform-compat/commit/36ee8a99f11b6f0ac658b8f2360c272c7bafd4db
Merge pull request #23 from jwhitlock/1132269_more_importer

bug 1132269 - Various importer fixes
Depends on: 1153260
Depends on: 1154349
Depends on: 1164311
Depends on: 1170196
Depends on: 1170199
Depends on: 1170206
Depends on: 1174808
Depends on: 1175177
No longer depends on: 1170709
No longer depends on: 1175177
No longer depends on: 1174808
No longer depends on: 1180573
No longer depends on: 1183593
Assignee: nobody → jwhitlock
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Commit pushed to master at https://github.com/mozilla/web-platform-compat

https://github.com/mozilla/web-platform-compat/commit/3e91e3671a6622943059c6560b09eaf0f4cde90e
bug 1132269 - Handle canonical feature names

In the sample JS display and the browse app, handle features with
canonical names, which are encoded as strings rather than objects.
All round 2 PRs are merged.
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.