Closed
Bug 1328839
Opened 7 years ago
Closed 7 years ago
[Research] How to reliably load JSON sources from MDN GitHub repositories into KumaScript?
Categories
(developer.mozilla.org Graveyard :: KumaScript, enhancement)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: fs, Unassigned)
References
Details
(Keywords: in-triage, Whiteboard: [specification][type:feature])
What problem would this feature solve? ====================================== Currently, data about Open Web technologies lives in KumaScript macros, because MDN pages consume this data to generate documentation or navigation and more. Ideally, Open Web data lives separate from KumaScript and the wiki pages, because it is useful for other consumers than MDN. This separation was started by putting data into JSON files in the repositories mdn/data and mdn/browser-compat-data. To use this data in MDN, KumaScript still needs to have fast and reliable access to the JSON files. Live-loading from GitHub is not a good approach. Alternatives have to be researched. Who has this problem? ===================== Core contributors to MDN How do you know that the users identified above have this problem? ================================================================== Our Site Reliability Engineer noticed that the current way of loading from the GitHub repository is not a good approach. https://github.com/mozilla/kumascript/pull/78 From 2012: https://www.quora.com/GitHub/What-is-the-recommended-way-to-use-raw-github-com "You can use it as a way to link to (or download) the raw contents of a file. Git is not designed as an efficient file serving system, so we rate limit it pretty heavily to protect against sites hotlinking (or doing anything that generates a lot of traffic)." How are the users identified above solving this problem now? ============================================================ a) Live load from raw.github (don't do this anymore!) b) Use a KumaScript macro (ejs format) that duplicates the JSON from the external repo. (works for a handful of JSON files, not for loads of them [BC JSONs]). Do you have any suggestions for solving the problem? Please explain in detail. ============================================================================== Josh says: "Once MDN (including kumascript) is being continuously deployed to Kubernetes later this year we'll have much better options for automatically updating data from external repos and APIs on the server side, either as part of an automated deployment, for example as part of the kumascript Docker image build phase, or as a sidecar container like https://github.com/kubernetes/git-sync, ideally with some sort of schema validation." I have no expertise with any of the things Josh mentioned. My first thought was to somehow integrate mdn/data and mdn/browser-compat-data as submodules into kumascript. But then submodules are making people cry, so I am eager to hear better ideas. Is there anything else we should know? ====================================== I filed this as a research/preparation bug, so that I would consider it fixed, once there is agreement on an actual solution for this (the summary question is answered) and (an) implementation bug(s) is/are filed. I think this is critical for MDN to proceed with offering structured data to external collaborators. I will put this on my wish list.
Comment 1•7 years ago
|
||
Currently the data is only requested from GitHub when you reload a page as logged in user via Ctrl+F5, purging the cache. And the quoted comment is already five years old, so this info may have changed in the meantime. This doesn't happen that often, so the issue currently shouldn't be that big. Non-the-less there should be some caching strategy established regarding caching that data on MDN-side. One solution that comes to my mind involves the use of the cacheFn() function[1] within the KumaScripts to cache the data once it's fetched from GitHub. Otherwise, the data may be fetched from GitHub on deployment and made accessible somewhere on MDN. Disadvantage is that a contributor has to wait for the next release to see the changes live. Sebastian [1] https://developer.mozilla.org/en-US/docs/MDN/Contribute/Tools/KumaScript#Built-in_methods
Comment 2•7 years ago
|
||
I contacted GitHub support, and here's their answer: ---- You shouldn't really be using that raw endpoint for programmatic access. For programmatic access -- the recommended approach is to use the API: http://developer.github.com/v3/ The raw endpoint isn't documented, as you noticed, which means it doesn't have defined caching or rate limiting behavior. In other words, you might get limited at any time and without any warning. The API has documented rate limits which you can rely on. --- My ideal solution would be that the data is present on each KumaScript host, like the macros (as of December 2016), and loading the data is a local file read. This ensures the data is fast, consistent, and takes the network out of the loop. However, I understand the desire to update this data more frequently than once or twice a week. I think rapid deployment (5-30 min after merge to mdn/data's master) could be possible after the AWS migration, so we're talking Q3 2017 before that is feasible. Some other options: * Publish as GitHub pages and load from there * Add as a git submodule to kumascript, deploy from there * Deploy a [kinto](https://github.com/Kinto/kinto) server, load the JSON data on merge to master with a web hook or similar I think a good first step would be to create an internal API through KumaScript. For example, mdn.LoadJsonData('api/groups') could just call mdn.fetchJSONResource('https://raw.githubusercontent.com/mdn/data/master/api/groups.json'), but later could load from disk, from a document storage API, or other method. And then I'd support quietly using raw.githubusercontent.com under current usage patterns until we're in AWS. We really don't have the dev resources to implement a better solution in SCL3.
Reporter | ||
Comment 3•7 years ago
|
||
Another option: * use npm packages (mdn-data & mdn-compat-data) which we plan to publish soon (seems like https://www.npmjs.com/package/caniuse-db releases daily or so)
Comment 4•7 years ago
|
||
Commits pushed to master at https://github.com/mozilla/kumascript https://github.com/mozilla/kumascript/commit/db40f8fa45bb1cac95ba50337049631b0c6eef84 bug 1328839: enable "require" of npm packages * add "mdn-browser-compat-data" to package.json and update "npm-shrinkwrap.json" * change name of "require" method on "APIContext" class to "require_macro", and update all macros and tests to reflect the change * add "require" method on "APIContext" class that maps to the nodejs "require" and add test https://github.com/mozilla/kumascript/commit/f2383aa2618e6a9819fee0e5b1976dfd63a20c9f Merge pull request #183 from escattone/use-npm-bcd-1328839 bug 1328839: enable "require" of npm packages
Comment 5•7 years ago
|
||
Commits pushed to master at https://github.com/mozilla/kuma https://github.com/mozilla/kuma/commit/439728c54e8723be1c55792bcc3c7bfe8ccdccea bug 1328839: improve kumascript npm package updates https://github.com/mozilla/kuma/commit/a9f27ae22035e7fb26aa50cb56a8c3817743fc07 Merge pull request #4255 from escattone/npm-update-1328839 bug 1328839: improve kumascript npm package updates
Comment 6•7 years ago
|
||
Commits pushed to master at https://github.com/mozilla/kumascript https://github.com/mozilla/kumascript/commit/60c43b04d42af2df2583c1cdcff096f804c0b417 bug 1328839: improve npm package updates https://github.com/mozilla/kumascript/commit/304787699a64eb5aee6d70a6a26f6dc3a6495999 Merge pull request #194 from escattone/npm-update-1328839 bug 1328839: improve npm package updates
Comment 7•7 years ago
|
||
I think we have proven that: 1. npm is a good way to package versioned JSON data, and to share it with external projects 2. GitHub raw API is a decent way to access quickly changing JSON data, and will fail for a few hours every other month. 3. Other options may be available after the AWS transition (bug 1110799, jgmize's comments) Since this bug is about investigating the issue, I think we can resolve it as fixed.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•4 years ago
|
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•