612721 - 'cfx docs' should export HTML, and html fragment

Reporter

Description

•

15 years ago

wbamberg and I have been talking about how to make it easier for flightdeck to get access to the documentation from various versions of the SDK. For reference, the current (0.9/0.10) docs behavior is: * the markdown source is stored in e.g. packages/addon-kit/docs/tabs.md * the 'cfx docs' command starts a local webserver, which serves "/packages/addon-kit/docs/tabs.md" literally, and serves "/packages/addon-kit/docs/tabs.md.json" by parsing the markdown with apiparser.py into chunks of type "markdown" or "api-json" ** the 'cfx docs' frontend javascript fetches tabs.md.json and then uses a bunch of jQuery functions to construct DOM nodes that render the API docs in some useful way. The 'showdown' JS library is used to convert markdown into DOM nodes. * the 'cfx sdocs' command creates a static tarball that includes tabs.md and tabs.md.json * flightdeck scans/imports the sdocs tarball, including the tabs.md.json file. When users look at the docs page, it uses Django templates to transform the JSON into HTML (on the server). The server provides HTML, and the client never sees the markdown or the * the SDK-to-Flightdeck "interface" is the syntax of the JSON The two biggest problems with using the JSON as an interface are: * we have two separate renderers (one in the 'cfx docs' frontend JS, the other in the flightdeck backend python), which must both be updated if we want to change the JSON, and they live in separate repositories, so version skew is a problem * other potential consumers of jetpack docs are responsible for rendering HTML docs themselves (including the API sections), which is a drag So the proposal is to have the SDK emit HTML in addition to the current raw Markdown and parsed JSON API data. The HTML it emits should be usable by frontends that want to incorporate the module docs into an existing page. So: * The 'cfx sdocs' tarball will, in addition to the current "tabs.md" and "tabs.md.json" files, two new files: "tabs.html" and "tabs_module.html". ** tabs_module.html will start with '<div id="tabs_module_api_docs" class="module_api_docs">' (or similar). This is designed to be interpolated into an existing page. ** tabs.html will just be an <html> template/wrapper around the contents of tabs_module.html, so it can be viewed directly, through a static server. The details of the CSS names still need to be worked out. The idea is to make it easy to view an unpacked sdocs tarball, or for something like flightdeck to serve pre-existing HTML files, or to incorporate them into other pages, without requiring a complex renderer that knows about the current JSON API syntax. One other detail: the API JSON is currently a list of two-tuples, each of which starts with a type identifier (either "markdown" or "api-json"). We plan to add a third type identifier named "version", which will only appear as the first element of the list, which will have an integer as a value. Renderers should ignore this. We'll use this to capture the syntax of the JSON contained in the rest of the list, so that external renderers will be able to know how to interpret it.

Will Bamberg [:wbamberg]

Assignee

Updated

•

15 years ago

Assignee: nobody → wbamberg

Status: NEW → ASSIGNED

Will Bamberg [:wbamberg]

Assignee

Comment 1

•

15 years ago

Attached patch Patch for bug612721 — Details — Splinter Review

Here's a first look at these changes. It's not a pull request because I don't seriously expect you to pull it, yet. Summary of what's done: 1) new Python module called (probably confusingly) renderapi.py, that takes either a Markdown file or the output of apiparser.py, and emits the DIV containing the rendered docs, or a standalone HTML page containing the DIV embedded in <body /> and with a header. The standalone HTML assumes it's living in the directory structure created by cfx sdocs and uses this to find stylesheets and images. It uses the same stylesheets as the SDK docs, and has some dodgy JavaScript to rewrite <img> links to find images. 2) changes in server.py to use this module in `cfx sdocs` to emit DIV and HTMLs for any Markdown files found under /packages. 3) changes in main.js, apidocs.css, and server.py to integrate this into `cfx docs` as well, thus making renderapi.js redundant. apidocs.css is now based on the new class and id attributes, and one nice side effect of this is that the css is very much simpler. Changes in presentation are fairly subtle, but improvements, I think. 4) change in apiparser.py to include a version and to fix a couple of annoyances in the current JSON: That some API elements like parameters don't have a "type" key, and some API elements define "type" differently to others. So a top-level property has a "type" key whose value is "property", and also a "property-type" whose value is the underlying datatype, i.e. "string". But a property which is a member of another object has no "property-type" key, but instead redefines "type" to be the datatype. Ugh. So now, all objects have a "type" which is "function", "method", "property" and so on, and objects which have an underlying datatype also have a "datatype" key to capture that. 5) A readme explaining the structure of the DIV (the interface of this tool) which I've put under cuddlefish/docs. Getting this right is kind of key. 6) Added Python's markdown under python-lib. Problems and unresolved things: 1) cfx docs is now *really slow*. I've tried to understand why without success. It's not because renderapi.py is very much slower: timing it in isolation it's not very fast, but fast enough (though I'd gladly accept performance improvement suggestions). There seems to be a looong delay between the server finishing generating the page, and the JQuery AJAX function returning to main.js. The length of the delay seems to be a function of the *size* of the HTTP response (and not its complexity: for example, a large Markdown file that contains no <api /> stuff is also slow). So the slowdown compared to the old code is probably because an HTML DIV is much bigger than the corresponding JSON. This slowdown does not happen in FF3.6, but only with the 4.0 beta 7 (I didn't try any other beta versions). If you have any insight into this or possible directions to explore I'd really like to hear them... 2) Internal links: having thought about this a bit I think the right thing to do is to make internal links be real HTML links, not Markdown links, and add a class attribute to them, then have some JS in the page rewrite them. I haven't done this here because more of the work is in the MD files, and it can be done independently of all this. There is chat in the land of Markdown about adding class attributes, but I think diverging from the standard would be a bad idea. 3) I generally had a hard time getting the markdown converter to behave, in particular I seem to have to UTF-8 encode the output, which seems a bit weird. Dumb question, but what's a nice way in Python to dump the binary content of a string, so I can figure out the encoding?

Attachment #494201 - Flags: review?(warner-bugzilla)

Attachment #494201 - Flags: feedback?

Will Bamberg [:wbamberg]

Assignee

Updated

•

15 years ago

Attachment #494201 - Flags: feedback? → feedback?(dbuchner)

Will Bamberg [:wbamberg]

Assignee

Updated

•

15 years ago

Blocks: 610030

Will Bamberg [:wbamberg]

Assignee

Updated

•

15 years ago

Blocks: 617029

Brian Warner [:warner :bwarner]

Reporter

Comment 2

•

15 years ago

sounds like a great approach. A few notes about the problems: 1) as Will and I discussed last week, I'm wondering if the slowdown is from FF4's XHR code trying to parse the HTML that it gets back. We talked about experimenting with the Content-Type in the response (make it text/plain to keep XHR from thinking it can be parsed), and running some tests to chart response size against round-trip time (in a new small test program, to rule out jetpack altogether) and see if there's a big correlation. Maybe there's a FF4 bug. 3) yeah, python2 is not very consistent about bytes versus unicode strings: it will quietly convert between the two when it seems to be necessary, which works fine until you venture beyond ASCII (python3 fixes this). The key is to always know what type a given variable is supposed to be, use isinstance(my_bytes, "str") or isinstance(my_unicode_string, "unicode") to check, and do explicit conversions: my_unicode_string = u"blah" my_bytes = my_unicode_string.encode("utf-8") back_to_unicode = my_bytes.decode("utf-8") python2 will, regrettably, allow my_unicode_string.decode(encoding), which probably acts like my_unicode_string.encode("ascii").decode(encoding), except that instead of "ascii" it may use your "default encoding". Likewise, it regrettably allows my_bytes.encode(encoding), which probably behaves like my_bytes.decode("ascii").encode(encoding). Both fail mysteriously when the things can't be expressed as ASCII. python3 rejects these calls. Anyways, to answer your question. If you have a bytestring, then you can get a basic hexdump with binascii.hexlify: from binascii import hexlify print hexlify("\x12\x34") # 1234 If you have a unicode string, you can either turn it into UTF-8 and then dump the bytes (assuming your brain can read utf-8, which mine can't): hexlify(u.encode("utf-8")) or turn it into unicode-string-literal form, like what you could paste back into python: u.encode("unicode_escape") (although note that unicode_escape tends to produce e.g. \xe9 instead of \u00e9, which may be a bit confusing)

Brian Warner [:warner :bwarner]

Reporter

Comment 3

•

15 years ago

Oh, also hexlify(u.encode("utf-32-be")) might be handy, especially if you then split it up on 8-character boundaries: s = hexlify(u.encode("utf-32-be")) print " ".join([s[i:i+8] for i in range(0, len(s), 8)]) Also, http://docs.python.org/library/codecs.html#standard-encodings has more details on those encodings.

Brian Warner [:warner :bwarner]

Reporter

Comment 4

•

15 years ago

Comment on attachment 494201 [details] [diff] [review] Patch for bug612721 (still trying to figure out a good way to leave comments in the right place.. here or in github) this branch looks pretty good! The two suggestions I'd make (apart from the specific comments on renderapi.py) would be to add a couple of smoke tests that cover renderapi.py (pass in a small MD file, get back the HTML, do a few 'self.assert_(blah in output)' lines), and to consider renaming renderapi.py to api_renderer.py In renderapi.py, the only thing I'd change is to use a triple-quoted string for the header/footer, instead of \n\ on each line. Also, I'd recommend rebasing this against current master before landing. The history is pretty tangled right now, and that might make it harder to read it later. I see good arguments against rebasing too, though. Oh, and I found the source of the XHR slowdown: in server.py, you should return a list of strings instead of a single string: "return [parsed]" instead of "return parsed". With those changes, I think this can land.

Attachment #494201 - Flags: review?(warner-bugzilla) → review+

Will Bamberg [:wbamberg]

Assignee

Comment 5

•

15 years ago

(In reply to comment #4) > > Oh, and I found the source of the XHR slowdown: in server.py, you should return > a list of strings instead of a single string: "return [parsed]" instead of > "return parsed". Fantastic. Thanks for finding that Brian, I'd got as far as building a minimal test program, which failed to reproduce the problem, and was wondering what to try next. Do you think it's worth using heading tags, rather than <div> tags, for things like 'api_header' and

Will Bamberg [:wbamberg]

Assignee

Comment 6

•

15 years ago

(In reply to comment #5) > (In reply to comment #4) > > > > Oh, and I found the source of the XHR slowdown: in server.py, you should return > > a list of strings instead of a single string: "return [parsed]" instead of > > "return parsed". > > Fantastic. Thanks for finding that Brian, I'd got as far as building a minimal > test program, which failed to reproduce the problem, and was wondering what to > try next. > > Do you think it's worth using heading tags, rather than <div> tags, for things > like 'api_header' and ...'api_name'? i.e. <h2 class='api-header'>Constructors</h2> ? Then it would be easy for the page to use the same style as for other headings if it chose, or to customize it if it chose that. Is there any reason this is a bad idea?

Brian Warner [:warner :bwarner]

Reporter

Comment 7

•

15 years ago

Heading tags seem like a good idea to me. The more handles we can give to CSS stylists, the better!

Gervase Markham [:gerv]

Comment 8

•

15 years ago

License review: http://www.freewisdom.org/projects/python-markdown/License is fine. This code is not compiled, so all distribution is source distribution, so the notification requirements of the license are fulfilled just by giving the code out in the SDK, so no further action is required. Gerv

Will Bamberg [:wbamberg]

Assignee

Comment 9

•

15 years ago

Thanks Gerv. Fixed by: https://github.com/mozilla/addon-sdk/commit/81aa39656fb1b1eda117f1501f968da077b50493

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Piotr Zalewa [:zalun]

Comment 10

•

15 years ago

Is this still the right bug for the cfx sdocs output? The 1.0b2 was generating *.md.div.html now it seems to generate *.md.div

Will Bamberg [:wbamberg]

Assignee

Comment 11

•

15 years ago

(In reply to comment #10) > Is this still the right bug for the cfx sdocs output? > > The 1.0b2 was generating *.md.div.html now it seems to generate *.md.div Sorry Piotr: yes, the fix for Bug 629922 renamed those files from *.div.html -> *.div.

Piotr Zalewa [:zalun]

Comment 12

•

14 years ago

It seems there is no documentation for the package itself (I see only README.md, but no html or anything) It would be good to have it in beta3

Will Bamberg [:wbamberg]

Assignee

Comment 13

•

14 years ago

Yes, this patch only addresses the API reference docs: that is, those module docs which include that proprietary <api></api> syntax. And the purpose of the patch among other things, is to hide that syntax from consumers like FlightDeck, so (1) the SDK can extend or even replace it without breaking FlightDeck and (2) FlightDeck doesn't have to use cfx tools at runtime. The package doc really is just the README: there isn't any <api></api>-type syntax for package docs, so a straightforward Markdown->HTML conversion is all that's needed.

Piotr Zalewa [:zalun]

Comment 14

•

14 years ago

currently it's like here: http://flightdeck.zalewa.info/api/ it may wait for SDK beta 3 - it would be good to have it consistent ... and not use cuddlefish in FlightDeck

Myk Melez [:myk] [@mykmelez]

Comment 15

•

14 years ago

(In reply to comment #14) > currently it's like here: http://flightdeck.zalewa.info/api/ > it may wait for SDK beta 3 - it would be good to have it consistent ... and not > use cuddlefish in FlightDeck Piotr: it's not clear what you're asking for here and whether there is a change that you think needs to make it into the SDK 1.0b3 release. Can you provide more details? Also, it would be best to file a new bug on the issue, since this already-resolved bug is not the best place to discuss it.

Piotr Zalewa [:zalun]

Comment 16

•

14 years ago

NVM - I made it work in both b2 and b3 mode