Closed Bug 612721 Opened 14 years ago Closed 14 years ago

'cfx docs' should export HTML, and html fragment

Categories

(Add-on SDK Graveyard :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: warner, Assigned: wbamberg)

References

Details

Attachments

(1 file)

wbamberg and I have been talking about how to make it easier for flightdeck
to get access to the documentation from various versions of the SDK. For
reference, the current (0.9/0.10) docs behavior is:

* the markdown source is stored in e.g. packages/addon-kit/docs/tabs.md
* the 'cfx docs' command starts a local webserver, which serves
  "/packages/addon-kit/docs/tabs.md" literally, and serves
  "/packages/addon-kit/docs/tabs.md.json" by parsing the markdown with
  apiparser.py into chunks of type "markdown" or "api-json"
** the 'cfx docs' frontend javascript fetches tabs.md.json and then uses a
   bunch of jQuery functions to construct DOM nodes that render the API docs
   in some useful way. The 'showdown' JS library is used to convert markdown
   into DOM nodes.
* the 'cfx sdocs' command creates a static tarball that includes tabs.md and
  tabs.md.json
* flightdeck scans/imports the sdocs tarball, including the tabs.md.json
  file. When users look at the docs page, it uses Django templates to
  transform the JSON into HTML (on the server). The server provides HTML, and
  the client never sees the markdown or the
* the SDK-to-Flightdeck "interface" is the syntax of the JSON

The two biggest problems with using the JSON as an interface are:

* we have two separate renderers (one in the 'cfx docs' frontend JS, the
  other in the flightdeck backend python), which must both be updated if we
  want to change the JSON, and they live in separate repositories, so version
  skew is a problem
* other potential consumers of jetpack docs are responsible for rendering
  HTML docs themselves (including the API sections), which is a drag

So the proposal is to have the SDK emit HTML in addition to the current raw
Markdown and parsed JSON API data. The HTML it emits should be usable by
frontends that want to incorporate the module docs into an existing page. So:

* The 'cfx sdocs' tarball will, in addition to the current "tabs.md" and
  "tabs.md.json" files, two new files: "tabs.html" and "tabs_module.html".
** tabs_module.html will start with '<div id="tabs_module_api_docs"
   class="module_api_docs">' (or similar). This is designed to be
   interpolated into an existing page.
** tabs.html will just be an <html> template/wrapper around the contents of
   tabs_module.html, so it can be viewed directly, through a static server.

The details of the CSS names still need to be worked out. The idea is to make
it easy to view an unpacked sdocs tarball, or for something like flightdeck
to serve pre-existing HTML files, or to incorporate them into other pages,
without requiring a complex renderer that knows about the current JSON API
syntax.

One other detail: the API JSON is currently a list of two-tuples, each of
which starts with a type identifier (either "markdown" or "api-json"). We
plan to add a third type identifier named "version", which will only appear
as the first element of the list, which will have an integer as a value.
Renderers should ignore this. We'll use this to capture the syntax of the
JSON contained in the rest of the list, so that external renderers will be
able to know how to interpret it.
Assignee: nobody → wbamberg
Status: NEW → ASSIGNED
Here's a first look at these changes. It's not a pull request because I don't seriously expect you to pull it, yet.

Summary of what's done:

1) new Python module called (probably confusingly) renderapi.py, that takes
either a Markdown file or the output of apiparser.py, and emits the DIV
containing the rendered docs, or a standalone HTML page containing the DIV
embedded in <body /> and with a header. The standalone HTML assumes it's
living in the directory structure created by cfx sdocs and uses this to find
stylesheets and images. It uses the same stylesheets as the SDK docs, and has
some dodgy JavaScript to rewrite <img> links to find images.

2) changes in server.py to use this module in `cfx sdocs` to emit DIV and HTMLs
for any Markdown files found under /packages.

3) changes in main.js, apidocs.css, and server.py to integrate this into `cfx
docs` as well, thus making renderapi.js redundant. apidocs.css is now based on
the new class and id attributes, and one nice side effect of this is that the
css is very much simpler. Changes in presentation are fairly subtle, but
improvements, I think.

4) change in apiparser.py to include a version and to fix a couple of
annoyances in the current JSON:

That some API elements like parameters don't have a "type" key, and some API
elements define "type" differently to others. So a top-level property has a
"type" key whose value is "property", and also a "property-type" whose value is
the underlying datatype, i.e. "string". But a property which is a member of
another object has no "property-type" key, but instead redefines "type" to be
the datatype. Ugh. So now, all objects have a "type" which is "function",
"method", "property" and so on, and objects which have an underlying datatype
also have a "datatype" key to capture that. 

5) A readme explaining the structure of the DIV (the interface of this tool)
which I've put under cuddlefish/docs. Getting this right is kind of key.

6) Added Python's markdown under python-lib.

Problems and unresolved things:

1) cfx docs is now *really slow*. I've tried to understand why without success.
It's not because renderapi.py is very much slower: timing it in isolation it's
not very fast, but fast enough (though I'd gladly accept performance
improvement suggestions). There seems to be a looong delay between the server
finishing generating the page, and the JQuery AJAX function returning to
main.js. The length of the delay seems to be a function of the *size* of the
HTTP response (and not its complexity: for example, a large Markdown file
that contains no <api /> stuff is also slow). So the slowdown compared to the
old code is probably because an HTML DIV is much bigger than the corresponding
JSON. This slowdown does not happen in FF3.6, but only with the 4.0 beta 7 (I
didn't try any other beta versions).

If you have any insight into this or possible directions to explore I'd really
like to hear them...

2) Internal links: having thought about this a bit I think the right thing to
do is to make internal links be real HTML links, not Markdown links, and add a
class attribute to them, then have some JS in the page rewrite them. I haven't
done this here because more of the work is in the MD files, and it can be done
independently of all this. There is chat in the land of Markdown about adding
class attributes, but I think diverging from the standard would be a bad idea.

3) I generally had a hard time getting the markdown converter to behave, in
particular I seem to have to UTF-8 encode the output, which seems a bit weird.
Dumb question, but what's a nice way in Python to dump the binary content of a
string, so I can figure out the encoding?
Attachment #494201 - Flags: review?(warner-bugzilla)
Attachment #494201 - Flags: feedback?
Attachment #494201 - Flags: feedback? → feedback?(dbuchner)
Blocks: 610030
Blocks: 617029
sounds like a great approach. A few notes about the problems:

1) as Will and I discussed last week, I'm wondering if the slowdown is from
   FF4's XHR code trying to parse the HTML that it gets back. We talked about
   experimenting with the Content-Type in the response (make it text/plain to
   keep XHR from thinking it can be parsed), and running some tests to chart
   response size against round-trip time (in a new small test program, to
   rule out jetpack altogether) and see if there's a big correlation. Maybe
   there's a FF4 bug.

3) yeah, python2 is not very consistent about bytes versus unicode strings:
   it will quietly convert between the two when it seems to be necessary,
   which works fine until you venture beyond ASCII (python3 fixes this). The
   key is to always know what type a given variable is supposed to be, use
   isinstance(my_bytes, "str") or isinstance(my_unicode_string, "unicode") to
   check, and do explicit conversions:

     my_unicode_string = u"blah"
     my_bytes = my_unicode_string.encode("utf-8")
     back_to_unicode = my_bytes.decode("utf-8")

   python2 will, regrettably, allow my_unicode_string.decode(encoding), which
   probably acts like my_unicode_string.encode("ascii").decode(encoding),
   except that instead of "ascii" it may use your "default encoding".
   Likewise, it regrettably allows my_bytes.encode(encoding), which probably
   behaves like my_bytes.decode("ascii").encode(encoding). Both fail
   mysteriously when the things can't be expressed as ASCII. python3 rejects
   these calls.

   Anyways, to answer your question. If you have a bytestring, then you can
   get a basic hexdump with binascii.hexlify:

    from binascii import hexlify
    print hexlify("\x12\x34")   # 1234

   If you have a unicode string, you can either turn it into UTF-8 and then
   dump the bytes (assuming your brain can read utf-8, which mine can't):

    hexlify(u.encode("utf-8"))

   or turn it into unicode-string-literal form, like what you could paste
   back into python:

    u.encode("unicode_escape")

   (although note that unicode_escape tends to produce e.g. \xe9 instead of
   \u00e9, which may be a bit confusing)
Oh, also hexlify(u.encode("utf-32-be")) might be handy, especially if you then split it up on 8-character boundaries:

 s = hexlify(u.encode("utf-32-be"))
 print " ".join([s[i:i+8] for i in range(0, len(s), 8)])

Also, http://docs.python.org/library/codecs.html#standard-encodings has more details on those encodings.
Comment on attachment 494201 [details] [diff] [review]
Patch for bug612721

(still trying to figure out a good way to leave comments in the right place.. here or in github)

this branch looks pretty good! The two suggestions I'd make (apart from the specific comments on renderapi.py) would be to add a couple of smoke tests that cover renderapi.py (pass in a small MD file, get back the HTML, do a few 'self.assert_(blah in output)' lines), and to consider renaming renderapi.py to api_renderer.py

In renderapi.py, the only thing I'd change is to use a triple-quoted string for the header/footer, instead of \n\ on each line.

Also, I'd recommend rebasing this against current master before landing. The history is pretty tangled right now, and that might make it harder to read it later. I see good arguments against rebasing too, though.

Oh, and I found the source of the XHR slowdown: in server.py, you should return a list of strings instead of a single string: "return [parsed]" instead of "return parsed".

With those changes, I think this can land.
Attachment #494201 - Flags: review?(warner-bugzilla) → review+
(In reply to comment #4)
> 
> Oh, and I found the source of the XHR slowdown: in server.py, you should return
> a list of strings instead of a single string: "return [parsed]" instead of
> "return parsed".

Fantastic. Thanks for finding that Brian, I'd got as far as building a minimal test program, which failed to reproduce the problem, and was wondering what to try next.

Do you think it's worth using heading tags, rather than <div> tags, for things like 'api_header' and
(In reply to comment #5)
> (In reply to comment #4)
> > 
> > Oh, and I found the source of the XHR slowdown: in server.py, you should return
> > a list of strings instead of a single string: "return [parsed]" instead of
> > "return parsed".
> 
> Fantastic. Thanks for finding that Brian, I'd got as far as building a minimal
> test program, which failed to reproduce the problem, and was wondering what to
> try next.
> 
> Do you think it's worth using heading tags, rather than <div> tags, for things
> like 'api_header' and

...'api_name'? 

i.e. <h2 class='api-header'>Constructors</h2> ?

Then it would be easy for the page to use the same style as for other headings if it chose, or to customize it if it chose that. Is there any reason this is a bad idea?
Heading tags seem like a good idea to me. The more handles we can give to CSS
stylists, the better!
License review: http://www.freewisdom.org/projects/python-markdown/License is fine. This code is not compiled, so all distribution is source distribution, so the notification requirements of the license are fulfilled just by giving the code out in the SDK, so no further action is required.

Gerv
Thanks Gerv. 
Fixed by: https://github.com/mozilla/addon-sdk/commit/81aa39656fb1b1eda117f1501f968da077b50493
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Is this still the right bug for the cfx sdocs output?

The 1.0b2 was generating *.md.div.html now it seems to generate *.md.div
(In reply to comment #10)
> Is this still the right bug for the cfx sdocs output?
> 
> The 1.0b2 was generating *.md.div.html now it seems to generate *.md.div

Sorry Piotr: yes, the fix for Bug 629922 renamed those files from *.div.html -> *.div.
It seems there is no documentation for the package itself (I see only README.md, but no html or anything)
It would be good to have it in beta3
Yes, this patch only addresses the API reference docs: that is, those module docs which include that proprietary <api></api> syntax. And the purpose of the patch among other things, is to hide that syntax from consumers like FlightDeck, so (1) the SDK can extend or even replace it without breaking FlightDeck and (2) FlightDeck doesn't have to use cfx tools at runtime.

The package doc really is just the README: there isn't any <api></api>-type syntax for package docs, so a straightforward Markdown->HTML conversion is all that's needed.
currently it's like here: http://flightdeck.zalewa.info/api/
it may wait for SDK beta 3 - it would be good to have it consistent ... and not use cuddlefish in FlightDeck
(In reply to comment #14)
> currently it's like here: http://flightdeck.zalewa.info/api/
> it may wait for SDK beta 3 - it would be good to have it consistent ... and not
> use cuddlefish in FlightDeck

Piotr: it's not clear what you're asking for here and whether there is a change that you think needs to make it into the SDK 1.0b3 release.  Can you provide more details?  Also, it would be best to file a new bug on the issue, since this already-resolved bug is not the best place to discuss it.
NVM - I made it work in both b2 and b3 mode
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: