Closed Bug 665750 Opened 13 years ago Closed 4 years ago

Export a subset of pages for offline reading


( Graveyard :: Wiki pages, enhancement, P3)



(Not tracked)



(Reporter: openjck, Unassigned)



(Whiteboard: [user-interview][th][triaged][type:feature])

Export a subset of pages (user-selected or structure-based) for offline reading (e.g., PDF or zipped HTML). Might be able to use code from Objavi2 (
Version: unspecified → Kuma
Assignee: lcrouch → nobody
Target Milestone: 1.0 alpha → ---
Priority: -- → P2
This is a pretty popular request on UserVoice. Only two individual requests were made, but these requests have some of the highest vote totals out of everything submitted.

Please see UserVoice for specifics on this request:
Summary: Kuma: Export a subset of pages for offline reading → Export a subset of pages for offline reading
Whiteboard: [u: user] [c: wiki] → u=user c=wiki p=
Closed: 13 years ago
Resolution: --- → DUPLICATE
reopening.  this bug is actually quite different than the other one.

in this bug, we want to make it possible for a user to select a document or a set of documents (perhaps all the docs that show up in their search results or filtered view) and export them to .pdf or .html for viewing offline.

the other bug was more about getting a raw dump of docs data so someone could run a mirror of the wiki pages somewhere else (online).
Resolution: DUPLICATE → ---
Good catch. There's some overlap: 

* "I often want to work offline (notebook, in a cafe)"
* "I'd like a dump of MDC (of certain sections or all) to store on my computer, so that I have the documentation locally"
* etc.

If nothing else, it might be helpful to keep this person's thoughts in mind.
Sheppy talked about this in his user interview.

Sheppy explained that users should be able to take an entire subsection of the site and dump it out as a PDF. He pointed to "The JavaScript Reference" as being one example of a subsection.

He explained that users would be interested in this because they could have the content locally (for example on their laptops and iPads), which might be useful if they are flying.

Sheppy: Two additional questions on this.

1. Can you provide a few more examples of subsections?
2. Can you provide a few more use cases for this? Other than flying, when might MDN readers want this feature?
Whiteboard: u=user c=wiki p= → [user-interview] u=user c=wiki p=
1) when they need it all the time and it's too slow to load it from the server all the time. Pageload of 1s adds up, if you do that a lot.
2) when the server is down
3) when their internet connection doesn't work
4) when they are on the beach or sitting in the park
   (and don't have 3G flatrate)
and approximately 245 other situations.
Additional comment on UserVoice:

Claus Reinke

How about adding HTML offline cache manifests to the top-level pages (Javascript, DOM, HTML, ..)?

on Oct 10, 2011
cache manifests is a good way to do it, and it looks like MindTouch has a service for it:
Priority: P2 → P3
Whoops, Janet's URL requires login.

This one lets you read the article without logging in:

(Mind that "admin" in the URL! ;))
Blocks: 756266
Version: Kuma → unspecified
Component: Website → Landing pages
This feature is still a little ways down from top priority, but I wanted to capture this idea for if/when we do take it on.

In bug 809514, Sheppy mentioned that it would be nice to generate an offline collection of all pages with a certain tag.
No longer blocks: 756266
Ideally we would offer a few ways to assemble this content:

(1) "This page and all of its subpages"
(2) "All pages that match this set of tags"

And ideally we would offer a few kinds of output:

(1) HTML for reading in-browser
(2) PDF
(3) ePub

I've listed those in priority order, I believe. I personally would love ePub since that'd work best in my e-reader of choice, but I know that it's the least broadly usable. If we can find a system that supports exporting to a variety of formats, that'd be fantastic.
The SUMO team has an intern working on this for the Firefox support docs. We should see if we can leverage their work.
Whiteboard: [user-interview] u=user c=wiki p= → [user-interview]
Depends on: 883967
How long as the little hard to see link about reading docs offline been there on MDN? I just noticed it for the first time. :)
Was just pushed today -- it isn't meant to get overwhelming amounts of attention, but it is in a prominent position.
After looking through the data (see 1) we have decided not to build this feature into MDN. Other options (like DocHub and Dash) exist for those still interested in this -- please contact me if you need any assistance setting those up.

Closed: 13 years ago11 years ago
Resolution: --- → WONTFIX
I looked at DocHub and Dash, and both seem to provide certain API subsets only, is that correct? Furthermore, I need to install local software, correct? Does Dash even exist for Linux? It looks Mac only. DocHub seems to provide only CSS and HTML APIs, but not any of the rest of MDC. I'm not interested in any third party services.

Sorry, but my needs are not filled. I need a full copy of (one language version of) the site, same information and no parts cut out, in a way that I use offline, without extra software to be installed, only the browser to use it. Essentially, I need a tarballs with HTML files that are browsable and self-containing.

From bug 883967 comment 10:
> * 0 new comments on bug 665750
> this feature is not in high demand. Safe to ignore bug 665750

I find that quite offensive. We are not supposed to make noise on bugzilla, but just wait. I've been patiently *waiting* for years. And now that's counted against us.

I've put quite some work into this bug 561470 comment 31 (which is essentially this bug), manually set up a mirror at , which cost a lot of time to set up, but gets more and more outdated due to lack of a feed from you.

I need MDN for my work. I want response times in the ms range and get 2-3s per page load. Sometimes, it's down. Sometimes, I don't have Internet. MDC is crucial.

Both this bug here and bug 561470 are important for classical open source values: You cannot be an open source project, ask the community to contribute to the documentation, and then keep all these docs for yourself, usable only on your website. The information must be free and copyable for everybody, both in source code form (bug 561470) and in resulting form (this bug). This is highly important for very basic open source reasons and preservation of information.
Resolution: WONTFIX → ---
(In reply to Ben Bucksch (:BenB) from comment #19)

I couldn't agree more with the previous comment. I would just add that *there are* places on the planet with very bad or expensive internet connection, they're usually both bad and expensive btw.
I voted for this bug when I was working in a research station in Antarctica. I set up local mirrors for all the documentation and software we were using there; all but one! PHP, python, django, etc. no problem whatsoever. MDN, despite being hugely valuable, was a nightmare to mirror and I gave up.
And think about all the developing countries! And people travelling, etc.

Ignoring this bug is really a disgrace.
We know and appreciate how much work it takes to export MDN content. As mentioned in, Mozilla WebOps provides a weekly snapshot of MDN at via the mechanism implemented in bug 757461. Thank you for helping us with it!

WONTFIX'ing this bug is inappropriate, but the reality is that 99.9% of MDN users aren't interested enough in the feature to click a link, much less to participate in defining, discussing, or prioritizing the feature. Without that collaboration, we can't devote company resources to it yet.
> I've put quite some work into this bug 561470 comment 31 (which is
> essentially this bug), manually set up a mirror at ,
> which cost a lot of time to set up, but gets more and more outdated due to
> lack of a feed from you.

We appreciate the work you are doing with, and this decision is in no way meant to show anything else. We have the Kuma API for third-party projects that want to programmatically re-use our documentation. We can make improvements to the API if warranted.

> Both this bug here and bug 561470 are important for classical open source
> values: You cannot be an open source project, ask the community to
> contribute to the documentation, and then keep all these docs for yourself,
> usable only on your website. The information must be free and copyable for
> everybody, both in source code form (bug 561470) and in resulting form (this
> bug). This is highly important for very basic open source reasons and
> preservation of information.

I understand your interest in offline copies of our content, but this is not a valid criticism. Our content is licensed CC-BY-SA and we have the Kuma API as a convenience for those interested in reusing it. I understand that the API may not be ideal for your needs, but I do not agree that this puts us at odds with free software values.

We all have the same goal in mind: offer a wonderful, helpful reference for web developers. Like any team, we have limited resources and need to carefully choose efforts that maximize this goal. While I understand that you would find this feature to be valuable, our research reveals that relatively few users feel the same way, and that our effort may be better spent elsewhere.

I will leave this open for now, but will ask Ali (our product manager) to make the final call.
> Mozilla WebOps provides a weekly snapshot of MDN at

I'm linking to this at , which is where the license in the footer points at.
Good idea. I've re-enabled the offline content notice but it sounds like Dash and dochub may not serve everyone's needs. We'll leave this bug open for when we have time and/or collaboration to create a better alternative.
Dash (maybe) covers the needs for OS X users. For Windows and Linux users, I think Zeal ( is a suitable alternative. Zeal and Dash use the same doc format and Zeal users have access to all of the docs available in Dash, which includes MDN. Unfortunately, Zeal does not yet have a download manager, so users have to manually download the docs they want from

The DocHub project is a bit abandoned as far as I can tell, or at least the scrapers they use to fetch the docs from MDN don't work anymore (as per
That's interesting. We should add Zeal to our offline content notice.
In other news, I've received 76 unique visitors to and 102 pageviews in less than 24 hours, so I think it's safe to assume that the tracking results at are wrong.
Bogdan, thanks for this nice list on Kapeli!
phew... I just noticed: Within a good year, I got 2GB of access/error.log on
Seems there's demand for mirrors.
Yo, can someone post up an offline version as pdf
(In reply to vlenceleuth from comment #30)
> Yo, can someone post up an offline version as pdf

Quoting Luke from comment 21.

> Mozilla WebOps provides a weekly snapshot of MDN at
> via the
> mechanism implemented in bug 757461. Thank you for helping us with it!
MDN Web technology references like:

are NOT listed in dash and zeal app.
we need those too, in fact if you make a docset with all MND (reference and developers documentation) it will be nice.
DevDocs has been open sourced and allows for offline reading of most of MDN's content:
thibaut: very nice. Does devdocs scrape WebGL and/or WebAPI docs as Daniel suggests? Can it?
Luke: No. At the moment DevDocs only scrapes API/reference pages in /Web/HTML, /Web/CSS, /Web/JavaScript, /Web/Reference/Events, and /Web/API.

It can certainly scrape more stuff but for now I've decided to skip the guides (difficult to index) and Firefox OS pages (still experimental and I'd rather separate them into their own doc set).

I'm going to write documentation in the coming days on how to extend/contribute to DevDocs.
MDN is definitely becoming "overstretched" in scope, especially now with FirefoxOS gaining more attention as well as docpages; speking of which, I think the next step should DEFINITELY be an App with some Documentation on the marketplace, even if it's not referencing MDN synchronously at all times--references could be pulled during off-peak hours, once a working/viable method presents itself
There are too many APIs, I don't know when I can finish reading all.
   I need to talk to someone directly on the phone.  Our numbers are: (909) 629-2820, (909) 343-9591, or (909) 343-(9434.  Our computer systems and cellphones were first attached in early August 2013, and it has been a constant battle until a week ago.  When they first hit, I battled them hard, and it cost me two desktops systems and 3 laptops.  I learned that I could not beat them head on, no one can beat an entity that has the latest cutting edge equipment, software, and teams of people to work at you 24 hours a day.  So, our home became a place of extreme duress, for we were under constant monitoring and surveillance in our home.  We were tracked via our phones when ever we went our.  They toyed with us on line, and manipulated of our computers to the point that we are unable to complete a single letter or document. Up until about 5 days ago, our computers, phones, and any devices that can connect in any possible way to the internet have been under the control of a major player in the software industry.  I know that sounds crazy, but we have absolute proof, and that was what finally made them leave.

  When they were still in control, I begin working on the Aurora Beta Browser as a desperate attempt to create a safe haven with enough tools to fight the changes that were made in all our communication devices.  (Said "entities" seem to have great interest in this Browser, and have been trying to take it apart on a daily basis.)  As crazy as it sounds, we have proof of all that has transpired.  We have held our ground, however, they have left our systems a mess, and left behind "something" that is still functioning.  I was fired from my job a month ago when they followed me to work and trashed my boss's computer systems.  I have applied for work, only to find our emails misdirected, phones don't ring when called, we have been threatened, and intimidated for leaving the apartment  without our phones. Our HP WiFi printer was turned into a communication hub.  My wife and I have had enough! So if anyone can help, we need it.
Flags: needinfo?(nobody)
Whiteboard: [user-interview] → [user-interview][th]

This is not the right place to discuss your problem. Please share your feedback below instead. If you continue to update this page your account will be banned.
Flags: needinfo?(nobody)
Whiteboard: [user-interview][th] → [user-interview][th][triaged][type:feature]
Severity: normal → enhancement
(In reply to Luke Crouch [:groovecoder] from bug 883967 comment #10)
> * 0 new comments on bug 665750

this is really bad reasoning. it encourages making "me too" comments in bugzilla, which i'm pretty sure we don't want.

anyway, we have this issue with the Addon SDK documentation ever since we moved them from our github repo to the MDN (see bug 1002307).

the SDK docs are pretty self-contained, and don't depend too much on other parts of the MDN -- mainstream usage of our high-level modules doesn't require any mozilla-specific knowledge like XUL or XPConnect.

so asking addon devs to download a whole 3 gigs mirror of the MDN site doesn't seem reasonable to me.
The reasoning is that we specifically asked visitors to comment on the bug, and no-one did.

But, the experiment was done before Addon SDK docs were moved to MDN, so we could get a different result - i.e., more interest in the feature - now.

Still, this is a large feature request. And there are some alternatives:

It may be easier to ask these developers to add the Addon SDK zone to those tools.
We do still want this capability -- the ability to have MDN content for reading as an e-book, for example, has been a goal for years.
Frankly, comment 38 reads like classic spam posts do.
...Back on topic: Eric, it seems that the e-book option is one of the crucial goals that do not make bug 665750 a duplicate of bug 757461.

Because to be honest, I was tempted to call this bug a duplicate too, since what we did in 757461 *was* effectively exporting a "subset of pages" (i. e. in a tarball).
However, keeping the two bugs separate lets me assume the actual goals are different from each other in fact.
Andreas, bug 757461 wasn't a subset, it was the whole thing. A "subset" is a subset that the user defines, not one for all.
Ahh I see! Well, Ben, I think that's a matter of definition. :)

To me it's already a "subset" once you subtract all the bulky video junk and maybe some php-generated pages.

But now I see what you mean: 665750 may be aiming for single chapters of the docs, e. g. a standalone XUL set, or a standalone set about XPath (for the XUL pro who wants to learn about new ways of addressing/traversing DOM node trees) etc.
any news about this?
Nothing new. It's a massive project and we haven't prioritized it.
Excuse-me if I'm a little bit off topic, but I just found this bug and it remained me a project I quickly hacked some month ago. CORS is available on MDN, so I use a client side spider to browse the website, with some filtering on URL, and locally save it's content inside local storage. It's not perfect but I use it daily and it fits my needs.
Here's the code of this project :
There is ongoing stemming from this weekend's HackOnMDN to bring offline caching functionality to MDN via Service Workers. More on this can be found at - all thoughts and ideas welcome, please chime in!
Thanks for the update and the work, :flaki. If you like you can file more bugs under this one for the on-going work.
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at and platform bugs at
Closed: 11 years ago4 years ago
Resolution: --- → WONTFIX
Product: → Graveyard
You need to log in before you can comment on or make changes to this bug.