Last Comment Bug 665750 - Export a subset of pages for offline reading
: Export a subset of pages for offline reading
Product: Mozilla Developer Network
Classification: Other
Component: Wiki pages (show other bugs)
: unspecified
: All All
P3 enhancement with 20 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
: 459383 643866 1002307 1071013 1184881 (view as bug list)
Depends on: 883967
  Show dependency treegraph
Reported: 2011-06-20 14:58 PDT by John Karahalis [:openjck]
Modified: 2015-07-17 08:43 PDT (History)
39 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Description User image John Karahalis [:openjck] 2011-06-20 14:58:52 PDT
Export a subset of pages (user-selected or structure-based) for offline reading (e.g., PDF or zipped HTML). Might be able to use code from Objavi2 (
Comment 1 User image John Karahalis [:openjck] 2011-07-20 12:06:17 PDT
This is a pretty popular request on UserVoice. Only two individual requests were made, but these requests have some of the highest vote totals out of everything submitted.

Please see UserVoice for specifics on this request:
Comment 2 User image John Karahalis [:openjck] 2011-07-26 11:59:55 PDT

*** This bug has been marked as a duplicate of bug 561470 ***
Comment 3 User image Jay Patel [:jay] 2011-07-26 12:03:19 PDT
reopening.  this bug is actually quite different than the other one.

in this bug, we want to make it possible for a user to select a document or a set of documents (perhaps all the docs that show up in their search results or filtered view) and export them to .pdf or .html for viewing offline.

the other bug was more about getting a raw dump of docs data so someone could run a mirror of the wiki pages somewhere else (online).
Comment 4 User image John Karahalis [:openjck] 2011-07-26 12:12:32 PDT
Good catch. There's some overlap: 

* "I often want to work offline (notebook, in a cafe)"
* "I'd like a dump of MDC (of certain sections or all) to store on my computer, so that I have the documentation locally"
* etc.

If nothing else, it might be helpful to keep this person's thoughts in mind.
Comment 5 User image John Karahalis [:openjck] 2011-08-08 18:05:21 PDT
Sheppy talked about this in his user interview.

Sheppy explained that users should be able to take an entire subsection of the site and dump it out as a PDF. He pointed to "The JavaScript Reference" as being one example of a subsection.

He explained that users would be interested in this because they could have the content locally (for example on their laptops and iPads), which might be useful if they are flying.

Sheppy: Two additional questions on this.

1. Can you provide a few more examples of subsections?
2. Can you provide a few more use cases for this? Other than flying, when might MDN readers want this feature?
Comment 6 User image Ben Bucksch (:BenB) 2011-08-09 00:53:51 PDT
1) when they need it all the time and it's too slow to load it from the server all the time. Pageload of 1s adds up, if you do that a lot.
2) when the server is down
3) when their internet connection doesn't work
4) when they are on the beach or sitting in the park
   (and don't have 3G flatrate)
and approximately 245 other situations.
Comment 7 User image Janet Swisher 2011-10-11 12:23:02 PDT
Additional comment on UserVoice:

Claus Reinke

How about adding HTML offline cache manifests to the top-level pages (Javascript, DOM, HTML, ..)?

on Oct 10, 2011
Comment 8 User image Luke Crouch [:groovecoder] 2011-10-11 12:33:33 PDT
cache manifests is a good way to do it, and it looks like MindTouch has a service for it:
Comment 9 User image Janet Swisher 2012-03-09 07:59:31 PST
*** Bug 643866 has been marked as a duplicate of this bug. ***
Comment 10 User image Janet Swisher 2012-03-09 07:59:58 PST
*** Bug 459383 has been marked as a duplicate of this bug. ***
Comment 11 User image Andreas Eibach 2012-06-09 09:32:59 PDT
Whoops, Janet's URL requires login.

This one lets you read the article without logging in:

(Mind that "admin" in the URL! ;))
Comment 12 User image John Karahalis [:openjck] 2012-11-12 13:54:09 PST
This feature is still a little ways down from top priority, but I wanted to capture this idea for if/when we do take it on.

In bug 809514, Sheppy mentioned that it would be nice to generate an offline collection of all pages with a certain tag.
Comment 13 User image Eric Shepherd [:sheppy] 2013-06-11 07:58:07 PDT
Ideally we would offer a few ways to assemble this content:

(1) "This page and all of its subpages"
(2) "All pages that match this set of tags"

And ideally we would offer a few kinds of output:

(1) HTML for reading in-browser
(2) PDF
(3) ePub

I've listed those in priority order, I believe. I personally would love ePub since that'd work best in my e-reader of choice, but I know that it's the least broadly usable. If we can find a system that supports exporting to a variety of formats, that'd be fantastic.
Comment 14 User image Luke Crouch [:groovecoder] 2013-06-11 08:02:26 PDT
The SUMO team has an intern working on this for the Firefox support docs. We should see if we can leverage their work.
Comment 15 User image Eric Shepherd [:sheppy] 2013-07-10 12:50:52 PDT
How long as the little hard to see link about reading docs offline been there on MDN? I just noticed it for the first time. :)
Comment 16 User image David Walsh :davidwalsh 2013-07-10 12:51:36 PDT
Was just pushed today -- it isn't meant to get overwhelming amounts of attention, but it is in a prominent position.
Comment 17 User image Luke Crouch [:groovecoder] 2013-07-26 16:28:01 PDT
Per this is a low-priority bug.
Comment 18 User image John Karahalis [:openjck] 2013-08-09 17:43:00 PDT
After looking through the data (see 1) we have decided not to build this feature into MDN. Other options (like DocHub and Dash) exist for those still interested in this -- please contact me if you need any assistance setting those up.

Comment 19 User image Ben Bucksch (:BenB) 2013-08-09 22:12:57 PDT
I looked at DocHub and Dash, and both seem to provide certain API subsets only, is that correct? Furthermore, I need to install local software, correct? Does Dash even exist for Linux? It looks Mac only. DocHub seems to provide only CSS and HTML APIs, but not any of the rest of MDC. I'm not interested in any third party services.

Sorry, but my needs are not filled. I need a full copy of (one language version of) the site, same information and no parts cut out, in a way that I use offline, without extra software to be installed, only the browser to use it. Essentially, I need a tarballs with HTML files that are browsable and self-containing.

From bug 883967 comment 10:
> * 0 new comments on bug 665750
> this feature is not in high demand. Safe to ignore bug 665750

I find that quite offensive. We are not supposed to make noise on bugzilla, but just wait. I've been patiently *waiting* for years. And now that's counted against us.

I've put quite some work into this bug 561470 comment 31 (which is essentially this bug), manually set up a mirror at , which cost a lot of time to set up, but gets more and more outdated due to lack of a feed from you.

I need MDN for my work. I want response times in the ms range and get 2-3s per page load. Sometimes, it's down. Sometimes, I don't have Internet. MDC is crucial.

Both this bug here and bug 561470 are important for classical open source values: You cannot be an open source project, ask the community to contribute to the documentation, and then keep all these docs for yourself, usable only on your website. The information must be free and copyable for everybody, both in source code form (bug 561470) and in resulting form (this bug). This is highly important for very basic open source reasons and preservation of information.
Comment 20 User image Francois Guerraz 2013-08-11 10:14:37 PDT
(In reply to Ben Bucksch (:BenB) from comment #19)

I couldn't agree more with the previous comment. I would just add that *there are* places on the planet with very bad or expensive internet connection, they're usually both bad and expensive btw.
I voted for this bug when I was working in a research station in Antarctica. I set up local mirrors for all the documentation and software we were using there; all but one! PHP, python, django, etc. no problem whatsoever. MDN, despite being hugely valuable, was a nightmare to mirror and I gave up.
And think about all the developing countries! And people travelling, etc.

Ignoring this bug is really a disgrace.
Comment 21 User image Luke Crouch [:groovecoder] 2013-08-13 14:30:22 PDT
We know and appreciate how much work it takes to export MDN content. As mentioned in, Mozilla WebOps provides a weekly snapshot of MDN at via the mechanism implemented in bug 757461. Thank you for helping us with it!

WONTFIX'ing this bug is inappropriate, but the reality is that 99.9% of MDN users aren't interested enough in the feature to click a link, much less to participate in defining, discussing, or prioritizing the feature. Without that collaboration, we can't devote company resources to it yet.
Comment 22 User image John Karahalis [:openjck] 2013-08-13 14:59:12 PDT
> I've put quite some work into this bug 561470 comment 31 (which is
> essentially this bug), manually set up a mirror at ,
> which cost a lot of time to set up, but gets more and more outdated due to
> lack of a feed from you.

We appreciate the work you are doing with, and this decision is in no way meant to show anything else. We have the Kuma API for third-party projects that want to programmatically re-use our documentation. We can make improvements to the API if warranted.

> Both this bug here and bug 561470 are important for classical open source
> values: You cannot be an open source project, ask the community to
> contribute to the documentation, and then keep all these docs for yourself,
> usable only on your website. The information must be free and copyable for
> everybody, both in source code form (bug 561470) and in resulting form (this
> bug). This is highly important for very basic open source reasons and
> preservation of information.

I understand your interest in offline copies of our content, but this is not a valid criticism. Our content is licensed CC-BY-SA and we have the Kuma API as a convenience for those interested in reusing it. I understand that the API may not be ideal for your needs, but I do not agree that this puts us at odds with free software values.

We all have the same goal in mind: offer a wonderful, helpful reference for web developers. Like any team, we have limited resources and need to carefully choose efforts that maximize this goal. While I understand that you would find this feature to be valuable, our research reveals that relatively few users feel the same way, and that our effort may be better spent elsewhere.

I will leave this open for now, but will ask Ali (our product manager) to make the final call.
Comment 23 User image Ben Bucksch (:BenB) 2013-08-13 18:23:52 PDT
> Mozilla WebOps provides a weekly snapshot of MDN at

I'm linking to this at , which is where the license in the footer points at.
Comment 24 User image Luke Crouch [:groovecoder] 2013-08-14 05:15:57 PDT
Good idea. I've re-enabled the offline content notice but it sounds like Dash and dochub may not serve everyone's needs. We'll leave this bug open for when we have time and/or collaboration to create a better alternative.
Comment 25 User image Bogdan Popescu (Kapeli) 2013-08-14 13:12:08 PDT
Dash (maybe) covers the needs for OS X users. For Windows and Linux users, I think Zeal ( is a suitable alternative. Zeal and Dash use the same doc format and Zeal users have access to all of the docs available in Dash, which includes MDN. Unfortunately, Zeal does not yet have a download manager, so users have to manually download the docs they want from

The DocHub project is a bit abandoned as far as I can tell, or at least the scrapers they use to fetch the docs from MDN don't work anymore (as per
Comment 26 User image Luke Crouch [:groovecoder] 2013-08-15 07:36:31 PDT
That's interesting. We should add Zeal to our offline content notice.
Comment 27 User image Bogdan Popescu (Kapeli) 2013-08-15 07:49:01 PDT
In other news, I've received 76 unique visitors to and 102 pageviews in less than 24 hours, so I think it's safe to assume that the tracking results at are wrong.
Comment 28 User image Ben Bucksch (:BenB) 2013-08-16 05:48:03 PDT
Bogdan, thanks for this nice list on Kapeli!
Comment 29 User image Ben Bucksch (:BenB) 2013-09-09 19:41:27 PDT
phew... I just noticed: Within a good year, I got 2GB of access/error.log on
Seems there's demand for mirrors.
Comment 30 User image vlenceleuth 2013-09-19 23:41:54 PDT
Yo, can someone post up an offline version as pdf
Comment 31 User image John Karahalis [:openjck] 2013-09-21 09:56:10 PDT
(In reply to vlenceleuth from comment #30)
> Yo, can someone post up an offline version as pdf

Quoting Luke from comment 21.

> Mozilla WebOps provides a weekly snapshot of MDN at
> via the
> mechanism implemented in bug 757461. Thank you for helping us with it!
Comment 32 User image Daniel Zorro 2013-10-19 13:53:56 PDT
MDN Web technology references like:

are NOT listed in dash and zeal app.
we need those too, in fact if you make a docset with all MND (reference and developers documentation) it will be nice.
Comment 33 User image Thibaut Courouble 2013-10-24 13:08:37 PDT
DevDocs has been open sourced and allows for offline reading of most of MDN's content:
Comment 34 User image Luke Crouch [:groovecoder] 2013-10-25 06:29:05 PDT
thibaut: very nice. Does devdocs scrape WebGL and/or WebAPI docs as Daniel suggests? Can it?
Comment 35 User image Thibaut Courouble 2013-10-25 09:43:25 PDT
Luke: No. At the moment DevDocs only scrapes API/reference pages in /Web/HTML, /Web/CSS, /Web/JavaScript, /Web/Reference/Events, and /Web/API.

It can certainly scrape more stuff but for now I've decided to skip the guides (difficult to index) and Firefox OS pages (still experimental and I'd rather separate them into their own doc set).

I'm going to write documentation in the coming days on how to extend/contribute to DevDocs.
Comment 36 User image ilmostro7 2013-10-28 23:59:53 PDT
MDN is definitely becoming "overstretched" in scope, especially now with FirefoxOS gaining more attention as well as docpages; speking of which, I think the next step should DEFINITELY be an App with some Documentation on the marketplace, even if it's not referencing MDN synchronously at all times--references could be pulled during off-peak hours, once a working/viable method presents itself
Comment 37 User image Susan Hu 2013-11-05 13:21:34 PST
There are too many APIs, I don't know when I can finish reading all.
Comment 38 User image merlin510dm 2013-12-06 23:07:19 PST
   I need to talk to someone directly on the phone.  Our numbers are: (909) 629-2820, (909) 343-9591, or (909) 343-(9434.  Our computer systems and cellphones were first attached in early August 2013, and it has been a constant battle until a week ago.  When they first hit, I battled them hard, and it cost me two desktops systems and 3 laptops.  I learned that I could not beat them head on, no one can beat an entity that has the latest cutting edge equipment, software, and teams of people to work at you 24 hours a day.  So, our home became a place of extreme duress, for we were under constant monitoring and surveillance in our home.  We were tracked via our phones when ever we went our.  They toyed with us on line, and manipulated of our computers to the point that we are unable to complete a single letter or document. Up until about 5 days ago, our computers, phones, and any devices that can connect in any possible way to the internet have been under the control of a major player in the software industry.  I know that sounds crazy, but we have absolute proof, and that was what finally made them leave.

  When they were still in control, I begin working on the Aurora Beta Browser as a desperate attempt to create a safe haven with enough tools to fight the changes that were made in all our communication devices.  (Said "entities" seem to have great interest in this Browser, and have been trying to take it apart on a daily basis.)  As crazy as it sounds, we have proof of all that has transpired.  We have held our ground, however, they have left our systems a mess, and left behind "something" that is still functioning.  I was fired from my job a month ago when they followed me to work and trashed my boss's computer systems.  I have applied for work, only to find our emails misdirected, phones don't ring when called, we have been threatened, and intimidated for leaving the apartment  without our phones. Our HP WiFi printer was turned into a communication hub.  My wife and I have had enough! So if anyone can help, we need it.
Comment 39 User image John Karahalis [:openjck] 2014-03-04 09:04:07 PST

This is not the right place to discuss your problem. Please share your feedback below instead. If you continue to update this page your account will be banned.
Comment 40 User image Will Bamberg [:wbamberg] 2014-04-28 09:18:48 PDT
*** Bug 1002307 has been marked as a duplicate of this bug. ***
Comment 41 User image Tomislav Jovanovic :zombie 2014-04-28 15:55:55 PDT
(In reply to Luke Crouch [:groovecoder] from bug 883967 comment #10)
> * 0 new comments on bug 665750

this is really bad reasoning. it encourages making "me too" comments in bugzilla, which i'm pretty sure we don't want.

anyway, we have this issue with the Addon SDK documentation ever since we moved them from our github repo to the MDN (see bug 1002307).

the SDK docs are pretty self-contained, and don't depend too much on other parts of the MDN -- mainstream usage of our high-level modules doesn't require any mozilla-specific knowledge like XUL or XPConnect.

so asking addon devs to download a whole 3 gigs mirror of the MDN site doesn't seem reasonable to me.
Comment 42 User image Luke Crouch [:groovecoder] 2014-04-29 06:19:42 PDT
The reasoning is that we specifically asked visitors to comment on the bug, and no-one did.

But, the experiment was done before Addon SDK docs were moved to MDN, so we could get a different result - i.e., more interest in the feature - now.

Still, this is a large feature request. And there are some alternatives:

It may be easier to ask these developers to add the Addon SDK zone to those tools.
Comment 43 User image Dave Townsend [:mossop] 2014-04-29 08:27:10 PDT
*** Bug 1002307 has been marked as a duplicate of this bug. ***
Comment 44 User image Eric Shepherd [:sheppy] 2014-04-29 08:38:57 PDT
We do still want this capability -- the ability to have MDN content for reading as an e-book, for example, has been a goal for years.
Comment 45 User image Andreas Eibach 2014-05-15 11:11:20 PDT
Frankly, comment 38 reads like classic spam posts do.
Comment 46 User image Andreas Eibach 2014-05-15 11:21:46 PDT
...Back on topic: Eric, it seems that the e-book option is one of the crucial goals that do not make bug 665750 a duplicate of bug 757461.

Because to be honest, I was tempted to call this bug a duplicate too, since what we did in 757461 *was* effectively exporting a "subset of pages" (i. e. in a tarball).
However, keeping the two bugs separate lets me assume the actual goals are different from each other in fact.
Comment 47 User image Ben Bucksch (:BenB) 2014-05-15 11:47:44 PDT
Andreas, bug 757461 wasn't a subset, it was the whole thing. A "subset" is a subset that the user defines, not one for all.
Comment 48 User image Andreas Eibach 2014-05-27 01:12:32 PDT
Ahh I see! Well, Ben, I think that's a matter of definition. :)

To me it's already a "subset" once you subtract all the bulky video junk and maybe some php-generated pages.

But now I see what you mean: 665750 may be aiming for single chapters of the docs, e. g. a standalone XUL set, or a standalone set about XPath (for the XUL pro who wants to learn about new ways of addressing/traversing DOM node trees) etc.
Comment 49 User image Daniele "Mte90" Scasciafratte 2014-08-25 13:02:03 PDT
any news about this?
Comment 50 User image Luke Crouch [:groovecoder] 2014-08-26 15:51:57 PDT
Nothing new. It's a massive project and we haven't prioritized it.
Comment 51 User image Clochix 2014-09-24 04:38:09 PDT
Excuse-me if I'm a little bit off topic, but I just found this bug and it remained me a project I quickly hacked some month ago. CORS is available on MDN, so I use a client side spider to browse the website, with some filtering on URL, and locally save it's content inside local storage. It's not perfect but I use it daily and it fits my needs.
Here's the code of this project :
Comment 52 User image Florian Scholz [:fscholz] (MDN) 2014-10-10 06:53:28 PDT
*** Bug 1071013 has been marked as a duplicate of this bug. ***
Comment 53 User image Szmozsánszky István [:flaki] 2015-03-31 03:18:45 PDT
There is ongoing stemming from this weekend's HackOnMDN to bring offline caching functionality to MDN via Service Workers. More on this can be found at - all thoughts and ideas welcome, please chime in!
Comment 54 User image Luke Crouch [:groovecoder] 2015-04-02 07:11:41 PDT
Thanks for the update and the work, :flaki. If you like you can file more bugs under this one for the on-going work.
Comment 55 User image Luke Crouch [:groovecoder] 2015-07-17 08:43:07 PDT
*** Bug 1184881 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.