Closed Bug 756547 Opened 12 years ago Closed 8 years ago

Investigate a git-based backend for the wiki

Categories

(developer.mozilla.org Graveyard :: Editing, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lorchard, Unassigned)

Details

The Kuma wiki is currently backed by MySQL tables, and implements a primitive revision control system with a data model of Documents and Revisions.

But, the more advanced we make that data model, the closer it gets to a true revision control system like git. So, why not jump ahead and think about replacing the MySQL database entirely with a git repository?

Some benefits, in particular to a distributed VCS:

* We could share and replicate the docs and doc activity onto a service like github

* We could accept documentation edits from outside the kuma wiki, from tools invented by the community

* Others could pull from our repository to build their own MDN-derived documentation tools (eg. dochub.io, but without the scraping)

Some challenges:

* I'm not sure how to scale a git repo. Maybe we could beg the github guys for some advice?

* The raw HTML source of Kuma wiki docs includes Kumascript template macros. However, it might be interesting to write an "offline builder" for the doc source. That could run through all source docs and run an instance of the kumascript service on someone's local machine to produce a pile of fully-rendered HTML files.
FWIW, we could consider Mercurial as well. It's written in Python and might be interesting. However, the git ecosystem seems much stronger, IMO.

In any case, this could bring a lot of generativity to the MDN doc base
No longer blocks: 756266
Blocks: 756266
This would be so cool! It would over-fulfill bug 561470.
This would have strong implications outside MDN, with projects like SUMO and possibly even mozilla.org becoming involved.

In terms of scalability, building a core service with git-repo-per-wiki might be the best way to go, but then we still need apps to figure out a lot that Git has no concept of:

* relationships between documents ("X is a translation of Y", "X is a hierarchical parent of Y", "X and Y are related by tags", etc)
* permissions to act on documents ("Alice can edit documents with property x but Bob can only translate documents with property x", etc)
* how to render documents (probably the easiest bit, since it's all text, but things like templates complicate) and cache the results.
* how to link to other documents. Even GitHub's git-based wikis seem to completely punt on this feature (or at least I've never seen markup to do it)--for many wikis, a "what links here" tree is really important.
* which revision to display (approval processes need something Git doesn't really do).

And that's just what I can think of. We'd probably have to/want to build a "localize this stuff" app that worked across wikis/repos, and kept track of what was a translation of what. Downstream apps like SUMO and MDN would almost certainly need to make workflow concessions.

Right now, those are the things I can think of that are different across the "similar" apps.
> strong implications outside MDN

I can't see a problem for other sites when devmo changes from its own MySQL DB to git.
I can see how trying this would help other sites that have a similar problem.

> * relationships between documents ("X is a translation of Y",
> "X is a hierarchical parent of Y", "X and Y are related by tags", etc)

git can say "A is based on B", which is true for translations.

> * how to render documents (probably the easiest bit, since it's all text,
> but things like templates complicate) and cache the results.

Irrelevant. MySQL can't render documents either. It's just a storage.

> * how to link to other documents

ditto

> * which revision to display (approval processes need something Git doesn't really do).

git has tags, which are usually used for the approved version.
You can also store somewhere else which revision of which doc is approved.

It's true that git works project-wide, not document-centric.
Note to self (and others): In case it's not obvious, one big benefit of working on this would be to naturally separate the private data from the public content. 

That is, everything in git would be public-by-default, and we could keep all the private personal administrative details hidden on the editing servers.
(In reply to Les Orchard [:lorchard] from comment #5)
> That is, everything in git would be public-by-default, and we could keep all
> the private personal administrative details hidden on the editing servers.

Doesn't get want committers identified by email address and optionally name? (And isn't that the best way to track contributions?) Or would we make publicly disclosing an email address a requirement to contribute to any git-backed wiki?
(In reply to James Socol [:jsocol, :james] from comment #6)
> (In reply to Les Orchard [:lorchard] from comment #5)
> > That is, everything in git would be public-by-default, and we could keep all
> > the private personal administrative details hidden on the editing servers.
> 
> Doesn't get want committers identified by email address and optionally name?
> (And isn't that the best way to track contributions?) Or would we make
> publicly disclosing an email address a requirement to contribute to any
> git-backed wiki?

Yeah, exactly that. Username is already disclosed, email could be (or maybe some identifier resembling an email address). I don't think git commits *require* a valid, reachable email address and our system could fudge that. Could even have an option to commit changes on behalf of anonymous users with a separate, shared anonymous identity.

I guess I'm implying that there are policy decisions that would be forced by the technical decision, but they should be made more clear just by the nature of the system.
Version: Kuma → unspecified
Component: Docs Platform → Editing
No longer blocks: 756266
Priority: -- → P2
Note mostly to self - I recently bookmarked a bunch of interesting resources for using git as a data model:

    https://pinboard.in/u:deusx/t:git
There is other wiki software that does that:
<http://www.wikimatrix.org/wiki/feature:RCS>
"Wikis which can use RCS as a datastore include Hatta, PHPWiki, MidgardWiki, SubWiki, FosWiki, TWiki, Gollum and Ikiwiki."
FWIW, I wrote up a quick project proposal page for this bug over here:

https://wiki.mozilla.org/MDN/Development/GitBackend
I know that there are discussions about the future architecture of MDN, but I'm not sure this specific bug is helpful (and new specific bugs will be created once the future architecture of MDN is defined).

Jannis, Lonnen, shouldn't we WONTFIX this?
Flags: needinfo?(jezdez)
Flags: needinfo?(chris.lonnen)
Les's work here is relevant though I think this bug is satisfied by the investigation results already in the comments.
Flags: needinfo?(chris.lonnen)
:teoli What Lonnen said, we'll be able to revisit this when we need to.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(jezdez)
Resolution: --- → FIXED
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.