Closed Bug 1280997 Opened 8 years ago Closed 4 years ago

Translated revisions have incorrect based_on revision

Categories

(developer.mozilla.org Graveyard :: Localization, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jwhitlock, Unassigned)

References

Details

(Keywords: in-triage)

While investigating bug 1280957, I found that the based_on revision was incorrect, and referred to a revision of a different document. This will make the translation interface even worse (comparing revisions from different documents) and may block page moves. A quick survey of the code shows this was a problem in the past, and validation code is preventing the issue.

Investigation is needed to determine:
- How widespread the problem is in current documents
- If the problem can be associated with a recent change
- How to prevent it from happening in the future
- If needed, how to fix current revisions with this issue

Developer brain dump follows:

The document 79501 is for /zh-TW/docs/Web/Guide/Performance/Using_web_workers
The document  1849 is for /en-US/docs/Web/API/Web_Workers_API/Using_web_workers, the original document

The revision 1072612 is the current revision for document 79501 (zh-TW doc)
The revision 1048638 is the current revision for document 1849 (en-US doc)

Revision 1072612 claims to be based on revision 730801. This is a revision of document 131237, /en-US/docs/Web/Reference/API_clone.  This change was made a few days previously, on June 18th.

This was fixed by manually changed the based_on revision to 911951, an old version of document 1849.  A .update() on a query set was used, because there is code in the model save() method prevents setting this to anything but the most recent revision of the English document.

based_on is a hidden form field in the translation interface, so it could be set or cleared by the client. Form and model validation did not find this error.

I wrote some code to diagnose the issue:

en_docs = Document.objects.filter(locale='en-US').exclude(current_revision__based_on__isnull=True).count()
redirect_docs = Document.objects.filter(is_redirect=True).exclude(current_revision__based_on__isnull=True).count()

same_doc = 0
diff_doc = 0

docs = Document.objects.exclude(locale='en-US').exclude(is_redirect=True).exclude(current_revision__based_on__isnull=True)
for doc_id in docs.values_list('id', flat=True):
    doc = Document.objects.get(id=doc_id)
    if doc.current_revision.based_on.document != doc.parent:
        if doc.current_revision.based_on.document == doc:
          same_doc += 1
        else:
          print "%d: %s based on %s" % (doc.id, doc.get_full_url(), doc.current_revision.based_on.document.get_full_url())
          diff_doc += 1

This code could be converted to forms that make sense on bug 1311142.

There are 20236 English documents with based_on set, and 2861 redirects with based_on set. ``based_on only makes sense in the context of translated documents. This suggests that source docs, translated docs, and redirects should be different models.

5 non-English documents were based on a different document than the parent document, and it appears the latest was edited in 2013, so there doesn't appear to be an ongoing bug. Fixing these could be a human-level task:

1316 non-English documents were based on themselves, which seems weird at first. One is https://developer.mozilla.org/fr/docs/User:SphinxKnight/Test, which was not translated from an English document. This means that a proposed "source models" doc may still need a locale column...

If docs without a parent are analyzed, the list gets shorter. These are the 4 non-English documents that are based on themselves:

For this last document, the creator was mdnwebdocs-bot. It is possible that the clean_current_revision command set the based_on to self.

Based on this analysis:

  1. There's something weird in English docs with based_on set, could use investigation but probably won't hurt anything.
  2. There's a human-scale amount of work to fix 9 docs with inconsistent data
  3. It would useful for data consistency to split up the data into source docs, translated docs, and redirects, instead of having a generic Document model.
See Also: → 1311142, 1540102
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at https://github.com/mdn/sprints/issues/ and platform bugs at https://github.com/mdn/kuma/issues/.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.