Translated revisions have incorrect based_on revision
Categories
(developer.mozilla.org Graveyard :: Localization, defect)
Tracking
(Not tracked)
People
(Reporter: jwhitlock, Unassigned)
References
Details
(Keywords: in-triage)
While investigating bug 1280957, I found that the based_on revision was incorrect, and referred to a revision of a different document. This will make the translation interface even worse (comparing revisions from different documents) and may block page moves. A quick survey of the code shows this was a problem in the past, and validation code is preventing the issue. Investigation is needed to determine: - How widespread the problem is in current documents - If the problem can be associated with a recent change - How to prevent it from happening in the future - If needed, how to fix current revisions with this issue Developer brain dump follows: The document 79501 is for /zh-TW/docs/Web/Guide/Performance/Using_web_workers The document 1849 is for /en-US/docs/Web/API/Web_Workers_API/Using_web_workers, the original document The revision 1072612 is the current revision for document 79501 (zh-TW doc) The revision 1048638 is the current revision for document 1849 (en-US doc) Revision 1072612 claims to be based on revision 730801. This is a revision of document 131237, /en-US/docs/Web/Reference/API_clone. This change was made a few days previously, on June 18th. This was fixed by manually changed the based_on revision to 911951, an old version of document 1849. A .update() on a query set was used, because there is code in the model save() method prevents setting this to anything but the most recent revision of the English document. based_on is a hidden form field in the translation interface, so it could be set or cleared by the client. Form and model validation did not find this error.
Reporter | ||
Comment 1•5 years ago
|
||
I wrote some code to diagnose the issue:
en_docs = Document.objects.filter(locale='en-US').exclude(current_revision__based_on__isnull=True).count()
redirect_docs = Document.objects.filter(is_redirect=True).exclude(current_revision__based_on__isnull=True).count()
same_doc = 0
diff_doc = 0
docs = Document.objects.exclude(locale='en-US').exclude(is_redirect=True).exclude(current_revision__based_on__isnull=True)
for doc_id in docs.values_list('id', flat=True):
doc = Document.objects.get(id=doc_id)
if doc.current_revision.based_on.document != doc.parent:
if doc.current_revision.based_on.document == doc:
same_doc += 1
else:
print "%d: %s based on %s" % (doc.id, doc.get_full_url(), doc.current_revision.based_on.document.get_full_url())
diff_doc += 1
This code could be converted to forms that make sense on bug 1311142.
There are 20236 English documents with based_on
set, and 2861 redirects with based_on
set. ``based_on only makes sense in the context of translated documents. This suggests that source docs, translated docs, and redirects should be different models.
5 non-English documents were based on a different document than the parent document, and it appears the latest was edited in 2013, so there doesn't appear to be an ongoing bug. Fixing these could be a human-level task:
- 2497: https://developer.mozilla.org/fr/docs/Web/API/API_HTML_Drag_and_Drop based on https://developer.mozilla.org/en-US/docs/DragDrop
- 13229: https://developer.mozilla.org/zh-CN/docs/Dynamically_modifying_XUL-based_user_interface based on https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XUL/Dynamically_modifying_XUL-based_user_interface
- 39022: https://developer.mozilla.org/zh-CN/docs/Mozilla/Tech/XUL/Tutorial/More_Wizards based on https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XUL/Tutorial/Tree_Box_Objects
- 39787: https://developer.mozilla.org/pl/docs/Web/CSS/outline-style based on https://developer.mozilla.org/en-US/docs/Web/CSS/-moz-outline-style
- 81761: https://developer.mozilla.org/bn-BD/docs/Project:MDN/%E0%A6%85%E0%A6%AC%E0%A6%A6%E0%A6%BE%E0%A6%A8/%E0%A6%B8%E0%A6%AE%E0%A7%8D%E0%A6%AA%E0%A6%BE%E0%A6%A6%E0%A6%95_%E0%A6%B8%E0%A6%B9%E0%A6%BE%E0%A6%AF%E0%A6%BC%E0%A6%BF%E0%A6%95%E0%A6%BE based on https://developer.mozilla.org/en-US/docs/MDN/Contribute/Editor
1316 non-English documents were based on themselves, which seems weird at first. One is https://developer.mozilla.org/fr/docs/User:SphinxKnight/Test, which was not translated from an English document. This means that a proposed "source models" doc may still need a locale column...
If docs without a parent
are analyzed, the list gets shorter. These are the 4 non-English documents that are based on themselves:
- 7024: https://developer.mozilla.org/pt-PT/docs/Web/SVG/Tutorial
- 13143: https://developer.mozilla.org/de/docs/Lokalisierung_von_Erweiterungsbeschreibungen
- 16897: https://developer.mozilla.org/ja/docs/Web/API/Element/hasAttributes
- 132293: https://developer.mozilla.org/es/docs/Web/HTML/Consejos_para_la_creaci%C3%B3n_de_p%C3%A1ginas_HTML_de_carga_r%C3%A1pida
For this last document, the creator was mdnwebdocs-bot. It is possible that the clean_current_revision
command set the based_on
to self.
Based on this analysis:
- There's something weird in English docs with
based_on
set, could use investigation but probably won't hurt anything. - There's a human-scale amount of work to fix 9 docs with inconsistent data
- It would useful for data consistency to split up the data into source docs, translated docs, and redirects, instead of having a generic Document model.
Comment 2•4 years ago
|
||
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at https://github.com/mdn/sprints/issues/ and platform bugs at https://github.com/mdn/kuma/issues/.
Updated•4 years ago
|
Description
•