Closed
Bug 1311142
Opened 9 years ago
Closed 5 years ago
Periodically detect and report on data consistency issues
Categories
(developer.mozilla.org Graveyard :: General, enhancement, P2)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: jwhitlock, Unassigned)
References
Details
(Keywords: in-triage, Whiteboard: [specification][type:feature])
What problem would this feature solve?
======================================
Data in MDN can become inconsistent due to many and often unknown causes, and cleanup usually requires direct manipulation or a data migration. Once the cleanup is done, it would be good to detect if the inconsistent data returns, to keep the data clean and to assist in determining the root causes.
Who has this problem?
=====================
Staff contributors to MDN
How do you know that the users identified above have this problem?
==================================================================
Data cleanup is a periodic concern of staff members. Some issues can be handled quickly (in less than an hour). Others (like bug 1311063) take more effort and have to be integrated into other document update plans.
How are the users identified above solving this problem now?
============================================================
For the most important content issues, dashboards are used to monitor progress:
https://developer.mozilla.org/en-US/docs/MDN/Doc_status/Overview
Other issues have task-specific dashboards, such as French pages that are not based on English documents:
https://developer.mozilla.org/fr/docs/without-parent
Monitoring these requires staff members to periodically visit the pages, or copy them to tracking spreadsheets.
Do you have any suggestions for solving the problem? Please explain in detail.
==============================================================================
A backend task could scan for "solved" issues, and alert staff 1) if it reappears 2) the scope of the issue, and 3) some sample items, and 4) Automatically fix issues if possible.
For example, some pages are redirects to other pages. The content of the page is a redirect to the new page, and an is_redirect flag should be set. A periodic task could analyze pages, and report inconsistencies:
Error: Page content is a redirect, but is_redirect=False.
Pages: 12
3 oldest:
* /link/to/first
* /link/to/second
* /link/to/third
The results could be sent in a weekly email to admins.
Is there anything else we should know?
======================================
Some ideas for this task:
* Redirect inconsistencies
* Shorten redirect chains
* Find users with a password
* Report on deleted files
* Purge deleted files older than X days
| Reporter | ||
Comment 1•9 years ago
|
||
Another thing to check: child page's slug does not extend parent topic's slug in the expected way.
For example, we'd expect "en-US/docs/Reference/Foo" to have a parent topic of "en-US/docs/Reference". It would be a problem if the parent topic was instead "en-US/docs/ImportantPages", since slug manipulation is currently the only way to walk "up" a content tree using the API. When this occurs, it is usually a symptom of a partially executed page move or direct database manipulation.
| Reporter | ||
Comment 2•8 years ago
|
||
Issue: English page has a translation "parent"
Detection: Document.objects.filter(locale='en-US').exclude(parent__isnull=True).count() > 0
What breaks: Users are unable to translation the page. An English page is shown in the list of translated pages.
See: bug 1331319, bug 918909
| Reporter | ||
Comment 3•8 years ago
|
||
Issue: Document slug contains URL-escape sequences
Detection: Document.objects.filter(slug__contains='%25').count() > 0
What breaks: KumaScript is unable to fetch page source for rendering. Apache may be unable to display the document. Redirects (zones, other) may not work.
See: bug 1343020
| Reporter | ||
Comment 4•8 years ago
|
||
Issue: Document has no current_revision
Detection: Document.objects.filter(current_revision__isnull=True).count() > 0
What breaks: $json API for original and translations
See: https://sentry.prod.mozaws.net/operations/mdn-prod/issues/385062
| Reporter | ||
Comment 5•8 years ago
|
||
Issue: $json data has a locale that does not match the document locale
Detection (per locale):
total = 0
for locale in settings.MDN_LANGUAGES:
total += Document.objects.filter(locale=locale).exclude(json__contains='"locale": "%s"' % locale)
total > 0
What breaks: Sample DB scraper, maybe other tools
May be correlated with the first translation of a document.
| Reporter | ||
Comment 6•8 years ago
|
||
Update: previous code would also "detect" Documents without a populated json field. Fixed detection code:
total = 0
for locale in settings.MDN_LANGUAGES:
locale_issues = (Document.objects.filter(locale=locale)
.exclude(json="")
.exclude(json__isnull=True)
.exclude(json__contains='"locale": "%s"' % locale))
total += locale_issues.count()
problem = total > 0
There were about 800 pages with this issue.
| Reporter | ||
Comment 7•8 years ago
|
||
Issue: Translated document has no current_revision
Detection: Document.objects.filter(current_revision__isnull=True).count() > 0
What breaks: The "unlocalized" URL will redirect to the broken page and raise an ISE
See: https://sentry.prod.mozaws.net/operations/mdn-prod/issues/387499/
| Reporter | ||
Comment 8•8 years ago
|
||
Issue: Document has no revisions
Detection: Document.objects.annotate(num_revisions=Count('revisions')).filter(num_revisions=0).count > 0
What breaks:
- If the document is a translation, it is not possible to edit the translation.
- If it is a deleted document (Document.deleted_objects.filter...), then recovery may not be possible
- Will also have no current revision, and has the same issues
See: https://sentry.prod.mozaws.net/operations/mdn-prod/issues/387499/
Fix for one known cause tracked in bug 1366038.
| Reporter | ||
Comment 9•8 years ago
|
||
Issue: Profile has multiple primary emails
Detection: EmailAddress.objects.filter(primary=True).annotate(user_count=Count('user')).filter(user_count__gt=1).count() > 1
What breaks:
- A user is unable to confirm a new email address (such as bug 1392082)
- It can be ambiguous which email is the primary for contacting the user. Most code uses the email on the User record, however.
See: https://sentry.prod.mozaws.net/operations/mdn-prod/issues/635312/
| Reporter | ||
Comment 10•8 years ago
|
||
Issue: Translation is a redirect
Detection: Document.objects.filter(is_redirect=True).exclude(parent__isnull=True).count()
What breaks:
- A user starting on the English page can get to the translation via the redirect, but they can't get back to the English page via the translation menu
- The edit menu is for a stand-alone document, not for a translated document. This makes it impossible to keep the translation up to date
- It it unclear if a user is able to fix the parent relation using the editing interface
This is the root cause for bug 1410245, covering one instance. There appear to be 468 documents in this state, so there is probably an underlying bug in page moves.
| Reporter | ||
Comment 11•8 years ago
|
||
The bug for #c10 is bug 1412045, not the one reported.
| Reporter | ||
Comment 12•8 years ago
|
||
Issue: Translation is associated with a redirect
Detection: Document.objects.exclude(is_redirect=True).filter(parent__is_redirect=True).count() > 0
What breaks:
- The translation becomes out of sync with the parent document.
- It becomes possible to create a fresh translation of the page. The translation that is left behind then needs to be deleted, losing those contributions.
There's currently 1382 of these documents. Bug 1416376 is an example.
| Reporter | ||
Comment 13•8 years ago
|
||
Issue: Slug ends in a slash
Detection: Document.objects.filter(slug__endswith='/').count() > 0
What breaks:
- The resolver removes the ending slash and redirects to the URL without, which may not exist.
I'm unable to create these documents using the new page interface, but there may be a bug in the page move code or other interfaces. There's currently 8 of these documents.
| Reporter | ||
Comment 14•8 years ago
|
||
Bug 1418880 tracks the root cause of "Translation is associated with a redirect" in comment 12.
Updated•6 years ago
|
Priority: -- → P2
Comment 15•5 years ago
|
||
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at https://github.com/mdn/sprints/issues/ and platform bugs at https://github.com/mdn/kuma/issues/.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX
Updated•5 years ago
|
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•