Clean up localization repositories: obsolete files, comments-only files

RESOLVED FIXED

Status

defect
RESOLVED FIXED
Last year
7 months ago

People

(Reporter: flod, Assigned: flod)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

I'd like to run a script to clean up existing localization repositories from years of mess.

Reason:
* Pootle, unlike Pontoon, used to commit files with only comments to the repository.
* Some locales enabled projects like Thunderbird or SeaMonkey years ago, and files have been lingering in the repository for years.

Strategy:
* Only look at localizable files: .dtd, .properties, .ini, .inc, .ftl
* Remove obsolete files, i.e. localizable files that are not available in gecko-strings
* Parse all localizable files, and remove files that don't include any parsable string

The reason to not look at all files is that we have legitimate files around (e.g. dictionaries).

I've been running this script on a few locales without noticing particular issues, but I'd like a sanity check before running it on all repository.

Example: https://hg.mozilla.org/l10n-central/af/rev/c29ec316124a
Attachment #8956111 - Attachment mime type: text/x-python-script → text/plain
Comment on attachment 8956111 [details]
clean_hg_repository.py

lgtm.
Attachment #8956111 - Flags: feedback?(l10n) → feedback+
Thanks, I've run the script on all repository, spot checking some of them, and couldn't find anything wrong.

I will likely run this script from time to time.
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Flod just stumbled over then when checking the suite l10n de directory. Could you please back out all changes made to the l10n suite directories. We still need the removed localizations in suite for SeaMonkey 2.57. We are late with this one and need to take the central ones.
Flags: needinfo?(francesco.lodolo)
(In reply to Frank-Rainer Grahl (:frg) from comment #4)
> Flod just stumbled over then when checking the suite l10n de directory.
> Could you please back out all changes made to the l10n suite directories. We
> still need the removed localizations in suite for SeaMonkey 2.57. We are
> late with this one and need to take the central ones.

I can't back out changes only for one folder, and this has been done for almost 9 months by now (4 times).

SeaMonkey is localized using the cross-channel repository as a base, and that covers central, beta, and release. If these files have been removed, they are also not available for localization in tools like Pontoon. The same is true for shared files, which I assume SeaMonkey needs, so I'm not sure exactly sure how that would work.

I'm sorry, but I don't see how I can help. If really needed, I can exclude /suite files from future cleanups.
Flags: needinfo?(francesco.lodolo)
We had a solution for the shared files until now. Please exclude suite from further cleanups.
Flags: needinfo?(francesco.lodolo)
OK, already updated the script to exclude /suite files.
Flags: needinfo?(francesco.lodolo)
(In reply to Francesco Lodolo [:flod] from comment #7)
> OK, already updated the script to exclude /suite files.

Side note: this won't prevent localizers from removing them on their own, especially if they work on hg directly, and see these files reported as obsolete.
> and see these files reported as obsolete.

Yes but at least for de files still in use in the current tree were removed. eg. editBookmarkOverlay.dtd. The script is clearly broken.
How does the script detect if files are in use? If we take:
https://hg.mozilla.org/l10n-central/en-GB/rev/d1009e2538b4
This removes both brand.properties and brand.dtd which both seem to be very much in use:
https://dxr.mozilla.org/comm-central/search?q=brand.properties+path%3Asuite&redirect=false
https://dxr.mozilla.org/comm-central/search?q=brand.dtd+path%3Asuite&redirect=false
or am I missing something?
It uses the cross-channel repository as a reference. If the file is missing there, it's obsolete and removed
https://hg.mozilla.org/l10n/gecko-strings/

Please look at the paths before assuming that the script is broken.

(In reply to Frank-Rainer Grahl (:frg) from comment #9)
> > and see these files reported as obsolete.
> 
> Yes but at least for de files still in use in the current tree were removed.
> eg. editBookmarkOverlay.dtd. The script is clearly broken.

comm-central has comm/suite/locales/en-US/chrome/common/places/editBookmarkOverlay.dtd which maps to suite/chrome/common/places/editBookmarkOverlay.dtd in l10n repositories.

For French
https://hg.mozilla.org/l10n-central/fr/file/tip/suite/chrome/common/places/editBookmarkOverlay.dtd

That's a brand new file in a different path. If the content of the file was the same, nobody actually bothered to go through l10n repos and copy the file over the new path to avoid localizers seeing it as a new file (and translating it from scratch).

For German this file was removed: suite/chrome/common/bookmarks/editBookmarkOverlay.dtd (note the /places subfolder)
In this changeset: https://hg.mozilla.org/l10n-central/de/rev/440235f6e077

It's not available in German because nobody is working on that localization, as far as I can tell from the history of the repository.

The same goes for branding
https://hg.mozilla.org/comm-central/rev/0d275d681a5d
(In reply to Francesco Lodolo [:flod] from comment #11)
> It uses the cross-channel repository as a reference. If the file is missing
> there, it's obsolete and removed
> https://hg.mozilla.org/l10n/gecko-strings/

I presume that repo was populated from somewhere?

When an en-US localisable file is worked on does it automagically make changes to the gecko-strings repo or does something extra have to be done?
That repo includes all strings that exist in mozilla/comm central/beta/release. For more details see the old announcement post
https://groups.google.com/d/msg/mozilla.dev.l10n/_K2j7Sg0Orw/5A4GzHAKAAAJ

Generation is done via scripts manually (for now), typically once or twice a week. You can get an idea from the pushlog.
I see that for TB it has picked up mail/branding/thunderbird/locales/en-US
but not the equivalent for SM of suite/branding/seamonkey/locales/en-US
Is that from the l10n.ini file in suite/locales ?
(In reply to Frank-Rainer Grahl (:frg) from comment #4)
> Flod just stumbled over then when checking the suite l10n de directory.
> Could you please back out all changes made to the l10n suite directories. We
> still need the removed localizations in suite for SeaMonkey 2.57. We are
> late with this one and need to take the central ones.

2.57 is from https://hg.mozilla.org/releases/comm-esr60/file/tip/suite/config/version.txt ?

ESR isn't included in cross-channel so far. We intentionally did not do that for 60 because of the Fluent migration work.

I suggest to use l10n-central from the Firefox 60 revisions, which should also include the SeaMonkey work at the time. https://product-details.mozilla.org/1.0/l10n/Firefox-60.0-build2.json has a list that should be good to use, pick your subset from that?
Yes, Thunderbird has its branding directories included in l10n.toml, which SeaMonkey doesn't. So the branding is not exposed to localization or the l10n build system.
(In reply to Francesco Lodolo [:flod] from comment #7)
> OK, already updated the script to exclude /suite files.

Just to make sure we're all on the same page: I won't be removing those files, but they will still be invisible to localizers via Pontoon, because en-US doesn't have them. They won't be able to see missing strings, fix errors, and so on.

That's hardly a scenario that makes sense, given that most locales don't have access to Mercurial these days.
Flod, sorry I screwed up. the script was correct. I forgot about two changes done to comm-central. But please leave suite out of the cleanup.
You need to log in before you can comment on or make changes to this bug.