Closed Bug 1842780 Opened 2 years ago Closed 1 years ago

Use OpusCleaner for sentence cleaning

Categories

(Firefox :: Translations, enhancement)

enhancement

Tracking

()

RESOLVED MOVED

People

(Reporter: marco, Unassigned)

References

()

Details

We should switch from our cleaning code to OpusCleaner for quality reasons. Different datasets have different cleaning needs (e.g. detokenizing, or datasets that don't need cleaning).
In addition, bicleaner removed short sentences, which means our models fare badly when fed with short sentences.

Blocks: 1843184
Blocks: 1843185
Status: NEW → RESOLVED
Closed: 1 years ago
Resolution: --- → MOVED
No longer blocks: 1842762, 1843184, 1843185
You need to log in before you can comment on or make changes to this bug.