Closed Bug 1825849 Opened 1 year ago Closed 9 months ago

Investigate performance, memory usage and translation quality without using a lexical shortlist

Categories

(Firefox :: Translations, enhancement)

enhancement

Tracking

()

RESOLVED MOVED

People

(Reporter: marco, Unassigned)

Details

Attachments

(1 file)

A lexical shortlist should improve performance but decrease translations quality. We should check what happens if we drop it.

Summary: Investigate performance and memory usage without using a lexical shortlist → Investigate performance, memory usage and translation quality without using a lexical shortlist

Using the native engine, removing the lexical shortlist results in a 4 MB memory reduction on a simple sentence translation. Translating the WMT13 test set takes basically the same time with or without the lexical shortlist. The quality of the translations is marginally better (removing some weird morphology and slightly reducing some too literal translations).

The severity field is not set for this bug.
:anatal, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(anatal)
Flags: needinfo?(anatal)
Severity: -- → S3

I'm currently processing quality evaluation tests at this moment.

For what purpose? I don't see a reason for that.

Flags: needinfo?(anatal)

I was thinking of a couple of things:

  1. We recently introduced COMET and we need to understand it better, this quick analysis would help (e.g. we can see which alternative translations for the same sentence yield the same scores or yield very different scores; we can make sure there is no improvement/regression that isn't covered by COMET scores);
  2. To double check the results and make sure there actually isn't any difference between the quality of the translations, so we can get rid of the shortlist for reduced memory usage. In theory there should be a difference, so we should double check the result that tells us otherwise.

We did this together yesterday, could you attach the diff?

Next, we should figure out if what we did (simply removing the shortlist file) is enough, or if some changes during training are actually necessary.

Left file contains translations that used the lexical shortlist and right ones didn't use

diff -U 0 flores-dev.bergamot.en  prod/pt-en/flores-dev.bergamot.en
--- flores-dev.bergamot.en	2023-04-26 09:52:52.032673323 -0700
+++ prod/pt-en/flores-dev.bergamot.en	2023-04-14 00:11:14.000000000 -0700
@@ -1 +1 @@
-On Monday, scientists at Stanford University School of Medicine announced the invention of a new diagnostic tool that can classify cells by type: a tiny printable chip that can be manufactured using standard inkjet printers for possibly about a penny each.
+On Monday, scientists at Stanford University School of Medicine announced the invention of a new diagnostic tool that can classify cells by type: a tiny printable chip that can be manufactured using standard inkjet printers for possibly about a penny of dollars each.
@@ -12 +12 @@
-The record for direct clashes of Nadal against the Canadian athlete is 7–2.
+The record of direct clashes of Nadal against the Canadian athlete is 7–2.
@@ -23 +23 @@
-Despite these accusations, Ma won with off guard through a speech advocating closer ties with mainland China.
+Despite these accusations, Ma won with off hand through a speech advocating closer ties with mainland China.
@@ -31 +31 @@
-The prison became notorious after the discovery of assault on prisoners after U.S. forces took over.
+The prison became notorious after the discovery of a prisoner abuse after U.S. forces took over.
@@ -45 +45 @@
-Fred is the strongest tropical cyclone on record in the southern and eastern end of the Atlantic since the creation of satellite imagery and is only the third significant hurricane recorded east of the 35th ?medium.
+Fred is the strongest tropical cyclone on record in the southern and eastern end of the Atlantic since the creation of satellite imagery and is only the third significant hurricane recorded east of the 35th ?medician.
@@ -50 +50 @@
-The New Zealand police had trouble using their speed radar weapons to see how fast Mr. Reid was going because of how low Black Beauty is, and the only time the police managed to score Mr. Reid was when he decreased to 160km/h.
+The New Zealand police had trouble using their radar speed guns to see just how fast Mr. Reid was going because of how low Black Beauty is, and the only time the police managed to score Mr. Reid was when he decreased to 160km/h.
@@ -59 +59 @@
-Police said Lo Piccolo had the advantage because it had been Provenzano's right-hand man in Palermo and his larger experience earned him the respect of the older generation of bosses, who followed Provenzano's policy of maintaining maximum discretion and, at the same time, strengthening his network of power.
+Police said Lo Piccolo had the advantage because it had been Provenzano's right-hand man in Palermo and his larger experiment earned him the respect of the older generation of bosses, who followed Provenzano's policy of maintaining maximum discretion and, at the same time, strengthening his power network.
@@ -69 +69 @@
-Tenants in Lockwood Gardens believe there must be 40 other families or more facing evictions, since they learned that OHA police are also investigating other public housing properties in Oakland that may have been caught in a housing scheme.
+Tenants in Lockwood Gardens believe there must be 40 other families or more facing evictions since they learned that OHA police are also investigating other public housing properties in Oakland that may have been caught on a housing scheme.
@@ -74 +74 @@
-A few weeks ago, after the information released by journalist Makis Triantafylopoulos on his popular TV program "Zougla", on the Alpha TV station, Petros Mantouvalos, a parliamentarian and lawyer, abdicated as members of his cabinet had been involved with illegal bribery and corruption.
+A few weeks ago, after the information released by journalist Makis Triantafylopoulos on his popular TV show "Zougla", on the Alpha TV station, Petros Mantouvalos, a parliamentarian and lawyer, abdicated as members of his cabinet had been involved in illegal bribery and corruption.
@@ -83 +83 @@
-It was believed that this bird of hot blooded, completely feathered, walked straight on two legs with claws like a Velociraptor.
+It was believed that this bird of hot blooded, completely feathered, walked straight on top of two legs with claws like a Velociraptor.
@@ -126 +126 @@
-Since its inception, The Onion has become a true empire of parodies of news, with a print edition, a website that attracted 5,000,000 unique visitors in the month of October, personal ads, a 24-hour news network, podcasts and a recently launched worldwide atlas, called Our Stupid World.
+Since its inception, The Onion has become a true empire of parody of news, with a print edition, a website that attracted 5,000,000 unique visitors in the month of October, personal ads, a 24-hour news network, podcasts and a recently launched worldwide atlas, called Our Stupid World.
@@ -242 +242 @@
-The Chandrayaan-1 unmanned lunar orbiter ejected its Moon Impact Probe (MIP), which crossed the Moon's surface at 1.5 km per second (3,000 miles per hour), and successfully landed near the lunar south pole.
+The Chandrayaan-1 unmanned lunar orbiter ejected its Moon Impact Probe (MIP), which crossed the Moon's surface at 1.5 km per second (3,000 miles per hour), and successfully landed near the lunar south pole.
@@ -340 +340 @@
-He was immediately treated by the on-call medical staff and transported to a local hospital where he later died.
+He was immediately served by medical staff on duty and transported to a local hospital, where he later died.
@@ -355 +355 @@
-The blade of ice skates has two edges with a cavity in the middle. They allow better adoring to the ice, even when tilted.
+The blade of ice skates has two edges with a cavity in the middle. They allow better arturing to ice, even when tilted.
@@ -432 +432 @@
-Turkey is surrounded by seas on three sides: the Aegean Sea to the west, the Black Sea to the north, and the Mediterranean Sea to the south.
+Turkey is surrounded by seas on three sides: the Aegean Sea to the west, the Black Sea to the north and the Mediterranean Sea to the south.
@@ -456 +456 @@
-This is a minor problem as lens manufacturers achieve better standards in production.
+This is if a minor problem, as lens manufacturers achieve better standards in production.
@@ -466 +466 @@
-Even surrounded by ruins thousands of years old, it is easy to remember the noises and aromas of long-lost battles, almost hear the noise of hooves on the paving stones and even smelling the dread succumbing from the dungeons.
+Even surrounded by ruins thousands of years old, it is easy to remember the noises and aromas of long-lost battles, almost hear the noise of hooves on the pavement stones and even smell the dread succumbing from the dungeons.
@@ -491 +491 @@
-Although in the end, Krushchev sent tanks to restore order, he ceded some economic demands and agreed to appoint the ninth-right Wladyslaw Gomulka as the new prime minister.
+Although in the end, Krushchev sent tanks to restore order, he ceded some economic demands and agreed to appoint the ninth-grade Wladyslaw Gomulka as the new prime minister.
@@ -519 +519 @@
-The human hand is shorter than the foot, with more straight phalanges.
+The human hand is shorter than foot, with more straight phalanges.
@@ -623 +623 @@
-Thus, recalling previous cases of entrepreneurial behavior and resulting successes has encouraged people to allow themselves new changes and new directions to the local church.
+Thus, remembering previous cases of entrepreneurial behavior and resulting successes has encouraged people to allow themselves new changes and new directions for the local church.
@@ -652,2 +652,2 @@
-As mentioned earlier, although the term "Eskimo" continues to be accepted in the U.S., it is considered pejorative by many Arctic peoples outside the U.S., especially in Canada.
-Although you can hear the word used by Greenland natives, its use should be avoided by foreigners.
+As mentioned earlier, although the term "Esquimo" continues to be accepted in the US, it is considered pejorative by many Arctic peoples outside the U.S., especially in Canada.
+Although you may hear the word used by Greenland natives, its use should be avoided by foreigners.
@@ -688 +688 @@
-The fabulous riches of the tomb are no longer in it, but have been removed to the Egyptian Museum in Cairo.
+The fabulous riches of the tomb are no longer found in it, but have been removed to the Egyptian Museum in Cairo.
@@ -703 +703 @@
-Your passport must be valid for at least 6 months of your travel dates. A one-way or round trip is required to prove your length of stay.
+Your passport must be valid for at least 6 months of your travel dates. A one-way or round trip passage is required to prove your length of stay.
@@ -722 +722 @@
-The island was first inhabited by the Taíno and the Caribbean. The Caribbean were a speaking people of the Arawak language that arrived there around 10,000 BC.
+The island was first inhabited by the Taíno and the Caribbean. The Caribbean were a speaking people of the Azare language that arrived there around 10,000 BC.
@@ -800 +800 @@
-Moose are not inherently aggressive, but will defend themselves if they perceive a threat.
+The elk are not inherently aggressive, but will defend themselves if they perceive a threat.
@@ -821 +821 @@
-This is something you always need to remember to avoid disappointment, or perhaps even a heartbreak with the local ways of doing things.
+This is something you always need to remember to avoid disappointments, or perhaps even a heartbreak with local ways of doing things.
@@ -832 +832 @@
-On night trains, passports can be collected by the driver so that you do not have their sleep interrupted.
+On night trains, passports may be collected by the driver so that you do not have their sleep interrupted.
@@ -893 +893 @@
-It is snow compacted with cracks filled and marked by flags. It can only be traveled by specialized tractors, transporting sleds with fuel and supplies.
+It is snow compacted with cracks filled and marked by flags. It can only be traversed by specialized tractors, carrying sleds with fuel and supplies.
@@ -969 +969 @@
-Through the centuries, in the wrinkled and clegyy landscape, terraces have been built, carefully, by the people, going to the cliffs overlooking the ocean.
+Through the centuries, in the wrinkled and clear landscape, terraces have been built, carefully, by people, going to the cliffs overlooking the ocean.
@@ -980 +980 @@
-This can be dangerous if the traveler chases the mirage, wasting precious energy and the remaining water.
+This can be dangerous if the traveler pursues the mirage, wasting precious energy and the remaining water.
@@ -995 +995 @@
-Only a few airlines still offer grieving fur, which give a small discount on the cost of the last minute trip for a funeral.
+Only a few airlines still offer grieving fares, which give a small discount on the cost of the last minute trip for a funeral.
No longer blocks: fx-translation
Blocks: 1842762
Severity: S3 → --
Type: defect → enhancement
No longer blocks: 1842762
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → MOVED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: