analyze human vs. machine translation quality

RESOLVED FIXED

Status

Input
General
P2
normal
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: willkg, Assigned: willkg)

Tracking

Details

(Whiteboard: u=analyzer c=translations p=2 s=input.2014q3)

Attachments

(2 attachments)

We're going to switch Firefox OS from automated machine translation to automated human translation. Once we do that, we should wait a week and then grab all those responses, run them all through machine translation and then build a report that compares the two.

This will let us figure out whether automated human translation is worth it and/or whether we want to just use it in specific scenarios like certain languages, etc.
This is a P2. I should do this at the end of this week--grabbing it.
Assignee: nobody → willkg
Priority: -- → P2
I generated a CSV of locales, guessed languages, descriptions, human translation and machine translation for all the responses for non-en-US responses for Firefox OS in the last 7 days. It now occurs to me that I'm a monoglot and I haven't got a clue what I can do with this.

Any ideas on what we can do with this CSV to get some kind of quality assessment?
Flags: needinfo?(mgrimes)
I say we ask some of the experts. Adding Ralph and Hermina. I propose we have them look at the machine vs human for the languages they do speak and tell us whether human translation is considerably better.
Flags: needinfo?(rdaub)
Flags: needinfo?(mgrimes)
Flags: needinfo?(hcondei)

Comment 4

4 years ago
sure! send them to me (and Ralph) and I'll gladly have a look. I don't see any attachment here. Just one note- i'll be ooo next week so please expect a slight delay in getting back to you.
Flags: needinfo?(hcondei)
Created attachment 8462588 [details]
translation_analysis.csv
Hermina, Ralph: The file is attached.

Column explanations:

1. id: This is the id for the response in the db.

2. locale: This is the locale the user was using when filling out the form.

3. guessed language: We run the original description through Gengo's language guesser and use that as the source language. This column holds the language the guesser guessed.

4. original description: This is what the user sent to us.

5. human translation: This is the translation we got back from a human translator at Gengo.

6. machine translation: This is the translation we got back from the Gengo machine translator.

Let me know if this format is helpful. If it's not, I can probably throw a web-app around this to make it easier to work through.


My first glance suggests the machine translations are pretty mediocre to the point where they're probably not wildly useful except for terms frequency analysis and even then, it's meh. Having said that, I'm a monoglot, so I have no idea.


The outcome of this analysis should be three-fold:

1. We should decide whether we want to continue doing human translation with Gengo for now. We probably want to specify a time in the future where we want to re-evaluate.

2. We should mull over whether we should try another human translation system and/or spend the time now to build the infrastructure for contributor-based translations.

3. If we're seeing issues with the human translations that would go away by adjusting the instructions for translation (we send a "comment" to the translator with each job that gives them some context and gives them some instruction on what to do in edge cases like obvious spam and nonsense posts), we should figure those out and then update the comment.

Comment 7

4 years ago
Hi Will,

I reviewed some feedback along with human and machine translations. 

You are correct, the machine translation is not on its prime-time yet. Very rough estimate, about 20% of the machine translations I saw did not make any sense. This is mostly due to misspellings or messy grammar in the original feedback. 

The human translations I saw were fair/good quality, and the translators seemed to put good effort to make sense out of the feedback.

Example -
* Input: Actualizaciones urgente se quedo atras este equipo tantas aplicaciones nuevas que no fumcionan aqui.
* Human translation: Urgent updates this equipment is outdated so many new applications that don't work here. 	
* Machine translation: Updates urgent stayed behind this team so many new applications that do not fumcionan here.

I also noted that in some cases, the language detected was incorrect (e.g. ID=4530567 was marked as Italian, but it's Spanish).

I hope this helps. Please let me know if you have any other questions or if you'd like more feedback!

- Ralph
Flags: needinfo?(rdaub)

Comment 8

4 years ago
Created attachment 8467410 [details]
translation_analysis_(user_friendly_format).ods

Translation analysis file, turned into a spreadsheet for easier reading. =)
From Ralph's response, I contend the outcome of this bug based on the items in comment #6 is as follows:

Item 1: We should stick with human translations for now.

Item 2: I have no idea. Seems like this human translation system is good enough?

Item 3: There's no additional instructions we can provide the Gengo translators that will yield better translations.


Regarding .csv vs. .ods, .csv is a spreadsheet format and it's more ubiquitous and useful than .ods is. That's why I attached a .csv originally.

Comment 10

4 years ago
Regarding .csv - 

Ah, I didn't think about opening .csv directly with a spreadsheet software. Makes sense, thanks for the heads up! =)
Matt: Want to chime in in regards to comment #6, comment #7 and comment #9?
Flags: needinfo?(mgrimes)
Hey guys. Sorry for the radio silence. I agree with Will on the assessment. Just a couple of notes:

1. We should stick with human translation for now. Woo!

2. I do think this service seems to be good enough. However, I do think allowing contributors to fix translations could be a great option. I don't think that is priority though since we don't have a huge contributor base at this point. Can we keep that on the back burner or do we need to do it now or never?

3. The only thought I had was perhaps we could ask the translators to call out when language detection fails? I don't know enough about the mechanics of the system to know if that is doable or even helpful.
Flags: needinfo?(mgrimes)
We don't have the infrastructure for contributor-performed tasks, yet. We'd need to build that before we can do any contributor-based stuff. That's going to be a really important part of Input, but I think the earliest I can get to that is 2014q4.

After we get the infrastructure done, we have a few translation-related things we could use help with:

1. translating feedback

2. fixing existing translations

3. identifying incorrect language detection

We could add some code to handle item 3 by re-translating those things.

Anyhow, that all needs infrastructure.

For now, I'm going to call this bug done and add a stub to the wiki about doing Contributor Infrastructure in 2014q4.

Thank you everyone!
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.