Closed Bug 1190356 Opened 5 years ago Closed 5 years ago

gengo changed machine translation system (was [traceback] KeyError: 'jobs' in gengo machine_translate code)

Categories

(Input Graveyard :: Submission, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

(Whiteboard: p=3 s=input.2015q3)

Starting at 4:46am this morning, we started getting a ton of errors from celery like this:


Task fjord.translations.tasks.translate_task with id 62844a0b-c432-4281-bd14-410e97d5de10 raised exception:
"KeyError('jobs',)"


Task was called with args: ('fjord.feedback.models:Response:5529714', u'gengo_machine', u'fr-FR', 'description', u'en', 'translated_description') kwargs: {}.

The contents of the full traceback was:

Traceback (most recent call last):
  File "/data/www/input.mozilla.org/input/vendor/src/celery/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/data/www/input.mozilla.org/input/vendor/src/celery/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/data/www/input.mozilla.org/input/fjord/translations/tasks.py", line 35, in translate_task
    translate(instance, system, src_lang, src_field, dst_lang, dst_field)
  File "/data/www/input.mozilla.org/input/fjord/translations/utils.py", line 32, in translate
    trans_system.translate(instance, src_lang, src_field, dst_lang, dst_field)
  File "/data/www/input.mozilla.org/input/fjord/translations/models.py", line 255, in translate
    instance.id, lc_src, dst_lang, text)
  File "/data/www/input.mozilla.org/input/fjord/translations/gengo_utils.py", line 83, in _requires_keys
    return fun(self, *args, **kwargs)
  File "/data/www/input.mozilla.org/input/fjord/translations/gengo_utils.py", line 276, in machine_translate
    job = resp['response']['jobs']['job_1']
KeyError: 'jobs'


Need to look into this pronto.
Grabbing it now. I have no clue how big it is. Making it 2 points for now figuring it'll take me a day to figure out the issue and then figure out what to do to fix it.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Whiteboard: p=2 s=input.2015q3
Gengo is sending back this response:

{u'opstat': u'ok', u'response': {u'order_id': 1689961, u'job_count':
1, u'credits_used': u'0.00', u'currency': u'USD'}}

That looks like the order is being handled asynchronously now for machine translation requests. I can't really tell.

Anyhow, Matt gave me Spencer's (our contact at Gengo) contact information, I emailed Spencer with the details of the issue and he forwarded it to Adam and I'll find out what's going on soon.

In the meantime, I shut off gengo translations since it's possible it's not really failing and is having some side-effects on our account.
Got an email back from Adam and they changed machine translation so it's no longer synchronous.

We have two options:

1. implement a callback system where Gengo's system calls a url in Input to tell Input that the translation is done

2. rework our machine translation system so that it does what our human translation system does

Of the two, the second is way easier. Further, it probably lets us ditch some code, so that's nice.

I'm going to keep this a 2 point bug and work on it today.
Summary: [traceback] KeyError: 'jobs' in gengo machine_translate code → gengo changed machine translation system (was [traceback] KeyError: 'jobs' in gengo machine_translate code)
Landed in https://github.com/mozilla/fjord/commit/b3c4684564bd71eb3f19a9abcd67e84cfe6c818b

I'm going to push it to stage and then prod and test it live. It's really hard to test otherwise.
I pushed it to stage and prod on Wednesday and then spent a couple of days fixing minor issues. Amongst other things, the Gengo API tells you the supported languages for human translation, but not machine translation. So every time it fails in machine translation, I have to add another exception in the code.

Pretty sure we're mostly fine now, though.

Marking this FIXED.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
This ended up being a 3 point bug. Spent a bunch of time on analysis and follow-up in addition to the time rewriting the code and tests.
Whiteboard: p=2 s=input.2015q3 → p=3 s=input.2015q3
Product: Input → Input Graveyard
You need to log in before you can comment on or make changes to this bug.