Downloading TMX through wget/curl interrupts mid-way
Categories
(Webtools Graveyard :: Pontoon, defect, P2)
Tracking
(Not tracked)
People
(Reporter: Pike, Assigned: jotes)
Details
Attachments
(1 file)
When downloading TMX files from Pontoon, the process interrupts when doing it through wget
(and possibly curl).
Doing so in the browser works.
Let's find out why.
Comment 1•5 years ago
•
|
||
When command line is used to download the file, we seem to be hitting H18:
Mar 23 19:44:22 mozilla-pontoon heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted"
method=GET path="/de/all-projects/de.all-projects.tmx" host=pontoon.mozilla.org
request_id=9a3949f1-13d3-48dc-89a8-17542c64b94d fwd="109.182.195.45" dyno=web.1 connect=1ms service=52243ms
status=503 bytes= protocol=https
That's how the output of curl
looks:
Leopold:Downloads mathjazz$ curl -o de.all-projects.tmx https://pontoon.mozilla.org/de/all-projects/de.all-projects.tmx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12.5M 0 12.5M 0 0 243k 0 --:--:-- 0:00:52 --:--:-- 205k
curl: (18) transfer closed with outstanding read data remaining
Downloading via the browser is much faster. I wonder if we're hitting Request timeout when using command line.
Relevant code:
https://github.com/mozilla/pontoon/blob/master/pontoon/base/views.py#L683
Comment 2•5 years ago
|
||
Right, the files are malformed, they're not valid .tmx nor valid XML.
tail -n 3 de.all-projects.tmx*
==> de.all-projects.tmx <==
<tu tuid="firefox-os-20:apps/sms/sms.properties:thread-header-textmany" srclang="en-US">
<tuv xml:lang="en-US">
<seg>{{name}} (+{{n}})</seg>
==> de.all-projects.tmx.1 <==
<tu tuid="focus-for-android:app.po:preference_privacy_stealth_summaryhide-webpages-when-switching-apps-and-block-taking-screenshots" srclang="en-US">
<tuv xml:lang="en-US">
<seg>Hide webpages when
==> de.all-projects.tmx.2 <==
<tuv xml:lang="en-US">
<seg>spans {{0}} columns</seg>
</tu
The motivation for using wget
or curl
is to download the .tmx for all locales at once.
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
:mathjazz
Can I take this bug?
Assignee | ||
Comment 4•5 years ago
|
||
I'm trying to reproduce this problem locally and my version of Curl (7.65.3) downloads uncorrupted .tmx files. Maybe the recent update to Django 2 helped somehow (?).
The transfer speed is still slow in my case (Chrome is much faster).
Comment 5•5 years ago
|
||
Assigned.
I hit the same error:
Leopold:pontoon mathjazz$ curl -o de.all-projects.tmx https://pontoon.mozilla.org/de/all-projects/de.all-projects.tmx
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12.3M 0 12.3M 0 0 241k 0 --:--:-- 0:00:52 --:--:-- 218k
curl: (18) transfer closed with outstanding read data remaining
Comment 6•5 years ago
|
||
Assignee | ||
Comment 7•5 years ago
|
||
Hey,
I wrote a small Gist with my thoughts about this issue: https://gist.github.com/jotes/3bf97a2542153b2ad0dc24f1bffa6f59
The problem is caused by the speed of transfer between a client and the server. When a transfer of a TMX file takes more than 50 seconds, Gunicorn decides to terminate the worker which streams that file to the client.
In comparison, when a user tries to execute curl --compressed
(uses GZIP) everything works fine and data is transferred in a few seconds.
I tried a couple of things to fix this issue (e.g. using an asynchronous Gunicorn worker), but most of them didn't introduce visible improvement.
I think there are two solutions that are worth considering for now:
- A low hanging fruit: Increase the Guincorn's worker timeout configuration. Unfortunately, I don't know how big change it's for the Mozilla's Pontoon instance.
- A harder one: Introduce an instance of a CDN (AWS Cloudfront/S3?) and periodically upload TMX files there.
I've created a small PR that can help with assessing the first solution.
Comment 8•5 years ago
|
||
We've increased the timeout to 120 seconds:
https://github.com/mozilla/pontoon/pull/1643
We've updated the docs to include the note about curl --compressed
:
https://github.com/mozilla-l10n/localizer-documentation/pull/180
Updated•4 years ago
|
Description
•