Closed Bug 1624408 Opened 6 months ago Closed 3 months ago

Downloading TMX through wget/curl interrupts mid-way


(Webtools :: Pontoon, defect, P2)



(Not tracked)



(Reporter: Pike, Assigned: jotes)



(1 file)

When downloading TMX files from Pontoon, the process interrupts when doing it through wget (and possibly curl).

Doing so in the browser works.

Let's find out why.

When command line is used to download the file, we seem to be hitting H18:

Mar 23 19:44:22 mozilla-pontoon heroku/router sock=backend at=error code=H18 desc="Server Request Interrupted"
method=GET path="/de/all-projects/de.all-projects.tmx"
request_id=9a3949f1-13d3-48dc-89a8-17542c64b94d fwd="" dyno=web.1 connect=1ms service=52243ms
status=503 bytes= protocol=https

That's how the output of curl looks:

Leopold:Downloads mathjazz$ curl -o de.all-projects.tmx
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.5M    0 12.5M    0     0   243k      0 --:--:--  0:00:52 --:--:--  205k
curl: (18) transfer closed with outstanding read data remaining

Downloading via the browser is much faster. I wonder if we're hitting Request timeout when using command line.

Relevant code:

Right, the files are malformed, they're not valid .tmx nor valid XML.

tail -n 3 de.all-projects.tmx*

==> de.all-projects.tmx <==
		<tu tuid="firefox-os-20:apps/sms/" srclang="en-US">
			<tuv xml:lang="en-US">
				<seg>{{name}} (+{{n}})</seg>

==> de.all-projects.tmx.1 <==
		<tu tuid="focus-for-android:app.po:preference_privacy_stealth_summaryhide-webpages-when-switching-apps-and-block-taking-screenshots" srclang="en-US">
			<tuv xml:lang="en-US">
				<seg>Hide webpages when 

==> de.all-projects.tmx.2 <==
			<tuv xml:lang="en-US">
				<seg>spans {{0}} columns</seg>

The motivation for using wget or curl is to download the .tmx for all locales at once.

Priority: -- → P2

Can I take this bug?

I'm trying to reproduce this problem locally and my version of Curl (7.65.3) downloads uncorrupted .tmx files. Maybe the recent update to Django 2 helped somehow (?).
The transfer speed is still slow in my case (Chrome is much faster).


I hit the same error:

Leopold:pontoon mathjazz$ curl -o de.all-projects.tmx
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.3M    0 12.3M    0     0   241k      0 --:--:--  0:00:52 --:--:--  218k
curl: (18) transfer closed with outstanding read data remaining
Assignee: nobody → poke


I wrote a small Gist with my thoughts about this issue:

The problem is caused by the speed of transfer between a client and the server. When a transfer of a TMX file takes more than 50 seconds, Gunicorn decides to terminate the worker which streams that file to the client.
In comparison, when a user tries to execute curl --compressed (uses GZIP) everything works fine and data is transferred in a few seconds.
I tried a couple of things to fix this issue (e.g. using an asynchronous Gunicorn worker), but most of them didn't introduce visible improvement.

I think there are two solutions that are worth considering for now:

  • A low hanging fruit: Increase the Guincorn's worker timeout configuration. Unfortunately, I don't know how big change it's for the Mozilla's Pontoon instance.
  • A harder one: Introduce an instance of a CDN (AWS Cloudfront/S3?) and periodically upload TMX files there.

I've created a small PR that can help with assessing the first solution.

We've increased the timeout to 120 seconds:

We've updated the docs to include the note about curl --compressed:

Closed: 3 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.