All users were logged out of Bugzilla on October 13th, 2018

make mozilla-download and mozilla-get-url timeout in case the FTP server doesn't answer

RESOLVED FIXED

Status

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: julienw, Assigned: julienw)

Tracking

({intermittent-failure})

unspecified
x86_64
Linux
intermittent-failure

Firefox Tracking Flags

(b2g-v2.1 fixed)

Details

Attachments

(5 attachments)

we should have verbose mode for mozilla-download and enable it in Makefile.
Flags: needinfo?(yurenju.mozilla)
(Assignee)

Comment 4

4 years ago
It happens a lot these days.

I'm not sure the issue is the lack of feedback... Travis is downloading really fast last time I checked. Maybe the server has an issue?

Maybe we should just timeout and retry.
See Also: → bug 979435
(Assignee)

Comment 5

4 years ago
And actually, it looks like it's not the issue is not the download but the retrieval of the correct URL. Because we don't have the URL printed...
(Assignee)

Comment 6

4 years ago
Created attachment 8450198 [details] [review]
debug github PR

Hey Yuren, can you merge this if you're fine with it?

I hope this will give us more data point...
Attachment #8450198 - Flags: review?(yurenju.mozilla)
(Slightly related, I've filed bug 1025942 to workaround this issue that I've never really dug into)
(Assignee)

Comment 8

4 years ago
(In reply to Anthony Ricaud (:rik) from comment #7)
> (Slightly related, I've filed bug 1025942 to workaround this issue that I've
> never really dug into)

It's not the same issue; in your case, the program finishes. In our case, it doesn't.
Comment on attachment 8450198 [details] [review]
debug github PR

merged.

https://github.com/mozilla-b2g/gaia/commit/38d193693e84164300fa83881228cf98ff947db6
Attachment #8450198 - Flags: review?(yurenju.mozilla) → review+
Flags: needinfo?(yurenju.mozilla)
(Assignee)

Updated

4 years ago
Summary: mozilla-download gives no feedback while downloading, and makes travis fail → make mozilla-download and mozilla-get-url timeout in case the FTP server doesn't answer
(Assignee)

Comment 10

4 years ago
I've changed the title of this bug.

In case it times out, the caller can decide to restart (in the tests, we can retry up to 3 or 5 times, like TBPL does btw).
Shyam: Do you know who could help us troubleshoot this FTP issue?
Flags: needinfo?(shyam)
Jake: Looking for a contact here to troubleshoot, poking at different IT people I know. You can ping Julien or I on #gaia (or needinfos).
And of course, I forgot to needinfo Jake in comment 12
Flags: needinfo?(nmaul)
(Assignee)

Comment 14

4 years ago
Created attachment 8453148 [details] [review]
mozilla-get-url PR

I don't exactly know if this will make things better, but at least this still works.
Assignee: nobody → felash
Attachment #8453148 - Flags: review?(yurenju.mozilla)

Comment 15

4 years ago
Need context here... no idea what this is or what it's trying to do.


(In reply to Julien Wajsberg [:julienw] from comment #4)
> Maybe we should just timeout and retry.

IMO This should be standard practice for anything working over the Internet. Any network really, but especially the Internet. One should always assume the network is broken, or will break at any time. Even if there's an underlying problem that can be solved this time, this should still be implemented because there will inevitably be underlying problems that we cannot solve.

I don't have context as to what this is attempting to do (from the output in the first 3 comments, looks like an Hg clone/pull, but then people are mentioning the FTP cluster) so maybe retrying is not easy somehow?

Hopefully this will happen again with debug output so we can tell better what's up. :)


(In reply to Julien Wajsberg [:julienw] from comment #5)
> And actually, it looks like it's not the issue is not the download but the
> retrieval of the correct URL. Because we don't have the URL printed...

Where is the URL retrieved from?
Flags: needinfo?(nmaul)
(Assignee)

Comment 16

4 years ago
Created attachment 8453165 [details] [review]
mozilla-download github PR

I tested using "npm link".

At least it still works as before, except we have now dots while downloading. I don't exactly know if the timeout changes do anything, I don't really know how to test...
Attachment #8453165 - Flags: review?(yurenju.mozilla)
(Assignee)

Comment 17

4 years ago
Hey Jake,

here is an example of something not working: [1]

You can see it working in [2] (you need to unfold line 498). Basically, we navigate on the FTP server so that we can find the correct build to retrieve according to some requirements.

In the debug output, we can only see that "ls" never returns. I really think the connection setup itself is timing out, but I don't have much more information.

Locally (either in Mozilla office or at home) I never had any issue, but on Travis it happens a lot for some weeks.

I added some PR here to add timeouts so that we can do the retry dance.

[1] https://travis-ci.org/mozilla-b2g/gaia/jobs/29506420
[2] https://travis-ci.org/mozilla-b2g/gaia/jobs/29506417


Hope this helps, thank you for your help !
Flags: needinfo?(nmaul)
Comment on attachment 8453148 [details] [review]
mozilla-get-url PR

looks good but I'm not peer/owner for this module so redirect review to James Lal.

and for timeout, I thinkg 1 minute is a little bit short if the timeout is for downloading, but if it's for connecting timeout and that 1 minute is okay.
Attachment #8453148 - Flags: review?(yurenju.mozilla)
Attachment #8453148 - Flags: review?(jlal)
Attachment #8453148 - Flags: feedback+
Attachment #8453165 - Flags: review?(yurenju.mozilla) → review+
Attachment #8453165 - Flags: review?(jlal)
Attachment #8453165 - Flags: review+
Attachment #8453165 - Flags: feedback+
(Assignee)

Comment 19

4 years ago
(In reply to Yuren [:yurenju] from comment #18)
> Comment on attachment 8453148 [details] [review]
> mozilla-get-url PR
> 
> looks good but I'm not peer/owner for this module so redirect review to
> James Lal.
> 
> and for timeout, I thinkg 1 minute is a little bit short if the timeout is
> for downloading, but if it's for connecting timeout and that 1 minute is
> okay.

I think it's 1 minute while we receive nothing, not 1 minute for the whole download.
Julien is right from the node docs:

>Sets the socket to timeout after timeout milliseconds of inactivity on the socket. By default net.Socket do not have a timeout.
(Assignee)

Comment 23

4 years ago
Thanks, now I need to do the node_modules dance. ;)
(Assignee)

Comment 25

4 years ago
Created attachment 8454378 [details] [review]
gaia-node-modules PR

Will land this without review but I'll wait for a Travis+Gaia Try before landing gaia.
(Assignee)

Comment 27

4 years ago
Created attachment 8454379 [details] [review]
gaia PR

waiting for travis and gaia-try
(Assignee)

Comment 28

4 years ago
master: da08a225b02aac0abfa0c8028e7885efc700bf40

will see how this works before uplifting to other branches.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
(Assignee)

Updated

4 years ago
status-b2g-v2.1: --- → fixed
(Assignee)

Comment 29

4 years ago
For Jake: we still have the issue with the FTP server though.

If the timeout works well, I'll file a separate bug to retry the download if this fails.
(Assignee)

Comment 30

4 years ago
Looks like it was not enough, we still have issue.

I've see nthe issue when downloading xulrunner with wget too. Reading wget's manual, I see there are 3 different timeouts: for dns, connect and read. I think the timeout I added in mozilla-get-url and mozilla-download are only "read" type timeout so maybe the issue is with dns or the initial connection?
(Assignee)

Comment 31

4 years ago
Mmm in https://travis-ci.org/mozilla-b2g/gaia/jobs/29812217 we clearly see the timeout doesn't work: we had a first data (there is a dot) but still it timed out...
(In reply to Anthony Ricaud (:rik) from comment #11)
> Shyam: Do you know who could help us troubleshoot this FTP issue?

Sorry, I didn't reply sooner. CC'ing cturra and srich for info.
Flags: needinfo?(shyam)
(Assignee)

Comment 33

4 years ago
I contacted the Travis team to know whether they can troobleshoot on their side, but no answer from them yet.

If you have commands to try there to help debugging this, I'd be happy to run them in a pull requests.

Comment 34

4 years ago
We've been doing some work to try to improve the FTP cluster... how has this been over the last few days/weeks?
Flags: needinfo?(nmaul)
Julien: See comment 34.
Flags: needinfo?(felash)
(Assignee)

Comment 36

4 years ago
Jake, looks like it's better now !
Flags: needinfo?(felash)
You need to log in before you can comment on or make changes to this bug.