Closed Bug 1033221 Opened 7 years ago Closed 7 years ago

make mozilla-download and mozilla-get-url timeout in case the FTP server doesn't answer

Categories

(Firefox OS Graveyard :: Gaia::Build, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(b2g-v2.1 fixed)

RESOLVED FIXED
Tracking Status
b2g-v2.1 --- fixed

People

(Reporter: julienw, Assigned: julienw)

References

Details

(Keywords: intermittent-failure)

Attachments

(5 files)

46 bytes, text/x-github-pull-request
yurenju
: review+
Details | Review
54 bytes, text/x-github-pull-request
jlal
: review+
yurenju
: feedback+
Details | Review
54 bytes, text/x-github-pull-request
jlal
: review+
yurenju
: feedback+
Details | Review
56 bytes, text/x-github-pull-request
Details | Review
46 bytes, text/x-github-pull-request
Details | Review
we should have verbose mode for mozilla-download and enable it in Makefile.
Flags: needinfo?(yurenju.mozilla)
It happens a lot these days.

I'm not sure the issue is the lack of feedback... Travis is downloading really fast last time I checked. Maybe the server has an issue?

Maybe we should just timeout and retry.
See Also: → 979435
And actually, it looks like it's not the issue is not the download but the retrieval of the correct URL. Because we don't have the URL printed...
Attached file debug github PR
Hey Yuren, can you merge this if you're fine with it?

I hope this will give us more data point...
Attachment #8450198 - Flags: review?(yurenju.mozilla)
(Slightly related, I've filed bug 1025942 to workaround this issue that I've never really dug into)
(In reply to Anthony Ricaud (:rik) from comment #7)
> (Slightly related, I've filed bug 1025942 to workaround this issue that I've
> never really dug into)

It's not the same issue; in your case, the program finishes. In our case, it doesn't.
Comment on attachment 8450198 [details] [review]
debug github PR

merged.

https://github.com/mozilla-b2g/gaia/commit/38d193693e84164300fa83881228cf98ff947db6
Attachment #8450198 - Flags: review?(yurenju.mozilla) → review+
Flags: needinfo?(yurenju.mozilla)
Summary: mozilla-download gives no feedback while downloading, and makes travis fail → make mozilla-download and mozilla-get-url timeout in case the FTP server doesn't answer
I've changed the title of this bug.

In case it times out, the caller can decide to restart (in the tests, we can retry up to 3 or 5 times, like TBPL does btw).
Shyam: Do you know who could help us troubleshoot this FTP issue?
Flags: needinfo?(shyam)
Jake: Looking for a contact here to troubleshoot, poking at different IT people I know. You can ping Julien or I on #gaia (or needinfos).
And of course, I forgot to needinfo Jake in comment 12
Flags: needinfo?(nmaul)
Attached file mozilla-get-url PR
I don't exactly know if this will make things better, but at least this still works.
Assignee: nobody → felash
Attachment #8453148 - Flags: review?(yurenju.mozilla)
Need context here... no idea what this is or what it's trying to do.


(In reply to Julien Wajsberg [:julienw] from comment #4)
> Maybe we should just timeout and retry.

IMO This should be standard practice for anything working over the Internet. Any network really, but especially the Internet. One should always assume the network is broken, or will break at any time. Even if there's an underlying problem that can be solved this time, this should still be implemented because there will inevitably be underlying problems that we cannot solve.

I don't have context as to what this is attempting to do (from the output in the first 3 comments, looks like an Hg clone/pull, but then people are mentioning the FTP cluster) so maybe retrying is not easy somehow?

Hopefully this will happen again with debug output so we can tell better what's up. :)


(In reply to Julien Wajsberg [:julienw] from comment #5)
> And actually, it looks like it's not the issue is not the download but the
> retrieval of the correct URL. Because we don't have the URL printed...

Where is the URL retrieved from?
Flags: needinfo?(nmaul)
I tested using "npm link".

At least it still works as before, except we have now dots while downloading. I don't exactly know if the timeout changes do anything, I don't really know how to test...
Attachment #8453165 - Flags: review?(yurenju.mozilla)
Hey Jake,

here is an example of something not working: [1]

You can see it working in [2] (you need to unfold line 498). Basically, we navigate on the FTP server so that we can find the correct build to retrieve according to some requirements.

In the debug output, we can only see that "ls" never returns. I really think the connection setup itself is timing out, but I don't have much more information.

Locally (either in Mozilla office or at home) I never had any issue, but on Travis it happens a lot for some weeks.

I added some PR here to add timeouts so that we can do the retry dance.

[1] https://travis-ci.org/mozilla-b2g/gaia/jobs/29506420
[2] https://travis-ci.org/mozilla-b2g/gaia/jobs/29506417


Hope this helps, thank you for your help !
Flags: needinfo?(nmaul)
Comment on attachment 8453148 [details] [review]
mozilla-get-url PR

looks good but I'm not peer/owner for this module so redirect review to James Lal.

and for timeout, I thinkg 1 minute is a little bit short if the timeout is for downloading, but if it's for connecting timeout and that 1 minute is okay.
Attachment #8453148 - Flags: review?(yurenju.mozilla)
Attachment #8453148 - Flags: review?(jlal)
Attachment #8453148 - Flags: feedback+
Attachment #8453165 - Flags: review?(yurenju.mozilla) → review+
Attachment #8453165 - Flags: review?(jlal)
Attachment #8453165 - Flags: review+
Attachment #8453165 - Flags: feedback+
(In reply to Yuren [:yurenju] from comment #18)
> Comment on attachment 8453148 [details] [review]
> mozilla-get-url PR
> 
> looks good but I'm not peer/owner for this module so redirect review to
> James Lal.
> 
> and for timeout, I thinkg 1 minute is a little bit short if the timeout is
> for downloading, but if it's for connecting timeout and that 1 minute is
> okay.

I think it's 1 minute while we receive nothing, not 1 minute for the whole download.
Julien is right from the node docs:

>Sets the socket to timeout after timeout milliseconds of inactivity on the socket. By default net.Socket do not have a timeout.
Thanks, now I need to do the node_modules dance. ;)
Attached file gaia-node-modules PR
Will land this without review but I'll wait for a Travis+Gaia Try before landing gaia.
Attached file gaia PR
waiting for travis and gaia-try
master: da08a225b02aac0abfa0c8028e7885efc700bf40

will see how this works before uplifting to other branches.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
For Jake: we still have the issue with the FTP server though.

If the timeout works well, I'll file a separate bug to retry the download if this fails.
Looks like it was not enough, we still have issue.

I've see nthe issue when downloading xulrunner with wget too. Reading wget's manual, I see there are 3 different timeouts: for dns, connect and read. I think the timeout I added in mozilla-get-url and mozilla-download are only "read" type timeout so maybe the issue is with dns or the initial connection?
Mmm in https://travis-ci.org/mozilla-b2g/gaia/jobs/29812217 we clearly see the timeout doesn't work: we had a first data (there is a dot) but still it timed out...
(In reply to Anthony Ricaud (:rik) from comment #11)
> Shyam: Do you know who could help us troubleshoot this FTP issue?

Sorry, I didn't reply sooner. CC'ing cturra and srich for info.
Flags: needinfo?(shyam)
I contacted the Travis team to know whether they can troobleshoot on their side, but no answer from them yet.

If you have commands to try there to help debugging this, I'd be happy to run them in a pull requests.
We've been doing some work to try to improve the FTP cluster... how has this been over the last few days/weeks?
Flags: needinfo?(nmaul)
Julien: See comment 34.
Flags: needinfo?(felash)
Jake, looks like it's better now !
Flags: needinfo?(felash)
You need to log in before you can comment on or make changes to this bug.