Closed Bug 1088020 Opened 10 years ago Closed 9 years ago

mozregression network operation hangs in getting mozilla-inbound builds.

Categories

(Testing :: mozregression, defect)

x86_64
Windows 8.1
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jujjyl, Unassigned)

References

Details

Attachments

(2 files)

On a Windows mozilla-build shell, type mozregression --good=2014-09-22 --bad=2014-09-23 --bits=32 This prints: $ mozregression --good=2014-09-22 --bad=2014-09-23 --bits=32 Got as far as we can go bisecting nightlies... Ensuring we have enough metadata to get a pushlog... Getting http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-22-03-02-04-mozilla-central/firefox-35.0a1.en-US.win32.txt Getting http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-23-16-59-02-mozilla-central/firefox-35.0a1.en-US.win32.txt Last good revision: 5bd6e09f074e (2014-09-22) First bad revision: afc933adf723 (2014-09-23) Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5bd6e09f074e&tochange=afc933adf723 ... attempting to bisect inbound builds (starting from previous week, to make sure no inbound revision is missed) Getting http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-15-03-02-04-mozilla-central/firefox-35.0a1.en-US.win32.txt Getting inbound builds between bf5fcc0c4b27 and afc933adf723 Retrieving valid builds from u'http://inbound-archive.pub.build.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1411540853/' generated an exception: 504 Server Error: Gateway Timeout Is the server http://inbound-archive.pub.build.mozilla.org/ experiencing network trouble? I see if I point my browser to that page, that it loads after a while, but with mozregression it never finishes. Occasionally I get this error instead: ... attempting to bisect inbound builds (starting from previous week, to make sure no inbound revision is missed) Getting http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/09/2014-09-15-03-02-04-mozilla-central/firefox-35.0a1.en-US.win32.txt Getting inbound builds between bf5fcc0c4b27 and afc933adf723 Interrupted. Retrieving valid builds from u'http://inbound-archive.pub.build.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1410784509/' generated an exception: 503 Server Error: Service Unavailable: Back-end server is at capacity
Trying to bisect another regression, I'm getting $ mozregression --good=2014-10-10 --bad=2014-10-11 --bits=32 Got as far as we can go bisecting nightlies... Ensuring we have enough metadata to get a pushlog... Getting http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-10-03-02-01-mozilla-central/firefox-35.0a1.en-US.win32.txt Getting http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-11-03-02-03-mozilla-central/firefox-35.0a1.en-US.win32.txt Last good revision: 50b689feab5f (2014-10-10) First bad revision: f74ad36bb97b (2014-10-11) Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=50b689feab5f&tochange=f74ad36bb97b ... attempting to bisect inbound builds (starting from previous week, to make sure no inbound revision is missed) Getting http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-03-03-02-04-mozilla-central/firefox-35.0a1.en-US.win32.txt Getting inbound builds between b85c260821ab and f74ad36bb97b Retrieving valid builds from u'http://inbound-archive.pub.build.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1412456373/' generated an exception: [Errno 8] _ssl.c:504: EOF occurred in violation of protocol
Yes, we should probably retry in this case. Or perhaps fix the problem properly, by creating some kind of web-service with this information pre-processed. See also bug 997388, which is essentially the same problem from a different angle (crawling so many directories in inbound-archive is *slow*).
See Also: → 1087986
I'm hoping that implementing the solution described in bug 1095756 will fix this.
I think that the patch introduced in bug 1096254 will fix this.
See Also: → 1122409
Tested this today in an attempt to bisect issue #1128163, but this still does not work. I am no longer getting failures due to slow operation, but on three consecutive attempts, a different error was generated each time. See the attached log.
Attached file bisect_inbound_log.txt
Hm, I would say that the 1 and 3 errors are due to bad network (maybe the 2 also, see below). What hopefully will fix this is Bug 1095756. In the mean time, you could try to adjust the http timeout option to get rid of error 1: mozregression -b 2014-11-29 -g 2014-11-28 --http-timeout 30 The second error you get, "requests.exceptions.HTTPError: 400 Client Error: no such method: pushes" do bother me. But I am unable to reproduce it, the same command you did works fine for me (with a higher http timeout value as here at work the proxy really slow down everything): mozregression -b 2014-11-29 -g 2014-11-28 --http-timeout 30 0:01.76 LOG: MainThread Bisector INFO Got as far as we can go bisecting nightlies... 0:01.76 LOG: MainThread Bisector INFO Last good revision: 17de0f463944 0:01.76 LOG: MainThread Bisector INFO First bad revision: e90536aa55dd 0:01.76 LOG: MainThread Bisector INFO Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=17de0f463944&tochange=e90536aa55dd 0:01.76 LOG: MainThread Bisector INFO ... attempting to bisect inbound builds (starting from previous week, to make sure no inbound revision is missed) 0:02.45 LOG: MainThread Bisector INFO Getting inbound builds between 5ba06e4f49e8 and e90536aa55dd 0:39.05 LOG: MainThread Test Runner INFO Testing inbound build with timestamp 1416958654, revision 3c0bba40 0:39.05 LOG: MainThread Test Runner INFO Downloading build from: http://inbound-archive.pub.build.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64/1416958654/firefox-36.0a1.en-US.linux-x86_64.tar.bz2 As that works for me, I suspect that every of these errors are network related. Not sure what to do here - I suspect the best option is to wait for bug 1095756.
I do not believe that I have a bad network, and the timeout occurred in well less than five seconds from issuing the command. However I'll retry the steps again at a later time.
Ok - well 'bad network' is probably a wrong term I used - it could be that one contacted server is slow to answer (https://hg.mozilla.org or http://inbound-archive.pub.build.mozilla.org in this case). I did a few more tests and I am able to reproduce the 'no such method: pushes' error sometimes. In fact, I can reproduce it without any use of mozregression - look at the attached script. It appears that sometimes the server just answer 400 error for apparently no reason. I am also pretty sure that the "Bad status line" appears randomly too. Well, as said before, this will be corrected with bug Bug 1095756 (well - users won't see this happening but we will have to handle it on the server side :)) I am thinking that we could "fix" these issues by retrying (say 3 times max) on these errors: - requests.exceptions.HTTPError - requests.exceptions.ConnectionError at different places in the code (when downloading push data and when looking into build folders as a start). Maybe raise the default timeout to 20 instead of 10 may also help (or 30...) to get less network errors. Also, I think we can fix all this without waiting for bug 1095756 - it will get the fixes later anyway. @Will William, what do you think ? I can mentor a good next bug on this if you're OK and that you agree on what I said. I think I would prefer to open a new bug for that.
Flags: needinfo?(wlachance)
Yes, I think this makes sense. Maybe we can try using Mozilla releng's redo module for handling some of the retry stuff, IIRC it was written exactly for situations like this: https://pypi.python.org/pypi/redo
Flags: needinfo?(wlachance)
See Also: → 1128601
See Also: → 1169992
Hi Jukka, I have released mozregression 0.37, and it changes the way inbound bisection is done (should be faster, and no more ftp crawling!), so I think this bug is resolved for you. Could you try it please ? Thanks!
Flags: needinfo?(jujjyl)
Thanks for the progress here! I tried the newest as of today, 0.40, and tried to bisect bug 1150552. Bisecting days gave some noisy exceptions spam in the console, however the bisection process itself seemed to work ok: Was this nightly build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry ', 'back' or 'exit' and press Enter): move() failed for "(u'c:\\users\\clb\\appd ata\\local\\temp\\tmpielb8d\\2015-05-23--mozilla-central--firefox-41.0a1.en-US.w in64.zip.tmp', u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2015-05-23--mo zilla-central--firefox-41.0a1.en-US.win64.zip')". Reason: The process cannot acc ess the file because it is being used by another process (13). Retrying... move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2015-05-2 3--mozilla-central--firefox-41.0a1.en-US.win64.zip.tmp', u'c:\\users\\clb\\appda ta\\local\\temp\\tmpielb8d\\2015-05-23--mozilla-central--firefox-41.0a1.en-US.wi n64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2015-05-2 3--mozilla-central--firefox-41.0a1.en-US.win64.zip.tmp', u'c:\\users\\clb\\appda ta\\local\\temp\\tmpielb8d\\2015-05-23--mozilla-central--firefox-41.0a1.en-US.wi n64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2015-05-2 3--mozilla-central--firefox-41.0a1.en-US.win64.zip.tmp', u'c:\\users\\clb\\appda ta\\local\\temp\\tmpielb8d\\2015-05-23--mozilla-central--firefox-41.0a1.en-US.wi n64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2015-05-2 3--mozilla-central--firefox-41.0a1.en-US.win64.zip.tmp', u'c:\\users\\clb\\appda ta\\local\\temp\\tmpielb8d\\2015-05-23--mozilla-central--firefox-41.0a1.en-US.wi n64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2014-12-3 0--mozilla-central--firefox-37.0a1.en-US.win64-x86_64.zip.tmp', u'c:\\users\\clb \\appdata\\local\\temp\\tmpielb8d\\2014-12-30--mozilla-central--firefox-37.0a1.e n-US.win64-x86_64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... Exception in thread Thread-20: Traceback (most recent call last): File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner self.run() File "C:\Python27\lib\threading.py", line 763, in run self.__target(*self.__args, **self.__kwargs) File "C:\Python27\lib\site-packages\mozregression\download_manager.py", line 1 67, in _download mozfile.move(temp_dest, dest) File "C:\Python27\lib\site-packages\mozfile\mozfile.py", line 242, in move _call_windows_retry(shutil.move, (src, dst)) File "C:\Python27\lib\site-packages\mozfile\mozfile.py", line 147, in _call_wi ndows_retry func(*args) File "C:\Python27\lib\shutil.py", line 303, in move os.unlink(src) WindowsError: [Error 32] The process cannot access the file because it is being used by another process: u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2015 -05-23--mozilla-central--firefox-41.0a1.en-US.win64.zip.tmp' move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2014-12-3 0--mozilla-central--firefox-37.0a1.en-US.win64-x86_64.zip.tmp', u'c:\\users\\clb \\appdata\\local\\temp\\tmpielb8d\\2014-12-30--mozilla-central--firefox-37.0a1.e n-US.win64-x86_64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2014-12-3 0--mozilla-central--firefox-37.0a1.en-US.win64-x86_64.zip.tmp', u'c:\\users\\clb \\appdata\\local\\temp\\tmpielb8d\\2014-12-30--mozilla-central--firefox-37.0a1.e n-US.win64-x86_64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2014-12-3 0--mozilla-central--firefox-37.0a1.en-US.win64-x86_64.zip.tmp', u'c:\\users\\clb \\appdata\\local\\temp\\tmpielb8d\\2014-12-30--mozilla-central--firefox-37.0a1.e n-US.win64-x86_64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... move() failed for "(u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2014-12-3 0--mozilla-central--firefox-37.0a1.en-US.win64-x86_64.zip.tmp', u'c:\\users\\clb \\appdata\\local\\temp\\tmpielb8d\\2014-12-30--mozilla-central--firefox-37.0a1.e n-US.win64-x86_64.zip')". Reason: The process cannot access the file because it is being used by another process (13). Retrying... Exception in thread Thread-23: Traceback (most recent call last): File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner self.run() File "C:\Python27\lib\threading.py", line 763, in run self.__target(*self.__args, **self.__kwargs) File "C:\Python27\lib\site-packages\mozregression\download_manager.py", line 1 67, in _download mozfile.move(temp_dest, dest) File "C:\Python27\lib\site-packages\mozfile\mozfile.py", line 242, in move _call_windows_retry(shutil.move, (src, dst)) File "C:\Python27\lib\site-packages\mozfile\mozfile.py", line 147, in _call_wi ndows_retry func(*args) File "C:\Python27\lib\shutil.py", line 303, in move os.unlink(src) WindowsError: [Error 32] The process cannot access the file because it is being used by another process: u'c:\\users\\clb\\appdata\\local\\temp\\tmpielb8d\\2014 -12-30--mozilla-central--firefox-37.0a1.en-US.win64-x86_64.zip.tmp' After bisecting through the dates, it printed out the following: 8:32.14 LOG: MainThread Bisector INFO Narrowed nightly regression window from [ 2014-12-08, 2014-12-10] (2 days) to [2014-12-08, 2014-12-09] (1 days) (~0 steps left) 8:32.14 LOG: MainThread Bisector INFO Got as far as we can go bisecting nightli es... 8:32.14 LOG: MainThread Bisector INFO Last good revision: 035a951fc24a 8:32.14 LOG: MainThread Bisector INFO First bad revision: f1f48ccb2d4e 8:32.14 LOG: MainThread Bisector INFO Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=035a951fc24a&tocha nge=f1f48ccb2d4e 8:32.14 LOG: MainThread Bisector INFO ... attempting to bisect inbound builds ( starting from 4 days prior, to make sure no inbound revision is missed) 8:34.31 LOG: MainThread Bisector INFO Getting inbound builds between a9fc463556 61 and f1f48ccb2d4e 8:58.96 LOG: MainThread Bisector INFO No inbound data found. 8:58.96 LOG: MainThread Bisector INFO There are no build artifacts on inbound f or these changesets (they are probably too old). I am not sure how to interpret this. Is 2014-12 too old for build artifacts and the error message is to be expected?
Flags: needinfo?(jujjyl)
Hi Jukka, Thanks for the feedback! mozregression will try to bisect inbound builds once there are no more nightly builds. But inbound builds are removed after something like six month currently, so the last error message you got is normal. Could you try with newer builds, to see if inbound bisection works better for you now ? Hum, I am a bit worried about these WindowsError: [Error 32] The process cannot access the file But it is for another bug - I'll give it a try later on a windows machine.
Oh, I reopened bug 1185756 for the windows error.
Thanks, bisecting inbounds from May 2015 worked ok!
Cool, thanks! Let's close this bug then. :)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: