Closed Bug 1050769 Opened 10 years ago Closed 10 years ago

upload a new mobile_tp4.zip pageset to the 3 headed remote talos server (round 2)

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

ARM
Android
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jmaher, Assigned: sbruno)

References

Details

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #1030166 +++ ^ that bug would have relevant information for updating this. Thanks to the work of :edmorley in bug 1050161, we have another round of cleaned up network access, here is the updated mobile_tp4.zip: http://people.mozilla.org/~jmaher/taloszips/zips/mobile_tp4.zip shasum mobile_tp4.zip 7373b491baf27dda89f47365142ba0d2ff6df1c6 mobile_tp4.zip
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1051993
It looks like this didn't work - in bug 1051993 I've pushed to try again, and I'm getting the same external connections as before. I've just re-downloaded the zip in comment 0 here and it does include my changes, so I think perhaps the wrong zip was uploaded to relengwebadm.private.scl3 Please could we try the upload again? :-)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Ed Morley [:edmorley] from comment #2) > It looks like this didn't work - in bug 1051993 I've pushed to try again, > and I'm getting the same external connections as before. I've just > re-downloaded the zip in comment 0 here and it does include my changes, so I > think perhaps the wrong zip was uploaded to relengwebadm.private.scl3 > > Please could we try the upload again? :-)
Flags: needinfo?(bugspam.Callek)
Simone, can you take this on "today", if not n-i me back and I'll do it during my day.
Flags: needinfo?(bugspam.Callek) → needinfo?(sbruno)
Hey Ed, Can you provide a single "changed file[name]" and a related shasum of said file to so we can verify against the server/extracted fileset that your change is in place as well.
Flags: needinfo?(emorley)
(In reply to Justin Wood (:Callek) from comment #5) > Hey Ed, > > Can you provide a single "changed file[name]" and a related shasum of said > file to so we can verify against the server/extracted fileset that your > change is in place as well. One of the changed files: [/c/tpn]$ shasum amazon.com/www.amazon.com/index.html 9b132c56328821a1ab7d1c5d48d769328061f66a amazon.com/www.amazon.com/index.html
Flags: needinfo?(emorley)
I uploaded and extracted the zip file. The change seems to be in place: # sha1sum amazon.com/www.amazon.com/index.html 9b132c56328821a1ab7d1c5d48d769328061f66a amazon.com/www.amazon.com/index.html
Flags: needinfo?(sbruno)
Thank you :-)
Assignee: nobody → sbruno
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
I'm still seeing the same failures in bug 1051993's new try run. I've gone over the relevant files with a fine toothcomb and am pretty sure I've not missed any external connections - I just think the pandas are not running the same pageset as the one that was updated in comment 7. I think we need to rule out: 1) That the zip isn't extracting to the wrong directory structure (ie one too deep or something like that - and we then have a double pageset, one inside the other). 2) That the pandas are in fact using this server/pageset for the remote-tp4m_nochrome run 3) That there's no caching going on
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
For #1, an |ls -al| from the top level of the talos repo, adding the output as a private file here would be good?
:edmorley: After extracting, the mobile_tp4 folder is located here: /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4 (as per instructions provided in https://bugzilla.mozilla.org/show_bug.cgi?id=1030166#c0). The old version of that folder was also located there, so basically I just replaced it with the new version.
Yeah I know it should have worked, and that yeah files were overwritten, but that could just mean the previous uploads were to the wrong place too though. As |ls -al| would at least help with #1 below.
so the top level folder shows this: ./talos/page_load_test: total 64 drwxr-xr-x 13 root root 4096 Aug 19 01:59 . drwxr-xr-x 14 root root 4096 Jun 25 13:44 .. drwxr-xr-x 2 root root 4096 Jun 20 2013 a11y drwxr-xr-x 4 root root 4096 May 2 09:57 canvasmark drwxr-xr-x 3 root root 4096 Dec 1 2011 dhtml drwxr-xr-x 5 root root 4096 May 2 09:57 dromaeo drwxr-xr-x 2 root root 4096 Oct 4 2012 kraken lrwxrwxrwx 1 root root 19 Apr 24 12:04 mobile_tp4 -> ../../../mobile_tp4 -rw-r--r-- 1 root root 4281 Jun 29 2012 quit.js drwxr-xr-x 2 root root 4096 Jun 25 13:44 scroll drwxr-xr-x 3 root root 4096 Dec 1 2011 svg drwxr-xr-x 2 root root 4096 Dec 1 2011 svg_opacity drwxr-xr-x 3 root root 4096 Jul 19 2013 svgx drwxr-xr-x 4 root root 4096 Jun 25 13:44 tart lrwxrwxrwx 1 root root 12 Apr 24 12:04 tp4 -> ../../../tp4 -rw-r--r-- 1 root root 1781 Dec 1 2011 tp4m.manifest drwxr-xr-x 2 root root 4096 Oct 4 2012 v8_7 but we want to see the contents of the mobile_tp4 directory.
Might unzip not be handling the symlinks correctly? If I've understood the paths in previous comments correctly, the actual pageset is at: /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/../../../mobile_tp4 ie: /data/releng/src/talos-remote/www/mobile_tp4
Oh! I just extracted the content of zip file in /data/releng/src/talos-remote/www/mobile_tp4
By "I just extracted" do you mean in previous comments you had extracted the zip there, or you've just re-extracted now to this location? :-)
we also want to make sure that the unzipping puts the data in mobile_tp4, not mobile_tp4/mobile_tp4.
"just extracted" means I have re-extracted it now to this location (sorry for the ambiguity). Data is now under /data/releng/src/talos-remote/www/mobile_tp4 (not mobile_tp4/mobile_tp4) I don't think this re-extraction will solve the problem, though, since check in Ccomment 7 was performed in /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4.
Thank you - I've retriggered the jobs again, but as you say, if the shasum worked fine then that was unlikely the issue. Failing that I guess this leaves: (In reply to Ed Morley [:edmorley] from comment #9) > 2) That the pandas are in fact using this server/pageset for the > remote-tp4m_nochrome run > 3) That there's no caching going on
https://tbpl.mozilla.org/php/getParsedLog.php?id=46259039&full=1&branch=try 06:52:07 INFO - 08-19 06:50:57.085 I/Gecko ( 2199): FATAL ERR_R: Non-local network connections are disabled and a connection attempt to g-ecx.images-amazon.com (54.239.132.83) was made. I guess we could do something radical, like rename the pageset directory for a few mins and see if we get any failures? That would rule out us modifying the wrong location?
Justin, any ideas? :-)
Flags: needinfo?(bugspam.Callek)
15:17 <simone|buildduty> edmorley|sheriffduty: I am ready to try the "rename the pageset directory for a few minutes" idea, if you want 15:21 <edmorley|sheriffduty> simone|buildduty: sure let's give it a go :-) 15:28 <simone|buildduty> edmorley|sheriffduty: page_load_test/mobile_tp4 renamed to m_tp4_ These jobs completed successfully after that: https://tbpl.mozilla.org/php/getParsedLog.php?id=46262984&tree=Try https://tbpl.mozilla.org/php/getParsedLog.php?id=46264271&tree=Try https://tbpl.mozilla.org/php/getParsedLog.php?id=46264306&tree=Try So the panda talos jobs aren't using the symlink at /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4, when they refer to URLs such as: http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html
(In reply to Ed Morley [:edmorley] from comment #23) > 15:17 <simone|buildduty> edmorley|sheriffduty: I am ready to try the "rename > the pageset directory for a few minutes" idea, if you want > 15:21 <edmorley|sheriffduty> simone|buildduty: sure let's give it a go :-) > 15:28 <simone|buildduty> edmorley|sheriffduty: page_load_test/mobile_tp4 > renamed to m_tp4_ > > These jobs completed successfully after that: > https://tbpl.mozilla.org/php/getParsedLog.php?id=46262984&tree=Try This one at least seems to have failed... 07:44:37 INFO - INFO : RSS: Main: 149086208 07:44:37 INFO - Cycle 1(1): loaded http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/news.google.com/news.google.com/index.html (next: http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/m.news.google.com/news.google.com/index.html) 07:44:37 INFO - RSS: Main: 167141376 07:44:37 INFO - Cycle 1(1): loaded http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/m.news.google.com/news.google.com/index.html (next: http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html) 07:44:37 INFO - RSS: Main: 165490688 07:44:37 INFO - __startBeforeLaunchTimestamp1408459391330__endBeforeLaunchTimestamp 07:44:37 INFO - __startAfterTerminationTimestamp1408459417181__endAfterTerminationTimestamp 07:44:37 INFO - Failed tp4m: 07:44:37 INFO - Stopped Tue, 19 Aug 2014 07:44:37 07:44:37 ERROR - Traceback (most recent call last): > So the panda talos jobs aren't using the symlink at > /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4, > when they refer to URLs such as: > http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon. > com/www.amazon.com/index.html Which makes me think this is false
(In reply to Justin Wood (:Callek) from comment #24) > (In reply to Ed Morley [:edmorley] from comment #23) > > 15:17 <simone|buildduty> edmorley|sheriffduty: I am ready to try the "rename > > the pageset directory for a few minutes" idea, if you want > > 15:21 <edmorley|sheriffduty> simone|buildduty: sure let's give it a go :-) > > 15:28 <simone|buildduty> edmorley|sheriffduty: page_load_test/mobile_tp4 > > renamed to m_tp4_ > > > > These jobs completed successfully after that: > > https://tbpl.mozilla.org/php/getParsedLog.php?id=46262984&tree=Try > > This one at least seems to have failed... For a non-local connection (this is a try run of the bug 1051993 patch), see the full log: 07:44:37 INFO - 08-19 14:43:29.697 I/GeckoDump( 2192): Cycle 1(1): loaded http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/m.news.google.com/news.google.com/index.html (next: http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html) 07:44:37 INFO - 08-19 14:43:29.697 I/GeckoDump( 2192): 07:44:37 INFO - 08-19 14:43:30.869 I/GeckoDump( 2192): RSS: Main: 165490688 07:44:37 INFO - 08-19 14:43:30.869 I/GeckoDump( 2192): 07:44:37 INFO - 08-19 14:43:30.986 D/GeckoSuggestedSites( 2192): Number of suggested sites: 4 07:44:37 INFO - 08-19 14:43:31.267 D/GeckoSuggestedSites( 2192): Number of suggested sites: 4 07:44:37 INFO - 08-19 14:43:31.283 D/GeckoSuggestedSites( 2192): Number of suggested sites: 4 07:44:37 INFO - 08-19 14:43:31.439 V/GeckoFavicons( 2192): Cancelling favicon load 36. 07:44:37 INFO - 08-19 14:43:31.712 I/SUTAgentAndroid( 1889): 10.26.128.20 : activity 07:44:37 INFO - 08-19 14:43:32.048 I/Gecko ( 2192): FATAL ERR_R: Non-local network connections are disabled and a connection attempt to g-ecx.images-amazon.com (54.230.119.189) was made. So this is still true: > > So the panda talos jobs aren't using the symlink at > > /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4, > > when they refer to URLs such as: > > http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon. > > com/www.amazon.com/index.html
If we were using this pageset, renaming it should result in 404s, not successfully loading the page & hitting the external network.
BAH!!!! So, it looks like we effectively have broken docs/made a mistake. While we did update the location and the zip file, we *did not* deploy it to the webheads only the admin node. (I had to run ./update at /data/releng/src/talos-remote) I'm going to get a task on myself to update our docs *today*. That said, this bug should now be fixed, see before/after: [jwood@foopy72.p5.releng.scl3.mozilla.com ~]$ curl http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html 2>/dev/null | shasum 95e7711e074e22c4df1bae8fd0fd23fdb6a2f3b3 - [jwood@foopy72.p5.releng.scl3.mozilla.com ~]$ curl http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html 2>/dev/null | shasum 9b132c56328821a1ab7d1c5d48d769328061f66a -
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: needinfo?(bugspam.Callek)
Resolution: --- → FIXED
Thank you - has worked :-) (green runs on https://tbpl.mozilla.org/?tree=Try&rev=863127ae067d)
Status: RESOLVED → VERIFIED
(In reply to Justin Wood (:Callek) from comment #27) > BAH!!!! > > I'm going to get a task on myself to update our docs *today*. Untested doc: https://wiki.mozilla.org/index.php?title=ReleaseEngineering%2FBuildduty%2FOther_Duties&action=historysubmit&diff=1007067&oldid=1000544
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: