Last Comment Bug 1050769 - upload a new mobile_tp4.zip pageset to the 3 headed remote talos server (round 2)
: upload a new mobile_tp4.zip pageset to the 3 headed remote talos server (roun...
Status: VERIFIED FIXED
:
Product: Release Engineering
Classification: Other
Component: Buildduty (show other bugs)
: other
: ARM Android
-- normal (vote)
: ---
Assigned To: Simone Bruno [:simone]
: Justin Wood (:Callek)
: Chris AtLee [:catlee]
Mentors:
Depends on: 1050161
Blocks: 1026970 1051993
  Show dependency treegraph
 
Reported: 2014-08-08 06:45 PDT by Joel Maher ( :jmaher) (UTC-9) (PTO: back August 2nd)
Modified: 2014-08-19 09:33 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
Machine State: ---
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Content of top level folder (230.88 KB, text/plain)
2014-08-19 05:45 PDT, Simone Bruno [:simone]
no flags Details

Description User image Joel Maher ( :jmaher) (UTC-9) (PTO: back August 2nd) 2014-08-08 06:45:09 PDT
+++ This bug was initially created as a clone of Bug #1030166 +++

^ that bug would have relevant information for updating this.

Thanks to the work of :edmorley in bug 1050161, we have another round of cleaned up network access, here is the updated mobile_tp4.zip:
http://people.mozilla.org/~jmaher/taloszips/zips/mobile_tp4.zip

shasum mobile_tp4.zip 
7373b491baf27dda89f47365142ba0d2ff6df1c6  mobile_tp4.zip
Comment 1 User image Justin Wood (:Callek) 2014-08-08 13:19:13 PDT
Done per https://bugzilla.mozilla.org/show_bug.cgi?id=1030166#c1
Comment 2 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-12 03:39:42 PDT
It looks like this didn't work - in bug 1051993 I've pushed to try again, and I'm getting the same external connections as before. I've just re-downloaded the zip in comment 0 here and it does include my changes, so I think perhaps the wrong zip was uploaded to relengwebadm.private.scl3

Please could we try the upload again? :-)
Comment 3 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-18 04:33:59 PDT
(In reply to Ed Morley [:edmorley] from comment #2)
> It looks like this didn't work - in bug 1051993 I've pushed to try again,
> and I'm getting the same external connections as before. I've just
> re-downloaded the zip in comment 0 here and it does include my changes, so I
> think perhaps the wrong zip was uploaded to relengwebadm.private.scl3
> 
> Please could we try the upload again? :-)
Comment 4 User image Justin Wood (:Callek) 2014-08-18 21:38:09 PDT
Simone, can you take this on "today", if not n-i me back and I'll do it during my day.
Comment 5 User image Justin Wood (:Callek) 2014-08-18 21:39:27 PDT
Hey Ed,

Can you provide a single "changed file[name]" and a related shasum of said file to so we can verify against the server/extracted fileset that your change is in place as well.
Comment 6 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 01:07:39 PDT
(In reply to Justin Wood (:Callek) from comment #5)
> Hey Ed,
> 
> Can you provide a single "changed file[name]" and a related shasum of said
> file to so we can verify against the server/extracted fileset that your
> change is in place as well.

One of the changed files:

[/c/tpn]$ shasum amazon.com/www.amazon.com/index.html
9b132c56328821a1ab7d1c5d48d769328061f66a  amazon.com/www.amazon.com/index.html
Comment 7 User image Simone Bruno [:simone] 2014-08-19 01:59:02 PDT
I uploaded and extracted the zip file. The change seems to be in place:

# sha1sum amazon.com/www.amazon.com/index.html
9b132c56328821a1ab7d1c5d48d769328061f66a  amazon.com/www.amazon.com/index.html
Comment 8 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 03:05:18 PDT
Thank you :-)
Comment 9 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 03:52:58 PDT
I'm still seeing the same failures in bug 1051993's new try run.
I've gone over the relevant files with a fine toothcomb and am pretty sure I've not missed any external connections - I just think the pandas are not running the same pageset as the one that was updated in comment 7.

I think we need to rule out:
1) That the zip isn't extracting to the wrong directory structure (ie one too deep or something like that - and we then have a double pageset, one inside the other).
2) That the pandas are in fact using this server/pageset for the remote-tp4m_nochrome run
3) That there's no caching going on
Comment 10 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 03:54:49 PDT
For #1, an |ls -al| from the top level of the talos repo, adding the output as a private file here would be good?
Comment 11 User image Simone Bruno [:simone] 2014-08-19 04:16:54 PDT
:edmorley:

After extracting, the mobile_tp4 folder is located here: /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4 (as per instructions provided in https://bugzilla.mozilla.org/show_bug.cgi?id=1030166#c0). The old version of that folder was also located there, so basically I just replaced it with the new version.
Comment 12 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 05:04:55 PDT
Yeah I know it should have worked, and that yeah files were overwritten, but that could just mean the previous uploads were to the wrong place too though. As |ls -al| would at least help with #1 below.
Comment 13 User image Simone Bruno [:simone] 2014-08-19 05:45:00 PDT
Created attachment 8475118 [details]
Content of top level folder
Comment 14 User image Joel Maher ( :jmaher) (UTC-9) (PTO: back August 2nd) 2014-08-19 05:51:19 PDT
so the top level folder shows this:
./talos/page_load_test:
total 64
drwxr-xr-x 13 root root 4096 Aug 19 01:59 .
drwxr-xr-x 14 root root 4096 Jun 25 13:44 ..
drwxr-xr-x  2 root root 4096 Jun 20  2013 a11y
drwxr-xr-x  4 root root 4096 May  2 09:57 canvasmark
drwxr-xr-x  3 root root 4096 Dec  1  2011 dhtml
drwxr-xr-x  5 root root 4096 May  2 09:57 dromaeo
drwxr-xr-x  2 root root 4096 Oct  4  2012 kraken
lrwxrwxrwx  1 root root   19 Apr 24 12:04 mobile_tp4 -> ../../../mobile_tp4
-rw-r--r--  1 root root 4281 Jun 29  2012 quit.js
drwxr-xr-x  2 root root 4096 Jun 25 13:44 scroll
drwxr-xr-x  3 root root 4096 Dec  1  2011 svg
drwxr-xr-x  2 root root 4096 Dec  1  2011 svg_opacity
drwxr-xr-x  3 root root 4096 Jul 19  2013 svgx
drwxr-xr-x  4 root root 4096 Jun 25 13:44 tart
lrwxrwxrwx  1 root root   12 Apr 24 12:04 tp4 -> ../../../tp4
-rw-r--r--  1 root root 1781 Dec  1  2011 tp4m.manifest
drwxr-xr-x  2 root root 4096 Oct  4  2012 v8_7

but we want to see the contents of the mobile_tp4 directory.
Comment 15 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 06:03:11 PDT
Might unzip not be handling the symlinks correctly?

If I've understood the paths in previous comments correctly, the actual pageset is at:

/data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/../../../mobile_tp4

ie:
/data/releng/src/talos-remote/www/mobile_tp4
Comment 16 User image Simone Bruno [:simone] 2014-08-19 06:14:00 PDT
Oh! I just extracted the content of zip file in /data/releng/src/talos-remote/www/mobile_tp4
Comment 17 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 06:16:26 PDT
By "I just extracted" do you mean in previous comments you had extracted the zip there, or you've just re-extracted now to this location? :-)
Comment 18 User image Joel Maher ( :jmaher) (UTC-9) (PTO: back August 2nd) 2014-08-19 06:17:50 PDT
we also want to make sure that the unzipping puts the data in mobile_tp4, not mobile_tp4/mobile_tp4.
Comment 19 User image Simone Bruno [:simone] 2014-08-19 06:22:56 PDT
"just extracted" means I have re-extracted it now to this location (sorry for the ambiguity).

Data is now under /data/releng/src/talos-remote/www/mobile_tp4 (not mobile_tp4/mobile_tp4)

I don't think this re-extraction will solve the problem, though, since check in Ccomment 7 was performed in /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4.
Comment 20 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 06:59:43 PDT
Thank you - I've retriggered the jobs again, but as you say, if the shasum worked fine then that was unlikely the issue.

Failing that I guess this leaves:

(In reply to Ed Morley [:edmorley] from comment #9)
> 2) That the pandas are in fact using this server/pageset for the
> remote-tp4m_nochrome run
> 3) That there's no caching going on
Comment 21 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 07:03:23 PDT
https://tbpl.mozilla.org/php/getParsedLog.php?id=46259039&full=1&branch=try
06:52:07     INFO -  08-19 06:50:57.085 I/Gecko   ( 2199): FATAL ERR_R: Non-local network connections are disabled and a connection attempt to g-ecx.images-amazon.com (54.239.132.83) was made.

I guess we could do something radical, like rename the pageset directory for a few mins and see if we get any failures? That would rule out us modifying the wrong location?
Comment 22 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 07:04:19 PDT
Justin, any ideas? :-)
Comment 23 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 08:14:17 PDT
15:17 <simone|buildduty> edmorley|sheriffduty: I am ready to try the "rename the pageset directory for a few minutes" idea, if you want
15:21 <edmorley|sheriffduty> simone|buildduty: sure let's give it a go :-)
15:28 <simone|buildduty> edmorley|sheriffduty: page_load_test/mobile_tp4 renamed to m_tp4_

These jobs completed successfully after that:
https://tbpl.mozilla.org/php/getParsedLog.php?id=46262984&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=46264271&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=46264306&tree=Try

So the panda talos jobs aren't using the symlink at /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4, when they refer to URLs such as:
http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html
Comment 24 User image Justin Wood (:Callek) 2014-08-19 08:28:57 PDT
(In reply to Ed Morley [:edmorley] from comment #23)
> 15:17 <simone|buildduty> edmorley|sheriffduty: I am ready to try the "rename
> the pageset directory for a few minutes" idea, if you want
> 15:21 <edmorley|sheriffduty> simone|buildduty: sure let's give it a go :-)
> 15:28 <simone|buildduty> edmorley|sheriffduty: page_load_test/mobile_tp4
> renamed to m_tp4_
> 
> These jobs completed successfully after that:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=46262984&tree=Try

This one at least seems to have failed...

07:44:37     INFO -  INFO : RSS: Main: 149086208
07:44:37     INFO -  Cycle 1(1): loaded http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/news.google.com/news.google.com/index.html (next: http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/m.news.google.com/news.google.com/index.html)
07:44:37     INFO -  RSS: Main: 167141376
07:44:37     INFO -  Cycle 1(1): loaded http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/m.news.google.com/news.google.com/index.html (next: http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html)
07:44:37     INFO -  RSS: Main: 165490688
07:44:37     INFO -  __startBeforeLaunchTimestamp1408459391330__endBeforeLaunchTimestamp
07:44:37     INFO -  __startAfterTerminationTimestamp1408459417181__endAfterTerminationTimestamp
07:44:37     INFO -  Failed tp4m:
07:44:37     INFO -  		Stopped Tue, 19 Aug 2014 07:44:37
07:44:37    ERROR -  Traceback (most recent call last):


> So the panda talos jobs aren't using the symlink at
> /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4,
> when they refer to URLs such as:
> http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.
> com/www.amazon.com/index.html

Which makes me think this is false
Comment 25 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 08:30:36 PDT
(In reply to Justin Wood (:Callek) from comment #24)
> (In reply to Ed Morley [:edmorley] from comment #23)
> > 15:17 <simone|buildduty> edmorley|sheriffduty: I am ready to try the "rename
> > the pageset directory for a few minutes" idea, if you want
> > 15:21 <edmorley|sheriffduty> simone|buildduty: sure let's give it a go :-)
> > 15:28 <simone|buildduty> edmorley|sheriffduty: page_load_test/mobile_tp4
> > renamed to m_tp4_
> > 
> > These jobs completed successfully after that:
> > https://tbpl.mozilla.org/php/getParsedLog.php?id=46262984&tree=Try
> 
> This one at least seems to have failed...

For a non-local connection (this is a try run of the bug 1051993 patch), see the full log:

07:44:37     INFO -  08-19 14:43:29.697 I/GeckoDump( 2192): Cycle 1(1): loaded http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/m.news.google.com/news.google.com/index.html (next: http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html)
07:44:37     INFO -  08-19 14:43:29.697 I/GeckoDump( 2192):
07:44:37     INFO -  08-19 14:43:30.869 I/GeckoDump( 2192): RSS: Main: 165490688
07:44:37     INFO -  08-19 14:43:30.869 I/GeckoDump( 2192):
07:44:37     INFO -  08-19 14:43:30.986 D/GeckoSuggestedSites( 2192): Number of suggested sites: 4
07:44:37     INFO -  08-19 14:43:31.267 D/GeckoSuggestedSites( 2192): Number of suggested sites: 4
07:44:37     INFO -  08-19 14:43:31.283 D/GeckoSuggestedSites( 2192): Number of suggested sites: 4
07:44:37     INFO -  08-19 14:43:31.439 V/GeckoFavicons( 2192): Cancelling favicon load 36.
07:44:37     INFO -  08-19 14:43:31.712 I/SUTAgentAndroid( 1889): 10.26.128.20 : activity
07:44:37     INFO -  08-19 14:43:32.048 I/Gecko   ( 2192): FATAL ERR_R: Non-local network connections are disabled and a connection attempt to g-ecx.images-amazon.com (54.230.119.189) was made.

So this is still true:
> > So the panda talos jobs aren't using the symlink at
> > /data/releng/src/talos-remote/www/talos-repo/talos/page_load_test/mobile_tp4,
> > when they refer to URLs such as:
> > http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.
> > com/www.amazon.com/index.html
Comment 26 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 08:31:57 PDT
If we were using this pageset, renaming it should result in 404s, not successfully loading the page & hitting the external network.
Comment 27 User image Justin Wood (:Callek) 2014-08-19 08:38:57 PDT
BAH!!!!

So, it looks like we effectively have broken docs/made a mistake. While we did update the location and the zip file, we *did not* deploy it to the webheads only the admin node.

(I had to run ./update at /data/releng/src/talos-remote)

I'm going to get a task on myself to update our docs *today*.

That said, this bug should now be fixed, see before/after:

[jwood@foopy72.p5.releng.scl3.mozilla.com ~]$ curl http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html 2>/dev/null | shasum
95e7711e074e22c4df1bae8fd0fd23fdb6a2f3b3  -
[jwood@foopy72.p5.releng.scl3.mozilla.com ~]$ curl http://talos-remote.pvt.build.mozilla.org/page_load_test/mobile_tp4/amazon.com/www.amazon.com/index.html 2>/dev/null | shasum
9b132c56328821a1ab7d1c5d48d769328061f66a  -
Comment 28 User image Ed Morley (Away 27th-2nd) [:emorley] 2014-08-19 09:15:33 PDT
Thank you - has worked :-)

(green runs on https://tbpl.mozilla.org/?tree=Try&rev=863127ae067d)
Comment 29 User image Justin Wood (:Callek) 2014-08-19 09:33:29 PDT
(In reply to Justin Wood (:Callek) from comment #27)
> BAH!!!!
> 
> I'm going to get a task on myself to update our docs *today*.

Untested doc:
https://wiki.mozilla.org/index.php?title=ReleaseEngineering%2FBuildduty%2FOther_Duties&action=historysubmit&diff=1007067&oldid=1000544

Note You need to log in before you can comment on or make changes to this bug.