Closed Bug 1232080 Opened 9 years ago Closed 8 years ago

Add logging to allthethings.json generation (was Teach tryextender that 10.10.2/"10.10" jobs don't run on trunk trees, including try)

Categories

(Testing :: General, defect)

defect
Not set
major

Tracking

(firefox45 affected)

RESOLVED FIXED
Tracking Status
firefox45 --- affected

People

(Reporter: philor, Assigned: armenzg)

References

Details

Attachments

(1 file)

Whether you are using the treeherder "Add new Jobs" UI, or the tryextender heroku app, or apparently the treeherder "Trigger Missing jobs"/"Trigger All Talos jobs" or whatever it is that the talos regression hunters use to retrigger six runs of every talos job, you get the "10.10" tests which are actually the 10.10.2 tests which do not ever actually run on trunk trees anymore, so they just sit perpetually pending until I notice on slave_health that there are a bunch of pending jobs for that pool, and go cancel them all (with or without having first pointlessly rebooted all the idle 10.10.2 slaves).
This is very weird.

@adusca, do you have time this week to look into this?

armenzg@armenzg-thinkpad:~/repos/mozilla-central$ grep "10.10" all_builders.txt | grep "mozilla-central talos dromae"
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos dromaeojs',
armenzg@armenzg-thinkpad:~/repos/mozilla-central$ grep "10.10" all_builders.txt | grep "try talos dromae"
 u'Rev5 MacOSX Yosemite 10.10 try talos dromaeojs',
 u'Rev5 MacOSX Yosemite 10.10.5 try talos dromaeojs',
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=talos%2010.10&group_state=expanded

I believe you meant .5 instead of .2
Summary: Teach tryextender that 10.10.2/"10.10" jobs don't run on trunk trees, including try → Teach tryextender that 10.10.5/"10.10" jobs don't run on trunk trees, including try
No, I did not. I got the list of affected trees wrong, not the slaves.

Because treeherder actively hides buildernames, the only way you can see them from that treeherder link is to use it to open the buildapi link for a push, but that will tell you that m-c is running 10.10.5 jobs. Since mozilla-aurora is also already running 10.10.5, I'd expect that after today's mergeday reconfig, mozilla-beta will be too, and only mozilla-release and esr38 will be running 10.10.
Summary: Teach tryextender that 10.10.5/"10.10" jobs don't run on trunk trees, including try → Teach tryextender that 10.10.2/"10.10" jobs don't run on trunk trees, including try
allthethings.json does indeed say that "10.10" is the only mozilla-central dromaeojs, but then, wouldn't be the first time allthethings.json lied, or didn't regenerate for far too long.
And also "talos remote-trobocheck2" is gone, kaput, extinct, but still being triggered all over the place, making me think that whether it's called allthethings.json or all_builders.txt, the datasource is outdated, or generation of it is broken.
And now backfill on mozilla-inbound thinks it should trigger the "10.10" jobs which do not exist there.
Severity: normal → blocker
I will be looking into this.

Are the trees closed because of this? (asking since the "blocker" flag)

@adusca, let me know if you can think of anything.
Assignee: nobody → armenzg
Nope, not closed, I raised the severity while I thought that buildapi didn't offer any way to fill in SETA-skipped jobs, then forgot it was changed when I submitted after realizing we weren't actually stuck with bustage and absolutely no way to trigger anything in the range of skipped jobs.
Severity: blocker → major
I see 14 Yosemite jobs on TH:
http://people.mozilla.org/~armenzg/sattap/2d88bf00.png
however, there are only 12 listed from allthethings.json [1]

It seems that from e10s we're missing dromaeojs and g1-e10s


[1]
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos chromez',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos dromaeojs',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos g1',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos g2',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos other',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos svgr',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos tp5o',

 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos chromez-e10s',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos g2-e10s',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos other-e10s',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos svgr-e10s',
 u'Rev5 MacOSX Yosemite 10.10 mozilla-central talos tp5o-e10s',
I've added myself to the crontab.

Looking at the date of the rsync is meaningless (Dec. 18th):
Looking in cruncher gives us the real date (Oct. 28th)

I will look into the current issue and report back.

I think the current situation is unsustainable.
I will be looking at producing our own.

[1]
https://secure.pub.build.mozilla.org/builddata/reports/
allthethings.json	18-Dec-2015 12:45 	12M	

[2]
[buildduty@cruncher.srv.releng.scl3 braindump]$ ls -l /var/www/html/builds/allthethings.json*
-rw-r--r-- 1 buildduty html 12252562 Oct 28 12:04 /var/www/html/builds/allthethings.json
-rw-r--r-- 1 buildduty html   392248 Oct 28 12:04 /var/www/html/builds/allthethings.json.gz
I have to figure out a way to determine why it was not working.
The only thing I did was to force an update.
This is a fall out from bug 1219390. I know I worked to make sure it worked. I wonder if I changed something at the end.
I will check again next week to see if a new one is generated.
We probably should check for the age on cruncher to make sure we don't this badly out of date (or generate it ourselves).

All of the missing builders are reported in here:
http://cruncher.build.mozilla.org/builds/allthethings._f097c63f247d_d4a8958176af_cdc5503e75c8.txt

It is now working again:
[buildduty@cruncher.srv.releng.scl3 community]$ ls -l /var/www/html/builds/allthethings*
-rw-r--r-- 1 buildduty html 12252562 Oct 28 12:04 /var/www/html/builds/allthethings._db1761df121c_ce1eca5d12c6_19a9daef4340.json
-rw-r--r-- 1 buildduty html        0 Oct 28 12:04 /var/www/html/builds/allthethings._db1761df121c_ce1eca5d12c6_19a9daef4340.txt
-rw-r--r-- 1 buildduty html 12421114 Dec 18 13:30 /var/www/html/builds/allthethings._f097c63f247d_d4a8958176af_cdc5503e75c8.json
-rw-r--r-- 1 buildduty html   176693 Dec 18 13:30 /var/www/html/builds/allthethings._f097c63f247d_d4a8958176af_cdc5503e75c8.txt
-rw-r--r-- 1 buildduty html 12421114 Dec 18 13:30 /var/www/html/builds/allthethings.json
-rw-r--r-- 1 buildduty html   395936 Dec 18 13:30 /var/www/html/builds/allthethings.json.gz
Flags: needinfo?(armenzg)
This is fixed now. This builder has been added:
* Rev5 MacOSX Yosemite 10.10.5 mozilla-central talos g1-e10s
https://secure.pub.build.mozilla.org/builddata/reports/allthethings._f097c63f247d_d4a8958176af_cdc5503e75c8.txt

I will follow up on this on Monday.
It seems that calling dump_allthethings.sh can fail without causing any output (and perhaps indefinitely).
This will only email me for now.


diff --git a/buildbot-related/dump_allthethings.sh b/buildbot-related/dump_allthethings.sh
--- a/buildbot-related/dump_allthethings.sh
+++ b/buildbot-related/dump_allthethings.sh
@@ -73,19 +73,19 @@ wait
 
 # Now combine them
 MASTER_DIRS=()
 for MASTER in ${MASTERS[*]}; do
     if [[ $MASTER =~ universal ]]; then
         continue
     fi
     MASTER_DIRS+=("$WORK/$MASTER/master.cfg")
 done
-python $(dirname $0)/dump_master_json.py -o $OUTPUT ${MASTER_DIRS[*]}
+python $(dirname $0)/dump_master_json.py -o $OUTPUT ${MASTER_DIRS[*]} || echo "Failed dump_master_json.py"
 
 if [ -s $FAILFILE ]; then
     echo "*** $(wc -l < $FAILFILE) master tests failed ***" >&2
     echo "Failed masters:" >&2
     sed -e 's/^/  /' "$FAILFILE" >&2
     exit 1
 fi
 
 exit $exit
diff --git a/community/generate_allthethings_json.sh b/community/generate_allthethings_json.sh
--- a/community/generate_allthethings_json.sh
+++ b/community/generate_allthethings_json.sh
@@ -20,47 +20,58 @@ workdir="$HOME/.mozilla/releng"
 # If we're executing this in cruncher we don't need to call
 # setup_buildbot_environment.sh (since we've already done so)
 # The main difference here is the "updated" or not logic
 if [ -d /var/www/html/builds/ ]; then
     cd $repos_dir
     # Logic borrowed from catlee
     updated=0
     rev_signature=''
     for d in buildbot-configs buildbotcustom tools; do
-        t=$(mktemp)
         hg -R $d pull -q
         prev_rev=$(hg -R $d id)
         hg -R $d update -q
         cur_rev=$(hg -R $d id)
         if [ "$prev_rev" != "$cur_rev" ]; then
             updated=1
         fi
         only_hash=`echo $cur_rev | awk -F " " '{print $1}'`
         rev_signature="${rev_signature}_${only_hash}"
     done
 
     if [ "$updated" = "1" ]; then
-        make_allthethings
+        set -ex
+        source $workdir/venv/bin/activate
+        echo "making all the things!"
+        (
+        cd $repos_dir/buildbot-configs
+        log="$publishing_path/allthethings.`date +%Y%m%d%H%M%S`.log"
+        # Generate allthethings.json
+        bash $dump_script $allthethings 2>&1 > $log || \
+            echo "Failed to generate allthething.json. Check $log"
+        )
         previous_file="$publishing_path/allthethings.json"
         new_file="$publishing_path/allthethings.${rev_signature}.json"
         # Publish new file
         cp $allthethings $new_file
         # Generate differences with the previous allthethings.json
         $repos_dir/braindump/buildbot-related/diff_allthethings.py \
            $previous_file $new_file > \
            $publishing_path/allthethings.${rev_signature}.txt
         # Overwrite the previous allthethings.json
Flags: needinfo?(armenzg)
I think calling $dump_script without prepending "bash" could have been the issue as this has been working.
I've made some more code changes to make the code a bit better.
I will let this be tested for a bit longer before I commit the code and change back to emailing releng.

Logs are also being produced:
https://secure.pub.build.mozilla.org/builddata/reports/allthethings.20151230223007.log


diff --git a/community/generate_allthethings_json.sh b/community/generate_allthethings_json.sh
--- a/community/generate_allthethings_json.sh
+++ b/community/generate_allthethings_json.sh
@@ -25,12 +25,15 @@ repos_dir="$workdir/repos"
 script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
 cd $script_dir
 
 function make_allthethings() {
+    source $workdir/venv/bin/activate
+    echo "making all the things!"
+    (
     cd $repos_dir/buildbot-configs
-    source $workdir/venv/bin/activate
     # Generate allthethings.json
-    $dump_script 2>&1 | grep -v "[loading|skipping]"
+    bash $dump_script $allthethings
+    )
 }
 
 # If we're executing this in cruncher we don't need to call
 # setup_buildbot_environment.sh (since we've already done so)
@@ -40,9 +43,8 @@ if [ -d /var/www/html/builds/ ]; then
     # Logic borrowed from catlee
     updated=0
     rev_signature=''
     for d in buildbot-configs buildbotcustom tools; do
-        t=$(mktemp)
         hg -R $d pull -q
         prev_rev=$(hg -R $d id)
         hg -R $d update -q
         cur_rev=$(hg -R $d id)
@@ -53,9 +55,12 @@ if [ -d /var/www/html/builds/ ]; then
         rev_signature="${rev_signature}_${only_hash}"
     done
 
     if [ "$updated" = "1" ]; then
-        make_allthethings
+        log="$publishing_path/allthethings.`date +%Y%m%d%H%M%S`.log"
+        # Generate allthethings.json
+        make_allthethings 2>&1 > $log || echo "Failed to generate allthething.json. Check $log"
+
         previous_file="$publishing_path/allthethings.json"
         new_file="$publishing_path/allthethings.${rev_signature}.json"
         # Publish new file
         cp $allthethings $new_file
Blocks: 1237711
https://hg.mozilla.org/build/braindump/rev/47ecad7037b0d9be6f3de2cb4c7339400b84dce2
Bug 1232080 - Log allthething.json creation issues + call script with bash
I landed the fix.
The logs are being generated.
I've added releng's email back again.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
This was backed out in bug 1238800.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Teach tryextender that 10.10.2/"10.10" jobs don't run on trunk trees, including try → Add logging to allthethings.json generation (was Teach tryextender that 10.10.2/"10.10" jobs don't run on trunk trees, including try)
This has no rush to be reviewed.

The only fix is to add || exit=1

Everything else makes sure that issues in generating allthethings would not notify releng and create logs.

Without the change, the exit code is 0. [1]
After the change, the exit code is 1. [2]
Even the exit code of generate_allthethings.sh is 1 [3]

[1]
[buildduty@cruncher.srv.releng.scl3 ~]$ cd ~/.mozilla/releng/repos/buildbot-configs/
[buildduty@cruncher.srv.releng.scl3 buildbot-configs]$ source ../../venv/bin/activate
(venv)[buildduty@cruncher.srv.releng.scl3 buildbot-configs]$ ~/.mozilla/releng/repos/braindump/buildbot-related/dump_allthethings.sh 2>&1 > /dev/null; echo $?
2016-01-28 12:53:47,122 - loading test-output/bm01-tests1-linux32/master.cfg
2016-01-28 12:53:49,296 - Couldn't load test-output/bm01-tests1-linux32/master.cfg
Traceback (most recent call last):
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 111, in dump_master
    c = loadMaster(path)
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 26, in loadMaster
    execfile(path, g, g)
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/master.cfg", line 13, in <module>
    import mobile_config
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/mobile_config.py", line 3461, in <module>
    loadSkipConfig(BRANCHES, "mobile")
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/config_seta.py", line 141, in loadSkipConfig
    define_configs(b, platforms, BRANCHES)
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/config_seta.py", line 121, in define_configs
    platform = seta_platforms[p][0]
KeyError: 'android-4-3-armv7-api11'
Traceback (most recent call last):
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 165, in <module>
    main()
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 146, in main
    dump = dump_master(args.masters[0])
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 111, in dump_master
    c = loadMaster(path)
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 26, in loadMaster
    execfile(path, g, g)
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/master.cfg", line 13, in <module>
    import mobile_config
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/mobile_config.py", line 3461, in <module>
    loadSkipConfig(BRANCHES, "mobile")
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/config_seta.py", line 141, in loadSkipConfig
    define_configs(b, platforms, BRANCHES)
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/config_seta.py", line 121, in define_configs
    platform = seta_platforms[p][0]
KeyError: 'android-4-3-armv7-api11'
Traceback (most recent call last):
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 165, in <module>
    main()
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 150, in main
    dumps = pool.map(worker, args.masters)
  File "/tools/python27/lib/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/tools/python27/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
ValueError: No JSON object could be decoded
2016-01-28 12:53:49,440 - loading test-output/bm70-build1/master.cfg
0

[2] 
(venv)[buildduty@cruncher.srv.releng.scl3 buildbot-configs]$ ~/.mozilla/releng/repos/braindump/buildbot-related/dump_allthethings.sh 2>&1 > /dev/null; echo $?
2016-01-28 12:52:54,509 - loading test-output/bm01-tests1-linux32/master.cfg
2016-01-28 12:52:58,173 - Couldn't load test-output/bm01-tests1-linux32/master.cfg
Traceback (most recent call last):
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 111, in dump_master
    c = loadMaster(path)
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 26, in loadMaster
    execfile(path, g, g)
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/master.cfg", line 13, in <module>
    import mobile_config
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/mobile_config.py", line 3461, in <module>
    loadSkipConfig(BRANCHES, "mobile")
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/config_seta.py", line 141, in loadSkipConfig
    define_configs(b, platforms, BRANCHES)
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/config_seta.py", line 121, in define_configs
    platform = seta_platforms[p][0]
KeyError: 'android-4-3-armv7-api11'
Traceback (most recent call last):
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 165, in <module>
    main()
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 146, in main
    dump = dump_master(args.masters[0])
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 111, in dump_master
    c = loadMaster(path)
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 26, in loadMaster
    execfile(path, g, g)
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/master.cfg", line 13, in <module>
    import mobile_config
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/mobile_config.py", line 3461, in <module>
    loadSkipConfig(BRANCHES, "mobile")
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/config_seta.py", line 141, in loadSkipConfig
    define_configs(b, platforms, BRANCHES)
  File "/home/buildduty/.mozilla/releng/repos/buildbot-configs/test-output/bm01-tests1-linux32/config_seta.py", line 121, in define_configs
    platform = seta_platforms[p][0]
KeyError: 'android-4-3-armv7-api11'
Traceback (most recent call last):
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 165, in <module>
    main()
  File "/home/buildduty/.mozilla/releng/repos/braindump/buildbot-related/dump_master_json.py", line 150, in main
    dumps = pool.map(worker, args.masters)
  File "/tools/python27/lib/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/tools/python27/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
ValueError: No JSON object could be decoded
2016-01-28 12:52:58,333 - loading test-output/bm70-build1/master.cfg
1
[3]
[buildduty@cruncher.srv.releng.scl3 braindump]$ ./community/generate_allthethings_json.sh ; echo $?
Failed to generate allthething.json. Check /var/www/html/builds/allthethings/allthethings.20160128125921.log
1
Attachment #8713328 - Flags: review?(bugspam.Callek)
Comment on attachment 8713328 [details] [diff] [review]
fix dump_master_json.py call not setting the exit code + allthethings.json generation improvements

Review of attachment 8713328 [details] [diff] [review]:
-----------------------------------------------------------------

I'm anticipating breakage, but thats also because I've seen this break in weird ways.

::: community/generate_allthethings_json.sh
@@ +28,5 @@
>  
>  function make_allthethings() {
> +    source $workdir/venv/bin/activate
> +    echo "making all the things!"
> +    (

why this change (sourcing a venv and then throwing us into a subshell?)

@@ +34,2 @@
>      # Generate allthethings.json
> +    $dump_script $allthethings 2>&1 || exit 1

does the `| grep -v` omission matter here? is that output no longer needing to stripped?
Attachment #8713328 - Flags: review?(bugspam.Callek) → review+
(In reply to Justin Wood (:Callek) from comment #20)
> Comment on attachment 8713328 [details] [diff] [review]
> fix dump_master_json.py call not setting the exit code + allthethings.json
> generation improvements
> 
> Review of attachment 8713328 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm anticipating breakage, but thats also because I've seen this break in
> weird ways.
> 
> ::: community/generate_allthethings_json.sh
> @@ +28,5 @@
> >  
> >  function make_allthethings() {
> > +    source $workdir/venv/bin/activate
> > +    echo "making all the things!"
> > +    (
> 
> why this change (sourcing a venv and then throwing us into a subshell?)
> 
TBH I moved that code forward from catlee's original script since my change on Nov. 28th allthethings.json had not been generated for a while.
I decided to bring my code as close as possible to his without questioning it.
It's very hard to determine why things break when generating this file.

> @@ +34,2 @@
> >      # Generate allthethings.json
> > +    $dump_script $allthethings 2>&1 || exit 1
> 
> does the `| grep -v` omission matter here? is that output no longer needing
> to stripped?

When we call this function, we already put the output in a log.
It is *only* when the call fails that we output something.
> +        make_allthethings 2>&1 > $log || echo "Failed to generate allthething.json. Check $log"
Callek, I won't land this until tomorrow.
At that point, would you mind helping me with retriggering the latest travis run for buildbot-configs?

To make sure we don't hit bug 1238800.

I will run it locally after landing before asking you.
https://hg.mozilla.org/build/braindump/rev/b976c403fedd1fc88f532186fd90b6bb46099409
Bug 1232080 - dump_master_json.py should set exit to failure when failing + generate_allthethings_json.sh improvements. r=Callek
Travis for buildbot-configs seems happy:
https://travis-ci.org/armenzg/build-buildbot-configs/builds/105681891
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: