Closed Bug 907246 Opened 11 years ago Closed 11 years ago

unable to parse log_url for l10n report

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Unassigned)

Details

12:08 Callek: .... is there any issues with the age of job results on tbpl right now
12:08 edmorley: Callek: nagios alerts for 4 hour builds perchance?
12:09 edmorley: looks up to date to me, bad guess
12:09 philor: they seemed sort of laggy yesterday, suddenly getting tons of results, but not enough to pin down
12:09 Callek: edmorley: no, but I'm seeing cron spam for a related service that is identical to cron spam that caused a 4-hour issue to show its head to me last time I saw it
12:09 Callek: Cron <buildapi@buildapi01> /home/buildapi/bin/report-today.sh
12:09 edmorley: ah
12:10 philor: yeah, they're lagging
12:11 catlee: ah, some busted property again?
12:11 Callek: catlee: almost certainly
12:12 Callek is staying hands off other than reporting the issue, unless asked to step in and investigate
12:14 armenzg_buildduty: Callek: I don't know what is going on
12:14 Callek: armenzg_buildduty: step 1 is looking at buildapi logs
12:14 Callek: :-)
12:14 armenzg_buildduty: no clue where those are
12:19 Callek: armenzg_buildduty: https://wiki.mozilla.org/ReleaseEngineering/BuildAPI#Maintenance
12:21 armenzg_buildduty: thanks Callek
Assignee: armenzg → bugspam.Callek
Cron has been coming every 15 mins since 6am PT:

Date: Tue, 20 Aug 2013 06:00:55 -0700
Message-Id: <201308201300.r7KD0teT011643@buildapi01.build.scl1.mozilla.com>
From: root@buildapi01.build.scl1.mozilla.com (Cron Daemon)
To: release@mozilla.com
Subject: Cron <buildapi@buildapi01> /home/buildapi/bin/report-today.sh
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
X-Cron-Env: <MAILTO=release@mozilla.com>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/buildapi>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=buildapi>
X-Cron-Env: <USER=buildapi>

'log_url' {'slave_id': 1331, 'master_id': 99, 'builder_id': 145000, 'request_ids': [28233507], 'id': 26765853, 'buildnumber': 2249, 'reason': 'This build was triggered by the successful completion of the en-US nightly.', 'requesttime': 1376998362, 'result': 4, 'starttime': 1377001747, 'endtime': 1377002973, 'properties': {'completeMarFilename': 'firefox-25.0a2.fa.win32.complete.mar', 'appName': 'Firefox', 'locale': 'fa', 'forced_clobber': False, 'l10n_revision': 'c1b4d82f45f9', 'slavebuilddir': 'm-aurora-w32-l10n-ntly-0000000', 'previous_buildid': '20130819004005', 'previousMarFilename': 'firefox-25.0a2.fa.win32.complete.mar', 'slavename': 'w64-ix-slave06', 'project': '', 'make.py[0]:': 'Leaving', 'builddir': 'm-aurora-w32-l10n-ntly-0000000', 'filepath': None, 'en_revision': 'default', 'basedir': 'e:/builds/moz2_slave/m-aurora-w32-l10n-ntly-0000000', 'completesnippetFilename': 'build/mozilla-aurora/obj-firefox/dist/update/complete.update.snippet', 'appVersion': '25.0a2', 'pur
 ge_actual': '79.45GB', 'purge_target': '3GB', 'platform': 'win32', 'master': 'http://buildbot-master62.srv.releng.use1.mozilla.com:8001/', 'branch': 'mozilla-aurora', 'fx_revision': '4167b7fca818', 'revision': None, 'product': 'firefox', 'repository': '', 'buildername': 'Firefox mozilla-aurora win32 l10n nightly', 'buildid': '20130820004003', 'hashType': 'sha512', 'previous_inipath': 'previous/application.ini', 'scheduler': 'Firefox mozilla-aurora win32 l10n nightly', 'inipath': 'dist/l10n-stage/firefox/application.ini', 'purged_clobber': False, 'buildnumber': 2249, 'periodic_clobber': False, 'builduid': 'cb0245a9bc7e46eab7a655c38ce437b3'}}

---
and
---

Date: Tue, 20 Aug 2013 09:15:58 -0700
Message-Id: <201308201615.r7KGFwFK018350@buildapi01.build.scl1.mozilla.com>
From: root@buildapi01.build.scl1.mozilla.com (Cron Daemon)
To: release@mozilla.com
Subject: Cron <buildapi@buildapi01> /home/buildapi/bin/report-today.sh
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
X-Cron-Env: <MAILTO=release@mozilla.com>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/buildapi>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=buildapi>
X-Cron-Env: <USER=buildapi>

'log_url' {'slave_id': 1331, 'master_id': 99, 'builder_id': 145000, 'request_ids': [28233507], 'id': 26765853, 'buildnumber': 2249, 'reason': 'This build was triggered by the successful completion of the en-US nightly.', 'requesttime': 1376998362, 'result': 4, 'starttime': 1377001747, 'endtime': 1377002973, 'properties': {'completeMarFilename': 'firefox-25.0a2.fa.win32.complete.mar', 'appName': 'Firefox', 'locale': 'fa', 'forced_clobber': False, 'l10n_revision': 'c1b4d82f45f9', 'slavebuilddir': 'm-aurora-w32-l10n-ntly-0000000', 'previous_buildid': '20130819004005', 'previousMarFilename': 'firefox-25.0a2.fa.win32.complete.mar', 'slavename': 'w64-ix-slave06', 'project': '', 'make.py[0]:': 'Leaving', 'builddir': 'm-aurora-w32-l10n-ntly-0000000', 'filepath': None, 'en_revision': 'default', 'basedir': 'e:/builds/moz2_slave/m-aurora-w32-l10n-ntly-0000000', 'completesnippetFilename': 'build/mozilla-aurora/obj-firefox/dist/update/complete.update.snippet', 'appVersion': '25.0a2', 'pur
 ge_actual': '79.45GB', 'purge_target': '3GB', 'platform': 'win32', 'master': 'http://buildbot-master62.srv.releng.use1.mozilla.com:8001/', 'branch': 'mozilla-aurora', 'fx_revision': '4167b7fca818', 'revision': None, 'product': 'firefox', 'repository': '', 'buildername': 'Firefox mozilla-aurora win32 l10n nightly', 'buildid': '20130820004003', 'hashType': 'sha512', 'previous_inipath': 'previous/application.ini', 'scheduler': 'Firefox mozilla-aurora win32 l10n nightly', 'inipath': 'dist/l10n-stage/firefox/application.ini', 'purged_clobber': False, 'buildnumber': 2249, 'periodic_clobber': False, 'builduid': 'cb0245a9bc7e46eab7a655c38ce437b3'}}
'log_url' {'slave_id': 1322, 'master_id': 99, 'builder_id': 145708, 'request_ids': [28233648], 'id': 26766628, 'buildnumber': 521, 'reason': 'This build was triggered by the successful completion of the en-US nightly.', 'requesttime': 1376998422, 'result': 4, 'starttime': 1377002846, 'endtime': 1377004336, 'properties': {'locale': 'et', 'forced_clobber': False, 'l10n_revision': '2e1246c9b33b', 'buildnumber': 521, 'make.py[0]:': 'Leaving', 'builddir': 'tb-c-cen-w32-l10n-ntly-0000000', 'en_revision': 'default', 'basedir': 'e:/builds/moz2_slave/tb-c-cen-w32-l10n-ntly-0000000', 'purge_actual': '92.65GB', 'purge_target': '3GB', 'platform': 'win32', 'master': 'http://buildbot-master62.srv.releng.use1.mozilla.com:8001/', 'branch': 'comm-central', 'revision': None, 'product': 'thunderbird', 'repository': '', 'buildername': 'Thunderbird comm-central win32 l10n nightly', 'buildid': '20130820030202', 'hashType': 'sha512', 'scheduler': 'Thunderbird comm-central win32 l10n nightly', 'moz_
 revision': 'd8b7195b63c9', 'purged_clobber': False, 'project': '', 'periodic_clobber': False, 'comm_revision': 'e6f324a5b474', 'builduid': '1086fa1f355b422a9043544b176a5a38'}}
'log_url' {'slave_id': 1323, 'master_id': 86, 'builder_id': 145715, 'request_ids': [28233671], 'id': 26766752, 'buildnumber': 390, 'reason': 'This build was triggered by the successful completion of the en-US nightly.', 'requesttime': 1376998434, 'result': 4, 'starttime': 1377003569, 'endtime': 1377004516, 'properties': {'hashType': 'sha512', 'builddir': 'tb-c-cen-w32-l10n-ntly-0000000', 'buildername': 'Thunderbird comm-central win32 l10n nightly', 'buildid': '20130820030202', 'basedir': 'e:/builds/moz2_slave/tb-c-cen-w32-l10n-ntly-0000000', 'forced_clobber': True, 'buildnumber': 390, 'en_revision': 'default', 'l10n_revision': 'df217fde5bda', 'platform': 'win32', 'branch': 'comm-central', 'periodic_clobber': False, 'master': 'http://buildbot-master63.srv.releng.use1.mozilla.com:8001/', 'locale': 'sv-SE', 'builduid': '1086fa1f355b422a9043544b176a5a38'}}
'log_url' {'slave_id': 1312, 'master_id': 99, 'builder_id': 145376, 'request_ids': [28239813], 'id': 26771180, 'buildnumber': 895, 'reason': 'This build was triggered by the successful completion of the en-US nightly.', 'requesttime': 1377007055, 'result': 4, 'starttime': 1377008792, 'endtime': 1377011318, 'properties': {'completeMarFilename': 'firefox-26.0a1.ja.win32.complete.mar', 'appName': 'Firefox', 'locale': 'ja', 'forced_clobber': True, 'l10n_revision': '1d668e57d1d0', 'previous_buildid': '20130818030219', 'previousMarFilename': 'firefox-26.0a1.ja.win32.complete.mar', 'project': '', 'make.py[0]:': 'Leaving', 'builddir': 'm-cen-w32-l10n-ntly-0000000000', 'filepath': None, 'en_revision': 'default', 'basedir': 'e:/builds/moz2_slave/m-cen-w32-l10n-ntly-0000000000', 'appVersion': '26.0a1', 'purge_actual': '102.19GB', 'purge_target': '3GB', 'platform': 'win32', 'master': 'http://buildbot-master62.srv.releng.use1.mozilla.com:8001/', 'branch': 'mozilla-central', 'fx_revision':
  'bb025b6949e8', 'revision': None, 'product': 'firefox', 'repository': '', 'buildername': 'Firefox mozilla-central win32 l10n nightly', 'buildid': '20130820030206', 'hashType': 'sha512', 'previous_inipath': 'previous/application.ini', 'scheduler': 'Firefox mozilla-central win32 l10n nightly', 'inipath': 'dist/l10n-stage/firefox/application.ini', 'purged_clobber': False, 'buildnumber': 895, 'periodic_clobber': False, 'builduid': 'f8dc29e95ec6469ea0ccea4aaddd472c'}}
which is from the keyError in http://hg.mozilla.org/build/buildapi/annotate/61965c9008a4/buildapi/scripts/l10n_report.py#l55

So, this is unable to find log_url

It doesn't actually exist in the build properties (but no idea why). It is meant to be added in at http://mxr.mozilla.org/build/source/buildbotcustom/bin/postrun.py#299

The buildbot job for this is http://buildbot-master62.srv.releng.use1.mozilla.com:8001/builders/Firefox%20mozilla-aurora%20win32%20l10n%20nightly/builds/2249

With an actually uploaded log at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/08/2013-08-20-00-40-03-mozilla-aurora-l10n/mozilla-aurora-win32-l10n-nightly-fa-bm62-build1-build2249.txt.gz


Maybe related is the pymake-breakage with the property setting at http://buildbot-master62.srv.releng.use1.mozilla.com:8001/builders/Firefox%20mozilla-aurora%20win32%20l10n%20nightly/builds/2249/steps/mock-setproperty/logs/stdio which is setting a property named 'make[0]:' to the value 'Leaving'

Since this is now at a point where its clear its not an ongoing issue (despite cron spam) and I have no idea where to go next, I'm dropping it from my radar, and moving to tools
Assignee: bugspam.Callek → nobody
Component: Buildduty → Tools
QA Contact: armenzg → hwine
Summary: buildapi is lagging → unable to parse log_url for l10n report
Axel, thought you would want to be aware of this bug...
So, nthomas responded to the cron e-mail with a solution [1]. And I then applied it to the remaining items (since he mentally spaced that there was more than one that was broken).

My output is here:

[root@buildapi01 buildapi]# telnet redis01.build.mozilla.org 6379
Trying 10.12.48.24...
Connected to redis01.build.mozilla.org (10.12.48.24).
Escape character is '^]'.
DEL builds:26766628
:1
DEL builds:26766752
:1
DEL builds:26771180
:1
^]
telnet> quit


[1] Nicks E-mail:

Redis fail again, the property does exist but redis doesn't know about
it. The magic number to identify this build is the id property 26765853.

mysql> select p.* from properties as p join build_properties as bp on
    -> bp.property_id = p.id where bp.build_id=26765853 and
p.name='log_url';
+----------+---------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id       | name    | source     | value

                                                               |
+----------+---------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 69683666 | log_url | postrun.py |
"http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/08/2013-08-20-00-40-03-mozilla-aurora-l10n/mozilla-aurora-win32-l10n-nightly-fa-bm62-build1-build2249.txt.gz"
|
+----------+---------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

FIX:
$ telnet redis01.build.mozilla.org 6379
Trying 10.12.48.24...
Connected to redis01.build.scl1.mozilla.com.
Escape character is '^]'.
DEL builds:26765853
:1

-Nick
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
There might be a race condition at the root of this. The master is uploading the log to ftp, and adding the build to the buildbot statusdb with the extra log_url property. Meanwhile buildapi01 is creating 
 https://secure.pub.build.mozilla.org/builddata/buildjson/builds-4hr.js.gz
every 60 seconds, populating redis as a cache as it goes. catlee would know if that's plausible.
BTW, report_l10n.py bustage doesn't affect the generation of any js.gz files, and anyway is on a different cron job from builds-4hr.js.gz. To show any connection between them we'd have to see excessive load causing lag, or something.
We hit another one of these today, a single instance, so I looked a little more.

After a build finishes we end up in
  http://hg.mozilla.org/build/buildbotcustom/file/default/bin/postrun.py#l286
The log gets uploaded in the first branch of the if, then the build is added to the status db in the elif. The db insert looks like a transaction so it really ought to be all or nothing, so this looks less like a race.

And yet, if you look at a weeks worth of the dated json files in https://secure.pub.build.mozilla.org/builddata/buildjson/, you can extract this:
   Date              # of jobs   # without log_url property
   2013-08-18           12992          7
   2013-08-19           34295          7
   2013-08-20           53662        119
   2013-08-21           50284          5
   2013-08-22           54371          2
   2013-08-23           48271          1
   2013-08-24           27748          4
It's lots of different types of builds, not just l10n, but we report the l10n case. And also, all the failures occur when the master is in use1.mozilla.com. So perhaps related to increased latency back to SCL3.
That's plausible. I believe the properties are added to the DB one by one, and so it would be possible for the report to see partial build information.
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.