Closed Bug 1252062 Opened 4 years ago Closed 4 years ago

Investigate BBB issues

Categories

(Release Engineering :: Release Automation: Other, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: jlund)

References

Details

https://tools.taskcluster.net/task-graph-inspector/#RzZ5FBeISiCaLboJcS1SBg/ shows a lot of pending BBB jobs, but I believe they have been finished, but have never reported the result back.
Callek, if you have time can you peek at this?

We have a lot of scheduled l10n jobs. I believe they are all done (selfserve and s3 tell me this), but the graph still thinks they are pending...
Flags: needinfo?(bugspam.Callek)
(Live notes:)

Looking at:

https://tools.taskcluster.net/task-graph-inspector/#1-qNGWPKTAOv7vx4wmGymg/dORzv5-WQ4SGeXOXBDBYUg/

To start...

Which specifies its artifact task of https://tools.taskcluster.net/task-graph-inspector/#1-qNGWPKTAOv7vx4wmGymg/vX12LuJOSju30SbuuSgfkw/

The artifact task has a rerun listed, and a failed first run [why?]

The actual bbb task has only run0 listed and is not resolved. This is http://buildbot-master72.build.mozilla.org:8001/builders/release-date_firefox_linux64_l10n_repack/builds/38 [found with self serve help: https://secure.pub.build.mozilla.org/buildapi/self-serve/date/build/100917102

Which failed for:

20:30:32     INFO - #####
20:30:32     INFO - ##### DesktopSingleLocale summary:
20:30:32     INFO - #####
20:30:32    ERROR - failure: repack_locale(lij)


Where above in the log is:

20:28:28     INFO -        rmdir "defaults/profile/chrome/"
20:28:28     INFO -      rmrfdir "defaults/profile/US/"
20:28:28     INFO -        rmdir "defaults/profile/"
20:28:28     INFO -        rmdir "defaults/autoconfig/"
20:28:28     INFO -        rmdir "components/"
20:28:28     INFO -        rmdir "chrome/overlayinfo/"
20:28:28     INFO -        rmdir "chrome/icons/default/"
20:28:28     INFO -        rmdir "chrome/icons/"
20:28:28     INFO -        rmdir "chrome/"
20:28:28     INFO -  mv: cannot stat `/builds/slave/rel-date_fx_l64_l10n_rpk-00000/build/date/obj-l10n/dist/l10n-stage/firefox.work/output.mar': No such file or directory
20:28:28     INFO -  Finished
20:28:28     INFO -  python /builds/slave/rel-date_fx_l64_l10n_rpk-00000/build/tools/release/signing/signtool.py #.....XXXCallek Omitted rest of command for brevity
20:28:28    ERROR -  Traceback (most recent call last):
20:28:28     INFO -    File "/builds/slave/rel-date_fx_l64_l10n_rpk-00000/build/tools/release/signing/signtool.py", line 228, in <module>
20:28:28     INFO -      main()
20:28:28     INFO -    File "/builds/slave/rel-date_fx_l64_l10n_rpk-00000/build/tools/release/signing/signtool.py", line 216, in main
20:28:28     INFO -      if not remote_signfile(options, urls, f, fmt, token, dest):
20:28:28     INFO -    File "/builds/slave/rel-date_fx_l64_l10n_rpk-00000/build/tools/lib/python/signing/client.py", line 44, in remote_signfile
20:28:28     INFO -      filehash = sha1sum(filename)
20:28:28     INFO -    File "/builds/slave/rel-date_fx_l64_l10n_rpk-00000/build/tools/lib/python/util/file.py", line 44, in sha1sum
20:28:28     INFO -      fp = open(f, 'rb')
20:28:28     INFO -  IOError: [Errno 2] No such file or directory: '../../dist/update/firefox-47.0.lij.linux-x86_64.complete.mar'
20:28:28    ERROR -  make[2]: *** [complete-patch] Error 1
20:28:28     INFO -  make[2]: Leaving directory `/builds/slave/rel-date_fx_l64_l10n_rpk-00000/build/date/obj-l10n/tools/update-packaging'
20:28:28    ERROR -  make[1]: *** [repackage-zip] Error 2
20:28:28     INFO -  make[1]: Leaving directory `/builds/slave/rel-date_fx_l64_l10n_rpk-00000/build/date/obj-l10n/browser/locales'
20:28:28     INFO -  make: *** [repackage-zip-lij] Error 2
20:28:28    ERROR - Return code: 2
20:28:28    ERROR - make installers-lij failed
20:28:28     INFO - Setting buildbot property lij_failure to failure: repack_locale(lij)
20:28:28     INFO - mkdir: /builds/slave/rel-date_fx_l64_l10n_rpk-00000/properties
20:28:28     INFO - Writing buildbot properties [u'lij_failure'] to /builds/slave/rel-date_fx_l64_l10n_rpk-00000/properties/lij_failure
20:28:28     INFO - Writing to file /builds/slave/rel-date_fx_l64_l10n_rpk-00000/properties/lij_failure
20:28:28     INFO - Contents:
20:28:28     INFO -  lij_failure:failure: repack_locale(lij)
20:28:28    ERROR - failure: repack_locale(lij)
20:28:28  WARNING - setting return code to 1
20:28:28     INFO - #####
20:28:28     INFO - ##### Running taskcluster-upload step.
20:28:28     INFO - #####


I'll look into the reason why bbb didn't mark either task as failed in a moment...
To start out I grepped through all log files, not just tclistener.log

Only mention of the artifact taskId is:

tclistener.log:2016-02-29 19:58:14,220 - bbb.servicebase - Created buildset_property artifactsTaskId=["vX12LuJOSju30SbuuSgfkw", "bbb"]

Which i think is expected....

for the other:
In bbb logs:

tclistener.log:2016-02-29 19:58:14,185 - bbb.servicebase - Fetching task dORzv5-WQ4SGeXOXBDBYUg
tclistener.log-2016-02-29 19:58:14,196 - bbb.servicebase - Created sourcestamp 13517218
tclistener.log-2016-02-29 19:58:14,202 - bbb.servicebase - Created buildset 16293352
tclistener.log-2016-02-29 19:58:14,207 - bbb.servicebase - Created buildset_property product=["firefox", "bbb"]
tclistener.log-2016-02-29 19:58:14,211 - bbb.servicebase - Created buildset_property build_number=[1, "bbb"]
tclistener.log-2016-02-29 19:58:14,215 - bbb.servicebase - Created buildset_property en_us_binary_url=["https://queue.taskcluster.net/v1/task/BtQhtEHtSJCbDEVP91Dmpw/artifacts/public/build", "bbb"]
tclistener.log-2016-02-29 19:58:14,220 - bbb.servicebase - Created buildset_property artifactsTaskId=["vX12LuJOSju30SbuuSgfkw", "bbb"]
tclistener.log-2016-02-29 19:58:14,224 - bbb.servicebase - Created buildset_property version=["47.0b12", "bbb"]
tclistener.log-2016-02-29 19:58:14,228 - bbb.servicebase - Created buildset_property release_promotion=[true, "bbb"]
tclistener.log:2016-02-29 19:58:14,232 - bbb.servicebase - Created buildset_property taskId=["dORzv5-WQ4SGeXOXBDBYUg", "bbb"]
tclistener.log-2016-02-29 19:58:14,235 - bbb.servicebase - Created buildset_property script_repo_revision=["987fdde4def3c14ddefc1c531bf42825615f8b2b", "bbb"]
tclistener.log-2016-02-29 19:58:14,239 - bbb.servicebase - Created buildset_property repo_path=["projects/date", "bbb"]
tclistener.log-2016-02-29 19:58:14,244 - bbb.servicebase - Created buildset_property locales=["id:498881006842 is:fa0da095f651 it:6644e67fa742 ja:f70b8d555701 kk:a69f24844a8d km:339790840ac1 kn:5ab48195d828 ko:413e47793ca6 lij:e9113fac3a93", "bbb"]
tclistener.log-2016-02-29 19:58:14,248 - bbb.servicebase - Created buildset_property revision=["987fdde4def3c14ddefc1c531bf42825615f8b2b", "bbb"]
tclistener.log-2016-02-29 19:58:14,252 - bbb.servicebase - Created buildrequest release-date_firefox_linux64_l10n_repack: 99763849
tclistener.log:2016-02-29 19:58:14,253 - bbb.servicebase - Creating task dORzv5-WQ4SGeXOXBDBYUg
tclistener.log-2016-02-29 19:58:14,264 - requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): queue.taskcluster.net




And looks like the other logs stalled:

==> bblistener.log <==
2016-02-28 04:11:19,019 - bbb.services - WEIRD: Task not found for brid 99556626, nothing to do.
2016-02-28 04:11:19,221 - bbb.services - WEIRD: Task not found for brid 99555843, nothing to do.
2016-02-28 04:11:26,187 - bbb.services - WEIRD: Task not found for brid 99553827, nothing to do.
2016-02-28 04:11:26,191 - bbb.services - WEIRD: Task not found for brid 99553897, nothing to do.
2016-02-28 04:11:27,565 - bbb.services - WEIRD: Task not found for brid 99554096, nothing to do.
2016-02-28 04:11:27,570 - bbb.services - WEIRD: Task not found for brid 99554174, nothing to do.
2016-02-28 04:11:28,645 - bbb.services - WEIRD: Task not found for brid 99554289, nothing to do.
2016-02-28 04:11:28,649 - bbb.services - WEIRD: Task not found for brid 99554356, nothing to do.
2016-02-28 04:11:29,315 - bbb.services - WEIRD: Task not found for brid 99552915, nothing to do.
2016-02-28 04:11:29,362 - bbb.services - WEIRD: Task not found for brid 99555940, nothing to do.

==> reflector.log <==
2016-02-28 05:42:16,850 - bbb.services - Processing task: jN02ZoyhT36iOCWGcV4DbA
2016-02-28 05:42:16,858 - sqlalchemy.pool.QueuePool - Exception during reset or similar
Traceback (most recent call last):
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 567, in _finalize_fairy
    fairy._reset(pool)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line 701, in _reset
    pool._dialect.do_rollback(self)
  File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2519, in do_rollback
    dbapi_connection.rollback()
ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
(In reply to Justin Wood (:Callek) from comment #4)

> ==> reflector.log <==
> 2016-02-28 05:42:16,850 - bbb.services - Processing task:
> jN02ZoyhT36iOCWGcV4DbA
> 2016-02-28 05:42:16,858 - sqlalchemy.pool.QueuePool - Exception during reset
> or similar
> Traceback (most recent call last):
>   File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line
> 567, in _finalize_fairy
>     fairy._reset(pool)
>   File "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/pool.py", line
> 701, in _reset
>     pool._dialect.do_rollback(self)
>   File
> "/builds/bbb/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py",
> line 2519, in do_rollback
>     dbapi_connection.rollback()
> ProgrammingError: (2014, "Commands out of sync; you can't run this command
> now")

For this last part, c.f. Bug 1250184 c#1
* deleted task with id: jN02ZoyhT36iOCWGcV4DbA mentioned in reflector log (comment 1) from the staging bbb db
* restarted staging bbb services with supervisor.

things look to be good again. will file now some follow up bugs to track something more substantial than this bandaid
Depends on: 1250184
Assignee: nobody → jlund
Flags: needinfo?(bugspam.Callek)
resolving for now and will leave the 1250184 for tracking the long term fix
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.