Try pushes not showing up, hg.mozilla.org/try is broken - CLOSED TREES
Categories
(Developer Services :: Mercurial: hg.mozilla.org, defect)
Tracking
(Not tracked)
People
(Reporter: nataliaCs, Assigned: sheehan)
References
Details
Cron task failure e-mails have been received:
- task example: e.g. https://firefox-ci-tc.services.mozilla.com/tasks/BW2NTBfCQ2WUJ4cOzFr6Ew
- log: https://firefox-ci-tc.services.mozilla.com/tasks/BW2NTBfCQ2WUJ4cOzFr6Ew/runs/0/logs/https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FBW2NTBfCQ2WUJ4cOzFr6Ew%2Fruns%2F0%2Fartifacts%2Fpublic%2Flogs%2Flive.log
2020-10-30 09:47:08,351 - INFO - retry: calling get_push_info with args: (Repository(repo_url='https://hg.mozilla.org/integration/autoland', repository_type='hg', project='autoland', level='3', trust_domain='gecko'),), kwargs: {'branch': 'default'}, attempt #5
2020-10-30 09:47:08,734 - INFO - retry: Giving up on get_push_info
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/build_decision/util/cli.py", line 62, in main
args.command(vars(args))
File "/usr/local/lib/python3.8/site-packages/build_decision/cli.py", line 34, in wrapper
func(args)
File "/usr/local/lib/python3.8/site-packages/build_decision/cli.py", line 69, in cron
run(
File "/usr/local/lib/python3.8/site-packages/build_decision/cron/init.py", line 74, in run
push_info = repository.get_push_info(branch=branch)
File "/usr/local/lib/python3.8/site-packages/redo/init.py", line 230, in _retriable_wrapper
return retry(func, args=args, kwargs=kwargs, *retry_args, **retry_kwargs)
File "/usr/local/lib/python3.8/site-packages/redo/init.py", line 185, in retry
return action(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/build_decision/repository.py", line 92, in get_push_info
raise RetryableError(
build_decision.repository.RetryableError: Changeset default has no associated pushes. Maybe the push log has not been updated?
[taskcluster 2020-10-30 09:47:09.016Z] === Task Finished ===
[taskcluster 2020-10-30 09:47:09.016Z] Unsuccessful task run with exit code: 1 completed in 90.977 seconds
Try doesn't seem to show any pushes.
Mentioned on #sheriffs:
there are 9 changesets (3 pushes?) which landed on autoland according to hg pull && hg update && hg id but are not shown
Issue was raised on #vcs
Updated•5 years ago
|
copied from #vcs chat:
05:02
dhouse: @Aryx the pushdataaggregator-pending.service shows a change in the logs after/around 08:22 also. THis one may be showing what the problem is and that is may be a repeat of the problem from bug 1673214 because instead of the number of unacked messages changing, it repeatedly says there is 1 unacked message
05:03
Oct 30 08:37:08 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 1 unacked messages in 1 partition: [6]
Oct 30 08:37:08 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:08 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 5 unacked messages in 1 partition: [6]
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 1 unacked messages in 1 partition: [6]
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:09 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:37:10 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 1 unacked messages in 1 partition: [6]
Oct 30 08:37:10 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:10 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:37:10 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 1 unacked messages in 1 partition: [6]
Oct 30 08:37:10 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:10 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 8 unacked messages in 1 partition: [6]
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:11 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:37:29 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 1 unacked messages in 1 partition: [6]
Oct 30 08:37:29 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:29 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:37:36 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 1 unacked messages in 1 partition: [6]
Oct 30 08:37:36 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:37:36 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:38:04 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator 1 unacked messages in 1 partition: [6]
Oct 30 08:38:04 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:38:04 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[24544]: kafka.producer producer.stop() called, but producer is not async
(then these last 3 lines repeats even now after I restarted that service/task)
05:06
it is repeatedly for this "partition 6" and previous messages are for various 0-7 partitions like:
Oct 30 08:25:17 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[10843]: vcsreplicator.aggregator copying heartbeat-1 from partition 6
Oct 30 08:25:17 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[10843]: vcsreplicator.aggregator copying heartbeat-1 from partition 7
Oct 30 08:25:17 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[10843]: kafka.producer producer.stop() called, but producer is not async
Oct 30 08:25:18 hgssh1.dmz.mdc1.mozilla.com pushdataaggregator-pending[10843]: vcsreplicator.aggregator 2 unacked messages in 2 partition: [0, 5]
05:08
I'll look to see if I can find a mapping for the "partition 6" to a server (possibly an aws instance like the failure last weekend)
(In reply to Dave House [:dhouse] from comment #1)
I'll look to see if I can find a mapping for the "partition 6" to a server (possibly an aws instance like the failure last weekend)
The partitions don't appear to map to a host, but that that partition of the data is not synced across all hosts. And there is a host list on the ssh machines (noted in the docs. https://mozilla-version-control-tools.readthedocs.io/en/latest/hgmo/ops.html#mirrors-in-pushdataaggregator-groups-file), at:
[dhouse@hgssh1.dmz.mdc1 ~]$ cat /repo/hg/pushdataaggregator_groups
hgweb1
hgweb2
hgweb3
hgweb4
hgweb-priv-uw2-a-1
hgweb-priv-uw2-b-1
hgweb-priv-ue1-a-1
hgweb-priv-ue1-b-1
Comment 3•5 years ago
|
||
dhouse investigated, sheehan rebooted the offending processes.
Description
•