So this could be anything from a cloud instance with bad I/O or a bad network connection to an issue with falling back to some slower option in either the hg client or the hg server.
If it's the former, I'm not sure there's much to do but notice and terminate the instance. I don't think we see enough of these to be able to characterize a "sick" instance accurately and quickly. And something that takes, say, 10 minutes at worker startup to decide whether the instance was sick would end up being phenomenally expensive and add a great deal of E2E time.
If it's the latter, then one thing you could do is wrap the hg command to time out after, say, ten minutes, and retry a few times within the task before failing.