Closed Bug 822853 Opened 12 years ago Closed 12 years ago

b2g_mozilla-inbound_panda_gaia_central_dep builds failing with Internal Server Errors

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Assigned: fox2mike)

References

Details

https://tbpl.mozilla.org/php/getParsedLog.php?id=18068848&tree=Mozilla-Inbound

12:46:08    ERROR -  abort: HTTP Error 500: Internal Server Error
12:46:08    ERROR -  Automation Error: hg not responding

etc etc etc
http://hg.mozilla.org/integration/gaia-central is also not rendering, implying server end issues
Severity: normal → major
This appears to have started at approx 0900 PT today - it looks as if some hook changes were made (these repos shouldn't have this hook running):

2012-12-18T17:12:47+0000: starting: hg --cwd /opt/vcs2vcs/repos/integration-gaia-central push --force hg.m.o
pushing to ssh://hg.m.o/integration/gaia-central
searching for changes
updating bookmark master
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 2 changesets with 7 changes to 7 files
remote: Error accessing https://treestatus.mozilla.org/gaia-central?format=json: HTTP Error 404: NOT FOUND
remote: Unable to check if the tree is open - treating as if CLOSED.
remote: To push despite treestatus being unavailable, include "CLOSED TREE" in your push comment
remote: transaction abort!
remote: rollback completed
remote: abort: pretxnchangegroup.a_treeclosure hook failed
2012-12-18T17:12:48+0000: starting: git --git-dir /opt/vcs2vcs/repos/integration-gaia/.git fetch --force
Can those changes be identified, then rolled back?

If not, a temporary workaround would be to ask a releng member to create that treestatus object, and leave open
Assignee: server-ops → server-ops-devservices
Component: Server Operations → Server Operations: Developer Services
That seems to have done the trick, but I don't understand how a push hook can cause this. We were pushing 140MB/s out of the hg webheads by 1540, when normally it's between 10 and 20 MB/s. Does the vcs sync flood a lot of inbound traffic ?
Assignee: server-ops-devservices → server-ops
Component: Server Operations: Developer Services → Server Operations
To be more specific, what about the hook/hwine's pushes caused 
  http://hg.mozilla.org/integration/gaia-central
to return 200's but not show the usual page, and build slaves trying to update/clone to see this:

13:25:19     INFO -  pulling from http://hg-internal.dmz.scl3.mozilla.com/integration/gaia-central
13:25:19     INFO -  searching for changes
13:25:19     INFO -  adding changesets
13:25:19     INFO -  adding manifests
13:25:19     INFO -  adding file changes
13:25:19     INFO -  added 17 changesets with 32 changes to 28 files
13:25:19    ERROR -  abort: HTTP Error 500: Internal Server Error
13:25:19    ERROR -  Automation Error: hg not responding

The load can be explained by those slaves then pulling from http://hg.mozilla.org/integration/gaia-central (same fail), then clobbering, failing to find a bundle, recloning the whole repo:
13:25:30     INFO -  command: hg clone -U http://hg-internal.dmz.scl3.mozilla.com/integration/gaia-central /builds/hg-shared/integration/gaia-central
13:25:30     INFO -  command: cwd: /builds/slave/b2g-m-in-panda-gaia-cen-dep
13:25:30     INFO -  command: output:
13:27:10     INFO -  requesting all changes
13:27:10     INFO -  adding changesets
13:27:10     INFO -  adding manifests
13:27:10     INFO -  adding file changes
13:27:10     INFO -  added 14035 changesets with 31249 changes to 6653 files
13:27:10    ERROR -  abort: HTTP Error 500: Internal Server Error
13:27:10    ERROR -  Automation Error: hg not responding

ie something happens right at the end of the pull/clone transactions that aborted them, leading to a more and more load
Assignee: server-ops → server-ops-devservices
Component: Server Operations → Server Operations: Developer Services
bkero: are we seeing some bad interaction between hgweb & hg mirror updates and hg server rollbacks?

And, yes, I know bug 781012 is a large part of the solution.
Blocks: 822901
created https://treestatus.mozilla.org/gaia-nightly to avoid potential b2g blocking problems on nightly. 

Opened bug 822901 to revert this change, and the one from comment #4
Maybe https://bugzilla.mozilla.org/show_bug.cgi?id=822648 is the cause. Dropping sev since this seems to be okay for now.
Severity: major → normal
Assignee: server-ops-devservices → bkero
bug 822648 does look like the culprit, and is the right thing for all the normal repos.

The integration/gaia* repos are "special" in that they are (at the moment) the only mirrors of git data -- i.e. they are not committed to by people, but by the vcs-sync processes.

So, it seems the appropriate fix is to disable the treeclosure.py hook on the hg.m.o/integration/gaia* repos:
 gaia
 gaia-central
 gaia-nightly
 gaia-shira

Please do so.
(In reply to Hal Wine [:hwine] from comment #10)
> bug 822648 does look like the culprit, and is the right thing for all the
> normal repos.
> 
> The integration/gaia* repos are "special" in that they are (at the moment)
> the only mirrors of git data -- i.e. they are not committed to by people,
> but by the vcs-sync processes.
> 
> So, it seems the appropriate fix is to disable the treeclosure.py hook on
> the hg.m.o/integration/gaia* repos:
>  gaia
>  gaia-central
>  gaia-nightly
>  gaia-shira
> 
> Please do so.

Done. 

Only active hook for all of them now is :

changegroup.push_printurls = python:mozhghooks.push_printurls.hook

I'm going to close out this bug as this will fix this issue for good. We can back out the changes made to treeclosure (to add these repos) later..shouldn't affect operations on these repos anymore.
Assignee: bkero → shyam
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Shyam - thanks much.

Note that I do not think the treeclosure hook should be modified. The number of "special" repos will always be small (and may become 0 within a few months).

So, I think we're good period. (unless you want some extra work ;)

We (releng) will do cleanup based on bug 822901.
Thank you for removing the tree-closure hooks on those :-)

Bit of context:
When we switched to treestatus, I used the list of repos that had the tree-closure hook enabled as the basis for which trees to add to treestatus.mozilla.org. The gaia repos have been created since then, and the hook was added to them without a corresponding entry added to treestatus.m.o

However, given that those trees are only for syncing with the git repos, I agree the hook isn't needed there.

Other than bug 822901 (and perhaps further investigation into ways in which we can make the git->hg sync not DOS hg.m.o), we should be all done here.
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.