Closed Bug 1150776 Opened 9 years ago Closed 9 years ago

b2g hazard builds intermittently "failed to run config.sh" when they try to clone from it.mozilla.org

Categories

(Release Engineering :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
Tracking Status
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: philor, Assigned: sfink)

References

Details

Attachments

(2 files)

15:57	RyanVM|sheriffduty	catlee: seeing a lot of B2G Hazard failures like this that look infra-y - https://treeherder.mozilla.org/log...?job_id=2532458&repo=fx-team
15:57	RyanVM|sheriffduty	CC sfink ^
15:57	RyanVM|sheriffduty	fatal: 'it.mozilla.org/b2g/platform_system_libfdio'
15:57	catlee	yeah, I noticed that before...
15:57	catlee	something is stripping 'g'
15:58	sfink	whoa
16:00	sfink	is it stripping g, or http://g, or https://g, or git://g, or what? I'm not sure where that url is coming from.
16:00	catlee	I think it's coming from http://hg.mozilla.org/integration.../config/emulator-jb/sources.xml
16:01	catlee	06:15:50 INFO - Running command: ['./config.sh', '-q', 'emulator-jb', '/builds/slave/b2g_fx-team_l64-b2g-haz_dep-00/build/tmp_manifest/emulator-jb.xml'] in /builds/slave/b2g_fx-team_l64-b2g-haz_dep-00/build
16:04	catlee	it does some processing higher up to write out that tmp manifest...
16:06	sfink	ok, so that would mean it's stripping off https://g
16:10	catlee	which is just bizarre
16:12	catlee	sfink: tmp_manifest looks ok here....
16:13	catlee	maybe it's config.sh
16:15	catlee	https://pastebin.mozilla.org/8828030
16:17	catlee	or repo

And once you mention repo, all discussion stops.
The only place I could find code that munges those urls is in repo. For example, _resolveFetchUrl in manifest_xml.py does some rewriting. I couldn't see how it would lead to the current symptoms, though. It would be nice to get logging in there. Anyone know how to get it to use a modified repo? It has one in-tree that pulls down the latest or something, so I'm not sure where to change stuff.
kind of stumped here...let's add some debugging info to see what manifest we've actually ended up with.
Attachment #8590426 - Flags: review?(jlund)
Comment on attachment 8590426 [details] [diff] [review]
manifest-dump.diff

Review of attachment 8590426 [details] [diff] [review]:
-----------------------------------------------------------------

unless I'm mistaken, I don't think this will find the smoking gun.

I poked some recent jobs:
  - bad build: http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-inbound-linux64-b2g-haz/1428646596/b2g_mozilla-inbound_linux64-b2g-haz_dep-bm94-build1-build486.txt.gz
  - good build: http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-inbound-linux64-b2g-haz/1428645816/b2g_mozilla-inbound_linux64-b2g-haz_dep-bm77-build1-build484.txt.gz

and their slaves still had sources.xml and emulator-jb.xml in their builddir:
http://people.mozilla.org/~jlund/bad_emulator-jb.xml
http://people.mozilla.org/~jlund/good_emulator-jb.xml

they look identical.

going back through m-i, the earliest I see this happening it March 31st around 1600PT

This change is much earlier but I thought it worth mentioning since it was only on the 13th of March: https://bugzil.la/1143013
Attachment #8590426 - Flags: review?(jlund) → review+
Yes, I fear you're right. Thanks for grabbing those manifests. So this points to problems in config.sh or repo itself perhaps.

I suspect this commit:
https://gerrit.googlesource.com/git-repo/+/cb07ba7e3d466a98d0af0771b4f3e21116d69898%5E!

So...why is this happening intermittently?
Hm, perhaps some previous job on the machine updates the repo checkout to master, whereas other jobs are using stable?
hazard_build.py already calls b2gbase's checkout_sources, so just moving the configuration and checkout_repotool action there should be sufficient to get the right version. checkout_repotool really should have been there already since it is called from buildb2gbase already.
Attachment #8591068 - Flags: review?(catlee)
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Attachment #8591068 - Flags: review?(catlee) → review+
I think we just need to uplift the mozharness patch to b2g37
Would this bug be also the issue below or should a new bug be opened?  KWierso reported this in #releng earlier.

https://treeherder.mozilla.org/logviewer.html#?job_id=9385145&repo=mozilla-inbound


16:03:21     INFO -  fatal: 'it.mozilla.org/b2g/gonk-misc' does not appear to be a git repository
16:03:21     INFO -  fatal: The remote end hung up unexpectedly
16:03:32     INFO -  fatal: 'it.mozilla.org/external/apitrace/apitrace' does not appear to be a git repository
16:03:32     INFO -  fatal: The remote end hung up unexpectedly
16:03:36     INFO -  fatal: 'it.mozilla.org/b2g/platform_bootable_recovery' does not appear to be a git repository
16:03:36     INFO -  fatal: The remote end hung up unexpectedly
16:03:36     INFO -  fatal: 'it.mozilla.org/b2g/platform_system_nfcd' does not appear to be a git repository
16:03:36     INFO -  fatal: The remote end hung up unexpectedly
16:03:53     INFO -  fatal: 'it.mozilla.org/b2g/gonk-misc' does not appear to be a git repository
16:03:53     INFO -  fatal: The remote end hung up unexpectedly
16:04:05     INFO -  fatal: 'it.mozilla.org/external/apitrace/apitrace' does not appear to be a git repository
16:04:05     INFO -  fatal: The remote end hung up unexpectedly
16:04:10     INFO -  fatal: 'it.mozilla.org/b2g/platform_system_nfcd' does not appear to be a git repository
16:04:10     INFO -  fatal: The remote end hung up unexpectedly
16:04:19     INFO -  fatal: 'it.mozilla.org/b2g/platform_bootable_recovery' does not appear to be a git repository
16:04:19     INFO -  fatal: The remote end hung up unexpectedly
And retriggering a previously green build is failing, too.
Inbound trees are currently closed because of the b2g build failures.
Flags: needinfo?(catlee)
Filed bug 1159548 to deal with today's issues.
Flags: needinfo?(sphink)
Flags: needinfo?(catlee)
Not sure it is the same issue I am seeing on try but the log definitely has this
03:33:04    FATAL - failed to run config.sh
03:33:04    FATAL - Running post_fatal callback...
03:33:04    FATAL - Exiting -1
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: