android-l10n hook breaks for very large PRs
Categories
(Release Engineering :: Release Automation, defect, P3)
Tracking
(Not tracked)
People
(Reporter: mtabara, Unassigned)
References
Details
Attachments
(1 file)
Manul forces of the https://firefox-ci-tc.services.mozilla.com/tasks/CwrPMGfEQg60YxjqZf7UjA are broken. The payload is too large.
| Reporter | ||
Comment 1•5 years ago
|
||
So the hook failed three times:
- https://firefox-ci-tc.services.mozilla.com/tasks/Pu1WcRqMSmWD56fsjvH7JA
- https://firefox-ci-tc.services.mozilla.com/tasks/EGFcLU_kQF6sGq1EEG2EUQ
- https://firefox-ci-tc.services.mozilla.com/tasks/CwrPMGfEQg60YxjqZf7UjA
Digging a bit, seems like the hook doesn't have a cron specifically in the repo but instead is triggered manually/pulse. I checked in the ci-config and it turns out that several pulse messages in various mobile projects (that use android-l10n) can trigger the hook via the github-push event.
Looking at the broken ones, seems like they were all caused by a PR that was merged in the AC, repo, specifically this. The reason the tasks fail are because TC returns standard_init_linux.go:190: exec user process caused "argument list too long" (full log here).
Looking at the tasks itself, indeed the payload is huge. Carefully comparing the HOOK_PAYLOAD, turns out that there's some metadata there that we store about the PR that has been merged in the mobile repos using android-l10n. I've compared the broken ones to one of the tasks that work, for example this.
The broken PR touchses ~700 files whereas the green PR touches two files. Because the files touched are specifically enumerated in the payload as a list of touched-files, the former breaks the string limits for that.
Solution is to move this input from cmdline encoded json to some input artifact or file or alike so that we're no longer limited by that.
| Reporter | ||
Comment 2•5 years ago
|
||
Hm, I’m sort of stuck. I don’t think the hook is broken per-se, since the hook has several green jobs in the past 24h (based on the pulse messages from master pushes). The hook emails we received were broken because of this PR https://github.com/mozilla-mobile/android-components/pull/8256/files with ~700 files broke the limitation in TC. I don’t see an immediate way to fix that. The hook payload comes from https://hg.mozilla.org/ci/ci-configuration/file/tip/cron-task-template.yml#l44 but I don’t understand who’s actually generating that payload (I presume somewhere under ciadmin generate that talks to Github API and bakes the result into that payload). I don’t see an easy way out, other than forking https://hg.mozilla.org/ci/ci-configuration/file/tip/cron-task-template.yml to some custom cron template that reads from input artifacts rather than hook-payload.
Comment 3•5 years ago
|
||
Hm. That may be an option. We have control over the cron tasks via https://hg.mozilla.org/ci/ci-admin/file/default/build-decision/src/build_decision/cron/__init__.py , I think. If the task that triggers the cron decision task creates an artifact for that cron decision task, then the corn decision task could point to that artifact.
This is a pretty big change, though, and the person who knew most about this stuff was Tom.
A possible workaround may be for someone to split up the broken PR into smaller chunks: 70 PRs of 10 files each? but that seems suboptimal. I'm not sure what else we can do. We may want to figure out what the consequences are if we don't parse that PR for l10n strings.
| Reporter | ||
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Comment 5•5 years ago
|
||
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QCmJJ5FNRCOWQqPy98q9gw died with standard_init_linux.go:190: exec user process caused "argument list too long"
Updated•4 years ago
|
Updated•3 years ago
|
Comment 7•3 years ago
|
||
Description
•