Closed Bug 1375621 Opened 4 years ago Closed 4 years ago
Autophone svg jobs frequently fail and give no failure summary
This has been happening for a while. I frequently see svg jobs failing and giving a log like https://autophone.s3.amazonaws.com/v1/task/SljdofMQR82AENrgo3aELA/runs/0/artifacts/public/build/5c7a83bb-a8f1-44f6-9657-26711e1450b7-autophone.log but Treeherder fails to parse that to give it a nice, starrable failure summary line.
This failure to emit failure messages in production has been a long standing problem. It works for me locally on my Fedora workstation, but I can not seem to find why it fails in Ubuntu. I am going to be upgrading the Autophone servers during mozsfo to use Fedora and this failure to emit messages will at least begin to go away.
Depends on: 1304063
The upgrade to Fedora did not fix this issue. I'll look into it soonest.
Assignee: nobody → bob
:bc - Do you know what's happening here? Are all/most of these crashes? I opened a few examples at random and found minidump crash reports in the logs, but I was surprised that there was no "PROCESS-CRASH", as emitted by mozcrash: https://dxr.mozilla.org/mozilla-central/rev/b07db5d650b7056c78ba0dbc409d060ec4e922cd/testing/mozbase/mozcrash/mozcrash/mozcrash.py#108 Could it be that simple?
No, I don't think so. I don't use mozcrash but use autophonecrash instead. Take https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=843a2c9538f9fc44dad973f51f5e31ce72a05776&filter-searchStr=autophone&exclusion_profile=false for example which shows a PROCESS-CRASH for an s1s2 job. Jobs in production are missing the PROCESS-CRASH even though they are running the same systems as I do. I have autophone-4 configured to test why the shutdown intent caused crashes and plan to use it to diagnose this problem once and for all.
Note bug 1380134 comment 4 which links to the results where tsvg and t failures do not have failure lines or failure summaries. But in bug 1380134 comment 5, autophone-4 reporting to treeherder.allizom.org and autophone-dev.s3.amazonaws.com does show the failure lines and failure summaries. I did two runs on autophone-4 on autoland reporting to staging. The first one at Tue Jul 11, 22:13:34 was with log level DEBUG and the other was at Tue Jul 11, 22:09:35 and both showed failure lines. The code running on autophone-4 is the same as was running prior to the backout in bug 1371291. There must be some configuration difference which is responsible. I'll look again later since I'm on pto today.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1381511
Whiteboard: [stockwell needswork] → [stockwell fixed:other]
You need to log in before you can comment on or make changes to this bug.