Closed Bug 1472019 Opened 6 years ago Closed 2 years ago

periodic file update task frequently timing out

Categories

(Release Engineering :: Release Automation: Other, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: aryx, Assigned: sfraser)

Details

More than half of the periodic file update runs time out, e.g. https://treeherder.mozilla.org/logviewer.html#?job_id=185331074&repo=mozilla-central Adding forced hosts Examining 53330 hosts. Waiting for 53330 responses. HSTS Probe received 53330 statuses. INFO: Writing output to nsSTSPreloadList.inc finished writing output file HSTS probing all done JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 150: TypeError: Cr is undefined JavaScript error: /home/worker/scripts/getHSTSPreloadList.js, line 150: TypeError: Cr is undefined [taskcluster:error] Task timeout after 3600 seconds. Force killing container. [taskcluster 2018-06-28 11:02:05.615Z] === Task Finished ===
Flags: needinfo?(sfraser)
Assignee: nobody → sfraser
Flags: needinfo?(sfraser)
I've been looking at this since it was ported to taskcluster. After porting it to Python, I had similar symptoms but useful error messages: we're reaching dns rate limits somewhere. What seems to happen is that the DNS lookups fail, but the event loop never handles them properly. Indeed, it thinks all the application code has finished, when there are still some coroutines around waiting for a response. In the Javascript version this means that things like 'Cr' fall out of scope and are now undefined. I've been experimenting with different ways of fixing this, from adding a DNS resolver inside the container, to internal rate limiting. The former doesn't appear to solve much, as it still relies on the container's original resolvers. Even adding extra ones still causes issues. Using Python's semaphore to slow down concurrency does seem to help, but Javascript doesn't have an equivalent without using an npm package, and I don't currently know if those would work with xpcshell. I'll be trying out that last option soon.

This task still takes a long time to run, but it doesn't time out these days.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.