Closed
Bug 1072291
Opened 10 years ago
Closed 10 years ago
Make pushlog ingestion more robust - round 2
Categories
(Tree Management :: Treeherder, defect, P1)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: mdoglio)
References
Details
Attachments
(1 file)
12:51 <zac> mdoglio|lunch, treeherder is not picking up new builds on b2g-i 13:50 <zac> https://treeherder.allizom.org/ui/#/jobs?repo=b2g-inbound 13:50 <zac> it's stuck on the 4:32am commit 14:01 <•mdoglio> zac: I'll have a look 14:14 <•mdoglio> zac: it's working now, I don't know yet what happened. I just restarted the ingestion service and everything seems to be working now 14:15 <zac> thanks mdoglio yeah I can see all the results no We need to figure out the root cause of this, in case it can occur on prod too (and in case it's due to the changes in bug 1071577
Assignee | ||
Updated•10 years ago
|
Summary: Investigate the cause of ingestion failing on treeherder-dev → Investigate the cause of ingestion failing on treeherder stage
Reporter | ||
Updated•10 years ago
|
Summary: Investigate the cause of ingestion failing on treeherder stage → Investigate the cause of pushlog ingestion failing on treeherder stage
Assignee | ||
Comment 1•10 years ago
|
||
zac reported the same issue today
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → mdoglio
Assignee | ||
Comment 2•10 years ago
|
||
We recently made some changes that have affected the network consumption. As a result, some tasks for data ingestion could take much more time than before and potentially being discarded because they exceed the maximum execution time currently set. Increasing that setting should solve this issue.
Reporter | ||
Comment 3•10 years ago
|
||
(In reply to Mauro Doglio [:mdoglio] from comment #2) > We recently made some changes that have affected the network consumption. As > a result, some tasks for data ingestion could take much more time than > before and potentially being discarded because they exceed the maximum > execution time currently set. Increasing that setting should solve this > issue. I really really think we should fix bug 1072422; this would solve the increased transfer time as well as the potential data loss by missing anything above the 10th push.
Reporter | ||
Comment 4•10 years ago
|
||
For the PR just opened, it doesn't add back the cache reset functionality, which means we'll have to manually reset in production. As an alternative to that or re-adding the django admin reset, we could just handle the 404 response from json-pushes (since we correctly get one from "fromchange", unlike "startid"), and in that case fall back to no fromchange param, like we do when the cache is empty. Sound good? :-)
Assignee | ||
Comment 5•10 years ago
|
||
yeah, I'll add handling for the 404 from json-pushes
Reporter | ||
Updated•10 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Updated•10 years ago
|
Summary: Investigate the cause of pushlog ingestion failing on treeherder stage → Make pushlog ingestion more robust - round 2
Assignee | ||
Comment 8•10 years ago
|
||
Attachment #8496043 -
Flags: review?(jeads)
Updated•10 years ago
|
Attachment #8496043 -
Flags: review?(jeads) → review+
Comment 9•10 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder-service https://github.com/mozilla/treeherder-service/commit/8f9a686fde80dbd8a11d60ae4a502a6a20cfdc99 (bug 1072291) revert pushlog caching strategy The pushlog cache now uses the top revision of the last push. Also, increase the time limit to fetch the pushlog to 3 minutes https://github.com/mozilla/treeherder-service/commit/cb3d46df361ce7d03acdbde7d0d8da81aab711e9 Bug 1072291 - handle 404 responses from json-pushes https://github.com/mozilla/treeherder-service/commit/31bfc111f6760a65bd85567bfa606a6e9d9190f6 Merge pull request #229 from mozilla/bug-1072291-fix-pushlog-ingestion (bug 1072291) revert pushlog caching strategy
Assignee | ||
Comment 10•10 years ago
|
||
We should increase the timeout for the pushlog retrieval and add a clear_cache command to the update.py script for deployment. That would prevent bugs like bug 1074077 from happening again
Assignee | ||
Comment 11•10 years ago
|
||
Increasing the timeout is not necessary, I will just update the deploy script to clear the cache as part of every deployment
Assignee | ||
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•