Closed Bug 1072291 Opened 11 years ago Closed 11 years ago

Make pushlog ingestion more robust - round 2

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: emorley, Assigned: mdoglio)

References

Details

Attachments

(1 file)

Github PR #229 on treeherder-service 11 years ago Mauro Doglio [:mdoglio] 54 bytes, text/x-github-pull-request	jeads : review+	Details \| Review

Ed Morley [:emorley]

Reporter

Description

•

11 years ago

12:51 <zac> mdoglio|lunch, treeherder is not picking up new builds on b2g-i 13:50 <zac> https://treeherder.allizom.org/ui/#/jobs?repo=b2g-inbound 13:50 <zac> it's stuck on the 4:32am commit 14:01 <•mdoglio> zac: I'll have a look 14:14 <•mdoglio> zac: it's working now, I don't know yet what happened. I just restarted the ingestion service and everything seems to be working now 14:15 <zac> thanks mdoglio yeah I can see all the results no We need to figure out the root cause of this, in case it can occur on prod too (and in case it's due to the changes in bug 1071577

Mauro Doglio [:mdoglio]

Assignee

Updated

•

11 years ago

Summary: Investigate the cause of ingestion failing on treeherder-dev → Investigate the cause of ingestion failing on treeherder stage

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Blocks: 1072379

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Summary: Investigate the cause of ingestion failing on treeherder stage → Investigate the cause of pushlog ingestion failing on treeherder stage

Mauro Doglio [:mdoglio]

Assignee

Comment 1

•

11 years ago

zac reported the same issue today

Mauro Doglio [:mdoglio]

Assignee

Updated

•

11 years ago

Assignee: nobody → mdoglio

Mauro Doglio [:mdoglio]

Assignee

Comment 2

•

11 years ago

We recently made some changes that have affected the network consumption. As a result, some tasks for data ingestion could take much more time than before and potentially being discarded because they exceed the maximum execution time currently set. Increasing that setting should solve this issue.

Ed Morley [:emorley]

Reporter

Comment 3

•

11 years ago

(In reply to Mauro Doglio [:mdoglio] from comment #2) > We recently made some changes that have affected the network consumption. As > a result, some tasks for data ingestion could take much more time than > before and potentially being discarded because they exceed the maximum > execution time currently set. Increasing that setting should solve this > issue. I really really think we should fix bug 1072422; this would solve the increased transfer time as well as the potential data loss by missing anything above the 10th push.

Ed Morley [:emorley]

Reporter

Comment 4

•

11 years ago

For the PR just opened, it doesn't add back the cache reset functionality, which means we'll have to manually reset in production. As an alternative to that or re-adding the django admin reset, we could just handle the 404 response from json-pushes (since we correctly get one from "fromchange", unlike "startid"), and in that case fall back to no fromchange param, like we do when the cache is empty. Sound good? :-)

Mauro Doglio [:mdoglio]

Assignee

Comment 5

•

11 years ago

yeah, I'll add handling for the 404 from json-pushes

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Status: NEW → ASSIGNED

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Summary: Investigate the cause of pushlog ingestion failing on treeherder stage → Make pushlog ingestion more robust - round 2

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

No longer blocks: 1072379

Mauro Doglio [:mdoglio]

Assignee

Comment 8

•

11 years ago

Attached file Github PR #229 on treeherder-service — Details

Attachment #8496043 - Flags: review?(jeads)

Jonathan Eads ( :jeads )

Updated

•

11 years ago

Attachment #8496043 - Flags: review?(jeads) → review+

Treeherder GitHub Bugbot

Comment 9

•

11 years ago

Commits pushed to master at https://github.com/mozilla/treeherder-service https://github.com/mozilla/treeherder-service/commit/8f9a686fde80dbd8a11d60ae4a502a6a20cfdc99 (bug 1072291) revert pushlog caching strategy The pushlog cache now uses the top revision of the last push. Also, increase the time limit to fetch the pushlog to 3 minutes https://github.com/mozilla/treeherder-service/commit/cb3d46df361ce7d03acdbde7d0d8da81aab711e9 Bug 1072291 - handle 404 responses from json-pushes https://github.com/mozilla/treeherder-service/commit/31bfc111f6760a65bd85567bfa606a6e9d9190f6 Merge pull request #229 from mozilla/bug-1072291-fix-pushlog-ingestion (bug 1072291) revert pushlog caching strategy

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Depends on: 1074077

Mauro Doglio [:mdoglio]

Assignee

Comment 10

•

11 years ago

We should increase the timeout for the pushlog retrieval and add a clear_cache command to the update.py script for deployment. That would prevent bugs like bug 1074077 from happening again

Mauro Doglio [:mdoglio]

Assignee

Comment 11

•

11 years ago

Increasing the timeout is not necessary, I will just update the deploy script to clear the cache as part of every deployment

Mauro Doglio [:mdoglio]

Assignee

Updated

•

11 years ago

Depends on: 1074199

Mauro Doglio [:mdoglio]

Assignee

Updated

•

11 years ago

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Ed Morley [:emorley]

Reporter

Updated

•

11 years ago

Blocks: 1076750

You need to log in before you can comment on or make changes to this bug.