422581 - Instantly process a queued report if requested

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Description

•

17 years ago

Reports seem to be taking a really long time to process (at least 3 hours, if not more) -- e.g. ffffce33-f099-11dc-bb23-001a4bd46e84 submitted 6:08 PM is still not processed as of 9:38 PM. We really need to beef up whatever needs beefing up to get to a point where we can process delays within minutes given the beta load, because the load for go-live will be significantly greater. This is currently an issue with development, because if a crash happens, it's too easy to forget to ever look back at it 6+ hours later to see what actually happened. Can we just run parallel instances of the component that parses the report and generates symbolic backtraces and stuff? I'd even consider this blocking 1.9, because we're going to need this data to get an idea of what needs to be looked into for dot releases for 1.9.

Flags: blocking1.9?

(not currently active) Ted Mielczarek

Comment 1

•

17 years ago

The processor can be run multiple times in parallel. AFAIK IT is working on bringing up more hardware capacity, and morgamic has Lars working on improving processor throughput.

Michael Morgan [:morgamic]

Assignee

Updated

•

17 years ago

Assignee: nobody → morgamic

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Comment 2

•

17 years ago

morgamic and I came up with a simple idea that would solve at least the developer problem: a simple page, probably behind LDAP, where someone can go and stick in a crash UUID to have that crash processed instantly in a separate queue. (Writing this here so we don't forget about it)

(not currently active) Ted Mielczarek

Comment 3

•

17 years ago

That's a pretty good idea. If you're on Windows, you might also consider using the symbol server to get stack traces from nightlies on your machine.

Damon Sicore (:damons)

Comment 4

•

17 years ago

Yeah, we'll need to fix this for release.

Flags: blocking1.9? → blocking1.9+

Priority: -- → P2

Michael Morgan [:morgamic]

Assignee

Comment 5

•

17 years ago

Update on this: lars checked in a new monitor and processor this afternoon that is the fix for the faulty mutual exclusion method and also uses a proper queue for pending reports that can be used to flag for priority. We're staging this atm and should be able to push this early next week. The UUID flagging mentioned in comment #2 would use the queue table to push things to the front for immediate processing. Also of note, the previous monitor randomly selected new items for processing, which caused erratic behavior as well (new reports instantly go through, old ones still waiting). This should also be fixed with the new patch.

Michael Morgan [:morgamic]

Assignee

Comment 6

•

17 years ago

Been looking at Lars' queue and I think we should make current (crappy) 404 page scenario do this: * flag that UUID as having priority in the queue (if exists) * display js refresh on the page, maybe with a monkey/shovel saying "I'm working" * load page again after 10 sec, which should be how long it takes to wait for a flagged report This would eliminate the need to: * log in * enter any uuids In the case where the UUID doesn't exist anywhere (queue or db) -- I think we should make 404 friendlier as noted in bug 414258, but not sure what text would go there to make the "we have no idea what that UUID is" case more manageable. Also, changing title to be more appropriate. And adding refactor as a dependency (bug 420809) since deploying that adds our queue.

Depends on: 420809

Priority: P2 → P1

Summary: Socorro takes too long to process reports → Instantly process a queued report if requested

Target Milestone: --- → 0.6

(not currently active) Ted Mielczarek

Comment 8

•

17 years ago

Could probably just dupe bug 411347 over here as well if we're going to bump priority on queued reports when you hit the URL.

Michael Morgan [:morgamic]

Assignee

Comment 9

•

17 years ago

Lars' queue patch was pushed, we have bug 426940 to resolve to eliminate lag times between the point when the collector receives a file and when it's queued. Worst-case wait would be ~30 seconds. Once it's in the db, it's a really simple patch to: * query jobs.uuid * if exists, set priority=1 ** load templates/working.html that has a meta-refresh to reports/index/[uuid] in 15 sec ** add sexy apng (http://people.mozilla.com/~dolske/apng/loading_16.png) * else do 404 or 410 I think this is better than the login/enter UUIDs scenario...

Status: NEW → ASSIGNED

Depends on: 426940

Michael Morgan [:morgamic]

Assignee

Updated

•

17 years ago

Target Milestone: 0.6 → 0.5

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Reporter

Comment 11

•

17 years ago

(In reply to comment #9) > I think this is better than the login/enter UUIDs scenario... That is a great idea.

Michael Morgan [:morgamic]

Assignee

Updated

•

17 years ago

No longer depends on: 426940

Michael Morgan [:morgamic]

Assignee

Comment 12

•

17 years ago

Attached patch v1, first run — Details — Splinter Review

Ted - this is what I have so far if you want to hack on it today. Needs review and if you want to hack on it go for it. It does some simple/cool stuff: * flags jobs entry for priority processing * shows queue status on pending page * redirects to report page after 10s * redirects back to pending page for another 10s if report isn't ready yet There's a small delay for an active Thread to pick up the queued report -- so delta between the priority update is probably ~5-8 seconds and process time is ~5-10 seconds. So we might want to up the refresh time to 15 seconds, but 10s doesn't hurt that much. Concerned a little about the ability to send no-cache headers. None of this should be cached so one of the things we need to do is force no-cache headers so it doesn't get stuck in the proxy cache. I fear pylons sucks horribly at this, but unconfirmed.

Attachment #313566 - Flags: review?(ted.mielczarek)

(not currently active) Ted Mielczarek

Comment 13

•

17 years ago

Comment on attachment 313566 [details] [diff] [review] v1, first run This looks good. I would prefer if we could just 404 on the report/index page instead of redirecting to report/pending and then 404ing, but that'd mean an extra db lookup for each queued job, I guess, which is silly. (Unless you have a way to pass this data along with the redirect, but we'd have to stick it in a cookie or something, wouldn't we?) I would bump the timeout to 15 or even 20 seconds if you expect processing to take more than 10 seconds. No point in redirecting them to the same page twice if we can just have the refresh be a bit slower.

Attachment #313566 - Flags: review?(ted.mielczarek) → review+

Michael Morgan [:morgamic]

Assignee

Comment 14

•

17 years ago

Alright, checked in on trunk with bumped refresh time (rev 351). Working with Lars and Aravind to stage this so we can get this sucker pushed.

Status: ASSIGNED → RESOLVED

Closed: 17 years ago

Keywords: push-needed

Resolution: --- → FIXED

Michael Morgan [:morgamic]

Assignee

Updated

•

16 years ago

Keywords: push-needed

Nobody; OK to take it and work on it

Updated

•

13 years ago

Component: Socorro → General

Product: Webtools → Socorro

Bugzilla

Instantly process a queued report if requested

Categories

(Socorro :: General, task, P1)

Tracking

(Not tracked)

People

(Reporter: vlad, Assigned: morgamic)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 8

Comment 9

Updated

Comment 11

Updated

Comment 12

Comment 13

Comment 14

Updated

Updated

Attachment

General

Description

File Name

Content Type