Closed Bug 417118 Opened 17 years ago Closed 17 years ago

Handle infinite loops and/or insanely large dumps gracefully

Categories

(Socorro :: General, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: morgamic, Assigned: lars)

References

Details

Goal here is to slim down report viewing in general, but aiming mostly at damage control for dumps of excessive size (usually in the presence of a EXCEPTION_STACK_OVERFLOW). When raw dumps exceed ~8M requests start piling up and the reporter has diminishing returns both at the app layer (memory bottleneck) and the db layer (same). The process of querying these large reports should be reserved only when people need them... so we need to do a few things: * add size column to breakpad.dumps * query dumps.size before dumps.data, only pulling data when dumps.size < config.max_dump_size * avoid parsing non-crashing frames for the frames tab when dump size gets over config.max_dump_size * offer raw dump only as file download instead of in textarea for copy/paste Some questions as far as report-view usage for people who analyze reports: * do we need to view _all_ frames or are the top 10 adequate? * would it be alright to initially load top 10 then switch to advanced view for other frames? * in the case of a 20M raw dump, is downloading it for manual inspection acceptable, or should we pursue a better web UI for this?
Already had bug 411401 on file for parts of this, but yeah, recursion crashes suck. They suck to process and they suck to display.
(In reply to comment #0) > Some questions as far as report-view usage for people who analyze reports: > * do we need to view _all_ frames or are the top 10 adequate? I prefer "all", but see what I mean by that below. > * would it be alright to initially load top 10 then switch to advanced view for > other frames? This would be fine. > * in the case of a 20M raw dump, is downloading it for manual inspection > acceptable, or should we pursue a better web UI for this? Downloading it would be fine. I'd even serve it up as a type of document that Firefox defaults to downloading instead of .txt which would be displayed in-browser. When I "all" above... Ideally since we know there's recursion, we'd simply display the repeated frames a couple of times then "..." and the end of the stack. If that's too hard to setup, simply displaying the repeated frames two or three times with a "..." at the end and a link to the full log would be more than acceptable. Often times the first 10 frames don't show the recursion, so I'd want to see more than that. If doing all of the last paragraph is hard (or not really worth it time-wise), let's just show the first 50 frames instead.
Offhand I was going to say 20 from the top and 20 from the bottom would be appreciated. But yeah, 50 would be better :). content-disposition:attachment ?
Priority: -- → P1
Lars, this is a decent challenge for the processor. We can't stop minidump stackwalk from traversing stack overflows (and we shouldn't because the end of the report is valuable) but when the processor parses the output would it be possible to recognize loops where N > 20 and skip them for anything > 20 until the end of the trace is reached?
Assignee: nobody → lars
There have been several informal discussions about how to resolve this. All the discussions have involved omitting lines from the "raw dump" in the database. There has been no agreement, however, on how to indicate within the "now-cooked" dump that an omission has been made. Is it acceptable to indicate within the file that an omission exists with a line consisting of just an ellipsis (...)? Is there some other indication that could be used? At one point, morgamic and I discussed trying to detect the repeating pattern, deleting all but N repetitions and place one instance of the repetition into a new column of the rawdumps table. To me, this seems like a lot of computation for little benefit. Another suggestion on IRC was to just have a threshold. If the crashing thread has more than T entries, omit entries after T - with the exception of never omitting the last X entries for the crashing thread. This idea is, computationally, much simpler than searching for patterns of unknown length of the previous suggestion. I need some direction as to which path I should follow...
I vote for the threshold approach. As long as we get enough frames on both ends we should be good. I think the ellipsis should be plenty obvious (maybe say how many frames have been omitted as well?).
Processor now edits some of the middle out of certain insanely large raw dumps before committing them to the database. The edits are controlled by two new command line options:crashingThreadFrameThreshold (default 100) and crashingThreadTailFrameThreshold (default 10). When reading the raw dump output from breakpad_stackwalk, processor counts the frames that it sees of the crashing thread. If that count exceeds the value of "crashingThreadFrameThreshold" then it stops passing those frames on. When the processor gets to the end of the crashing thread's frames, it goes back in the frame list by “crashingThreadTailFrameThreshold” frames and passes those last frames on to the database. The effect is to remove the middle, leaving the two ends intact. Since the frames are sequentially numbered, the gap in the sequence serves as the indicator as to where the edited section was removed. A new column has been added to the reports table: “truncated”. This boolean just indicates if the aforementioned editing has occurred. Example: crashingThreadFrameThreshold = 7 crashingThreadTailFrameThreshold = 5 original crashing thread frames: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 truncated crashing thread frames: 0, 1, 2, 3, 4, 5, 12, 13, 14, 15, 16
of course, there was a flaw in manually typing my example. substitute this line: truncated crashing thread frames: 0, 1, 2, 3, 4, 5, 6, 12, 13, 14, 15, 16
It would be nice to stick in a synthetic frame that said something like "Some frames omitted", but that could be a follow-up, this should be much better than what we have!
(In reply to comment #8) > crashingThreadTailFrameThreshold (default 10) Would anyone object to having 20 be the default here?
This is done, pushed on Wednesday.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.