Closed Bug 680013 Opened 14 years ago Closed 12 years ago

404 from about:crashes

Categories

(Socorro :: General, task)

x86
Windows Vista
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugzilla, Unassigned)

References

()

Details

(Keywords: regression)

Attachments

(2 files)

clicking on a crash inside the about:crashes get me a 404 The requested page could not be found. http://crash-stats.mozilla.com/report/index/bp-eeb6de1d-ceac-42c6-ae89-7b0122110818
I tested this, but I did not find a 404 when I clicked on the link.
I've seen this behavior several times in about:crashes in the last 36 hours. Clicking on a current (same day) crash ooid returns a "not found" page. Since I was on a friend's computer, I do not have the ooid to quote here. I would suggest grep'ng the middleware logs for the ooid in comment #0.
Just guessing here, but I have a feeling this may be related to the work we did to replace bad urls that caused 500s with 404s: I would guess priority jobs might be accidentally being redirected to 404s. Monitor shows it assigned for processing at 01:20:32,88 Middleware Aug 18 01:20:23 Socorro Web Services (pid 6728): 2011-08-18 01:20:23,643 DEBUG - Dummy-1 - GetCrash get {'datatype': u'processed', 'uuid': 'eeb6de1d-ceac-42c6-ae89-7b0122110818'} Aug 18 01:20:23 Socorro Web Services (pid 6728): 2011-08-18 01:20:23,661 DEBUG - Dummy-1 - Dummy-1 - retry_wrapper: unhandled exception, OOID not found: eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 01:20:23 Socorro Web Services (pid 6728): 2011-08-18 01:20:23,664 DEBUG - Dummy-1 - Dummy-1 - retry_wrapper: unhandled exception, OOID not found: eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 03:37:20 Socorro Web Services (pid 6728): 2011-08-18 03:37:20,767 DEBUG - Dummy-1 - GetCrash get {'datatype': u'processed', 'uuid': 'eeb6de1d-ceac-42c6-ae89-7b0122110818'} Aug 18 07:24:49 Socorro Web Services (pid 6728): 2011-08-18 07:24:49,547 DEBUG - Dummy-4 - GetCrash get {'datatype': u'processed', 'uuid': 'eeb6de1d-ceac-42c6-ae89-7b0122110818'} Webapp Wants sudo to view the syslog, which I don't have
Found it in Kohana: 2011-08-18 01:20:23 -07:00 --- error: [404 Page Not Found] File: system/core/Kohana.php; Line: 816; Message: The page you requested, report/index/bp-eeb6de1d-ceac-42c6-ae89-7b0122110818, could not be found. Disregard my former comment about weblogs. It's just web05 has no logs after 6/24, so I was assuming kohana was logging to syslog. The other webheads all have logs where I expected. (Need to follow up on web05 though)
Assignee: nobody → bsavage
Keywords: regression
And on the collector: Aug 18 01:20:21 Socorro Collector (pid 23323): 2011-08-18 01:20:21,707 INFO - MainThread - eeb6de1d-ceac-42c6-ae89-7b0122110818 received Aug 18 01:20:21 Socorro Collector (pid 23323): 2011-08-18 01:20:21,708 INFO - MainThread - saved - eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 01:20:25 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:25,606 DEBUG - submissionMillQueuingThread - queuing standard job eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 01:20:25 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:25,734 DEBUG - Thread-5 - received: ('eeb6de1d-ceac-42c6-ae89-7b0122110818',) Aug 18 01:20:25 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:25,736 DEBUG - Thread-5 - Thread-5 - getJson eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 01:20:25 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:25,739 DEBUG - Thread-5 - pushing eeb6de1d-ceac-42c6-ae89-7b0122110818 to dest Aug 18 01:20:26 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:26,023 INFO - Thread-5 - saved - eeb6de1d-ceac-42c6-ae89-7b0122110818
This seems to occur when the user is fast enough to load the link before the crash has been put in the queue. This should be a rare occurrence, and we could solve it with a better error page specifically for OOID not found.
I have this this before too and had discussion with rhelmer about it on IRC.
Since this appears to be a race condition caused by about:crashes being able to submit it's own crash, the solution here is to further educate the user so that they know their crash is likely still in processing. This patch adds a new page to that effect, that is neither a 404 or a 500 error.
Attachment #554213 - Flags: review?(chris.lonnen)
Attachment #554213 - Flags: feedback?(laura)
Comment on attachment 554213 [details] [diff] [review] Improving the error display and user information Per Laura, the final language will read "If you recently submitted this crash..." instead of "If you recently crashed..."
Comment on attachment 554213 [details] [diff] [review] Improving the error display and user information Can you fix the spacing irregularities in the new else branch?
Comment on attachment 554213 [details] [diff] [review] Improving the error display and user information You can tidy up the whitespace before check in if you'd like.
Attachment #554213 - Flags: review?(chris.lonnen) → review+
Fixed in 3467 for branch, 3648 and 3649 for trunk.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I think this is an improvement, but we still get regular complaints from people clicking about:crashes on an unsubmitted crash to get to Socorro and getting this 404 (even though it says "We couldn't find the OOID you're after. If you recently submitted this crash, it may still be in the queue.") The problem is that if the crash hasn't been submitted, about:crashes has a click handler which submits the job and as soon as collectors return and OOID it follows the link, so it's pretty much guaranteed to not be ready in time. I think this is a use case we should support. I suggest either/or: a) file a dependent bug to have about:crashes append an HTTP param when it submits, so we can show a "processing, please wait" and (30sec?) spinner b) always show an initial "processing" spinner if the incoming OOID looks valid (b) is like what we used to do and moved away from, (a) seems more elegant (but of course we need to wait for client changes). I think this is fine though since current state will be status quo and it'll improve as people upgrade.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I am seeing bug reporters who are confused by this *all the time* now. They click on a report in about:crashes (sometimes one that was just submitted, sometimes to submit one that hadn't been submitted), hit this 404 page, and assume that their crash report didn't work somehow. We need to get something better here, even if it's just a smarter version of the old "wait and refresh" page.
Severity: normal → major
Specifically: (In reply to Chris Lonnen :lonnen from comment #6) > This seems to occur when the user is fast enough to load the link before the > crash has been put in the queue. This should be a rare occurrence, and we > could solve it with a better error page specifically for OOID not found. I have ample evidence to suggest that this is not true in practice.
(In reply to Ted Mielczarek [:ted, :luser] from comment #15) > I have ample evidence to suggest that this is not true in practice. The cause or the frequency of occurrence?
The frequency of occurrence. I've seen quite a few bug reporters hit this 404 page and assume that it means their crash report isn't available.
The solution here is to determine the date of the submission, and if it is today's date, display the waiting page; if it is not, we display the 404 error if we can't find it. Also, we will update the error message to be more explicit.
We just discussed this on IRC. To be more specific, we should look at the date encoded in the last six digits of the OOID. If we can't find the report, but that's today's date, we should wait for the report to show up in the system. I believe that would fix 99% of the issues I've seen, which are of the form "I just submitted a crash, clicked the link from about:crashes, and crash-stats tells me it can't find it".
Target Milestone: --- → 2.3.3
Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/d4c08110ab66d6bd8547866c809834e63fa69118 Merge pull request #140 from brandonsavage/bug680013 Bug 680013 - Users received a 404 error when clicking on a crash report t
r+, see github for additional comments
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Attached image crash_not_found
QA verified on stage. When a crash is not found the user receives the updated message
Status: RESOLVED → VERIFIED
Depends on: 706058
This is reverted in 2.3.3.1
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Target Milestone: 2.3.3 → 2.4
Component: Socorro → General
Product: Webtools → Socorro
Target Milestone: 2.4 → 2.4.4
There are a number of things that need to be improved upon before this issue is completely resolved, regarding how we handle crashes that are not available. 1) If a user requests a crash, that is available in a processed state in both Hbase and Postgres, they are automatically displayed the data. 2) If a user requests a crash, that is available in a processed state in Postgres, but an unprocessed state in Hbase, they are asked to wait while the Hbase crash is processed priority. 3) If a user requests a crash that is unprocessed in Hbase, and does not exist in Postgres, the user is asked to wait while the crash is priority processed. 4) If the user requests a crash that was submitted today, and is not in Postgres or Hbase, the user is asked to wait while the crash has time to run through the system. 5) If the user requests a crash that was NOT submitted today, and does not exist in Hbase or Postgres, the user is given a special 404 page that describes why the crash may no longer be available. Rob, how does this system of steps work? I realize that #2 is extraordinarily unlikely to occur but I imagine a situation where it MIGHT happen and I'd like to nail this completely in the rewrite. The upshot of this is that the middleware will be responsible for sending back a JSON response composed of two parts: the first part will be the status code for whichever kind of action the user should take. The second part will be used for data, if data is available (or empty if data is unavailable).
(In reply to Brandon Savage [:brandon] from comment #25) > > Rob, how does this system of steps work? I realize that #2 is > extraordinarily unlikely to occur but I imagine a situation where it MIGHT > happen and I'd like to nail this completely in the rewrite. Looks good to me, I agree that #2 "should not" happen. I don't think there's any harm in attempting to fix it, but we should make sure to write a log message so we can look into how it could have happened.
Target Milestone: 2.4.4 → ---
Assignee: bsavage → nobody
Attachment #554213 - Flags: feedback?(laura)
Apparently I re-filed this and rhelmer fixed it in bug 891470.
Status: REOPENED → RESOLVED
Closed: 14 years ago12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: