680013 - 404 from about:crashes

Reporter

Description

•

14 years ago

clicking on a crash inside the about:crashes get me a 404 The requested page could not be found. http://crash-stats.mozilla.com/report/index/bp-eeb6de1d-ceac-42c6-ae89-7b0122110818

Brandon Savage [:brandon]

Comment 1

•

14 years ago

I tested this, but I did not find a 404 when I clicked on the link.

K Lars Lohn [:lars] [:klohn]

Comment 2

•

14 years ago

I've seen this behavior several times in about:crashes in the last 36 hours. Clicking on a current (same day) crash ooid returns a "not found" page. Since I was on a friend's computer, I do not have the ooid to quote here. I would suggest grep'ng the middleware logs for the ooid in comment #0.

Laura Thomson :laura

Comment 3

•

14 years ago

Just guessing here, but I have a feeling this may be related to the work we did to replace bad urls that caused 500s with 404s: I would guess priority jobs might be accidentally being redirected to 404s. Monitor shows it assigned for processing at 01:20:32,88 Middleware Aug 18 01:20:23 Socorro Web Services (pid 6728): 2011-08-18 01:20:23,643 DEBUG - Dummy-1 - GetCrash get {'datatype': u'processed', 'uuid': 'eeb6de1d-ceac-42c6-ae89-7b0122110818'} Aug 18 01:20:23 Socorro Web Services (pid 6728): 2011-08-18 01:20:23,661 DEBUG - Dummy-1 - Dummy-1 - retry_wrapper: unhandled exception, OOID not found: eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 01:20:23 Socorro Web Services (pid 6728): 2011-08-18 01:20:23,664 DEBUG - Dummy-1 - Dummy-1 - retry_wrapper: unhandled exception, OOID not found: eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 03:37:20 Socorro Web Services (pid 6728): 2011-08-18 03:37:20,767 DEBUG - Dummy-1 - GetCrash get {'datatype': u'processed', 'uuid': 'eeb6de1d-ceac-42c6-ae89-7b0122110818'} Aug 18 07:24:49 Socorro Web Services (pid 6728): 2011-08-18 07:24:49,547 DEBUG - Dummy-4 - GetCrash get {'datatype': u'processed', 'uuid': 'eeb6de1d-ceac-42c6-ae89-7b0122110818'} Webapp Wants sudo to view the syslog, which I don't have

Laura Thomson :laura

Comment 4

•

14 years ago

Found it in Kohana: 2011-08-18 01:20:23 -07:00 --- error: [404 Page Not Found] File: system/core/Kohana.php; Line: 816; Message: The page you requested, report/index/bp-eeb6de1d-ceac-42c6-ae89-7b0122110818, could not be found. Disregard my former comment about weblogs. It's just web05 has no logs after 6/24, so I was assuming kohana was logging to syslog. The other webheads all have logs where I expected. (Need to follow up on web05 though)

Laura Thomson :laura

Updated

•

14 years ago

Assignee: nobody → bsavage

Keywords: regression

Laura Thomson :laura

Comment 5

•

14 years ago

And on the collector: Aug 18 01:20:21 Socorro Collector (pid 23323): 2011-08-18 01:20:21,707 INFO - MainThread - eeb6de1d-ceac-42c6-ae89-7b0122110818 received Aug 18 01:20:21 Socorro Collector (pid 23323): 2011-08-18 01:20:21,708 INFO - MainThread - saved - eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 01:20:25 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:25,606 DEBUG - submissionMillQueuingThread - queuing standard job eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 01:20:25 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:25,734 DEBUG - Thread-5 - received: ('eeb6de1d-ceac-42c6-ae89-7b0122110818',) Aug 18 01:20:25 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:25,736 DEBUG - Thread-5 - Thread-5 - getJson eeb6de1d-ceac-42c6-ae89-7b0122110818 Aug 18 01:20:25 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:25,739 DEBUG - Thread-5 - pushing eeb6de1d-ceac-42c6-ae89-7b0122110818 to dest Aug 18 01:20:26 Socorro Storage Mover (pid 25042): 2011-08-18 01:20:26,023 INFO - Thread-5 - saved - eeb6de1d-ceac-42c6-ae89-7b0122110818

Lonnen :lonnen

Comment 6

•

14 years ago

This seems to occur when the user is fast enough to load the link before the crash has been put in the queue. This should be a rare occurrence, and we could solve it with a better error page specifically for OOID not found.

Marcia Knous [:marcia]

Comment 7

•

14 years ago

I have this this before too and had discussion with rhelmer about it on IRC.

Brandon Savage [:brandon]

Comment 8

•

14 years ago

Attached patch Improving the error display and user information — Details — Splinter Review

Since this appears to be a race condition caused by about:crashes being able to submit it's own crash, the solution here is to further educate the user so that they know their crash is likely still in processing. This patch adds a new page to that effect, that is neither a 404 or a 500 error.

Attachment #554213 - Flags: review?(chris.lonnen)

Attachment #554213 - Flags: feedback?(laura)

Brandon Savage [:brandon]

Comment 9

•

14 years ago

Comment on attachment 554213 [details] [diff] [review] Improving the error display and user information Per Laura, the final language will read "If you recently submitted this crash..." instead of "If you recently crashed..."

Lonnen :lonnen

Comment 10

•

14 years ago

Comment on attachment 554213 [details] [diff] [review] Improving the error display and user information Can you fix the spacing irregularities in the new else branch?

Lonnen :lonnen

Comment 11

•

14 years ago

Comment on attachment 554213 [details] [diff] [review] Improving the error display and user information You can tidy up the whitespace before check in if you'd like.

Attachment #554213 - Flags: review?(chris.lonnen) → review+

Brandon Savage [:brandon]

Comment 12

•

14 years ago

Fixed in 3467 for branch, 3648 and 3649 for trunk.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Robert Helmer [:rhelmer]

Comment 13

•

14 years ago

I think this is an improvement, but we still get regular complaints from people clicking about:crashes on an unsubmitted crash to get to Socorro and getting this 404 (even though it says "We couldn't find the OOID you're after. If you recently submitted this crash, it may still be in the queue.") The problem is that if the crash hasn't been submitted, about:crashes has a click handler which submits the job and as soon as collectors return and OOID it follows the link, so it's pretty much guaranteed to not be ready in time. I think this is a use case we should support. I suggest either/or: a) file a dependent bug to have about:crashes append an HTTP param when it submits, so we can show a "processing, please wait" and (30sec?) spinner b) always show an initial "processing" spinner if the incoming OOID looks valid (b) is like what we used to do and moved away from, (a) seems more elegant (but of course we need to wait for client changes). I think this is fine though since current state will be status quo and it'll improve as people upgrade.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

(not currently active) Ted Mielczarek

Comment 14

•

14 years ago

I am seeing bug reporters who are confused by this *all the time* now. They click on a report in about:crashes (sometimes one that was just submitted, sometimes to submit one that hadn't been submitted), hit this 404 page, and assume that their crash report didn't work somehow. We need to get something better here, even if it's just a smarter version of the old "wait and refresh" page.

Severity: normal → major

(not currently active) Ted Mielczarek

Comment 15

•

14 years ago

Specifically: (In reply to Chris Lonnen :lonnen from comment #6) > This seems to occur when the user is fast enough to load the link before the > crash has been put in the queue. This should be a rare occurrence, and we > could solve it with a better error page specifically for OOID not found. I have ample evidence to suggest that this is not true in practice.

Lonnen :lonnen

Comment 16

•

14 years ago

(In reply to Ted Mielczarek [:ted, :luser] from comment #15) > I have ample evidence to suggest that this is not true in practice. The cause or the frequency of occurrence?

(not currently active) Ted Mielczarek

Comment 17

•

14 years ago

The frequency of occurrence. I've seen quite a few bug reporters hit this 404 page and assume that it means their crash report isn't available.

Brandon Savage [:brandon]

Comment 19

•

14 years ago

The solution here is to determine the date of the submission, and if it is today's date, display the waiting page; if it is not, we display the 404 error if we can't find it. Also, we will update the error message to be more explicit.

(not currently active) Ted Mielczarek

Comment 20

•

14 years ago

We just discussed this on IRC. To be more specific, we should look at the date encoded in the last six digits of the OOID. If we can't find the report, but that's today's date, we should wait for the report to show up in the system. I believe that would fix 99% of the issues I've seen, which are of the form "I just submitted a crash, clicked the link from about:crashes, and crash-stats tells me it can't find it".

Brandon Savage [:brandon]

Updated

•

14 years ago

Target Milestone: --- → 2.3.3

[github robot]

Comment 21

•

14 years ago

Commit pushed to https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/d4c08110ab66d6bd8547866c809834e63fa69118 Merge pull request #140 from brandonsavage/bug680013 Bug 680013 - Users received a 404 error when clicking on a crash report t

Lonnen :lonnen

Comment 22

•

14 years ago

r+, see github for additional comments

Status: REOPENED → RESOLVED

Closed: 14 years ago → 14 years ago

Resolution: --- → FIXED

Matt Brandt [:mbrandt]

Comment 23

•

14 years ago

Attached image crash_not_found — Details

QA verified on stage. When a crash is not found the user receives the updated message

Matt Brandt [:mbrandt]

Updated

•

14 years ago

Status: RESOLVED → VERIFIED

Scoobidiver (away)

Updated

•

14 years ago

Depends on: 706058

Laura Thomson :laura

Comment 24

•

14 years ago

This is reverted in 2.3.3.1

Status: VERIFIED → REOPENED

Resolution: FIXED → ---

Target Milestone: 2.3.3 → 2.4

Nobody; OK to take it and work on it

Assignee

Updated

•

14 years ago

Component: Socorro → General

Product: Webtools → Socorro

Laura Thomson :laura

Updated

•

14 years ago

Target Milestone: 2.4 → 2.4.4

Brandon Savage [:brandon]

Comment 25

•

14 years ago

There are a number of things that need to be improved upon before this issue is completely resolved, regarding how we handle crashes that are not available. 1) If a user requests a crash, that is available in a processed state in both Hbase and Postgres, they are automatically displayed the data. 2) If a user requests a crash, that is available in a processed state in Postgres, but an unprocessed state in Hbase, they are asked to wait while the Hbase crash is processed priority. 3) If a user requests a crash that is unprocessed in Hbase, and does not exist in Postgres, the user is asked to wait while the crash is priority processed. 4) If the user requests a crash that was submitted today, and is not in Postgres or Hbase, the user is asked to wait while the crash has time to run through the system. 5) If the user requests a crash that was NOT submitted today, and does not exist in Hbase or Postgres, the user is given a special 404 page that describes why the crash may no longer be available. Rob, how does this system of steps work? I realize that #2 is extraordinarily unlikely to occur but I imagine a situation where it MIGHT happen and I'd like to nail this completely in the rewrite. The upshot of this is that the middleware will be responsible for sending back a JSON response composed of two parts: the first part will be the status code for whichever kind of action the user should take. The second part will be used for data, if data is available (or empty if data is unavailable).

Robert Helmer [:rhelmer]

Comment 26

•

14 years ago

(In reply to Brandon Savage [:brandon] from comment #25) > > Rob, how does this system of steps work? I realize that #2 is > extraordinarily unlikely to occur but I imagine a situation where it MIGHT > happen and I'd like to nail this completely in the rewrite. Looks good to me, I agree that #2 "should not" happen. I don't think there's any harm in attempting to fix it, but we should make sure to write a log message so we can look into how it could have happened.

Brandon Savage [:brandon]

Updated

•

14 years ago

Target Milestone: 2.4.4 → ---

Brandon Savage [:brandon]

Updated

•

13 years ago

Assignee: bsavage → nobody

Brandon Savage [:brandon]

Updated

•

12 years ago

Attachment #554213 - Flags: feedback?(laura)

(not currently active) Ted Mielczarek

Comment 27

•

12 years ago

Apparently I re-filed this and rhelmer fixed it in bug 891470.

Status: REOPENED → RESOLVED

Closed: 14 years ago → 12 years ago

Resolution: --- → FIXED

Improving the error display and user information 14 years ago Brandon Savage [:brandon] 1.93 KB, patch	lonnen : review+	Details \| Diff \| Splinter Review
crash_not_found 14 years ago Matt Brandt [:mbrandt] 78.96 KB, image/png		Details