Closed Bug 611018 Opened 11 years ago Closed 11 years ago

Topcrashers calls failing when missing trending data

Categories

(Socorro :: General, task)

task
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: ryansnyder, Assigned: rhelmer)

References

Details

Attachments

(2 files, 4 obsolete files)

I've noticed that top crasher api calls tend to fail when they are newish versions.  When top crasher api calls take place, 2 queries are called - the first grabs the top crasher data for the current time period (today through 14 days ago) and the second call grabs the top crasher data for the previous time period for trending data ( today minus 15 days ago through 28 days ago).  It seems that if the second call does not have any data, the api call fails to return any data and throws a 500 error.
To be more specific...

On newish versions 3.6.12 and 4.0b7, the api call returns 500 errors for 7 day and 14 day durations, while other, older versions return data successfully.  If you make a call with a 3 day duration, top crasher calls for versions 3.6.12 and 4.0b7 return data successfully.
I can reproduce this on prod and staging. Going to dig into this a bit more in dev.
Assignee: nobody → robert
Status: NEW → ASSIGNED
The problem seems to be that totalNumberOfCrashesForPeriod() returns None rather than 0 if there are no crashes for the specified period:

http://code.google.com/p/socorro/source/browse/tags/releases/1.7.5_r2680_20101103/socorro/services/topCrashBySignatureTrends.py#34

This is called from getListOfTopCrashesBySignature() to calculate the percentage of crashes displayed versus the total for the period:

http://code.google.com/p/socorro/source/browse/tags/releases/1.7.5_r2680_20101103/socorro/services/topCrashBySignatureTrends.py#62

I will attach a patch for this momentarily.
Also added asserts to make sure the types are what we expect, would have made this easier to diagnose from traceback.

Also pushed to github, if you'd like to review there:
https://github.com/rhelmer/socorro/commit/de024e87f84b512d08ca654e38c80982c40ae87f
Attachment #489575 - Flags: review?(lars)
Attachment #489575 - Flags: review?(lars) → review+
Sending        socorro/services/topCrashBySignatureTrends.py
Transmitting file data .
Committed revision 2695.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Laura mentioned in today's call that the fix might not yet be on staging; indeed: 

13:52:17.717: Network: GET http://crash-stats.stage.mozilla.com/topcrasher/byversion/Fennec/2.0a1 [HTTP/1.1 500 Internal Server Error 147ms]

Is that a separate bug, or part of this?  I can reproduce on http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/3.6.12 and http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/4.07b, as mentioned in comment 1.  Thanks!
Caught this on staging; I should have tested that this worked in the failure case *and* the normal case ;) Sorry for the noise.
Attachment #489667 - Flags: review?(lars)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 489667 [details] [diff] [review]
assert proper type for databaseParameters["numberOfCrashes"]

the long vs int thing surprises me.

Actually a message in the asserts would probably be helpful.  On that one, if the assert fails with None, the error will just say "AssertionError, None" which isn't very helpful.  Granted you'll get the line number and once you look, you'll understand what's happening, but a more useful error message would be reassuring.

do what you think is best.
Attachment #489667 - Flags: review?(lars) → review+
(In reply to comment #8)
> Comment on attachment 489667 [details] [diff] [review]
> assert proper type for databaseParameters["numberOfCrashes"]
> 
> the long vs int thing surprises me.
> 
> Actually a message in the asserts would probably be helpful.  On that one, if
> the assert fails with None, the error will just say "AssertionError, None"
> which isn't very helpful.  Granted you'll get the line number and once you
> look, you'll understand what's happening, but a more useful error message would
> be reassuring.
> 
> do what you think is best.

OK I agree, I will attach a patch as-landed.
Attached patch fix as landed (obsolete) — Splinter Review
Sending        socorro/services/topCrashBySignatureTrends.py
Transmitting file data .    
Committed revision 2700.
(In reply to comment #6)
> Laura mentioned in today's call that the fix might not yet be on staging;
> indeed: 
> 
> 13:52:17.717: Network: GET
> http://crash-stats.stage.mozilla.com/topcrasher/byversion/Fennec/2.0a1
> [HTTP/1.1 500 Internal Server Error 147ms]
> 
> Is that a separate bug, or part of this?  I can reproduce on
> http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/3.6.12 and

The two above look like they work on stage now.

> http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/4.07b, as
> mentioned in comment 1.  Thanks!

This one does not work, because it is missing from the branch data screen. We should fix this too (really just returning a more informative error message), but this should be a different bug I think.

Stephen would you mind filing one, if you agree? Try adding that branch to test.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Sorry, Rob; that rogue URL was just a straight-up typo, and should be http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/4.0b7, rather than http://crash-stats.stage.mozilla.com/topcrasher/byversion/Firefox/4.07b, so no new bug needed.

Verified FIXED; thanks!
Status: RESOLVED → VERIFIED
(In reply to comment #8)
> Comment on attachment 489667 [details] [diff] [review]
> assert proper type for databaseParameters["numberOfCrashes"]
> 
> the long vs int thing surprises me.

Me too, I looked into it a bit. http://www.postgresql.org/docs/8.2/static/functions-aggregate.html says sum() may return:

"bigint for smallint or int arguments, numeric for bigint arguments, double precision for floating-point arguments, otherwise the same as the argument data type "

In this case the arguments are int, so I think psycopg2 is mapping bigint -> long (this is harder for me to find a definitive link for, but it's the behavior I've observed).

Generally I don't really like doing asserts for checking types (why aren't we just using a statically typed language then? :)), but since this is going into a formatted string where the types are going to be checked I think it makes sense; might be nice to be able to pass e.g. a dict of expected types to socorro.database.database if we end up doing this elsewhere.
Attached patch fix hardcoded long (obsolete) — Splinter Review
Typo in the error message, r2701.
see bug 611263, seems not fixed right now
(In reply to comment #16)
> see bug 611263, seems not fixed right now

Right, it's not fixed on prod -- we still have to push this out.
rhelmer, r2700 + r2702 are the sum of the changes for this bug?
(In reply to comment #18)
> rhelmer, r2700 + r2702 are the sum of the changes for this bug?

r2695 is the basic fix, r2697-r2701 is getting the asserts right, removing redundancy in the checks, and adding more informative error messages.

I'll attach a unified patch.
Attachment #489575 - Attachment is obsolete: true
Attachment #489667 - Attachment is obsolete: true
Attachment #489710 - Attachment is obsolete: true
Attachment #489723 - Attachment is obsolete: true
What's the milestone for this?  1.7.5.2?
Yeah.  I put it into 1.7.5.
Target Milestone: --- → 1.7.5
Component: Socorro → General
Product: Webtools → Socorro
Flags: in-testsuite?
You need to log in before you can comment on or make changes to this bug.