Closed
Bug 608309
Opened 14 years ago
Closed 13 years ago
middleware should gracefully handle missing data
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
2.0
People
(Reporter: stephend, Assigned: brandon)
References
()
Details
Attachments
(1 file)
477 bytes,
patch
|
lars
:
review+
|
Details | Diff | Splinter Review |
http://crash-stats.stage.mozilla.com/topcrasher/byversion/Thunderbird/3.1b1pre is currently giving me: "Unable to load data System error, please retry in a few minutes" Ditto for http://crash-stats.stage.mozilla.com/topcrasher/byversion/Fennec/2.0.2, though that might actually be a code issue.
Reporter | ||
Comment 1•14 years ago
|
||
And http://crash-stats.stage.mozilla.com/topcrasher/byversion/Fennec/2.0a1, which I don't think is a code issue, since that version is available from the Versions pulldown
Reporter | ||
Comment 2•14 years ago
|
||
http://crash-stats.stage.mozilla.com/topcrasher/byversion/Fennec/4.0b1 too. (Let me know, Ryan, if I should spin bugs off; thanks!)
Updated•14 years ago
|
Assignee: server-ops → aravind
Comment 3•14 years ago
|
||
I think this might be an application/code problem. Here is the error I see in the middleware logs. Oct 29 14:02:12 dhcp-10-2-11-235 Socorro Web Services (pid 22548): 2010-10-29 14:02:12,556 DEBUG - MainThread - TopCrashBySignatureTrends get {'crashType': 'browser', 'product': 'Fennec', 'endDate': datetime.datetime(2010, 10, 29, 14, 0), 'listSize': 300, 'productdims_id': 120, 'version': '4.0b1', 'duration': datetime.timedelta(14)} Oct 29 14:02:12 dhcp-10-2-11-235 Socorro Web Services (pid 22548): 2010-10-29 14:02:12,564 DEBUG - MainThread - entered twoPeriodTopCrasherComparison Oct 29 14:02:12 dhcp-10-2-11-235 Socorro Web Services (pid 22548): 2010-10-29 14:02:12,569 DEBUG - MainThread - endDate 2010-10-29 14:00:00 Oct 29 14:02:12 dhcp-10-2-11-235 Socorro Web Services (pid 22548): 2010-10-29 14:02:12,569 DEBUG - MainThread - rangeOfQueriesGenerator for 2010-10-01 14:00:00 to 2010-10-15 14:00:00 Oct 29 14:02:12 dhcp-10-2-11-235 Socorro Web Services (pid 22548): 2010-10-29 14:02:12,572 ERROR - MainThread - MainThread Caught Error: exceptions.TypeError Oct 29 14:02:12 dhcp-10-2-11-235 Socorro Web Services (pid 22548): 2010-10-29 14:02:12,572 ERROR - MainThread - int argument required If I am looking at the wrong messages here, please toss it back to server ops and we will debug more.
Assignee: aravind → nobody
Component: Server Operations → Socorro
Product: mozilla.org → Webtools
QA Contact: mrz → socorro
Comment 4•14 years ago
|
||
and here is the message corresponding to the TB requests. Oct 29 13:41:34 dhcp-10-2-11-235 Socorro Web Services (pid 9572): 2010-10-29 13:41:34,031 DEBUG - MainThread - TopCrashBySignatureTrends get {'crashType': 'browser', 'product': 'Thunderbird', 'endDate': datetime. datetime(2010, 10, 29, 14, 0), 'listSize': 300, 'productdims_id': 69, 'version': '3.1b1pre', 'duration': datetime.timedelta(14)} Oct 29 13:41:34 dhcp-10-2-11-235 Socorro Web Services (pid 9572): 2010-10-29 13:41:34,037 DEBUG - MainThread - entered twoPeriodTopCrasherComparison Oct 29 13:41:34 dhcp-10-2-11-235 Socorro Web Services (pid 9572): 2010-10-29 13:41:34,039 DEBUG - MainThread - endDate 2010-02-23 04:00:00 Oct 29 13:41:34 dhcp-10-2-11-235 Socorro Web Services (pid 9572): 2010-10-29 13:41:34,039 DEBUG - MainThread - rangeOfQueriesGenerator for 2010-01-26 04:00:00 to 2010-02-09 04:00:00 Oct 29 13:41:34 dhcp-10-2-11-235 Socorro Web Services (pid 9572): 2010-10-29 13:41:34,041 ERROR - MainThread - MainThread Caught Error: exceptions.TypeError Oct 29 13:41:34 dhcp-10-2-11-235 Socorro Web Services (pid 9572): 2010-10-29 13:41:34,041 ERROR - MainThread - int argument required Oct 29 13:41:34 dhcp-10-2-11-235 Socorro Web Services (pid 9572): 2010-10-29 13:41:34,041 ERROR - MainThread - trace back follows: File "/data/breakpad/processor/socorro/webapi/webapiService.py", line 33, in GE T result = self.get(*args) File "/data/breakpad/processor/socorro/services/topCrashBySignatureTrends.py", line 228, in get return twoPeriodTopCrasherComparison(cursor, parameters) File "/data/breakpad /processor/socorro/services/topCrashBySignatureTrends.py", line 193, in twoPeriodTopCrasherComparison listOfTopCrashers = listOfListsWithChangeInRank(rangeOfQueriesGenerator(databaseCursor, context, listOfTop CrashersFunction))[0] File "/data/breakpad/processor/socorro/services/topCrashBySignatureTrends.py", line 135, in listOfListsWithChangeInRank for i, aListOfTopCrashers in enumerate(listOfQueryResultsIterabl e): File "/data/breakpad/processor/socorro/services/topCrashBySignatureTrends.py", line 109, in rangeOfQueriesGenerator yield queryExecutionFunction(aCursor, parameters) File "/data/breakpad/pro Oct 29 13:52:00 dhcp-10-2-11-235 Socorro Web Services (pid 9571): 2010-10-29 13:52:00,409 DEBUG - MainThread - MainThread - creating crashStorePool
Reporter | ||
Comment 5•14 years ago
|
||
http://crash-stats.stage.mozilla.com/topcrasher/byversion/Camino/2.0.2, also? Have we started putting checks on the cron jobs, to ensure they're returning the right data?
Target Milestone: --- → 1.7.5
Updated•14 years ago
|
Target Milestone: 1.7.5 → 1.7.6
Comment 6•14 years ago
|
||
Changing the summary, since (per IRC discussion) we think this is a case where middleware is not error checking appropriately. In the face of missing data, middleware should: 1) log appropriately 2) return something the UI can interpret as "no data" rather than "error"
Summary: Missing data for Thunderbird 3.1b1pre on staging → middleware should gracefully handle missing data
Updated•13 years ago
|
Target Milestone: 1.7.6 → 1.7.7
Comment 7•13 years ago
|
||
(In reply to comment #5) > http://crash-stats.stage.mozilla.com/topcrasher/byversion/Camino/2.0.2, also? > Have we started putting checks on the cron jobs, to ensure they're returning > the right data? This is covered by bug 616480, and I think jabba has actually taken care of most of it.
Updated•13 years ago
|
Assignee: nobody → laura
(In reply to comment #0) > "Unable to load data > > System error, please retry in a few minutes" FWIW, I've seen this message several times this week in production (for https://crash-stats.mozilla.com/topcrasher/byversion/Camino/2.1a1, which then has data a bit later on a reload). Dunno if this is a generic error message or related to this bug?
Comment 9•13 years ago
|
||
I'm trying to repro this on devdb and not having much luck. It's getting new data shortly so I'll try again after that.
Comment 10•13 years ago
|
||
Stephen: got any other test cases? None of these reproduce the specific error for me, they all just 404. What's the desired behavior?
Reporter | ||
Comment 11•13 years ago
|
||
(In reply to comment #10) > Stephen: got any other test cases? None of these reproduce the specific error > for me, they all just 404. > > What's the desired behavior? This was reproducible before by using NetSparker Community Edition and/or Acunetix, two free scanning/fuzzing tools (Windows-only, I'm afraid, though any good fuzzer/crawler should trigger this). [1] http://www.mavitunasecurity.com/communityedition/ [2] http://www.acunetix.com/cross-site-scripting/scanner.htm If you're up for it, I can do a trial run right now, and see if it's still reproducible. When it was, though, we flooded the error logs, and iirc, took down the staging server (or at least made it unavailable for a long while).
Comment 12•13 years ago
|
||
(In reply to comment #11) > > This was reproducible before by using NetSparker Community Edition and/or > Acunetix, two free scanning/fuzzing tools (Windows-only, I'm afraid, though any > good fuzzer/crawler should trigger this). > > [1] http://www.mavitunasecurity.com/communityedition/ > [2] http://www.acunetix.com/cross-site-scripting/scanner.htm > > If you're up for it, I can do a trial run right now, and see if it's still > reproducible. When it was, though, we flooded the error logs, and iirc, took > down the staging server (or at least made it unavailable for a long while). Can you just run it for a short time period? That'd be dandy.
Reporter | ||
Comment 13•13 years ago
|
||
(In reply to comment #12) > Can you just run it for a short time period? That'd be dandy. Did just that, tonight, with jabba and rhelmer around; at 4pm, I fired off Netsparker (it begins in "crawl" mode), and around 4:21pm, in "attack" mode, jabba saw around 2.4GB of core dumps within a few minutes, at which point I stopped.
Comment 14•13 years ago
|
||
I'm going to bump this until we have a reproducible test case again.
Target Milestone: 1.7.7 → ---
Reporter | ||
Comment 15•13 years ago
|
||
Talked in person w/Laura, and we decided this can wait for new staging, where hopefully the "attack" mode won't be as much of a deal, and where we could also fix logging problems as we see them coming in.
Reporter | ||
Comment 16•13 years ago
|
||
Can this be considered for 1.7.8, or at least 1.7.9 (or whichever milestone is next)? We've uncovered some pretty good bugs via scanners/fuzzers, and it sounds like improving logging in the middleware is a win over all, for obvious reasons. If we run this on https://crash-stats.allizom.org, which should be a mirror of production, both hardware and config/code-wise, then the concerns about DOS'ing it should be ameliorated. Lars, iirc, Brandon had questions about the approach to addressing this problem.
Reporter | ||
Comment 17•13 years ago
|
||
Any chance we can 2.0 this? It's pretty important that WebQA be able to negative-test the app.
Updated•13 years ago
|
Target Milestone: --- → 2.0
Updated•13 years ago
|
Assignee: laura → bsavage
Assignee | ||
Comment 18•13 years ago
|
||
This patch raises a BadRequest exception for the user in the event that they give us data that results in a TypeError. This patch is short and sweet.
Attachment #538235 -
Flags: review?(lars)
Updated•13 years ago
|
Attachment #538235 -
Flags: review?(lars) → review+
Assignee | ||
Comment 19•13 years ago
|
||
Fixed in revision 3210.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•