Closed
Bug 560628
Opened 14 years ago
Closed 14 years ago
ADU daily UI changes for hang detection
Categories
(Socorro :: General, task, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
1.7
People
(Reporter: ozten, Assigned: ozten)
References
Details
Attachments
(1 file, 1 obsolete file)
29.59 KB,
patch
|
laura
:
review+
lars
:
review+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #560627 +++ Per Bug#559360 Comment#16 We need the ADU daily report to support hang detection. This depends on an update to http://code.google.com/p/socorro/wiki/AduAPI
Assignee | ||
Updated•14 years ago
|
Assignee | ||
Updated•14 years ago
|
Summary: ADU daily UI changes for hang detection → ADU daily service call changes for hang detection
Target Milestone: --- → 1.7
Assignee | ||
Updated•14 years ago
|
Summary: ADU daily service call changes for hang detection → ADU daily UI changes for hang detection
Version: 1.8 → 1.7
Updated•14 years ago
|
Assignee: nobody → chowse
Updated•14 years ago
|
Priority: -- → P1
Comment 1•14 years ago
|
||
See attachment #439544 [details] for mock-ups. The All/Crashes/Hangs options above the chart can be excluded.
Comment 2•14 years ago
|
||
looks good.
Updated•14 years ago
|
Assignee: chowse → ozten.bugs
Assignee | ||
Comment 3•14 years ago
|
||
We need to add report_type to the top_crashes_by_signature table. report type can be any, crash, or hang. There are a couple of ways to go here. 1) Add report_type column and calculate twice as many rows. The crashesByADU API would filter based on this column. 2) Create a table that maps prod/version/signatures to report_type. This can be updated via top crashers or another cron. The crashesByADU API would filter results by joining against this new table. 2.1) A signature would be a 'hang' if x% were for crashes with a hangid. 3) ??? Will noodle on this some more.
Comment 4•14 years ago
|
||
here is how I'm tracking the data until socorro gets updated https://wiki.mozilla.org/CrashKill/Crashr#3.6.4
Comment 5•14 years ago
|
||
we could actually create one more calculated number column. that would be to divide the number of "hang reports" by 2 to have a "number of hang incidences." the number of "hang incidences" plus the number of crashes equals our new over all stability number.
Assignee | ||
Comment 6•14 years ago
|
||
(In reply to comment #4) (In reply to comment #5) Created Bug#565482
Assignee | ||
Comment 7•14 years ago
|
||
So I've been working on #2 from Bug#560628 Comment#3. I've hit a wall in getting my new queries to perform well do to the magnitude of data in the tcbs table. I've create Bug#565777 and re-opened Bug#545000.
Assignee | ||
Updated•14 years ago
|
Target Milestone: 1.7 → 1.8
Assignee | ||
Comment 8•14 years ago
|
||
This needs more testing, but I think it's ready for code review. I tried really hard to work with existing tables, but eventually adding a new table that is at the same level of granularity as raw_adu and the daily.php report turned out to be a much easier path. See comments in schema.py for db column documentation.
Attachment #447117 -
Flags: review?(ryan)
Attachment #447117 -
Flags: review?(laura)
Attachment #447117 -
Flags: review?(lars)
Comment 9•14 years ago
|
||
Comment on attachment 447117 [details] [diff] [review] Adds new daily_crashes db table as well as new startDailyCrashes.py cron In models/daily.php::get(), could we add the new parameter at the end for backwards API compat? I know it's not *that* relevant in this instance. I also have a couple, possibly dumb, questions about the python parts of this patch. I'll ping you in IRC.
Comment 10•14 years ago
|
||
Comment on attachment 447117 [details] [diff] [review] Adds new daily_crashes db table as well as new startDailyCrashes.py cron These issues are all relatively minor. If the code works, and you don't want to change it, I'll switch to +. == TestSchema.py == ok == daily_crash.py == several issues: 1 - the socorro.lib.psycopghelper module is deprecated. You can use the socorro.database.database module that has nearly an identical interface. 2 - function "most_recent_day" - This function has some problems handling exceptions. Both exception handlers catch all exceptions including keyboardInterupt and SigKill. If some thing goes wrong and the process hangs and must be killed, these exception handlers will necessitate "kill -9". It is better to use this idiom: try: ... except Exception: ... The python exception hierarchy places signals and the keyboard interrupt exceptions at a lower level than the class Exception. This idiom will not catch those and allow the natural process killing mechanisms to work. 3 - function "most_recent_day" - the outer exception hander does not need to encompass the inner exception handler. That inner one will catch everything. The outer one could be shrunk to cover only lines 18-20. The result is slightly different behavior that may be a correction to an unintended code path. In the existing code, if the singleValueSql fails with some exception, then the outer handler falls back to a returning the value of "most_recent_day_from_product_visibility" without checking it for None like the other code path does when it calls that function. This seems cleaner by using only 2 exception handlers instead of 3 and has no nesting: try: day = psy.singleValueSql(databaseCursor, "SELECT MAX(adu_day) FROM daily_crashes") if day: return day logger.info("daily_crashes was empty, using product_visibility to determine start date") except Exception: util.reportExceptionAndContinue(logger) logger.warning("Ouch, db error accessing daily_crashes, using product_visibility to determine start date") try: day = most_recent_day_from_product_visibility(databaseCursor, logger) if day: return day else: return fail_most_recent_day(logger) except Exception: util.reportExceptionAndContinue(logger) return fail_most_recent_day(logger) 4 - function "most_recent_day_from_product_visibility" - this function has the side effect of rolling back the connection. That rollback should, perhaps, be placed in the exception handler itself to indicate the exception state. 5 - function "record_crash_stats" - 1st logging has a spelling error 6 - function "record_crash_stats" - the socorro.database.database module gives you a slightly simpler interface to the database. Since this is not a multithreaded program, there is no need to instantiate a whole database connection pool. A single connection will do: databaseConnectionPool = psy.DatabaseConnectionPool(config.databaseHost, config.databaseName, config.databaseUserName, config.databasePassword, logger) becomes databaseConnection = db.Database(config) and _, databaseCursor = databaseConnectionPool.connectionCursorPair() becomes databaseCursor = db.connection().cursor() == aduByDay.py == I like the way that in "fetchCrashHistory" you've replaced a horrendous date equation with nothing special. It looks like the UTC to PST/PDT conversion has been done in the insert into the daily_crashes table. == schema.py == ok == startDailyCrash.py == ok == dailycrashconfig.py.dist == rather than importing everything from 'commonconfig.py', most of the 1.7 scripts/configs have been converted to cherry picking values. This makes the --help command line option list out only what is really needed to run the script rather than everything. For an example, see the new processorconfig.py.dist. Also, that idiom of exception handle was masking errors. We've settled on dropping it and requiring that configuration be in ./config rather than having a fallback location (that was always wrong).
Attachment #447117 -
Flags: review?(lars) → review-
Comment 11•14 years ago
|
||
sorry, I missed one. We should at least stub out a .../socorro/unittests/cron/test_daily_crash.py. Actually writing the damn things takes a couple times longer that writing the code itself. Ideally we want unit tests for every function, but considering our time constraints, we may have to (rightly or wrongly) consider that to be a luxury...
Assignee | ||
Comment 12•14 years ago
|
||
Assignee | ||
Updated•14 years ago
|
Attachment #447117 -
Attachment is obsolete: true
Attachment #447117 -
Flags: review?(ryan)
Attachment #447117 -
Flags: review?(laura)
Assignee | ||
Updated•14 years ago
|
Attachment #447176 -
Flags: review?(laura)
Attachment #447176 -
Flags: review?(lars)
Assignee | ||
Comment 13•14 years ago
|
||
Second patch has changes from Comment #9 and Comment #10. Thanks guys for the feedback. (In reply to comment #11) I agree, but these should be integration tests though as none of the logic in daily_crash is unit testable.
Comment 14•14 years ago
|
||
(In reply to comment #13) The logic is unit testable using mock objects to replace and simulate the database. It is extremely tedious and it is brittle. I really hate writing that kind of test. However, I cannot deny that on several occasions, the act of writing the tests revealed some subtle logic flaws. It is also about the only way to test exception handling.
Assignee | ||
Comment 15•14 years ago
|
||
In addition to patch2 one slight tweak per lars IRC feedback: database = db.Database(config) databaseConnection = database.connection() try: ... finally: databaseConnection.close()
Updated•14 years ago
|
Attachment #447176 -
Flags: review?(lars) → review+
Assignee | ||
Comment 16•14 years ago
|
||
Checking in *backend portion first* socorro scripts Adding scripts/config/dailycrashconfig.py.dist Adding scripts/startDailyCrash.py Adding socorro/cron/daily_crash.py Sending socorro/database/schema.py Sending socorro/lib/psycopghelper.py Sending socorro/services/aduByDay.py Sending socorro/unittest/database/testSchema.py Transmitting file data ....... Committed revision 2105. I'll commit frontend portion once Bug#567923 is done.
Target Milestone: 1.8 → 1.7
Assignee | ||
Comment 17•14 years ago
|
||
Committing frontend changes. Sending webapp-php/application/controllers/daily.php Sending webapp-php/application/controllers/products.php Sending webapp-php/application/models/daily.php Sending webapp-php/application/views/daily/daily_search.php Sending webapp-php/application/views/daily/index.php Transmitting file data ..... Committed revision 2110.
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 18•14 years ago
|
||
Comment on attachment 447176 [details] [diff] [review] second stab I missed this...but it's good.
Attachment #447176 -
Flags: review?(laura) → review+
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•