Closed
Bug 465632
Opened 16 years ago
Closed 16 years ago
Prepare for partitioning
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: lars, Assigned: ozten)
References
Details
Attachments
(2 files, 1 obsolete file)
1.47 KB,
patch
|
morgamic
:
review+
|
Details | Diff | Splinter Review |
834 bytes,
patch
|
Details | Diff | Splinter Review |
All SQL queries to the reports, frames, extensions, modules or dumps tables must include an absolute date range in order to optimize the use of the database partitioning. "Absolute" means not using the function "now()". Dates must be expressed as a literal of the type timestamp with time zone. Bound variables are also disallowed for dates. ------- select * from reports where reports.date BETWEEN now() - CAST('3 months' AS INTERVAL) AND now() becomes: select * from reports where reports.date BETWEEN timestamp with time zone '2008-11-18' - CAST('3 months' AS INTERVAL) AND timestamp with time zone '2008-11-18' ------- select * from reports where uuid = 'c94f7b8c-8969-4746-ba21-3c9cb2081117'; becomes: select * from reports where uuid = 'c94f7b8c-8969-4746-ba21-3c9cb2081117' and date = timestamp with time zone '2008-11-17'; ------- In the case where you need to lookup a specific uuid and have no idea of the associated date, you can cheat and extract the date from the uuid itself: the last 6 digits represent YYMMDD.
Assignee | ||
Comment 1•16 years ago
|
||
I don't see a date column in modules table. schema.py doesn't mode the modules table either. Is that a typo? The reports table has 'date' is 'timestamp with time zone' other tables have 'date' columns are 'timestamp' which I think defaults them to without timezone. Does this matter?
Reporter | ||
Comment 2•16 years ago
|
||
hmm, of course. The frames, extensions, modules and dumps will have date columns when the migration is complete. They do not have a date column now. The modules table was deprecated, but may be brought back to life in the future. Interesting about the column types. Just before the WebDev OnSite, I unified all those types to match the report 'date' column, yet, it obviously didn't make it into svn nor can I find it in any of my local copies. Grrr. It is, perhaps, time to start doing some branching in svn rather than just juggling these different versions by the seat of my pants.
Comment 3•16 years ago
|
||
(In reply to comment #2) > The modules table was deprecated, but may be brought back to life in the > future. I don't recall seeing discussion about this. Did I miss something? Are we going to just display the modules list via the raw dump when displaying a report? Also, getting rid of the modules table will make analysis like those proposed in bug 464775, bug 423968, and bug 439679 impossible.
Reporter | ||
Comment 4•16 years ago
|
||
During the height of our database trouble keeping up with demand late last Spring, we stopped populating the 'modules' table. It's ratio of 100:1 in module rows to report rows was a killer. As far as we could see, the modules table was not used by anything. The WebApp gets module data from the raw dump table. So while it has not been in use, I've secretly kept it alive in stasis, suspecting that we would eventually realize that it wasn't as useless as we thought. However, before it can be brought back, its original incarnation's lack of normalization must be addressed. It was populated with thousands and thousands of essentially duplicate rows.
Assignee | ||
Comment 5•16 years ago
|
||
(In reply to comment #3) > (In reply to comment #2) > > The modules table was deprecated, but may be brought back to life in the > > future. > > I don't recall seeing discussion about this. Did I miss something? Are we going > to just display the modules list via the raw dump when displaying a report? > Also, getting rid of the modules table will make analysis like those proposed > in bug 464775, bug 423968, and bug 439679 impossible. Just to confirm... The webapp currently is driven by the dump. It breaks it apart and populates the frames and modules data in the page.
Comment 6•16 years ago
|
||
Ok, seems reasonable. Just noting that people would like to get more advanced analysis later, and that data is useful. (Perhaps in a modified form.)
Assignee | ||
Comment 7•16 years ago
|
||
Attachment #350216 -
Flags: review?(morgamic)
Assignee | ||
Comment 8•16 years ago
|
||
I reviewed topcrashers.py and serverstatus.py - No queries need to be changed. webapp-php use of NOW() was limited to one file, one line. Queries like /* soc.web common.queryTopSig. */ SELECT reports.signature, count(reports.id), count(CASE WHEN (reports.os_name = 'Windows NT') THEN 1 END) AS is_windows, count(CASE WHEN (reports.os_name = 'Mac OS X') THEN 1 END) AS is_mac, count(CASE WHEN (reports.os_name = 'Linux') THEN 1 END) AS is_linux, count(CASE WHEN (reports.os_name = 'Solaris') THEN 1 END) AS is_solaris FROM reports WHERE reports.signature IS NOT NULL AND (reports.product = 'Firefox') AND reports.date BETWEEN now() - CAST('1 weeks' AS INTERVAL) AND now() GROUP BY reports.signature ORDER BY count(reports.id) DESC LIMIT 100 have been rewritten to: /* soc.web common.queryTopSig. */ SELECT reports.signature, count(reports.id), count(CASE WHEN (reports.os_name = 'Windows NT') THEN 1 END) AS is_windows, count(CASE WHEN (reports.os_name = 'Mac OS X') THEN 1 END) AS is_mac, count(CASE WHEN (reports.os_name = 'Linux') THEN 1 END) AS is_linux, count(CASE WHEN (reports.os_name = 'Solaris') THEN 1 END) AS is_solaris FROM reports WHERE reports.signature IS NOT NULL AND (reports.product = 'Firefox') AND reports.date BETWEEN TIMESTAMP '2008-11-26 13:36:53' - CAST('1 weeks' AS INTERVAL) AND TIMESTAMP '2008-11-26 13:36:53' GROUP BY reports.signature ORDER BY count(reports.id) DESC LIMIT 100 Other examples of updated queries: /* soc.web common.queryFreq */ SELECT date_trunc('day', reports.build_date) AS build_date, count(CASE WHEN (reports.signature = 'JS_XDRNewMem') THEN 1 END) AS count, CAST(count(CASE WHEN (reports.signature = 'JS_XDRNewMem') THEN 1 END) AS FLOAT(10)) / count(reports.id) AS frequency, count(reports.id) AS total, count(CASE WHEN (reports.signature = 'JS_XDRNewMem' AND reports.os_name = 'Windows NT') THEN 1 END) AS count_windows, CASE WHEN (count(CASE WHEN (reports.os_name = 'Windows NT') THEN 1 END) > 0) THEN (CAST(count(CASE WHEN (reports.signature = 'JS_XDRNewMem' AND reports.os_name = 'Windows NT') THEN 1 END) AS FLOAT(10)) / count(CASE WHEN (reports.os_name = 'Windows NT') THEN 1 END)) ELSE 0.0 END AS frequency_windows, count(CASE WHEN (reports.signature = 'JS_XDRNewMem' AND reports.os_name = 'Mac OS X') THEN 1 END) AS count_mac, CASE WHEN (count(CASE WHEN (reports.os_name = 'Mac OS X') THEN 1 END) > 0) THEN (CAST(count(CASE WHEN (reports.signature = 'JS_XDRNewMem' AND reports.os_name = 'Mac OS X') THEN 1 END) AS FLOAT(10)) / count(CASE WHEN (reports.os_name = 'Mac OS X') THEN 1 END)) ELSE 0.0 END AS frequency_mac, count(CASE WHEN (reports.signature = 'JS_XDRNewMem' AND reports.os_name = 'Linux') THEN 1 END) AS count_linux, CASE WHEN (count(CASE WHEN (reports.os_name = 'Linux') THEN 1 END) > 0) THEN (CAST(count(CASE WHEN (reports.signature = 'JS_XDRNewMem' AND reports.os_name = 'Linux') THEN 1 END) AS FLOAT(10)) / count(CASE WHEN (reports.os_name = 'Linux') THEN 1 END)) ELSE 0.0 END AS frequency_linux, count(CASE WHEN (reports.signature = 'JS_XDRNewMem' AND reports.os_name = 'Solaris') THEN 1 END) AS count_solaris, CASE WHEN (count(CASE WHEN (reports.os_name = 'Solaris') THEN 1 END) > 0) THEN (CAST(count(CASE WHEN (reports.signature = 'JS_XDRNewMem' AND reports.os_name = 'Solaris') THEN 1 END) AS FLOAT(10)) / count(CASE WHEN (reports.os_name = 'Solaris') THEN 1 END)) ELSE 0.0 END AS frequency_solaris FROM reports WHERE reports.signature IS NOT NULL AND reports.signature = 'JS_XDRNewMem' AND ((reports.product = 'Firefox' AND reports.version = '3.1a2pre')) AND reports.date BETWEEN TIMESTAMP '2008-11-26 13:39:27' - CAST('2 weeks' AS INTERVAL) AND TIMESTAMP '2008-11-26 13:39:27' GROUP BY date_trunc('day', reports.build_date) ORDER BY date_trunc('day', reports.build_date) DESC /* soc.web common.queryReports */ SELECT reports.date, reports.date_processed, reports.uptime, reports.comments, reports.uuid, reports.product, reports.version, reports.build, reports.signature, reports.url, reports.os_name, reports.os_version, reports.cpu_name, reports.cpu_info, reports.address, reports.reason, reports.last_crash, reports.install_age FROM reports WHERE reports.signature IS NOT NULL AND reports.signature = 'JS_XDRNewMem' AND ((reports.product = 'Firefox' AND reports.version = '3.1a2pre')) AND reports.date BETWEEN TIMESTAMP '2008-11-26 13:39:27' - CAST('2 weeks' AS INTERVAL) AND TIMESTAMP '2008-11-26 13:39:27' ORDER BY reports.date DESC LIMIT 500 /* soc.web topcrash.topcrasherss */ SELECT topcrashers.signature, topcrashers.version, topcrashers.product, SUM(topcrashers.total) AS total, SUM(topcrashers.win) AS win, SUM(topcrashers.mac) AS mac, SUM(topcrashers.linux) AS linux FROM topcrashers WHERE product='Firefox' AND version='3.1a2pre' AND last_updated > '2008-10-07 12:27:37' GROUP BY topcrashers.signature, topcrashers.version, topcrashers.product HAVING SUM(topcrashers.total) > 0 ORDER BY SUM(topcrashers.total) DESC LIMIT 100 /* soc.web topcrash.lastupdate */ SELECT topcrashers.last_updated AS last_updated FROM topcrashers WHERE product='Firefox' AND version='3.1a2pre' ORDER BY last_updated DESC LIMIT 1
Reporter | ||
Comment 9•16 years ago
|
||
I regret to say that in comments #0 and #2, I've lead you astray - I've had you work with the wrong column. As Ted pointed out, the 'date_processed' column is the one we should be working with, not the 'date' field. Sorry about that. The following other columns have had names changed for consistency: reports.comments -> reports.user_comments reports.date -> reports.client_crash_date reports.message -> reports.processor_notes reports.starteddatetime -> reports.started_datetime reports.completeddatetime -> reports.completed_datetime frames.date -> frames.date_processed modules.date -> modules.date_processed dumps.date -> dumps.date_processed
Assignee | ||
Comment 10•16 years ago
|
||
Where does Ted point out that we should work with date_processed? Updating two references to report.date and https://bugzilla.mozilla.org/show_bug.cgi?id=454620
Attachment #350216 -
Attachment is obsolete: true
Attachment #352347 -
Flags: review?(morgamic)
Attachment #352347 -
Flags: review?(lars)
Attachment #350216 -
Flags: review?(morgamic)
Assignee | ||
Comment 11•16 years ago
|
||
So the report crons should also use date_processed. This will affect mtbf and top crashers by url. Attaching just a patch for topcrashers.
Attachment #352355 -
Flags: review?(lars)
Reporter | ||
Comment 12•16 years ago
|
||
Ted points out the date to date_processed mix up in the Bug 466242 "solve the database date confusion"
Assignee | ||
Comment 13•16 years ago
|
||
Cool, thanks. So the only time we should use date_crashed is when viewing a specific crash report. Otherwise queries should use date_processed as the date criteria. Usually this is a trivial difference, except for the case where crash_date and date_processed have a large interval between them, say a differed job processed later.
Assignee | ||
Comment 14•16 years ago
|
||
Checked in on trunk r 776 for webapp-php and socorro/cron.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
Attachment #352347 -
Flags: review?(morgamic)
Attachment #352347 -
Flags: review?(lars)
Attachment #352347 -
Flags: review+
Assignee | ||
Updated•14 years ago
|
Attachment #352355 -
Flags: review?(lars)
Updated•13 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•