Open Bug 380878 (chartsys_rewrite) Opened 18 years ago Updated 11 years ago

New charting system

Categories

(Bugzilla :: Reporting/Charting, enhancement)

enhancement
Not set
normal

Tracking

()

People

(Reporter: LpSolit, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: helpwanted)

I had a private discussion with wurblzap first, then per email with gerv and justdave and finally yesterday at our monthly Bugzilla meeting on IRC: The idea is to redo the charting system from scratch using a different way to store bug changes. Till now, bug changes are stored in the bugs_activity table and new charts require collectstats.pl to run on a daily (or at least regular) basis to populate the series* tables. What we suggest instead is to store a copy of edited bugs in a new history_bugs table allowing us to drop the bugs_activity and all series* tables. This would allow us to easily get a snapshot of the DB at a given date (à la wiki) and generating charts should be relatively easy. Some developers objected because of possible space consumption and perf issues. This should be carefully investigated! Some developers also asked about how to store multi-value fields, such as the keyword and CC lists. Maybe we will need separate history_* tables, such as history_cc, history_keywords and history_flags; we don't know yet. About generating the history_bugs table, we could use the bugs_activity table before killing it. One problem is that values edited from admin pages are not reflected in the bugs_activity table. This may produce some fun results.
For the record, here are my current concerns and points: - The new-new system should absolutely be required to support charting of arbitrary searches, as the new one does today; we should not be deciding what our customers might want to chart. Anything else is a functional regression. - The change tables will occupy a non-negligible amount of database space, which has to be RAIDed, backed-up, read into memory etc. - The change tables will duplicate a *lot* of data; this lack of normalisation is supposed to be a bad thing in database design. - I have deep suspicions about the performance of trying to extract data for certain types of search (for example, those involving JOINs) out of the type of table currently proposed. Gerv
I think the new system could be a good idea. Here are my concerns, though: * The "changed" Boolean Charts still have to work. (changed from, changed to, etc.) * We have to be able to generate the View Bugs Activity page quickly. It's used more than charts are. Also, if we move Activity onto the main page, as some people want, then it *definitely* has to be fast or it's going to significantly slow down show_bug.cgi.
I think the ideal way to make this meet the requirement that ALL the searches that work now also work if we can define a select for each table that represents that tables state at a point in the past and use that in a subselect as a replacement for each table. So, if we have something that knows how to SELECT <something> from bugs LEFT JOIN longdescs ON condition INNER JOIN products ON condition WHERE very_elaborate_condition it should (I hope) be possible to write it as... SELECT <something> from (SELECT <the_last_state_of_each_bug_prior_to_the_specified_date>) as bugs LEFT JOIN (SELECT <the_last_state_of_longdescs_prior_to_the_specified_date) as longdescs ON condition INNER JOIN (SELECT <the_last_state_of_products_prior_to_the_specified_date) as products ON condition WHERE very_elaborate_condition I am not sure if the aliasing trick is legal, but we could get around that. More to the point, I have no idea is the databses can optimize beasts like this or if they would wind up creating monster temporary tables. On thing that would be important to make this work... If a bug is created with its summary "foo", then it is changed to "bar", the history file we use for this should have an entry changing its value to "foo" and then, later, to "bar". This helps, relative to our existing activity mechanism, by making it unnecessary to check first for changes, then for the original value. Note also that the current state of a bug is just the latest change of each field... perhaps the regular table like "bugs" should be subselects. If this approach were used even for keys to other tables... (i.e. Bug 2345 was in product 2 on a date in the past when product 2 was "FireBird"), then this whole scheme could work.
We could possibly even come up with some very clever way of writing a view. If we required MySQL 5.0, we could use Views in the DB. I don't think we should eliminate the regular tables. We'd have to have a MAX(date) on each query, and that would mean a mandatory GROUP BY, and that would mean that every query would be a subselect, and that doesn't sound good for performance to me (making every query against the bugs table into actually two queries).
I like the idea of moving to a new charting system, however, I am concerned as one who has a lot of code based on the current bugs_activity table. I would love to see something that is much easier to produce dashboard-like data easily. If the desire is to eliminate bugs_activity completely, then I think we ought to make sure that in the new methods, that all data changes are logged without exception. When we have full support for transactions coded in post_bug.cgi and process_bug.cgi (as well as other updates), it'll be much easier to log updates with changes because any part of the update that fails can be rolled back, even the log update. By the way, I happen to be migrating a bug tracking system (Oracle-based) into Bugzilla that stores entire snapshots of bugs as a way to determine bug history. While it's cool for developers, it clearly takes a significant performance hit when looking at bug history as the database grows. Since the historic data and current data are in the same table, there is a *slight* performance advantage, but the slight gain is offset at search/query time. As I see it, the main advantage to this method of storing historic data is a reduced need to integrate changes to the underlying schema into the history collection table(s). I would also say that it can be its downfall. As others have mentioned, single table histories can make for very slow databases.
(In reply to comment #4) > We could possibly even come up with some very clever way of writing a view. If > we required MySQL 5.0, we could use Views in the DB. For those that are not aware, using views may (emphasize may) increase performance by taking advantage of the query cache. Max, I am certain you knew that. :-)
I won't have time to play with it before we freeze for 3.2, and there is still too much discussion going on about how to implement it optimally to do anything now anyway. Feel free to take it!
Assignee: LpSolit → gerv
Target Milestone: Bugzilla 3.2 → ---
Assignee: gerv → charting
You need to log in before you can comment on or make changes to this bug.