Closed Bug 535829 Opened 16 years ago Closed 16 years ago

Collect/report additional metadata for out-of-process plugins

Categories

(Socorro :: General, task)

x86
Linux
task
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: ozten)

References

Details

Attachments

(5 files, 2 obsolete files)

For out-of-process plugins, we are going to be sending additional metadata with crash reports and we'd really like to be able to collect/query on it: * ProcessType=default|plugin (in the future we will also add "content") * PluginFilename=libflashplayer.so * PluginName * PluginVersion All of these will potentially be optional. For crash reports without a ProcessType, please assume "default". In querying, we'd like to be able to filter to see only crashes that occur in the "default" or "plugin" process types. We'd also like to search on substrings of the PluginName, so for example there will be plugin names such as "Shockwave Flash 10.0 r42" and we'd like to be able to get topcrash lists of "all plugin processes with 'Flash' in the plugin name". Ideally this will be ready by the time we do OOPP betas, which is scheduled for the second week of January. Please let me know if that isn't feasible.
So this data is coming in via the HTTP post or via the dump? Are these associated with the extension information that we're already getting? Or is this essentially a new stream of data? Regardless of the source, in order to make the data participate in queries, it will have to go into the database. If it supplemental data to the extension data that we already get, then we can likely extend that table. Worst case scenario is having to add a new partitioned table for this data. Getting the data into the database is only the first step. We need to get a UI designed, integrated, and implemented. So with all that, I'm afraid we are not agile enough to get this out by the second week of January. Since it has not been on our radar at all, we'll need to weigh relative priorities in our existing schedule and make allowances.
It's coming in via the HTTP post. It's not really related to extension information, it's just one ProcessType/PluginFilename/PluginName/PluginVersion per crash report (I think it would belong in the main crash table, then, but I'm not up to speed on the current design). Collecting the data and displaying it in the "Details" section of individual crash report pages and in the JSON version would be very helpful by itself: querying ability is definitely a secondary priority, because as long as the data is available we can use JSON to do external correlation/processing. In the future, what's the best way of getting requests on your radar? I'm guessing that filing the bug isn't sufficient, if you didn't see it ;-)
(In reply to comment #2) > it's just one ProcessType/PluginFilename/PluginName/PluginVersion > per crash report This probably won't be true forever; I think we'll want to report all plugins currently loaded, as we do for extensions.
That's a different issue entirely. This metadata is about the single plugin loaded into the plugin process, not the list of plugins loaded into the parent process.
As there is just one of each per crash, then the data will have to go into the reports table. The schema will have to be modified. Since that table contains legacy partitions from the older system, the change isn't particularly straight forward. Again, there will have to be changes to the processor. Ozten will have to comment on the changes required for the UI. We triage our bug list on an irregular schedule. We should probably make that a regular thing to catch these critical requests. The best way to capture my attention, however, is to add me to the CC list.
I'd say this is 5 days of UI work. 2.5 days to alter the search form and 2.5 days to add the new "top crashers by plugin" trend report. So if this was the #1 priority, the UI could be done. Of course it depends on the rest of the system.
Target Milestone: --- → 1.4
(In reply to comment #5) Instead of adding this to the reports table, I think we should put this into a process_details partitioned table. The foreign key would be uuid.
This could be implemented as a radio checkbox defaulting to All types of processes. Selecting Plugin would enable the "Filename" area which allows for exact, starts with, or contains. Content won't appear until a later release when content is rendered in it's own child process, but it is in the mockup to show this future extension.
Assignee: nobody → ozten.bugs
Severity: normal → critical
Depends on: 539261
Attachment #422579 - Flags: review?(morgamic)
Attachment #422579 - Flags: review?(griswolf)
(In reply to comment #9) The attached schema is designed to 1) Support this bug (with the three tables in Attachment#422579 [details]) 2) Work with default, content, Jetpack, and other types of processes crashes once they are developed (with the three tables in Attachment#422580 [details]) 3) Not alter the reports table which is already quite wide 4) Work well with processor inserting data as well as php search pulling data out Notes: plugin_report_20090119 would inherit from plugin_report (not shown) crash_processes_20090119 would inherit from crash_processes (not shown)
> Notes: > plugin_report_20090119 would inherit from plugin_report (not shown) > crash_processes_20090119 would inherit from crash_processes (not shown) As these tables use inheritance and the naming scheme implies parallelism with the 'frames' and 'extensions' tables, I'm assuming that it is for partitioning. In order for foreign key constraints to work in terms of the parent tables, the child tables for all tables involved in a join must share the same constraint columns. EXPLAIN ANALYZE SELECT * FROM reports_20090119 JOIN crash_processes_20090119 AS process on process.report_id = reports_20090119.id JOIN plugins_reports_20090119 AS pr on pr.report_id = reports_20090119.id JOIN plugins ON plugins.id = pr.plugin_id WHERE process.type = 'P' AND plugins.filename LIKE 'lib%'; This query works fine because it speaks only in literal terms of the partitions. Rewrite the query in terms of the parent tables and things get murky. The server cannot know which partitions of the 'plugins_reports' and 'crash_processes' tables to choose. The executions the query will degrade into searching _all_ the partitions of the aforementioned tables. The means that the 'plugins_reports' and 'crash_processes' tables must have the 'date_processed' column in the same manner that the 'frames' and 'extensions' tables do. That query (admittedly awkwardly) must look something like this: EXPLAIN ANALYZE SELECT * FROM reports JOIN crash_processes AS process on process.report_id = reports_20090119.id and 'min date literal' < reports.date_processed and reports.date_processed < 'max date literal' and 'min date literal' < process.date_processed and process.date_processed < 'max date literal' JOIN plugins_reports AS pr on pr.report_id = reports.id and 'min date literal' < pr.date_processed and pr.date_processed < 'max date literal' JOIN plugins ON plugins.id = pr.plugin_id WHERE process.type = 'P' AND plugins.filename LIKE 'lib%'; BTW, these new tables will have to codified within the .../socorro/database/schema.py file with the table dependencies. The weekly partitioning script will also have to prebuild the partitions in the same manner as the 'frames' and 'extensions' partitions are created.
(In reply to comment #13) I forgot a column for exclusion constraints. Thanks! I will use date_processed on both plugin_report and crash_processes tables. I didn't want to do all the partitioning work w/o getting a design review first, since it's mostly book keeping. Yep, I've got schema.py on my todo list. Was going to 1) db tables 2) scripts/createPartitions.py and schema.py 3) processor work 4) Finish php work (just altering the queries is left)
In #2, scripts.createPartitions.py won't actually need any changes. It uses the function 'createPartitions' function in 'schema.py'. You'll need to add your new tables to the list 'databaseObjectClassListForWeeklyPartitions'. However, there are some changes to .../socorro/unittests/database/testSchema.py that will have to happen. griswolf will be helpful there.
Attachment #422579 - Flags: review?(morgamic) → review+
Comment on attachment 422579 [details] A design for supporting out of process plugins Aside from probably having an unnecessary serial in plugins_reports, this looks sane to me.
Creates entries into plugins and plugins_reports tables, adds the following properties to the processed dump (JSONZ) processType, pluginFilename, pluginName, and pluginVersion.
Attachment #422889 - Flags: review?(morgamic)
Attachment #422889 - Flags: review?(griswolf)
Comment on attachment 422889 [details] [diff] [review] First stab at adding OOPP to the processor Parial commentary: change 'CHARACTER VARYING' type to 'text' for new tables, creationSql should include all the necessary columns, so there should be no need for alterColumnDefinitions etc. nor for the new table to be listed in the databaseObjectClassListForUpdate I'm going to need to spend more time looking at the processor.py code
Attachment #422889 - Flags: review?(griswolf) → review-
Comment on attachment 422889 [details] [diff] [review] First stab at adding OOPP to the processor schema.py: per Josh B: varchar is slower than text, less flexible. See also comment #18 PluginsTable: varchar(255) should be type text PluginsReportsTable: varchar(30) should be type text remove method alterColumnDefinitions(...) method remove method updateDefinition(...) processor.py: def insertCrashProcess why is processType limited to 10 char? Also concerned that version field is only 16 char. This can be unlimited per the change to type text in plugins_reports, and should absolutely be no less than 30. Both appear to be potential future problem. Suggest either unlimited, or at least 50 each. logger(... No Big Deal...) lose the commentary. Just "Plugin exists for pluginFilename". Log message of type INFO == 'No big deal' processErrorMessages.append... nit: 'unable to record' seems misleading. What is 'record'? How about: "Failed inserting out of process plugin crash info: file: %s, name: %s, version: %s"
Updated var chars to TEXT, sorry I forgot to do that from earlier feedback. Removed alterColumnDefinitions and updateDefinition. There are many of these in schema.py with commented out bodies, so I was trying to follow the current style. With TEXT change, processType is no longer limited to 10 characters. Previous reasoning... Valid values (present and possible future) default, content, plugin, jetpack With TEXT change version is no longer limited to 16 characters. Previous reasoning... This is the current definition of reports.version Updated logging copy. Thanks for your review.
Attachment #422889 - Attachment is obsolete: true
Attachment #423582 - Flags: review?(morgamic)
Attachment #423582 - Flags: review?(griswolf)
Attachment #422889 - Flags: review?(morgamic)
Comment on attachment 423582 [details] [diff] [review] Second stab at adding OOPP to the processor old line 1311+, new line 1376: the PluginsTable doesn't need updating
Attachment #423582 - Flags: review?(griswolf) → review-
Removed PluginsTable from update list.
Attachment #423582 - Attachment is obsolete: true
Attachment #423741 - Flags: review?(morgamic)
Attachment #423741 - Flags: review?(griswolf)
Attachment #423582 - Flags: review?(morgamic)
Comment on attachment 423741 [details] [diff] [review] Third stab at adding OOPP to the processor OK, as far as I can tell
Attachment #423741 - Flags: review?(griswolf) → review+
Code is staged.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
http://crash-stats.stage.mozilla.com is ready for testing. 1) search for Process Type - plugin 2) search for Filename exactly libflashplayer.so 3) search for Name starts with Adobe
Attachment #423741 - Flags: review?(morgamic)
Attachment #422579 - Flags: review?(griswolf)
Attachment #423741 - Flags: review+
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: