Closed
Bug 948644
Opened 11 years ago
Closed 10 years ago
record missing symbols
Categories
(Socorro :: General, task)
Tracking
(Not tracked)
RESOLVED
FIXED
112
People
(Reporter: benjamin, Assigned: rhelmer)
References
Details
(Whiteboard: [db migration])
I think we should monitor when we're missing symbols more carefully, and notify if we're missing symbols that we don't expect to be missing.
I recently ran reports of missing symbols for a single day (8-Dec).
https://crash-analysis.mozilla.com/bsmedberg/missing-symbols.20131208.byname.csv
https://crash-analysis.mozilla.com/bsmedberg/missing-symbols.20131208.csv
This is just a report on missing symbols that affect signature generation, not all missing symbols. It uses the JSON MDSW data from bug 431514. The script to generate this is at https://github.com/mozilla/jydoop/blob/master/scripts/missing-symbols.py
What I think I'd like to do is spend some time manually noting, for each DLL name, the following information:
* who made it
* whether we expect to have symbols for it
So for example:
"user32.dll" -> ("Microsoft", true)
"xul.dll" -> ("Mozilla", true)
"igd10iumd32.dll" -> ("Intel", false)
"nvwgf2um.dll" -> ("Nvidia", false)
Then, either continuously in the processors or at the end of each data, create a report for missing symbols. This report should also trigger alerts to stability@mozilla.org:
* if we are missing symbols that we expect to have
* the DLL version and debug ID that we're missing
Currently my report includes DLLs even if we don't know the Debug ID. This is interesting inforamation in general sense, but it's often that case that OOM can cause this to happen and we can't look up symbols without a debug ID anyway, so we should exclude DLLs with no debug ID from the alerts.
I'd really like the alerts to happen very quickly, more frequently than once a day: e.g. on patch Tuesday we often have a lag of a day or more before we've fetched symbols: it would be nice to fix that within a few hours. But as a first step, doing it on a daily batch would get us most of the way there and is probably very easy, either using the hadoop/jydoop script I already wrote, or building this into the processors and having them populate a postgres table of missing symbols.
Laura, this is a January/Febryary ask from me if possible.
Reporter | ||
Comment 1•11 years ago
|
||
Oh, I primarily care about Windows right now. It might be interesting to run this for B2G as well, but that's a lower priority and might have slightly different requirements. I care only a little about Mac, and not at all about Linux because we're unlikely to have symbols for any useful OS or 3rd-party software on Linux.
OS: Linux → Windows 7
Hardware: x86_64 → All
Updated•11 years ago
|
Assignee: nobody → dmaher
Comment 2•11 years ago
|
||
There is a query in bug 976034 that produces the data necessary to satisfy this request; however, as per conversations with Ted on IRC, it is quite heavy (on the order of hours to complete). Clearly this isn't as simple as "write a Nagios check", since running that query every 5 minutes would cause havoc, heh. Running query once per day and then creating some sort of report would seem to be the only way to use that query sanely.
:bsmedberg has a jydoop job that, while effective, is "pretty heavyweight in general", and therefore falls into the same category as the SQL query above.
Alternatively, Ted suggests another potential approach: the processors are already generating this data (as part of the processed JSON), and it may be possible to extract and report on this data more efficiently than from the SQL database. Note the "filename" and "missing_symbols" elements of this example: https://gist.github.com/luser/9510714 . It may be possible to use this data somehow.
Laura notes that the above-noted JSON exists in HBase, but also in Postgres and Elasticsearch, the latter two being potentially very interesting (read: efficient) for report generation.
Investigation on-going.
Assignee | ||
Comment 3•11 years ago
|
||
(In reply to Daniel Maher [:phrawzty] from comment #2)
> There is a query in bug 976034 that produces the data necessary to satisfy
> this request; however, as per conversations with Ted on IRC, it is quite
> heavy (on the order of hours to complete). Clearly this isn't as simple as
> "write a Nagios check", since running that query every 5 minutes would cause
> havoc, heh. Running query once per day and then creating some sort of
> report would seem to be the only way to use that query sanely.
>
> :bsmedberg has a jydoop job that, while effective, is "pretty heavyweight in
> general", and therefore falls into the same category as the SQL query above.
>
> Alternatively, Ted suggests another potential approach: the processors are
> already generating this data (as part of the processed JSON), and it may be
> possible to extract and report on this data more efficiently than from the
> SQL database. Note the "filename" and "missing_symbols" elements of this
> example: https://gist.github.com/luser/9510714 . It may be possible to use
> this data somehow.
We're looking at how to make this query faster, the data is totally unstructured right now so there should be some easy wins. Worst-case we can run it once per day and save the output in a table which you could use.
> Laura notes that the above-noted JSON exists in HBase, but also in Postgres
> and Elasticsearch, the latter two being potentially very interesting (read:
> efficient) for report generation.
>
> Investigation on-going.
Reporter | ||
Comment 4•11 years ago
|
||
FWIW, I think you'd have better luck by having the processors write a simple log of missing symbols and aggregating those logs. After-the-fact database querying seems like overkill.
Assignee | ||
Comment 5•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #4)
> FWIW, I think you'd have better luck by having the processors write a simple
> log of missing symbols and aggregating those logs. After-the-fact database
> querying seems like overkill.
Good point! We could have the processors insert info about missing symbols into a table in postgres (or wherever else we want too)
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Daniel Maher [:phrawzty] from comment #2)
> There is a query in bug 976034 that produces the data necessary to satisfy
> this request; however, as per conversations with Ted on IRC, it is quite
> heavy (on the order of hours to complete). Clearly this isn't as simple as
> "write a Nagios check", since running that query every 5 minutes would cause
> havoc, heh. Running query once per day and then creating some sort of
> report would seem to be the only way to use that query sanely.
>
> :bsmedberg has a jydoop job that, while effective, is "pretty heavyweight in
> general", and therefore falls into the same category as the SQL query above.
>
> Alternatively, Ted suggests another potential approach: the processors are
> already generating this data (as part of the processed JSON), and it may be
> possible to extract and report on this data more efficiently than from the
> SQL database. Note the "filename" and "missing_symbols" elements of this
> example: https://gist.github.com/luser/9510714 . It may be possible to use
> this data somehow.
>
> Laura notes that the above-noted JSON exists in HBase, but also in Postgres
> and Elasticsearch, the latter two being potentially very interesting (read:
> efficient) for report generation.
>
> Investigation on-going.
I agree with bsmedberg in comment 4 - we're looking for needles in the haystack here, so doing a huge query to pull them out doesn't make sense. Processors are in a good position to see that 'missing_symbols' is 'true' in the JSON, and record just the info about those modules somewhere (a Postgres table would be quite easy to provide).
Ted, could we WONTFIX bug 976034 in favor of this bug? Sorry I didn't notice before that it is set blocking this one currently.
Flags: needinfo?(ted)
Flags: needinfo?(dmaher)
Comment 7•11 years ago
|
||
If you can get this output out in some format I'm happy to make the missing symbols job use it. I don't care how you want to work the Bugzilla mechanics.
My only worry with producing this data on-demand is that it's going to wind up being a lot of data. We should do some exploratory queries here to figure out how much data we're talking about.
Something like `select count(*) from modules where modules.missing_symbols = true` (although clearly that's not proper SQL) would give us an estimate of how many entries we'd have for a set of reports.
Flags: needinfo?(ted)
Updated•11 years ago
|
Flags: needinfo?(dmaher)
Assignee | ||
Comment 8•10 years ago
|
||
I believe that we can do this with a processor rule. Something along these lines:
MissingSymbols
--
for module in processed_crash['modules']:
if module['missing_symbols']:
cursor.execute('''
INSERT INTO missing_symbols
(debug_file, debug_id)
VALUES (%s, %s)'
'''INSERT INTO missing_symbols
Assignee: dmaher → rhelmer
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(lars)
Assignee | ||
Comment 9•10 years ago
|
||
What I get for inserting code into bugzilla :P
MissingSymbols
--
for module in processed_crash['modules']:
if module['missing_symbols']:
debug_file = processed_crash['debug_file']
debug_id = processed_crash['debug_id']
cursor.execute('''
INSERT INTO missing_symbols
(debug_file, debug_id)
VALUES (%s, %s)''', debug_file, debug_id)
--
Open questions for you Lars and Selena:
* is it OK to insert to Postgres in the _action() method of a processor rule?
* we expect a lot of duplicates, would it be better to upsert here, or to insert dupes and group it when doing daily reports?
See comment 4 for context on this.
Flags: needinfo?(sdeckelmann)
Flags: needinfo?(lars)
Assignee | ||
Comment 10•10 years ago
|
||
More concrete proposal w/ simple test in https://gist.github.com/rhelmer/5113a29812708450c08e
Assignee | ||
Comment 11•10 years ago
|
||
Note that I have co-opted this bug somewhat because we need a replacement for the modulelist hadoop job (which dumps all modules but really we only need missing symbols) - but I believe we can help to satisfy comment 0 by providing a table that is updated by the processors in real-time.
Comment 12•10 years ago
|
||
I think the bits from comment 0 would be better off as a standalone service that drank from the firehose of the data feed you'll be producing here.
Assignee | ||
Comment 13•10 years ago
|
||
Comment 14•10 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/f7841cbcf90c35f0849a67de54ec6cc4cf04af01
bug 948644 - add rule to track missing symbols in postgres table, with
help from lars on unittests
https://github.com/mozilla/socorro/commit/b3d76ac25e12bdd0501450b6b60a90dc9ce64a11
Merge pull request #2487 from rhelmer/bug948644-store-missing-symbols
bug 948644 - add rule to track missing symbols in postgres table, with
Updated•10 years ago
|
Target Milestone: --- → 111
Comment 15•10 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/bcca1a7cd549d7267d6d3417c61dd32d179de3c4
followup to bug 948644 - provide placeholder for all expected values
https://github.com/mozilla/socorro/commit/fbc9176346ce00a54ac43dda41748d1e6b47ce07
Merge pull request #2496 from rhelmer/bug948644-store-missing-symbols
followup to bug 948644 - provide placeholder for all expected values
Comment 16•10 years ago
|
||
I hadn't looked at your commits till just now, but I note that this will produce missing symbols for crashes on all platforms, whereas before we were limiting to just Windows. I don't think it's that big of a deal, since it's not that hard to filter by filename and the majority of our crashes are on Windows anyway.
Comment 17•10 years ago
|
||
So, since you morphed bsmedberg's bug to not exactly do what he wanted in comment 0 (but lay the groundwork for it), we should file a followup to actually build what he wants. As per comment 12, I think what would be useful would be to feed this data into a queue (I would suggest RabbitMQ but I hear we want to do away with that) and write a simple consumer that can handle the incoming data stream and figure out what to do with it.
Assignee | ||
Comment 18•10 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #16)
> I hadn't looked at your commits till just now, but I note that this will
> produce missing symbols for crashes on all platforms, whereas before we were
> limiting to just Windows. I don't think it's that big of a deal, since it's
> not that hard to filter by filename and the majority of our crashes are on
> Windows anyway.
That is what I was thinking, I still need to do generate daily reports based on this data we're collecting to replace modulelist.txt, Going to file a separate bug for that.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #17)
> So, since you morphed bsmedberg's bug to not exactly do what he wanted in
> comment 0 (but lay the groundwork for it), we should file a followup to
> actually build what he wants. As per comment 12, I think what would be
> useful would be to feed this data into a queue (I would suggest RabbitMQ but
> I hear we want to do away with that) and write a simple consumer that can
> handle the incoming data stream and figure out what to do with it.
I was leaving this bug open, but can file a separate bug if it makes things clearer/more actionable.
Would the (potentially large) number of duplicates be a problem if we were inserting this straight from processor into a queue? If some delay is tolerable (hourly for instance) then we could filter out at least some of these by doing a periodic SELECT on the missing_symbols table and GROUP BY to remove duplicates within that hour bucket.
In comment 0 bsmedberg mentions starting with a daily batch for simplicity, with "a few hours" as ideal. I think it would be just as simple to start with hourly reports, if it's just a query against the missing_symbols table.
Status: NEW → ASSIGNED
Comment 19•10 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #18)
> I was leaving this bug open, but can file a separate bug if it makes things
> clearer/more actionable.
I think so, conflating separate but related issues in a single bug gets messy.
> In comment 0 bsmedberg mentions starting with a daily batch for simplicity,
> with "a few hours" as ideal. I think it would be just as simple to start
> with hourly reports, if it's just a query against the missing_symbols table.
Yeah, that might be better, especially as a first implementation. Whatever is reading this data is still going to have to deal with duplicates at some point (the delay between when the report is generated and when it produces the missing symbols will ensure that), but we can fine-tune this a bit.
Comment 20•10 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/5457ca91e49db20f5d48f2fec9faefcc951df2f6
bug 948644 - use crash_id instead of undefined 'key'
https://github.com/mozilla/socorro/commit/72d82681616fdf23e247fcdfa85c3cb24e5560ff
bug 948644 - correct column name
https://github.com/mozilla/socorro/commit/a90ff9eed6e952d1fe248f575746b83f0fe1903e
Merge pull request #2502 from rhelmer/bug948644-store-missing-symbols
bug 948644 - use crash_id instead of undefined 'key'
Updated•10 years ago
|
Target Milestone: 111 → 112
Assignee | ||
Updated•10 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Summary: Automatic monitoring and classification of missing symbols → record missing symbols
Assignee | ||
Comment 21•10 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #19)
> (In reply to Robert Helmer [:rhelmer] from comment #18)
> > I was leaving this bug open, but can file a separate bug if it makes things
> > clearer/more actionable.
>
> I think so, conflating separate but related issues in a single bug gets
> messy.
OK filed bug 1106313 to track automatic monitoring/classification of missing symbols, and morphed this one to be about record missing symbols that the processor sees (the actual work done in this bug).
Assignee | ||
Updated•10 years ago
|
Whiteboard: [db migration]
Assignee | ||
Comment 22•10 years ago
|
||
Migration run on prod.
You need to log in
before you can comment on or make changes to this bug.
Description
•