remove missing symbols bookkeeping [1/2023]
Categories
(Tecken :: General, enhancement, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
If the download API code can't find a symbol file in the configured sources, it adds a record to the MissingSymbol
table.
.ttf files are fonts that show up in symbols lists. They get requested regularly and often show up at the top of the missing symbols list. For example, here's the top five missing symbols in the last email:
xul.dll 101.0.1.8194 000000000000000000000000000000000 2448
Ubuntu-R.ttf None 000000000000000000000000000000000 1691
DejaVuSans.ttf None 000000000000000000000000000000000 1690
nvidiactl None 000000000000000000000000000000000 1550
libdispatch.dylib 0.501.40.12 C749985761A53D7DA5EA65DCC8C3DF920 1246
This bug covers fixing the download code to not keep track of missing .ttf files.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
|
||
The "missing symbols" email is generated by this jupyter notebook:
https://github.com/marco-c/missing_symbols/blob/master/modules-with-missing-symbols.ipynb
That notebook looks at the Socorro crash report data in BigQuery--it doesn't use the missing symbols API on the Symbols server and thus never sees the data in the download_missingsymbol
table.
Given that the missing symbols emails don't use the missing symbols db table, can we can remove all the missing symbols bookkeeping in the Symbols server? This would simplify a bunch of things in Socorro and Tecken.
Gabriele, Marco: Do either of you use the Downloads Missing page (https://symbols.mozilla.org/downloads/missing/) or the downloads missing API (https://symbols.mozilla.org/api/downloads/missing/) on symbols.mozilla.org? Do you know anything that does?
Assignee | ||
Comment 2•2 years ago
|
||
Mmm... On second thought, I see it used here:
https://searchfox.org/mozilla-central/source/tools/crashreporter/system-symbols/win/symsrv-fetch.py
However, that's the only place I see it used across all repositories on github. Can we change the symsrv-fetch.py script to get the data from the same place modules-with-missing-symbols gets it?
Comment 3•2 years ago
|
||
I can't think of any other place where it is being used.
If this API is simply building the list of missing symbols by listing all modules from crash reports, then we can replace it with a query such as the one from modules-with-missing-symbols.ipynb (as that's what modules-with-missing-symbols.ipynb is doing basically).
We could make modules-with-missing-symbols.ipynb upload an artifact somewhere and symsrv-fetch.py could grab it.
Comment 4•2 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #2)
Mmm... On second thought, I see it used here:
https://searchfox.org/mozilla-central/source/tools/crashreporter/system-symbols/win/symsrv-fetch.py
However, that's the only place I see it used across all repositories on github.
As far as I know that's the only user. I have a script I used locally from time to time but aside from that I'm not aware of any other public users of that API.
Can we change the symsrv-fetch.py script to get the data from the same place modules-with-missing-symbols gets it?
I suppose we could. It's a different system because I don't think we have SQL support on TaskCluster, but I suppose I could figure something out.
Comment 5•2 years ago
|
||
Gabriele, we could modify my script to also generate a file with the list and upload it on S3. This way you don't have to set up a new cron but can piggyback on the already existing one.
Comment 6•2 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #5)
Gabriele, we could modify my script to also generate a file with the list and upload it on S3. This way you don't have to set up a new cron but can piggyback on the already existing one.
That would be nice!
Assignee | ||
Comment 7•2 years ago
•
|
||
Removing the whole missing-symbols bookkeeping simplifies work I need to do to move Socorro and Tecken to GCP. It also reduces the database work that Tecken does in the download API which is heavily used. That's important because the addition of inline function data has really done a number on Tecken, so reducing the things it's doing--especially database things--helps.
Can we work on this at the beginning of 2023?
Comment 8•2 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #7)
Can we work on this at the beginning of 2023?
Sure!
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 9•1 year ago
|
||
Marco, Gabriele: Looks like symsrv-fetch.py is still using https://symbols.mozilla.org/missingsymbols.csv . Can someone find the time to update the script to source the missing symbols from somewhere else by the end of August? I need to move forward with the Tecken GCP migration and this is blocking that.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Comment 10•1 year ago
|
||
We could create a query on Redash that generates the list, and then on the Taskcluster side use the Redash API to retrieve the results. It's pretty simple, just need to GET something like "https://sql.telemetry.mozilla.org/api/queries/QUERY_NUMBER/results.json?api_key=API_KEY".
I won't be able to help unfortunately, I'll be out until the All Hands.
Comment 11•1 year ago
|
||
I'm not familiar with redash but I can temporarily switch symsrv-fetch.py to use crash-stats directly by sampling a subset of recent crashes. I already have a version of the script that does that I used on my machine in the past so I can use that. This will remove the dependency on missingsymbols.csv and we can switch to Redash after the all-hands.
Assignee | ||
Comment 12•1 year ago
|
||
I really appreciate that. Thank you!
Assignee | ||
Comment 13•1 year ago
|
||
Assignee | ||
Comment 14•1 year ago
|
||
Assignee | ||
Comment 15•1 year ago
•
|
||
Something is using /api/downloads/missing/
. I looked at searchfox and github search and didn't see any scripts. Then I checked the logs and it's a Chrome user agent hitting the Downloads Missing page. The querystring looks like fuzzing. Then I noticed that you don't need to be logged in to see the Downloads Missing tab--it shows up for everyone. I think this is fuzzing traffic.
As a side note, getting rid of this page and fuzzing traffic will be good because it's a slow page.
I verified the site status in the admin works and the navigation works on stage.
Assignee | ||
Comment 16•1 year ago
|
||
This was deployed just now in bug #1848645. Marking as FIXED.
Description
•