Closed Bug 1774004 Opened 2 years ago Closed 1 year ago

remove missing symbols bookkeeping [1/2023]

Categories

(Tecken :: General, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

If the download API code can't find a symbol file in the configured sources, it adds a record to the MissingSymbol table.

.ttf files are fonts that show up in symbols lists. They get requested regularly and often show up at the top of the missing symbols list. For example, here's the top five missing symbols in the last email:

xul.dll	101.0.1.8194	000000000000000000000000000000000	2448
Ubuntu-R.ttf	None	000000000000000000000000000000000	1691
DejaVuSans.ttf	None	000000000000000000000000000000000	1690
nvidiactl	None	000000000000000000000000000000000	1550
libdispatch.dylib	0.501.40.12	C749985761A53D7DA5EA65DCC8C3DF920	1246

This bug covers fixing the download code to not keep track of missing .ttf files.

Assignee: nobody → willkg
Status: NEW → ASSIGNED

The "missing symbols" email is generated by this jupyter notebook:

https://github.com/marco-c/missing_symbols/blob/master/modules-with-missing-symbols.ipynb

That notebook looks at the Socorro crash report data in BigQuery--it doesn't use the missing symbols API on the Symbols server and thus never sees the data in the download_missingsymbol table.

Given that the missing symbols emails don't use the missing symbols db table, can we can remove all the missing symbols bookkeeping in the Symbols server? This would simplify a bunch of things in Socorro and Tecken.

Gabriele, Marco: Do either of you use the Downloads Missing page (https://symbols.mozilla.org/downloads/missing/) or the downloads missing API (https://symbols.mozilla.org/api/downloads/missing/) on symbols.mozilla.org? Do you know anything that does?

Flags: needinfo?(mcastelluccio)
Flags: needinfo?(gsvelto)
Summary: don't mark .ttf files as missing symbols → remove missing symbols bookkeeping

Mmm... On second thought, I see it used here:

https://searchfox.org/mozilla-central/source/tools/crashreporter/system-symbols/win/symsrv-fetch.py

However, that's the only place I see it used across all repositories on github. Can we change the symsrv-fetch.py script to get the data from the same place modules-with-missing-symbols gets it?

I can't think of any other place where it is being used.

If this API is simply building the list of missing symbols by listing all modules from crash reports, then we can replace it with a query such as the one from modules-with-missing-symbols.ipynb (as that's what modules-with-missing-symbols.ipynb is doing basically).
We could make modules-with-missing-symbols.ipynb upload an artifact somewhere and symsrv-fetch.py could grab it.

Flags: needinfo?(mcastelluccio)

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #2)

Mmm... On second thought, I see it used here:

https://searchfox.org/mozilla-central/source/tools/crashreporter/system-symbols/win/symsrv-fetch.py

However, that's the only place I see it used across all repositories on github.

As far as I know that's the only user. I have a script I used locally from time to time but aside from that I'm not aware of any other public users of that API.

Can we change the symsrv-fetch.py script to get the data from the same place modules-with-missing-symbols gets it?

I suppose we could. It's a different system because I don't think we have SQL support on TaskCluster, but I suppose I could figure something out.

Flags: needinfo?(gsvelto)

Gabriele, we could modify my script to also generate a file with the list and upload it on S3. This way you don't have to set up a new cron but can piggyback on the already existing one.

(In reply to Marco Castelluccio [:marco] from comment #5)

Gabriele, we could modify my script to also generate a file with the list and upload it on S3. This way you don't have to set up a new cron but can piggyback on the already existing one.

That would be nice!

Removing the whole missing-symbols bookkeeping simplifies work I need to do to move Socorro and Tecken to GCP. It also reduces the database work that Tecken does in the download API which is heavily used. That's important because the addition of inline function data has really done a number on Tecken, so reducing the things it's doing--especially database things--helps.

Can we work on this at the beginning of 2023?

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #7)

Can we work on this at the beginning of 2023?

Sure!

Assignee: willkg → nobody
Status: ASSIGNED → NEW
Summary: remove missing symbols bookkeeping → remove missing symbols bookkeeping [1/2023]

Marco, Gabriele: Looks like symsrv-fetch.py is still using https://symbols.mozilla.org/missingsymbols.csv . Can someone find the time to update the script to source the missing symbols from somewhere else by the end of August? I need to move forward with the Tecken GCP migration and this is blocking that.

Flags: needinfo?(mcastelluccio)
Flags: needinfo?(gsvelto)
Assignee: nobody → willkg
Status: NEW → ASSIGNED

We could create a query on Redash that generates the list, and then on the Taskcluster side use the Redash API to retrieve the results. It's pretty simple, just need to GET something like "https://sql.telemetry.mozilla.org/api/queries/QUERY_NUMBER/results.json?api_key=API_KEY".

I won't be able to help unfortunately, I'll be out until the All Hands.

Flags: needinfo?(mcastelluccio)

I'm not familiar with redash but I can temporarily switch symsrv-fetch.py to use crash-stats directly by sampling a subset of recent crashes. I already have a version of the script that does that I used on my machine in the past so I can use that. This will remove the dependency on missingsymbols.csv and we can switch to Redash after the all-hands.

Flags: needinfo?(gsvelto)
Depends on: 1847520

I really appreciate that. Thank you!

Something is using /api/downloads/missing/. I looked at searchfox and github search and didn't see any scripts. Then I checked the logs and it's a Chrome user agent hitting the Downloads Missing page. The querystring looks like fuzzing. Then I noticed that you don't need to be logged in to see the Downloads Missing tab--it shows up for everyone. I think this is fuzzing traffic.

As a side note, getting rid of this page and fuzzing traffic will be good because it's a slow page.

I verified the site status in the admin works and the navigation works on stage.

This was deployed just now in bug #1848645. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: