Bug 1697671 Comment 0 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Andrew Sutherland [:asuth] (he/him)

on 2021-03-10 15:24:19 PST

Archiving an enhancement proposal by :nika in #searchfox on matrix today [first message link](https://matrix.to/#/!vDKxYNxlsZYvjSWGBh:mozilla.org/$a1JqA0LPVlzpT8tlBcE0QcjLrOpM7c7CXLlG6AMAGI0?via=mozilla.org&via=matrix.org&via=t2bot.io), we should "make the resource:// and chrome:// URLs in our js/c++ code link to the relevant source file in searchfox".

The domain synopsis is:
- The information is already made available via the chrome-map.json artifact [built by code coverage](https://searchfox.org/mozilla-central/rev/2b99ea2e97eef00a8a1c7e24e5fe51ab5304bc42/python/mozbuild/mozbuild/codecoverage/chrome_map.py#145-154).
  - The relevant routes are:
    - gecko.v2.mozilla-central.latest.firefox.linux64-ccov-opt
    - gecko.v2.mozilla-central.revision.REVISION.firefox.linux64-ccov-opt
  - You can find the jobs via "-ccov" filtering [like so for linux64](https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=linux64-ccov)
  - Nika's analysis of the data is: "It looks like [2] has a series of paths which were copied into dist, and everything under dist/bin/ is in GreD, and everything under dist/bin/browser/ is under XCurProcD (IIRC)"

In terms of implementing this in searchfox, right now the situation is that all of our linkification logic happens very late in the pipeline [the output formatting stage for text/comment/string tokens](https://github.com/mozsearch/mozsearch/blob/fac46525413cc9c6158f400800363293245b8bea/tools/src/format.rs#L291) via [link logic in links.rs](https://github.com/mozsearch/mozsearch/blob/master/tools/src/links.rs) that handles general link detection via [the linkify crate](https://crates.io/crates/linkify) plus hard-coded Bugzilla/servo/WPT PR linkification magic.

That's actually a quite easy place to put the logic, the problem is that it completely bypasses the whole analysis infrastructure that feeds into cross-referencing and such.  What we really want is something consistent with :kats' enhancements in bug 1418001 that produces analysis records like `{"loc":"00007:9-26","target":1,"kind":"use","pretty":"dom/workers/WorkerPrivate.h","sym":"FILE_dom/workers/WorkerPrivate@2Eh"}` that explicitly match up with the string token and have an explicit symbol identifier that gets created and cross-referenced.

In a fully normalized pipeline, the way that might work is:
- We teach [fetch-tc-artifacts](https://github.com/mozsearch/mozsearch-mozilla/blob/master/shared/fetch-tc-artifacts.sh) to download `chrome-map.json`.
- We teach [deriver-per-file-info.rs](https://github.com/mozsearch/mozsearch/blob/master/tools/src/bin/derive-per-file-info.rs) to ingest `chrome-map.json`.
- Indexers like the C++ indexer (written in C++) and the [JS indexer/analyzer itself written in JS](https://github.com/mozsearch/mozsearch/blob/master/scripts/js-analyze.js) (and only the JS indexer to start) emit analysis records for strings.
  - This might also possibly include comments, or tokens in comments that appear to match some interesting-ness heuristic, like being enclosed in backticks or being camel-cased or having "::" inside them, etc.  In the C++ case, clang actually fully understands doxygen syntax so there's a ton of low-hanging fruit if one were so interested.
- A meta-indexer written in rust (so we can reuse the linkify logic and generally only write the logic once) processes these strings and converts them into richer symbol references, re-writing the analysis files in the process.
  - For an initial implementation, this could be run against analysis files as we read them as part of the cross-referencing process (but re-writing them if any links are encountered).  The restriction would be that the only data that could be looked up would be from `concise-per-file-info.json`, which would work for this scenario.  It would not work for things like `ClassName` which would depend on the normal cross-referencing process having already been completed.
    - One could obviously do something involving futures/to-do lists, but that still doesn't seem like something to do in the MVP.
  - This would involve extracting some of the concise-file-info support logic in [output-file.rs](https://github.com/mozsearch/mozsearch/blob/master/tools/src/bin/output-file.rs) out so it can be reused.

An evolutionary step towards this could be to just do the first two steps that get the info into the concise-file-info and just propagate that to the existing formatting output logic in the existing late-linkifying logic.

Revision 1 by

Andrew Sutherland [:asuth] (he/him)

on 2021-03-10 15:26:10 PST

Archiving an enhancement proposal by :nika in #searchfox on matrix today [first message link](https://matrix.to/#/!vDKxYNxlsZYvjSWGBh:mozilla.org/$a1JqA0LPVlzpT8tlBcE0QcjLrOpM7c7CXLlG6AMAGI0?via=mozilla.org&via=matrix.org&via=t2bot.io), we should "make the resource:// and chrome:// URLs in our js/c++ code link to the relevant source file in searchfox".

The domain synopsis is:
- The information is already made available via the chrome-map.json artifact [built by code coverage](https://searchfox.org/mozilla-central/rev/2b99ea2e97eef00a8a1c7e24e5fe51ab5304bc42/python/mozbuild/mozbuild/codecoverage/chrome_map.py#145-154).
  - The relevant routes are:
    - gecko.v2.mozilla-central.latest.firefox.linux64-ccov-opt
    - gecko.v2.mozilla-central.revision.REVISION.firefox.linux64-ccov-opt
  - You can find the jobs via "-ccov" filtering [like so for linux64](https://treeherder.mozilla.org/jobs?repo=mozilla-central&searchStr=linux64-ccov)
  - Nika's analysis of the data is: "It looks like [2] has a series of paths which were copied into dist, and everything under dist/bin/ is in GreD, and everything under dist/bin/browser/ is under XCurProcD (IIRC)"

In terms of implementing this in searchfox, right now the situation is that all of our linkification logic happens very late in the pipeline [the output formatting stage for text/comment/string tokens](https://github.com/mozsearch/mozsearch/blob/fac46525413cc9c6158f400800363293245b8bea/tools/src/format.rs#L291) via [link logic in links.rs](https://github.com/mozsearch/mozsearch/blob/master/tools/src/links.rs) that handles general link detection via [the linkify crate](https://crates.io/crates/linkify) plus hard-coded Bugzilla/servo/WPT PR linkification magic.

That's actually a quite easy place to put the logic, the problem is that it completely bypasses the whole analysis infrastructure that feeds into cross-referencing and such.  What we really want is something consistent with :kats' enhancements in bug 1418001 that produces analysis records like `{"loc":"00007:9-26","target":1,"kind":"use","pretty":"dom/workers/WorkerPrivate.h","sym":"FILE_dom/workers/WorkerPrivate@2Eh"}` that explicitly match up with the string token and have an explicit symbol identifier that gets created and cross-referenced.

In a fully normalized pipeline, the way that might work is:
- We teach [fetch-tc-artifacts](https://github.com/mozsearch/mozsearch-mozilla/blob/master/shared/fetch-tc-artifacts.sh) to download `chrome-map.json`.
- We teach [deriver-per-file-info.rs](https://github.com/mozsearch/mozsearch/blob/master/tools/src/bin/derive-per-file-info.rs) to ingest `chrome-map.json` and populate `concise-per-file-info.json` with the info necessary to map the URLs to the source files.
- Indexers like the C++ indexer (written in C++) and the [JS indexer/analyzer itself written in JS](https://github.com/mozsearch/mozsearch/blob/master/scripts/js-analyze.js) (and only the JS indexer to start) emit analysis records for strings.
  - This might also possibly include comments, or tokens in comments that appear to match some interesting-ness heuristic, like being enclosed in backticks or being camel-cased or having "::" inside them, etc.  In the C++ case, clang actually fully understands doxygen syntax so there's a ton of low-hanging fruit if one were so interested.
- A meta-indexer written in rust (so we can reuse the linkify logic and generally only write the logic once) processes these strings and converts them into richer symbol references, re-writing the analysis files in the process.
  - For an initial implementation, this could be run against analysis files as we read them as part of the cross-referencing process (but re-writing them if any links are encountered).  The restriction would be that the only data that could be looked up would be from `concise-per-file-info.json`, which would work for this scenario.  It would not work for things like `ClassName` which would depend on the normal cross-referencing process having already been completed.
    - One could obviously do something involving futures/to-do lists, but that still doesn't seem like something to do in the MVP.
  - This would involve extracting some of the concise-file-info support logic in [output-file.rs](https://github.com/mozsearch/mozsearch/blob/master/tools/src/bin/output-file.rs) out so it can be reused.

An evolutionary step towards this could be to just do the first two steps that get the info into the concise-file-info and just propagate that to the existing formatting output logic in the existing late-linkifying logic.

Back to Bug 1697671 Comment 0