Closed Bug 1562879 Opened 6 years ago Closed 5 years ago

Rust indexing produces malformed crossrefs file, breaking many "go to definition" links

Categories

(Webtools :: Searchfox, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aosmond, Assigned: asuth)

Details

If I go to:

https://searchfox.org/mozilla-central/rev/8a990595ce6d5ed07ace2d4d5d86cc69aec90bde/gfx/wr/webrender_api/src/display_item.rs#100

And click on BorderDisplayItem and then Go to definition of ..., all I get is a blank page.

Similarly, if I go to:

https://searchfox.org/mozilla-central/rev/8a990595ce6d5ed07ace2d4d5d86cc69aec90bde/gfx/wr/webrender_api/src/display_item.rs#123

And click on SetFilterData and then Search for tuple ..., I get the search page, but nothing appears. If I change the search term to just be SetFilterData, it finds uses of it.

2019-07-02 14:54:54.505767/pid=1261 - request(handled by 1262) /mozilla-central/define?q=webrender_api%3A%3Adisplay_item%3A%3ABorderDisplayItem&redirect=false
2019-07-02 14:54:54.508221/pid=1262 - exception
Traceback (most recent call last):
  File "/home/ubuntu/mozsearch/router/router.py", line 396, in do_GET
    self.process_request()
  File "/home/ubuntu/mozsearch/router/router.py", line 450, in process_request
    definition = results['Definitions'][0]
KeyError: 'Definitions'

2019-07-02 14:54:54.519774/pid=1261 - error pid 1262 - 0.011604

Looks like there's no definitions in the search results (for the BorderDisplayItem)

In ~/index/mozilla-central/analysis/gfx/wr/webrender_api/src:

{"loc":"00450:11-28","target":1,"kind":"def","pretty":"webrender_api::display_item::BorderDisplayItem","sym":"webrender_api::display_item::BorderDisplayItem"}

Seems to have a definition for the symbol here. Not sure why it's not getting found.

The crossref file also has what appears like a valid entry:

webrender_api::display_item::BorderDisplayItem
{"Definitions":[{"lines":[{"bounds":[11,28],"context":"","contextsym":"","line":"pub struct BorderDisplayItem {","lno":450}],"path":"gfx/wr/webrender_api/src/display_item.rs"}],"Uses":[{"lines":[{"bounds":[36,53],"context":"","contextsym":"","line":"use api::{AlphaType, BorderDetails, BorderDisplayItem, BuiltDisplayListIter};","lno":5},{"bounds":[14,31],"context":"","contextsym":"","line":"border_item: &BorderDisplayItem,","lno":2632}],"path":"gfx/wr/webrender/src/display_list_flattener.rs"},{"lines":[{"bounds":[7,24],"context":"","contextsym":"","line":"Border(BorderDisplayItem),","lno":100},{"bounds":[7,24],"context":"","contextsym":"","line":"Border(BorderDisplayItem),","lno":142},{"bounds":[11,28],"context":"","contextsym":"","line":"pub struct BorderDisplayItem {","lno":450}],"path":"gfx/wr/webrender_api/src/display_item.rs"},{"lines":[{"bounds":[39,56],"context":"","contextsym":"","line":"let item = di::DisplayItem::Border(di::BorderDisplayItem {","lno":1299}],"path":"gfx/wr/webrender_api/src/display_list.rs"}]}

So maybe the crossref file didn't get loaded properly?

The crossrefs file is in a line format, with one line being the "key" and the next line being the "value", alternating all the way down. It seems like at some point there's some weird junk and so keys start getting interpreted as values and vice-versa:

{"Definitions":[{"lines":[{"bounds":[0,15],"context":"audioipc_server::ServerWrapper","contextsym":"audioipc_server::ServerWrapper","line":"callback_thread: core::CoreThread,","lno":54}],"path":"media/audioipc/server/src/lib.rs"}],"Uses":[{"lines":[{"bounds":[0,15],"context":"","contextsym":"","line":"callback_thread: callback_thread,","lno":91},{"bounds":[24,39],"context":"","contextsym":"","line":"let cb_remote = wrapper.callback_thread.remote();","lno":108}],"path":"media/audioipc/server/src/lib.rs"}]}
audioipc_server::ServerWrapper::core_thread
{"Definitions":[{"lines":[{"bounds":[0,11],"context":"audioipc_server::ServerWrapper","contextsym":"audioipc_server::ServerWrapper","line":"core_thread: core::CoreThread,","lno":53}],"path":"media/audioipc/server/src/lib.rs"}],"Uses":[{"lines":[{"bounds":[0,11],"context":"","contextsym":"","line":"core_thread: core_thread,","lno":90},{"bounds":[8,19],"context":"","contextsym":"","line":"wrapper.core_thread.remote().spawn(|handle| {","lno":117}],"path":"media/audioipc/server/src/lib.rs"}]}
audioipc_server::]
_________
{"Definitions":[{"lines":[{"bounds":[21,32],"context":"","contextsym":"","line":"impl rpc::Server for CubebServer {","lno":146}],"path":"media/audioipc/server/src/server.rs"}]}

Above snippet shows where it happens. There's a audioipc_server::] key followed by a line with just underscores, which gets interpreted as the value. And then the Definitions on the next line get picked up as the next key.

Theory at the moment is that we got a corrupted analysis file (possibly from bug 1539606 or something similar) which was just well-formed enough to get interpreted as JSON with stray newlines, and resulted in this. If so this will probably fix itself on the next indexing run. I'm going to wait and see what happens.

Doesn't seem to have fixed itself.

In ~/index/mozilla-central/analysis/media/audioipc/server/src/server.rs we have:

{"loc":"00146:21-32","source":1,"syntax":"","pretty":"impl for audioipc_server::]\n         ","sym":"audioipc_server::]\n_________","nestingRange":"146:21-152:14"}
{"loc":"00146:21-32","source":1,"syntax":"","pretty":"struct audioipc_server::server::CubebServer","sym":"audioipc_server::server::CubebServer"}
{"loc":"00146:21-32","target":1,"kind":"def","pretty":"audioipc_server::]\n_________","sym":"audioipc_server::]\n_________"}

So that's where the stray newlines are coming from in the symbol names.

An alternate fix that I hacked in to yesterday's web-server and connected to dev.searchfox.org to verify is to modify crossref.py so that if it encounters a "value" line that doesn't start with { it just drops and continues the loop to the next line.

diff --git a/router/crossrefs.py b/router/crossrefs.py
index a49823c..07d9371 100644
--- a/router/crossrefs.py
+++ b/router/crossrefs.py
@@ -30,6 +30,11 @@ def load(config):
                 pos += len(line)
                 key = line.strip()
             else:
+                if not line.startswith('{'):
+                    log('Found bogus line [%s] at pos [%d]', line, pos)
+                    pos += len(line)
+                    continue
+
                 value = line.strip()
                 s = "{},{}".format(pos, pos + len(value))
                 crossrefs[key] = s

It seems very likely that the underlying problem is in my rust-indexer.rs hack that tried to extract the impl names.

At https://github.com/mozsearch/mozsearch/blob/315d4dcc0cd78d14b4f790eb637d7c5177ed4eed/tools/src/bin/rust-indexer.rs#L599 we try and read the contents of the underlying span using direct byte offsets and lengths. Then we use construct_qualname to prefix whatever we extract with the scope and "::".

So we have:

  • scope: "audioipc_server"
  • glue: "::"
  • fseek/fread gone wrong: "]\n "

We should potentially comment out/remove the whole block that I added at https://github.com/mozsearch/mozsearch/blob/315d4dcc0cd78d14b4f790eb637d7c5177ed4eed/tools/src/bin/rust-indexer.rs#L595-L621 in favor of :kats' proposed solution of more directly integrating with rustc and its AST.

Oh, huh, or I can just fix it so that we properly check that span.file matches the source file's buffer we loaded.

The fix is deployed, but I'm reassigning this to :asuth for any followup work (maybe adding a test or writing a better fix). If there's no suitable followup work feel free to close this.

Assignee: kats → bugmail
Summary: Rust indexing broken → Rust indexing produces malformed crossrefs file, breaking many "go to definition" links

I attached to today's indexer and this is what the save-analysis data shows for the corrupted symbol:

        {
            "span": {
                "byte_start": 8663,
                "column_end": 33,
                "column_start": 22,
                "byte_end": 8674,
                "line_end": 146,
                "file_name": "media/audioipc/server/src/server.rs",
                "line_start": 146
            },
            "parent": null,
            "kind": "Direct",
            "children": [
                {
                    "index": 120,
                    "krate": 0
                },
                {
                    "index": 121,
                    "krate": 0
                },
                {
                    "index": 122,
                    "krate": 0
                },
                {
                    "index": 123,
                    "krate": 0
                },
                {
                    "index": 124,
                    "krate": 0
                }
            ],
            "value": "",
            "attributes": [],
            "sig": null,
            "id": 3,
            "docs": ""
        },

This notably is the correct file, so it's not a file mismatch. Which makes sense because the whole point of flat_map_per_file! in the analyze_crate method is to bin things by the file they're in. So a file mismatch isn't possible

But if we break out dd to check the offsets, we find that the offsets have betrayed us somehow:

ubuntu@ip-172-31-21-160:~/index/mozilla-central/gecko-dev$ dd if=media/audioipc/server/src/server.rs bs=1 skip=8663 count=11 | hexdump -Cc
11+0 records in
11+0 records out
11 bytes copied, 3.5098e-05 s, 313 kB/s
00000000  5d 0a 20 20 20 20 20 20  20 20 20                 |].         |
0000000   ]  \n                                                        
000000b

Confusingly, this doesn't seem to be an encoding issue leading to inconsistent interpretations of offsets, as the file in question has only a single multi-byte UTF-8 encoding, and that's the 2-byte copywrite symbol on the first line of the file. Unless there's some kind of weird newline normalization issue going on, I have to admit it's not obvious what this could be other than a rustc bug. (The file has standard *nix newlines.)

I'm going to mark this fixed because we haven't seen any further regressions and I think we want to abandon using the save-analysis data given:

  • Prior internal searchfox team consensus that we'd rather be emitting analysis data from rustc
  • https://github.com/rust-lang/rfcs/pull/2912 intending to move to rust-analyzer for the LSP which means the save-analysis mechanism will be mooted.
  • As part of that move the "library-ification" effort should help make it easier for searchfox to emit the data directly from rustc or a rustc-alike.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.