Searching for MOZ_BUILD_WEBRENDER finds MOZ_GECKO_PROFILER instead

RESOLVED FIXED

Status

enhancement
RESOLVED FIXED
10 months ago
9 months ago

People

(Reporter: kats, Assigned: kats)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(firefox64 fixed)

Details

Attachments

(1 attachment)

https://searchfox.org/mozilla-central/search?q=MOZ_BUILD_WEBRENDER

The first set of results are all for MOZ_GECKO_PROFILER but they should be for MOZ_BUILD_WEBRENDER.
Related: go to a use of MOZ_GECKO_PROFILER and do the "go to definition" thing. It takes you to the definition of MOZ_BUILD_WEBRENDER. So maybe the two are ending up with the same symbol id?
MOZ_BUNDLED_FONTS, which is defined on the next line after the webrender one, is broken in the same way:
https://searchfox.org/mozilla-central/search?q=MOZ_BUNDLED_FONTS&path=

Maybe there's something weird with how the results from the different versions of the generated file mozilla-config.h are getting merged?
The symbol name comes from https://github.com/mozsearch/mozsearch/blob/master/clang-plugin/MozsearchIndexer.cpp#L333 which is:
  return hash(Filename + std::string("@") + locationToString(Loc));
where locationToString https://github.com/mozsearch/mozsearch/blob/master/clang-plugin/MozsearchIndexer.cpp#L228 is:
  return stringFormat("%05d:%d", Line, Column - 1);
with the outermost caller I think being https://github.com/mozsearch/mozsearch/blob/master/clang-plugin/MozsearchIndexer.cpp#L1438 which is:
  std::string Mangled = std::string("M_") + mangleLocation(Loc, Ident->getName());

So, yeah, seems like maybe the generated magic symbol should maybe include the macro name rather than just being a (file, line, column) tuple?
I think the underlying conceptual problem is that locations in generated files need to include the OS the file was generated for, unless we know the different versions of the file are textually identical. We could include a hash of the file itself, maybe.
I guess there's 2 problems:
1) Symbol collisions.
2) Symbol stability.

The issue on this bug is symbol collisions in a single indexing pass due to multiple platforms being merged and auto-generated files having different contents.  Your proposal fixes that but makes identifier stability worse.  This is almost entirely mitigated by people rarely providing links to others that include "symbol:XXX" searches and the fact that such links don't look remotely stable because those are clearly hashes.

The issue with symbol stability is also more complex because the C++ behavior is largely what we want, as the symbol is tied to the specific definition point and all the places that "define" reaches.  So when nsRequestObserverProxy.cpp does a "#define LOG" its log is symbol M_8e426a3cf0c261fa and when nsStandardURL.cpp does "#define LOG" its log is symbol M_103c6dedac255319.

Arguably this is what everyone wants most of the time.  However it does fall down when the symbol is not defined.  When the macro does not have a location, we *do* fall back to using the macro name in mangleLocation(location, backup).  So if you do a search for "symbol:M_DEBUG" you'll get all the places `#ifdef DEBUG` was found in the codebase but DEBUG was not actually defined.  Multiple-definitions also potentially pose a problem; for example in cairo-xlib-surface.c DEBUG gets defined and had a symbol instead like "M_103c6dedac255319" in a build I did.

If we improve this, I expect it to come via way of improving searchfox's performance on HTML/JS tests where we could ideally have some rough concept of namespaces/partitioning.  Tests frequently form their own little islands of test logic via "head-*.js" files for Gecko-specific-tests or "../resource/" files for Web Platform Tests.  Searchfox could potentially handle this better rather than the single flattened JS namespace that _all_ the JS code in the tree is unified into.
(In reply to Andrew McCreight [:mccr8] from comment #4)
> I think the underlying conceptual problem is that locations in generated
> files need to include the OS the file was generated for, unless we know the
> different versions of the file are textually identical. We could include a
> hash of the file itself, maybe.

This sounds like a reasonable fix to me. As a shortcut to hashing the entire file, we can just detect if the file is in the __GENERATED__ folder or not, and assume that all generated files are different (by hashing in a platform-specific value).

(In reply to Andrew Sutherland [:asuth] from comment #5)
> If we improve this, I expect it to come via way of improving searchfox's
> performance on HTML/JS tests where we could ideally have some rough concept
> of namespaces/partitioning.  Tests frequently form their own little islands
> of test logic via "head-*.js" files for Gecko-specific-tests or
> "../resource/" files for Web Platform Tests.  Searchfox could potentially
> handle this better rather than the single flattened JS namespace that _all_
> the JS code in the tree is unified into.

I don't follow this, is this specific to symbol stability?
I wrote a patch to fix this, seems to work. It's deployed to dev currently.
Assignee: nobody → kats
Attachment #9016242 - Flags: review?(continuation) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/76efab1fc917
Add a platform-specific piece into the symbol hash from generated files. r=mccr8
> I don't follow this, is this specific to symbol stability?

Related.  What I mean is that C pre-processor #include's create a flat soupy namespace in the same way the traditional HTML/JS <script>/importScripts()/etc. evaluation models do.  The effective namespace of a C++ file depends on what it transitively #included and internally #defined.  The effect namespace of an HTML/JS file depends on what it imports/evaluates via various means and what it defines, but it's messier because an HTML file including a script tag is very much like #including a .cpp file into another .cpp file.  In particular, this happens frequently in tests, so the JS files are effectively parameterized by their use.  ES Modules provide a way out of this, but it will be quite a while before gecko's tree is free of hacky weird JS test logic.

So given that the C pre-processor situation is already pretty great because we get the analysis we want for free, I expect the only place someone might enhance our handling is for the HTML/JS case.  And once we have that, maybe we can fix the #define issue by building on whatever we do there.

A potential strawman for that would be similar to what we could do to address bug 1466692.  Namely, further partition the identifier results on a per-"identifiers" row basis so that the search UI can help facet/partition the results.  Right now we over-merge and have no way to undo the merge step.  For the JS cases we could tuple the symbols with the rough sub-tree[1] they're in and allow the UI to facet/partition on that, but by default everything would still appear all smooshed together thanks to "identifiers" unifying all the tupled symbols.  The context menu could offer both an option for the most immediate symbol (ex: "M_hash", "nsDisplayOutline::CreateWebRenderCommands") as well as the unifying/superclass id (ex: "M_DEBUG", "nsDisplayItem::CreateWebRenderCommands").

1: By rough sub-tree I mean that each subdir of "dom/" and "testing/web-platform/tests/" each are effectively their own islands when it comes to JS logic.  Fancier analysis could understand the various include graphs that happen and do cluster labeling, but just the directories would get us 80% of the way there.
https://hg.mozilla.org/mozilla-central/rev/76efab1fc917
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Ah, I see what you mean. That sounds like an interesting change, but we should spin that off into another bug.

I synced the change that landed here into the mozsearch repo as well: https://github.com/mozsearch/mozsearch/pull/158
You need to log in before you can comment on or make changes to this bug.