Closed Bug 1820241 Opened 2 years ago Closed 2 years ago

FOG Artifact Build Support appears to not be working

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox113 --- fixed

People

(Reporter: chutten, Assigned: chutten)

References

Details

Attachments

(2 files)

Simple things like adding a new event and then trying to use that new event in a module and in a test aren't working.

Investigate how far this goes (is it just xpcshell-test? Is it everything?) and get it fixed.

Happens on ./mach run too, so it's not test-only.

Refresher:

  • The system responsible for this is called JOG
  • It runs on the main thread on the first access via JS to any named property under the Glean or GleanPings global
  • It will only perform work if
    • the build was built with MOZILLA_OFFICIAL not set
    • the build IsDevelopmentBuild
    • the file of metrics definitions for artifact build consumption is at GreD/jogfile.json
      • and it is validly formatted
      • and it contains the metric or ping you expect

I have checked:

  • The build isn't official (AppConstants.MOZILLA_OFFICIAL is false)
  • The build should be IsDevelopmentBuild (There's no omni.ja that I can find)
  • The jogfile is in objdir/dist/bin (GreD (the appropriate position))
    • It appears to be validly formatted (We have error-level logging for failure to parse the jogfile, and none of them log)
    • It indeed contains the metric I expect.

The next thing I did was fake an artifact build by copying jogfile.json to objdir/dist/bin/. Alas, that actually works just fine (anything I change in jogfile.json appears just fine when I check about:glean).

Now to try out debug artifact builds so I can step through and see what's what.

Despite having debug artifacts and symbols (thank you docs), without file and line, I can't step. I set a breakpoint on JOG::EnsureRuntimeMetricsRegistered and jog_load_jogfile and confirm that the former is definitely getting hit frequently (as expected, since it's on the NamedGetter for both the Glean and GleanPings globals).

Alas, jog_load_jogfile is never hit.

With the debug artifact build I can at least confirm none of the NS_WARN_IF cases are being hit inside EnsureRuntimeMetricsRegistered. This either means

  • despite AppConstants.MOZILLA_OFFICIAL being false when I ./mach run and check the runtime value, MOZILLA_OFFICIAL must've been defined during build meaning this ifdef gets hit and we early return.
  • despite there being no omni.ja to be found in the objdir, IsDevelopmentBuild is returning false.

I'm not sure how to narrow it down further, but it's the end of the week so I guess that's a question for Monday.

FOG's various modules (including jog) haven't taken full advantage of MOZ_LOG.
Time to use a weird "artifact builds aren't working in FOG" problem as an
excuse to get it done.

Also, while I'm here, was getting weird debug-only crashes because of emplace
on a None. Not at all the behaviour I expected from mozilla::Maybe, but the
fixed syntax is even more expressive, so I'm all for it.

All my efforts to debug and step through artifact builds have failed. My Linux machine has been having problems with finding sources and symbols for some time, but even following the Debugging on Windows docs gets me either a windbg or a Visual Studio who claims that firefox is running without a xul.dll, meaning that there's absolutely no hope that it'll be able to find my code to break on it.

So I'm taking a different tack. This patch adds logging that should, when enabled on an artifact build, tell me exactly where things are going awry. And that'll be the next step on the path to fixing this.

Keywords: leave-open
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ed476d540170 Add proper mozlogging to JOG module r=TravisLong

2023-03-08 16:56:54.538264 UTC - [Parent 522785: Main Thread]: V/jog MOZILLA_OFFICIAL build. No JOG for you.

From this line, this means these artifacts were build with MOZILLA_OFFICIAL.

ni?nalexander - I thought we could use MOZILLA_OFFICIAL as a compile-time check to ensure our (expensive main thread I/O using) artifact build support only ran definitely on artifact builds. Is it no longer the case that these artifacts are unofficially-built? (was it always the case and I somehow failed to notice?)

Flags: needinfo?(nalexander)

(In reply to Chris H-C :chutten from comment #8)

2023-03-08 16:56:54.538264 UTC - [Parent 522785: Main Thread]: V/jog MOZILLA_OFFICIAL build. No JOG for you.

From this line, this means these artifacts were build with MOZILLA_OFFICIAL.

ni?nalexander - I thought we could use MOZILLA_OFFICIAL as a compile-time check to ensure our (expensive main thread I/O using) artifact build support only ran definitely on artifact builds. Is it no longer the case that these artifacts are unofficially-built? (was it always the case and I somehow failed to notice?)

Oh dear: I think this has always been broken. This specific conditional is in a compiled code artifact, i.e., in libxul.so. Artifact builds fetch compiled code artifacts from upstream official builds. So this conditional has always been in place; we just missed it the first time around.

I'm not aware of an existing way to "reach up" from the compiled code and ask the intepreted code if it's an artifact build, i.e., to query AppConstants.sys.mjs, which will have the correct value. Any of the mechanisms to do so (JS XPCOM components and services come to mind) would suffice. It would also work to have some early JS expose an environment variable or set a native-code flag with this value.

:glandium: am I missing some existing mechanism or pattern?

Flags: needinfo?(nalexander) → needinfo?(mh+mozilla)

If we're moving from compile-time-knowable to only run-time-knowable, I'm already asking IsDevelopmentBuild() (which checks for an omnijar). That looks cheap enough we could move it up to MOZILLA_OFFICIAL's place, though honestly the Maybe<bool> sFoundAndLoadedJogfile; is probably pretty quick, too.

So what I'm really interested in is if there's existing compile-time-knowable mechanisms or patterns.

...and if IsDevelopmentBuild() is fit for purpose.

...and if this might cause any problems with artifact builds on try.

(In reply to Chris H-C :chutten from comment #10)

If we're moving from compile-time-knowable to only run-time-knowable, I'm already asking IsDevelopmentBuild() (which checks for an omnijar). That looks cheap enough we could move it up to MOZILLA_OFFICIAL's place, though honestly the Maybe<bool> sFoundAndLoadedJogfile; is probably pretty quick, too.

It's possible to have packaged artifact builds, so this isn't quite what you want. More below.

So what I'm really interested in is if there's existing compile-time-knowable mechanisms or patterns.

No, by definition there is not: the same compiled code artifacts are used in two different contexts -- a full build and an artifact build. Same libxul.so under the hood.

...and if IsDevelopmentBuild() is fit for purpose.

I think not.

...and if this might cause any problems with artifact builds on try.

Yes, it will: Ba artifact builds in automation are packaged like any other builds.

Packaged (== has an omnijar) is not the indicator you are looking for. We've resisted this type of indicator at various times because it's tricky to get right and other proxy indicators have sufficed. Let's wait for :glandium to opine.

Oh boy, IsDevelopmentBuild is such a misleading name...
Can someone give more context as to what the goal is here, because comment 2 doesn't help me much, unfortunately.

Flags: needinfo?(mh+mozilla)

Certainly. FOG and Glean, like Firefox Desktop Telemetry before them, depend on codegen to work: we use it to generate APIs and structures and constants and stuff to support instrumentation.

However, not all our developers compile as part of their desired workflow, preferring artifact builds. So FOG, wanting to still support these developers' instrumentation needs like Firefox Desktop Telemetry before it, needs to do some (expensive) work to support new or changed JS instrumentation that is registered not at compile-time, but at run-time. The subsystem that does this work for FOG is called Runtime Metric Definition Subsystem JOG.

We want to only do this expensive work in cases where it's necessary. So what we're in the market for is some means to know, preferably at compile time and in C++, that a given build is or isn't A Developer's Build in Artifact Mode (or CI's artifact builds, which have similar needs). We never want to run the expensive work on a released build.

Is that sufficient context?

Flags: needinfo?(mh+mozilla)

I'm afraid there is no solution in sight for this. The artifacts that are downloaded by developer builds in artifact mode are shippable builds. Whatever you'd add on the C++ side to do what you want would be shipped as well.

Flags: needinfo?(mh+mozilla)

If you expose something like MOZ_ARTIFACT_BUILDS to js, then the js side could trigger whatever it needs if the necessary (opt-in) entry point is made available in all builds.

Alas, there is no JS side to FOG (at least not yet). Due to (IIRC) needing to have a wildcard named getter the Glean and GleanPings globals are implemented in C++. I'm not sure if pulling JS-known information like AppConstants can be done and is advisable to do from C++?

We do have some XPCOM in nsIFOG... I could try putting the MOZ_ARTIFACT_BUILDS knowledge in a JS-implemented method on the FOG service?

Speaking of MOZ_ARTIFACT_BUILDS, I couldn't find docs on it but I gather it's the build-time define we should be using instead of COMPILE_ENVIRONMENT to identify we're in the "build" step of an artifact build? If so, I think I should file a follow-up to move away from COMPILE_ENVIRONMENT.

Flags: needinfo?(mh+mozilla)

MOZ_ARTIFACT_BUILDS is a superset of COMPILE_ENVIRONMENT, and they are not equivalent. But for identifying artifact builds, yes, you should use the former rather than the latter.

Flags: needinfo?(mh+mozilla)
See Also: → 1823696

FOG needs to trigger JOG (artifact build support for runtime-defined metrics.
Allows changes and additions to metrics.yamls to be usable from JS without a
compile step) when running an Artifact Build.

So we take MOZ_ARTIFACT_BUILDS and pop it into a pref for FOG's use in C++.

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/871e18b79bc1 Use MOZ_ARTIFACT_BUILDS to determine if FOG needs JOG r=glandium,janerik
Regressions: 1825141

(Whoops, sorry about leaving this left open. In penance I just downloaded the latest tree and confirmed it's all working.)

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
See Also: → 1833104
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: