Closed Bug 1761268 Opened 9 months ago Closed 9 months ago

Perma toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js | xpcshell return code: 0 when Gecko 99 merges to release on 2022-03-28

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
100 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox98 --- unaffected
firefox99 + fixed
firefox100 --- fixed

People

(Reporter: noriszfay, Assigned: chutten)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

Beta-as-release simulation
How to run these simulations

Failure log:

[task 2022-03-24T13:29:28.316Z] 13:29:28     INFO -  TEST-START | toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js
[task 2022-03-24T13:29:28.395Z] 13:29:28     INFO -  adb launch_application: am startservice -W -n 'org.mozilla.geckoview.test_runner/org.mozilla.geckoview.test_runner.XpcshellTestRunnerService$i0' -a android.intent.action.MAIN --es env0 XPCOM_DEBUG_BREAK=stack-and-abort --es env1 MOZ_CRASHREPORTER=1 --es env2 MOZ_CRASHREPORTER_NO_REPORT=1 --es env3 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env4 MOZ_DEVELOPER_REPO_DIR=/builds/worker/checkouts/gecko --es env5 MOZ_DISABLE_CONTENT_SANDBOX=1 --es env6 MOZ_FETCHES_DIR=/builds/worker/fetches --es env7 MOZ_DISABLE_SOCKET_PROCESS=1 --es env8 LD_LIBRARY_PATH=/data/local/tmp/test_root/xpcb --es env9 MOZ_LINKER_CACHE=/data/local/tmp/test_root/xpcb --es env10 GRE_HOME=/data/local/tmp/test_root/xpcb --es env11 XPCSHELL_TEST_PROFILE_DIR=/data/local/tmp/test_root/xpc/p/5aeccd2f-467a-436e-8586-b1ceac933ccf --es env12 HOME=/data/local/tmp/test_root/xpc/p --es env13 XPCSHELL_TEST_TEMP_DIR=/data/local/tmp/test_root/xpc/tmp/0b1e08b3-3234-4891-92be-a90d2a1e7e84 --es env14 MOZ_ANDROID_DATA_DIR=/data/local/tmp/test_root/xpcb --es env15 MOZ_IN_AUTOMATION=1 --es env16 MOZ_ANDROID_CPU_ABI=x86_64 --es env17 MOZHTTP2_PORT=33841 --es env18 MOZNODE_EXEC_PORT=42703 --es env19 TMPDIR=/data/local/tmp/test_root/xpc/p/5aeccd2f-467a-436e-8586-b1ceac933ccf --es env20 XPCSHELL_MINIDUMP_DIR=/data/local/tmp/test_root/xpc/minidumps/5aeccd2f-467a-436e-8586-b1ceac933ccf --es arg0 -g --es arg1 /data/local/tmp/test_root/xpcb --es arg2 --greomni --es arg3 /data/local/tmp/test_root/xpcb/geckoview-test_runner.apk --es arg4 -m --es arg5 -e --es arg6 'const _HEAD_JS_PATH = "/data/local/tmp/test_root/xpc/head.js";' --es arg7 -e --es arg8 'const _MOZINFO_JS_PATH = "/data/local/tmp/test_root/xpc/p/5aeccd2f-467a-436e-8586-b1ceac933ccf/mozinfo.json";' --es arg9 -e --es arg10 'const _PREFS_FILE = "/data/local/tmp/test_root/xpc/user.js";' --es arg11 -e --es arg12 'const _TESTING_MODULES_DIR = "/data/local/tmp/test_root/xpc/m";' --es arg13 -f --es arg14 /data/local/tmp/test_root/xpc/head.js --es arg15 -e --es arg16 'const _HEAD_FILES = ["/data/local/tmp/test_root/xpc/toolkit/components/glean/tests/xpcshell/head.js"];' --es arg17 -e --es arg18 'const _JSDEBUGGER_PORT = 0;' --es arg19 -e --es arg20 'const _TEST_CWD = "/data/local/tmp/test_root/xpc/toolkit/components/glean/tests/xpcshell";' --es arg21 -e --es arg22 'const _TEST_FILE = ["test_GIFFTIPC.js"];' --es arg23 -e --es arg24 'const _TEST_NAME = "toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js";' --es arg25 -e --es arg26 '_execute_test(); quit(0);' --ez use_multiprocess True --es out_file /data/local/tmp/test_root/xpc/logs/xpcshell-b8d875c7-3f00-41e8-a73d-2ed4952877c4.log
[task 2022-03-24T13:29:28.659Z] 13:29:28     INFO -  remotexpcshelltests.py | toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js | 27044 | Launched Test App
[task 2022-03-24T13:29:36.615Z] 13:29:36     INFO -  remotexpcshelltests.py | toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js | 27044 | Application ran for: 0:00:08.298078
[task 2022-03-24T13:29:36.705Z] 13:29:36  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js | xpcshell return code: 0
[task 2022-03-24T13:29:36.705Z] 13:29:36     INFO -  TEST-INFO took 8388ms
[task 2022-03-24T13:29:36.705Z] 13:29:36     INFO -  >>>>>>>
[task 2022-03-24T13:29:36.706Z] 13:29:36     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
[task 2022-03-24T13:29:36.706Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_next_test 0 pending (2)
[task 2022-03-24T13:29:36.706Z] 13:29:36     INFO -  (xpcshell/head.js) | test MAIN run_test finished (2)
[task 2022-03-24T13:29:36.706Z] 13:29:36     INFO -  running event loop
[task 2022-03-24T13:29:36.706Z] 13:29:36     INFO -  toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js | Starting test_setup
[task 2022-03-24T13:29:36.706Z] 13:29:36     INFO -  (xpcshell/head.js) | test test_setup pending (2)
[task 2022-03-24T13:29:36.706Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_next_test 0 finished (2)
[task 2022-03-24T13:29:36.707Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_next_test 1 pending (2)
[task 2022-03-24T13:29:36.707Z] 13:29:36     INFO -  (xpcshell/head.js) | test test_setup finished (2)
[task 2022-03-24T13:29:36.707Z] 13:29:36     INFO -  toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js | Starting run_child_stuff
[task 2022-03-24T13:29:36.707Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_child_stuff pending (2)
[task 2022-03-24T13:29:36.707Z] 13:29:36     INFO -  TEST-SKIP | toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js | run_child_stuff - run_child_stuff skipped because the following conditions were met: (runningInParent)
[task 2022-03-24T13:29:36.707Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_next_test pending (3)
[task 2022-03-24T13:29:36.708Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_next_test 1 finished (3)
[task 2022-03-24T13:29:36.708Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_next_test 2 pending (3)
[task 2022-03-24T13:29:36.708Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_child_stuff finished (3)
[task 2022-03-24T13:29:36.708Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_next_test finished (2)
[task 2022-03-24T13:29:36.708Z] 13:29:36     INFO -  toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js | Starting test_child_metrics
[task 2022-03-24T13:29:36.708Z] 13:29:36     INFO -  (xpcshell/head.js) | test test_child_metrics pending (2)
[task 2022-03-24T13:29:36.709Z] 13:29:36     INFO -  (xpcshell/head.js) | test run in child pending (3)
[task 2022-03-24T13:29:36.709Z] 13:29:36     INFO -  (xpcshell/head.js) | test run_next_test 2 finished (3)
[task 2022-03-24T13:29:36.709Z] 13:29:36     INFO -  "CONSOLE_MESSAGE: (warn) [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "/data/local/tmp/test_root/xpc/head.js" line: 372}]"
[task 2022-03-24T13:29:36.709Z] 13:29:36     INFO -  (xpcshell/head.js) | test finished (2)
[task 2022-03-24T13:29:36.709Z] 13:29:36     INFO -  Unexpected exception undefined - timed out after 50 tries.
[task 2022-03-24T13:29:36.709Z] 13:29:36     INFO -  undefined
[task 2022-03-24T13:29:36.709Z] 13:29:36     INFO -  exiting test
[task 2022-03-24T13:29:36.709Z] 13:29:36     INFO -  <<<<<<<
[task 2022-03-24T13:29:36.819Z] 13:29:36     INFO -  Cleaning up profile for /builds/worker/workspace/build/tests/xpcshell/tests/toolkit/components/glean/tests/xpcshell/test_GIFFTIPC.js folder: /data/local/tmp/test_root/xpc/p/5aeccd2f-467a-436e-8586-b1ceac933ccf
[task 2022-03-24T13:29:37.129Z] 13:29:37     INFO -  INFO | Result summary:
[task 2022-03-24T13:29:37.130Z] 13:29:37     INFO -  INFO | Passed: 833
[task 2022-03-24T13:29:37.130Z] 13:29:37  WARNING -  INFO | Failed: 1
[task 2022-03-24T13:29:37.130Z] 13:29:37  WARNING -  One or more unittests failed.
[task 2022-03-24T13:29:37.130Z] 13:29:37     INFO -  INFO | Todo: 6
[task 2022-03-24T13:29:37.130Z] 13:29:37     INFO -  INFO | Retried: 1
[task 2022-03-24T13:29:37.130Z] 13:29:37     INFO -  SUITE-END | took 588s
[task 2022-03-24T13:29:37.131Z] 13:29:37     INFO -  Node moz-http2 server shutting down ...
[task 2022-03-24T13:29:37.180Z] 13:29:37    ERROR - Return code: 1
Flags: needinfo?(bas)

Unexpected exception undefined - timed out after 50 tries. Ah, here we go. That'll be from this line. I'll be sure to add a message to that line.

This means we're timing out waiting for the data recorded by run_child_stuff to appear on the parent process.

Which is weird. We don't appear to be running run_child_stuff at all. We should be running it because of await run_test_in_child("test_GIFFTIPC.js");, but the log says test run in child pending (3).. there's no CHILD-TEST-STARTED.

Maybe there's a problem with dumping child-process logs? CHILD-TEST-STARTED must be logged before the await should return. Okay, let's assume the log has nothing from the child process in it. I think test finished (2) may correspond to when the child test completes?

Ugh, and the timestamps are useless, too, as they're all from the time the log was dumped, not the time the line was logged, so I can't confirm that the waitForCondition is waiting a sensible length of time.


Or, in an organized way, this log appears to

  1. Have less-than-helpful timestamps (they correspond to when the log was dumped, not when the line was logged)
  2. Have no content-process log lines
  3. Show that the test is running correctly, but that the ContentTaskUtils.waitForCondition times out waiting for Firefox Telemetry data to make it across IPC.

The default waitForCondition time is 5s (50 iterations of 100ms). The Firefox Telemetry IPC timeout is 2s. There should be no reason for this to be timing out due to the timing. But then there should also not be a difference between release and beta (all the Firefox Telemetry probes are release_channel_collection: 'opt-out').

If an error happens within run_child_stuff it would fail to log here and it would fail to run the instrumentation to the end which would cause us to be waiting for something that wouldn't show up. That could be a possible way this would fail, though run_child_stuff has nothing that should be problematic specifically in release.

What to do, what to do... First things first I'm going to add some logging and run these simulations myself. See if I can reproduce it on try and get more information out of these runs. I'd love if someone could step in and show me a way to turn these timestamps useful and/or also include content-process test logging.

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: -- → P1

Oh good, I can reproduce this locally atop the Beta tree. (weird that it didn't bounce when landing, then)

Okay, that's weird. Events are getting across IPC, but Scalars aren't. And there are some content-process scalars that are getting across, so it doesn't seem like an IPC thing. It doesn't seem to be the thing that filters "telemetry.test" metrics (that's turned off). It doesn't appear to be a prerelease/release thing (that was fixed before the tests ran on Nightly on try). It doesn't appear to be the products check, as that's overridden in test_setup. They're not expired. They're not being forbidden from being recorded in the process (record_in_processes: ["all"])...

...I'm at a bit of a loss. Does anyone on this bug happen to know how to connect gdb to an android emulator running mach test?

nalexander: You seem like someone who might know about how to debug android?

Flags: needinfo?(nalexander)
Has Regression Range: --- → yes

I found some docs I hope are current enough to get me going. Building debug now, gonna start Android Studio after the symbols are safely in place in my objdir.

Looks as though the docs have a bad .lldbconfig value, though. The config guess for android's not <PATH>/objdir-android-opt but is more like <PATH>/obj-android-x86_64-unknown-linux-android. We'll see how it goes.

(In reply to James Graham [:jgraham] from comment #3)

nalexander: You seem like someone who might know about how to debug android?

In a past life, yes. And sometimes I moonlight! Best to ask in #geckoview these days.

(In reply to Chris H-C :chutten from comment #4)

I found some docs I hope are current enough to get me going. Building debug now, gonna start Android Studio after the symbols are safely in place in my objdir.

I (think I) wrote these docs moons ago, but they still look essentially correct. However, I think that post Bug 1581971 you can do ./mach run --debug on Android and connect lldb to GeckoViewExample out-of-the-box. I cannot vouch for this myself, though, since a) I've not used it myself and b) it's not tested in CI. I do not know if the mechanism is wired up for ./mach test --debug ....

Looks as though the docs have a bad .lldbconfig value, though. The config guess for android's not <PATH>/objdir-android-opt but is more like <PATH>/obj-android-x86_64-unknown-linux-android. We'll see how it goes.

Mutatis mutandis those docs look reasonable to me. You probably want to chat in #geckoview with :agi (and perhaps me if he's not available). Good luck!

Flags: needinfo?(nalexander)

On Android we don't run the JSMs that turn canRecordBase on in content
processes in xpcshell, so we have to turn that on ourselves.

Alrighty, think I've got it. Child processes in xpcshell don't set canRecordBase to true (which it more-or-less always should be) which means that non-Event data collections shortcircuit and never actually get stored. The patch contains a minimal fix that should work just peachily (try run is in progress

What's I'd like to know is how this didn't get picked up in earlier try runs on prerelease builds. These tests should (and locally do (Travis checked)) fail on tip of tree. (So this bugfix is needed across the trains.). The mechanism by which this is set to true is not especially straightforward, but so long as someone created a Telemetry instance on the process (which we did because it's a Service (also, we called it in the test)) it should be init'd to true by this code right here. Maybe there's some weird interaction with TelemetryControllerBase and its use of the toolkit.telemetry.enabled pref.

At any rate, this patch fixes it locally and is building away on try. With luck it'll be green. What do I need to do from there? (Beyond getting code review)?

Flags: needinfo?(nfay)

Comment on attachment 9269473 [details]
Bug 1761268 - Ensure we can record data in tests in child processes on Android r?TravisLong!

Beta/Release Uplift Approval Request

  • User impact if declined: test-only failure, no user impact
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Not risky 'cause it's a test-only patch. Worst it can do is make the test fail... harder?
  • String changes made/needed:
Attachment #9269473 - Flags: approval-mozilla-release?
Attachment #9269473 - Flags: approval-mozilla-beta?
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f2c5c8ceb77a
Ensure we can record data in tests in child processes on Android r=TravisLong
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch

Comment on attachment 9269473 [details]
Bug 1761268 - Ensure we can record data in tests in child processes on Android r?TravisLong!

Approved for beta uplift.
The test fix will make it to Release on merge day coming on Monday, March 28.

Attachment #9269473 - Flags: approval-mozilla-release?
Attachment #9269473 - Flags: approval-mozilla-release-
Attachment #9269473 - Flags: approval-mozilla-beta?
Attachment #9269473 - Flags: approval-mozilla-beta+
Flags: needinfo?(bas)
You need to log in before you can comment on or make changes to this bug.