Closed Bug 1646402 Opened 5 years ago Closed 5 years ago

mozregression isn't uploading inside of mach or console

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mdroettboom, Assigned: mdroettboom)

Details

(Whiteboard: [telemetry:glean-rs:m?])

Attachments

(1 file)

Link to GitHub pull-request: https://github.com/mozilla/glean/pull/986 5 years ago GitHub Bugzilla PR Linker 41 bytes, text/x-github-pull-request		Details \| Review

Michael Droettboom [:mdroettboom]

Assignee

Description

•

5 years ago

:wlach pointed out in bug 1646173:

"""
Not sure if this is related but there has been no reported mozregression usage on mach (which I gather would be using the latest glean sdk, unlike the GUI which uses a hardcoded version-- 30.1.0 currently) since May 27:

https://sql.telemetry.mozilla.org/queries/70610#177730
"""

I'm not sure this is related to bug 1646173, so breaking it out into its own bug.

William Lachance (:wlach)

Updated

•

5 years ago

Summary: mozregression isn't uploading inside of mach → mozregression isn't uploading inside of mach or console

William Lachance (:wlach)

Comment 1

•

5 years ago

This looks to be happening with both console and mach, actually. There have been almost no pings from mozregression versions later than 4.05 (the release that came out around the time of the Glean SDK):

https://sql.telemetry.mozilla.org/queries/72098/source

Running things on the command-line, I'm not seeing any Glean messages being emitted on the mozregression console... there's this bit of indirection which forks off a process, but I'm still not seeing anything even with that code commented out:

https://github.com/mozilla/mozregression/blob/dc2a6498c33d57cca5940dc6d5a35ccf75929ae4/mozregression/telemetry.py#L43

Michael Droettboom [:mdroettboom]

Assignee

Comment 2

•

5 years ago

I'm having trouble reproducing this. Here's what I did:

Made a fresh venv
Installed mozregression's dependencies and mozregression into it
Upgraded glean_sdk to make sure I was getting the broken 31.1.1 version
Modified mozregression to send a tagged ping by adding ping_tag="foo" to the glean.config.Configuration constructor in mozregression's telemetry.py.
Ran the GUI from the commandline using python gui/build.py run
Ran a single run from the GUI
Pings show up in the debug pings viewer

So... something is different about my environment from yours, but I wonder what?

Flags: needinfo?(wlachance)

Michael Droettboom [:mdroettboom]

Assignee

Comment 3

•

5 years ago

I should add -- this is on Linux at the command line. Maybe this is platform-specific?

Michael Droettboom [:mdroettboom]

Assignee

Comment 4

•

5 years ago

Ah, I see -- If I do

mozregression -b 2020-06-16 -g 2020-06-17

pings don't seem to be sent. Now that I have something to reproduce, looking further...

William Lachance (:wlach)

Comment 5

•

5 years ago

(In reply to Michael Droettboom [:mdroettboom] from comment #4)

Ah, I see -- If I do
mozregression -b 2020-06-16 -g 2020-06-17
pings don't seem to be sent. Now that I have something to reproduce, looking further...

Yes! The GUI seems to be working fine. Sorry, I forgot to point that out (I guess I figured it was implied...).

Flags: needinfo?(wlachance)

Michael Droettboom [:mdroettboom]

Assignee

Comment 6

•

5 years ago

It's looking like this is the issue:

https://stackoverflow.com/questions/34506638/how-to-register-atexit-function-in-pythons-multiprocessing-subprocess

Glean uses an atexit handler to make sure that all of it's threaded work completes before shutting down the process. It turns out that these atexit handlers are not called when using multiprocessing to spawn a separate process.

Replacing _send_telemetry_ping_oop with the following does resolve the issue.

def _send_telemetry_ping_oop(variant, appname, upload_enabled):
    try:
        initialize_telemetry(upload_enabled, allow_multiprocessing=True)
        if upload_enabled:
            _send_telemetry_ping(variant, appname)
    finally:
        atexit._run_exitfuncs()

I'm not sure this is the best solution however. As the SO post points out -- other things could be using atexit that might interfere. Glean should probably grow a public API to call for this case. It's not great -- it requires documenting "if you're using multiprocessing, also make sure you do this other thing to shut things down cleanly".

Another possibility is to turn off multithreading when inside of a multiprocessing process (assuming that's detectable). That wouldn't put this burden on our users.

Another thing to attack would be the reason mozregression is using multiprocessing in the first place -- because mach may need its own instance of Glean and we don't want the data intermingled / going to different endpoints in each context etc. An original design assumption was that components always send their data "as if" coming from the app, but that's not the use case we want for mozregression.

I'm not sure why it broke with this particular revision -- but given that it's kind of race-conditiony, I think something with the timing / mutexes changed enough to hit this. It could also explain (possibly) the lower mach numbers if it were flaky all along.

GitHub Bugzilla PR Linker

Comment 7

•

5 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/glean/pull/986 — Details

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

5 years ago

Assignee: nobody → mdroettboom

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

5 years ago

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

mozregression isn't uploading inside of mach or console

Categories

(Data Platform and Tools :: Glean: SDK, defect, P3)

Tracking

(Not tracked)

People

(Reporter: mdroettboom, Assigned: mdroettboom)

References

Details

(Whiteboard: [telemetry:glean-rs:m?])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Updated

Attachment

General

Description

File Name

Content Type