Closed Bug 1607586 Opened 4 years ago Closed 4 years ago

Send all at-shutdown pings in a batch

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla74
Tracking Status
firefox74 --- fixed

People

(Reporter: chutten, Assigned: chutten)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

bug 1315306 seems to be caused at least in part by spawning a bunch of pingsenders in a row to handle the multiple pings we send at shutdown.

Let's gather all of these pings together in a bunch and hand them to TelemetrySend at once, instead of calling into TelemetrySend for each one independently. This will allow us to, later, when pingsender is augmented with the ability to send multiple pings at once, send them with a single pingsender invocation.

There has evolved a convention in Telemetry-land where pings are often sent
with a schedule that includes "and at shutdown, using pingsender". It turns out
that spawning a pingsender process for each of these pings can be a problem in
certain instances, so let's batch them all together so we can send them in a
single go.

The batching procedure starts at profile-change-net-teardown (after which gecko
networking doesn't work anymore) and batches all pings that wish to be sent
anyway, using pingsender. (Pings wishing to be sent not with pingsender
continue to have the behaviour of getting saved to disk and looked at in the
next Firefox session).

Then, when TelemetryController is more-or-less done shutting down (during
profile-before-change-telemetry as part of AsyncShutdown.sendTelemetry), it
flushes the batch.

What about pings wanting to be sent after profile-before-change-telemetry?
As before, any (non-internal) use of Telemetry APIs after profile-before-change
is forbidden.

At present the batch is fed one-by-one to pingsender at flush.

Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f94dd143caa2
Batch pingsender-sent pings sent at shutdown r=gsvelto,janerik
Backout by malexandru@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/28107c0396ad
Backed out changeset f94dd143caa2 on suspicion of causing leaks. a=backout,Aryx

Backed out changeset f94dd143caa2 (Bug 1607586) on suspicion of causing leaks. a=backout,Aryx

Backout link: https://hg.mozilla.org/integration/autoland/rev/28107c0396ad4f11c06fb83a66dffecc21673d9c

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&tochange=3e3a24d5378f3dbeca62124c8491549251e50498&fromchange=5ea36d82a283c479423faaabf0c7f2eb622167ed&selectedJob=285815587

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=285815653&repo=autoland&lineNumber=1744

[task 2020-01-21T18:50:15.665Z] 18:50:15 ERROR - GECKO(1129) | ==1129==ERROR: LeakSanitizer: detected memory leaks
[task 2020-01-21T18:50:15.667Z] 18:50:15 INFO - GECKO(1129) | Indirect leak of 1024 byte(s) in 1 object(s) allocated from:
[task 2020-01-21T18:50:15.668Z] 18:50:15 INFO - GECKO(1129) | #0 0x55d4fadc7d12 in calloc /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:154:3
[task 2020-01-21T18:50:15.669Z] 18:50:15 INFO - GECKO(1129) | #1 0x7f6e1dcce27d in PR_SetThreadPrivate /builds/worker/workspace/build/src/nsprpub/pr/src/threads/prtpd.c:162:27
[task 2020-01-21T18:50:15.669Z] 18:50:15 INFO - GECKO(1129) | #2 0x7f6e010b7970 in nsThread::InitCurrentThread() /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:696:26
[task 2020-01-21T18:50:15.670Z] 18:50:15 INFO - GECKO(1129) | #3 0x7f6e010b8ce3 in nsThreadManager::GetCurrentThread() /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp:583:18
[task 2020-01-21T18:50:15.671Z] 18:50:15 INFO - GECKO(1129) | #4 0x7f6e010c626b in NS_GetCurrentThread() /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp:502:33
[task 2020-01-21T18:50:15.671Z] 18:50:15 INFO - GECKO(1129) | #5 0x7f6e0afcaef7 in profiler_register_thread(char const*, void*) /builds/worker/workspace/build/src/tools/profiler/core/platform.cpp:4515:9
[task 2020-01-21T18:50:15.676Z] 18:50:15 INFO - GECKO(1129) | #6 0x7f6e010b0c9f in AutoProfilerRegisterThread /builds/worker/workspace/build/src/obj-firefox/dist/include/GeckoProfiler.h:1022:5
[task 2020-01-21T18:50:15.677Z] 18:50:15 INFO - GECKO(1129) | #7 0x7f6e010b0c9f in void mozilla::Maybe<mozilla::AutoProfilerRegisterThread>::emplace<char const (&) [11]>(char const (&) [11]) /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/Maybe.h:526:39
[task 2020-01-21T18:50:15.678Z] 18:50:15 INFO - GECKO(1129) | #8 0x7f6e010b0766 in nsProcess::Monitor(void*) /builds/worker/workspace/build/src/xpcom/threads/nsProcessCommon.cpp:118:20
[task 2020-01-21T18:50:15.679Z] 18:50:15 INFO - GECKO(1129) | #9 0x7f6e1dcd8d0e in _pt_root /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:201:5
[task 2020-01-21T18:50:15.679Z] 18:50:15 INFO - GECKO(1129) | #10 0x7f6e21a0c6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
[task 2020-01-21T18:50:15.680Z] 18:50:15 INFO - GECKO(1129) | Indirect leak of 11 byte(s) in 1 object(s) allocated from:
[task 2020-01-21T18:50:15.680Z] 18:50:15 INFO - GECKO(1129) | #0 0x55d4fadc7b9d in malloc /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:145:3
[task 2020-01-21T18:50:15.685Z] 18:50:15 INFO - GECKO(1129) | #1 0x7f6e1dccc211 in PR_SetCurrentThreadName /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:1668:28
[task 2020-01-21T18:50:15.686Z] 18:50:15 INFO - GECKO(1129) | #2 0x7f6e010b07be in NS_SetCurrentThreadName /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp:495:3
[task 2020-01-21T18:50:15.687Z] 18:50:15 INFO - GECKO(1129) | #3 0x7f6e010b07be in nsProcess::Monitor(void*) /builds/worker/workspace/build/src/xpcom/threads/nsProcessCommon.cpp:122:5
[task 2020-01-21T18:50:15.688Z] 18:50:15 INFO - GECKO(1129) | #4 0x7f6e1dcd8d0e in _pt_root /builds/worker/workspace/build/src/nsprpub/pr/src/pthreads/ptthread.c:201:5
[task 2020-01-21T18:50:15.688Z] 18:50:15 INFO - GECKO(1129) | #5 0x7f6e21a0c6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
[task 2020-01-21T18:50:15.689Z] 18:50:15 INFO - GECKO(1129) | -----------------------------------------------------
[task 2020-01-21T18:50:15.690Z] 18:50:15 INFO - GECKO(1129) | Suppressions used:
[task 2020-01-21T18:50:15.691Z] 18:50:15 INFO - GECKO(1129) | count bytes template
[task 2020-01-21T18:50:15.691Z] 18:50:15 INFO - GECKO(1129) | 27 816 nsComponentManagerImpl
[task 2020-01-21T18:50:15.691Z] 18:50:15 INFO - GECKO(1129) | 4 832 mozJSComponentLoader::LoadModule
[task 2020-01-21T18:50:15.691Z] 18:50:15 INFO - GECKO(1129) | 612 17514 libfontconfig.so
[task 2020-01-21T18:50:15.691Z] 18:50:15 INFO - GECKO(1129) | 1 29 libglib-2.0.so
[task 2020-01-21T18:50:15.691Z] 18:50:15 INFO - GECKO(1129) | 2 360 XPCWrappedNativeJSOps
[task 2020-01-21T18:50:15.691Z] 18:50:15 INFO - GECKO(1129) | -----------------------------------------------------
[task 2020-01-21T18:50:15.691Z] 18:50:15 INFO - GECKO(1129) | SUMMARY: AddressSanitizer: 1035 byte(s) leaked in 2 allocation(s).
[task 2020-01-21T18:50:15.767Z] 18:50:15 INFO - TEST-INFO | Main app process: exit 0
[task 2020-01-21T18:50:15.768Z] 18:50:15 INFO - TEST-INFO | LeakSanitizer | To show the addresses of leaked objects add report_objects=1 to LSAN_OPTIONS
[task 2020-01-21T18:50:15.769Z] 18:50:15 INFO - TEST-INFO | LeakSanitizer | This can be done in testing/mozbase/mozrunner/mozrunner/utils.py
[task 2020-01-21T18:50:15.770Z] 18:50:15 ERROR - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at PR_SetThreadPrivate, nsThread::InitCurrentThread, nsThreadManager::GetCurrentThread, NS_GetCurrentThread
[task 2020-01-21T18:50:15.770Z] 18:50:15 ERROR - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at PR_SetCurrentThreadName, NS_SetCurrentThreadName, nsProcess::Monitor, _pt_root

Flags: needinfo?(chutten)

Gabriele, I think this means we're now running pingsender too late for it to remember to tidy up after itself (ie, after "xpcom-shutdown" maybe?) so it's leaking these strings.

What do you think the solution should be? I've been using nsIProcess blindly, so I'm not sure of the tradeoffs of, say, telling nsIProcess to listen to a later shutdown phase.

Flags: needinfo?(chutten) → needinfo?(gsvelto)

Yes, "xpcom-shutdown" is very late and nsIProcess uses XPCOM facilities (such as the observer services). It would be better to shut down before, maybe at "application-quit".

Flags: needinfo?(gsvelto)

Taking a closer look, I don't think we're starting the Process that late after all.

Combining XPCOM Shutdown docs, AsyncShutdown docs, and when profile-before-change-telemetry is used it seems as though the trigger for these pings is happening just before xpcom-shutdown.... so we should be safe?

I mean, we're running pingsender inside _cleanupOnShutdown which happens synchronously within AsyncShutdown.sendTelemetry (shutdown is a blocker)... so we should still be running it before xpcom shutdown.

So I'm not sure why we're leaking. Maybe I need to see if I can reproduce this with LSan locally and debug the order of operations. This promises to take quite some time.

Ah-ha! We are running the process early enough, but the problem isn't when we're running PingSender, it is that we're running PingSender at all.

Previously pingsender wouldn't run if there were no pings to send. Turns out that the patches as landed wouldn't prevent us from invoking pingsender with no args. These tests were never expecting to have to properly leakcheck around that, so my guess is that there's a configuration snafu.

Lemme push a patch where we don't run pingsender when we don't need to: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9255e8f7c0b09166dc60b0d67d1b1674b398f24a

Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/93b81b2495cd
Batch pingsender-sent pings sent at shutdown r=gsvelto,janerik
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: