Marionette unit test "test_profile_management.py | test_replace_with_external_profile" doesn't set required preferences
Categories
(Remote Protocol :: Marionette, defect, P3)
Tracking
(firefox98 fixed)
Tracking | Status | |
---|---|---|
firefox98 | --- | fixed |
People
(Reporter: aryx, Assigned: whimboo)
References
Details
Attachments
(1 file)
Marionette tests failed today with the same crash signature which affected users in bug 1749910.
Does it submit data to telemetry (and should it not run this)? Or does a different connection explain the failure?
Assignee | ||
Comment 1•3 years ago
|
||
The assertion right before the crash includes the following in the stack:
https://treeherder.mozilla.org/logviewer?job_id=363967009&repo=autoland&lineNumber=64107-64141
[task 2022-01-13T08:11:05.139Z] 08:11:05 INFO - Assertion failure: false (We are done sending this request!), at /builds/worker/checkouts/gecko/netwerk/protocol/http/Http3Stream.cpp:273
[..]
[task 2022-01-13T08:11:05.337Z] 08:11:05 INFO - #23: BaseThreadInitThunk[C:\Windows\System32\KERNEL32.DLL +0x1fa29]
[task 2022-01-13T08:11:05.337Z] 08:11:05 INFO - #24: DllBlocklist_Initialize[Z:\task_164205822091498\build\application\firefox\mozglue.dll +0x7b8f8]
[task 2022-01-13T08:11:05.338Z] 08:11:05 INFO - #25: RtlGetAppContainerNamedObjectPath[C:\Windows\SYSTEM32\ntdll.dll +0x67a9e]
[task 2022-01-13T08:11:05.338Z] 08:11:05 INFO - #26: RtlGetAppContainerNamedObjectPath[C:\Windows\SYSTEM32\ntdll.dll +0x67a6e]
So I assume this is related to the DLL BlockList feature?
I still don't know what users actually hit in bug 1749910 and why it's fixed now.
Reporter | ||
Comment 2•3 years ago
|
||
Sounds like fetching the DLL blocklist can also hit the HTTP/3 bug.
Assignee | ||
Comment 3•3 years ago
|
||
Given that we disable Telemetry for Marionette (Remote Protocol) I'm going to remove it from the summary and make it an intermittent bug.
Not sure if we should / can stop using the DLL blocklist for Marionette tests.
Assignee | ||
Comment 4•3 years ago
|
||
I assume that we can mark it as duplicate of bug 1749957? Or is any follow-up necessary?
Reporter | ||
Comment 5•3 years ago
|
||
This bug is about the test environment to a server with got Firefox into the bad state else we shouldn't see the crashes. These kind of connections should be prevented (other test suites didn't fail - there must be a difference).
Comment 6•3 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #2)
Sounds like fetching the DLL blocklist can also hit the HTTP/3 bug.
Thanks Sebastian, I'll make sure people are aware.
Comment 7•3 years ago
|
||
ni? toshi in case there are any changes we should make to the DLL blocklist download code.
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
Gian-Carlo, can we please get a new bug filed for whatever is need in core to not cause such a crash? I would take this bug for the Marionette changes to actually disable the blocklist for testing similar to:
Comment 9•3 years ago
|
||
Cloned into bug 1750173.
Updated•3 years ago
|
Comment 10•3 years ago
•
|
||
I closed that bug again: Toshihito confirmed the DLL blocklist is entirely static (it needs to work before we load any Windows networking DLLs!), and doesn't update over the net.
Looking at the crash log, the stack pasted is corrupt and not the actual one the warning came from:
[task 2022-01-13T08:13:06.839Z] 08:13:06 INFO - Thread 7 Socket Thread (crashed)
[task 2022-01-13T08:13:06.839Z] 08:13:06 INFO - 0 xul.dll!mozilla::net::Http3Stream::OnReadSegment(char const*, unsigned int, unsigned int*) [Http3Stream.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 273 + 0x0]
[task 2022-01-13T08:13:06.840Z] 08:13:06 INFO - 1 xul.dll!static mozilla::net::nsHttpTransaction::ReadRequestSegment(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*) [nsHttpTransaction.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 735 + 0x20]
[task 2022-01-13T08:13:06.841Z] 08:13:06 INFO - 2 xul.dll!nsBufferedInputStream::ReadSegments(nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) [nsBufferedStreams.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 433 + 0x21]
(there's more, but nothing revealing the problem)
(and here we see some DLL Blocklist stack simply because that patched the threadstart, unrelated to the problem)
But it's not obvious who was doing what that caused the action on the Socket Thread, as it's just processing a message from its event loop. While this happened the DLL Blocklist and third party stuff was indeed running, but that seems to be because they were still busy hashing the DLLs injected into Firefox, and they're not necessarily the origin of the network query.
Assignee | ||
Comment 11•3 years ago
|
||
Interesting. Is there any way that I could use to reproduce this failure? Otherwise it will be impossible to actually get this investigated and fixed.
I just checked our settings and we indeed do now allow the blocklist to access the network:
https://searchfox.org/mozilla-central/rev/3de56eb5f266f523340e739ae1b53258e0a95dfe/testing/marionette/client/marionette_driver/geckoinstance.py#114-115
Assignee | ||
Updated•3 years ago
|
Comment 12•3 years ago
|
||
Okay, maybe it's not a coincidence:
task 2022-01-13T08:13:08.020Z] 08:13:08 INFO - Thread 91 glean.upload
...
[task 2022-01-13T08:13:08.025Z] 08:13:08 INFO - 6 xul.dll!static mozilla::Viaduct::ViaductCallback(mozilla::`anonymous namespace'::ViaductByteBuffer) [Viaduct.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 46 + 0x167]
...
So we hit the known issue with Viaduct again. But maybe the fact that we're collecting DLL hashes at exactly this time isn't a coincidence, as we have third-party module telemetry (bug 1686229).
Assignee | ||
Comment 13•3 years ago
|
||
So if it' indeed Telemetry which exact preferences would have to be set to make glean to not upload anything? Maybe we indeed miss to set a preference.
Assignee | ||
Comment 14•3 years ago
|
||
I had a look at the testing profiles under /testing/profiles
for telemetry
occurrences and these I found:
https://searchfox.org/mozilla-central/search?q=telemetry&path=testing%2Fprofiles
Note that over on bug 1662544 we seem to have only disabled various telemetry features. So is it enough to set user_pref("toolkit.telemetry.server", "https://%(server)s/telemetry-dummy");
here? In our configs we don't set any pref that's including telemetry
in its name. Shall we reset the prefs that disable certain features?
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 15•3 years ago
|
||
Chris, maybe you can also have a look at the patches on bug 1662544? What did we wrong in disabling Telemetry?
Comment 16•3 years ago
|
||
The only thing you ever need to do in order to disable data reporting in Firefox is to uncheck the box in about:preferences#privacy
that says "Allow Firefox to send technical and interaction data to Mozilla". This corresponds to the pref datareporting.healthreport.uploadEnabled
which, if false
will tell Firefox Telemetry not to upload anything, and will forbid the Glean SDK from even collecting anything.
...except. The act of flipping that pref from true
to false
shows that the user has removed consent for us to collect data about them. We use this as a signal to trigger self-serve data deletion which means we have to send a "deletion-request" ping containing the anonymous identifiers we need to determine which data to delete on our systems.
If the profile that Marionette was using had the value set to true
(its default), and only later (after the first time FOG inits, to be precise) was it set to false
, then the Glean SDK would craft a "deletion-request" ping and try to upload it. This includes if the value were changed while Firefox wasn't running: if (on a given profile) FOG ever initialized the Glean SDK with data upload enabled, if it inits later with upload disabled it would trigger such a ping. And if the testing prefs didn't say otherwise, the Glean SDK will try to send it to our production servers, which were running HTTP/3 for those two hours last week.
If this is actually what's happening, then before we set datareporting.healthreport.uploadEnabled
to false
we should tell Glean not to use the network by setting telemetry.fog.test.localhost_port
to -1
. (( The reason this wasn't part of bug 1662544 is because this capability was only added later by bug 1737157. And the reason I didn't change the driver prefs in bug 1737157 is because it appeared as though those prefs were being set before FOG was being initialized. (which still may be the case, but if so I'm at a loss for what it is that Glean thinks it should be uploading) ))
(( The third-party modules telemetry is a red herring as that's Legacy Firefox Telemetry (which uses ServiceRequest
), not Glean (which uses viaduct
). ))
Assignee | ||
Comment 17•3 years ago
|
||
Thanks Chris! Here some clarification:
With both Marionette and geckodriver we we prepare the profile before launching Firefox. Hereby we also create a user.js
file which contains all the required preferences that require a restart or as above have to be set before the first start. Hereby we actually set datareporting.healthreport.uploadEnabled
to false
. And as I read it means that Glean will not collect any data, and Telemetry to not upload anything. But there was still a HTTP3 connection and this probably cannot be the deletion-request
ping then. But what is it?
Here the stack from the glean.upload
thread as it can be seen in the crash stack:
https://treeherder.mozilla.org/logviewer?job_id=363967009&repo=autoland&lineNumber=71042
[task 2022-01-13T08:13:08.020Z] 08:13:08 INFO - Thread 91 glean.upload
[task 2022-01-13T08:13:08.021Z] 08:13:08 INFO - 0 ntdll.dll!NtWaitForAlertByThreadId + 0xc
[task 2022-01-13T08:13:08.021Z] 08:13:08 INFO - eip = 0x774646dc esp = 0x2d6ff564 ebp = 0x2d6ff5bc ebx = 0x2d6ff638
[task 2022-01-13T08:13:08.021Z] 08:13:08 INFO - esi = 0x00000000 edi = 0x2bdfe090 eax = 0x00000000 ecx = 0x00000000
[task 2022-01-13T08:13:08.021Z] 08:13:08 INFO - edx = 0x00000000 efl = 0x00000202
[task 2022-01-13T08:13:08.021Z] 08:13:08 INFO - Found by: given as instruction pointer in context
[task 2022-01-13T08:13:08.021Z] 08:13:08 INFO - 1 ntdll.dll!RtlSleepConditionVariableSRW + 0x11c
[task 2022-01-13T08:13:08.022Z] 08:13:08 INFO - eip = 0x774322fd esp = 0x2d6ff568 ebp = 0x2d6ff5bc ebx = 0x2d6ff638
[task 2022-01-13T08:13:08.022Z] 08:13:08 INFO - esi = 0x00000000 edi = 0x2bdfe090
[task 2022-01-13T08:13:08.022Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.022Z] 08:13:08 INFO - 2 KERNELBASE.dll!SleepConditionVariableSRW + 0x22
[task 2022-01-13T08:13:08.022Z] 08:13:08 INFO - eip = 0x76fe48e3 esp = 0x2d6ff5c4 ebp = 0x2d6ff5e0 ebx = 0x2d6ff638
[task 2022-01-13T08:13:08.022Z] 08:13:08 INFO - esi = 0x00000000 edi = 0x2bdfe090
[task 2022-01-13T08:13:08.022Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.023Z] 08:13:08 INFO - 3 mozglue.dll!mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) [ConditionVariable_windows.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 58 + 0xe]
[task 2022-01-13T08:13:08.023Z] 08:13:08 INFO - eip = 0x71f13a49 esp = 0x2d6ff5e8 ebp = 0x2d6ff620 ebx = 0x2d6ff638
[task 2022-01-13T08:13:08.023Z] 08:13:08 INFO - esi = 0x00000000 edi = 0x2bdfe090
[task 2022-01-13T08:13:08.023Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.024Z] 08:13:08 INFO - 4 xul.dll!mozilla::OffTheBooksCondVar::Wait(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [BlockingResourceBase.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 534 + 0x10]
[task 2022-01-13T08:13:08.024Z] 08:13:08 INFO - eip = 0x6705aec4 esp = 0x2d6ff628 ebp = 0x2d6ff658 ebx = 0x00000000
[task 2022-01-13T08:13:08.024Z] 08:13:08 INFO - esi = 0x2bdfe0bc edi = 0x247792e0
[task 2022-01-13T08:13:08.024Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.024Z] 08:13:08 INFO - 5 xul.dll!mozilla::OffTheBooksCondVar::Wait() [BlockingResourceBase.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 514 + 0xb]
[task 2022-01-13T08:13:08.024Z] 08:13:08 INFO - eip = 0x6705adff esp = 0x2d6ff660 ebp = 0x2d6ff668 ebx = 0x2bdfe0bc
[task 2022-01-13T08:13:08.025Z] 08:13:08 INFO - esi = 0x2bdfe030 edi = 0x2cf0b400
[task 2022-01-13T08:13:08.025Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.025Z] 08:13:08 INFO - 6 xul.dll!static mozilla::Viaduct::ViaductCallback(mozilla::`anonymous namespace'::ViaductByteBuffer) [Viaduct.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 46 + 0x167]
[task 2022-01-13T08:13:08.025Z] 08:13:08 INFO - eip = 0x6c804cd7 esp = 0x2d6ff670 ebp = 0x2d6ff704 ebx = 0x2bdfe0bc
[task 2022-01-13T08:13:08.025Z] 08:13:08 INFO - esi = 0x2bdfe030 edi = 0x2cf0b400
[task 2022-01-13T08:13:08.026Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.026Z] 08:13:08 INFO - 7 xul.dll!viaduct::backend::ffi::impl$1::send(viaduct::backend::ffi::FfiBackend*, viaduct::Request) [ffi.rs:08d254139525017ceb5f98be59d3a5864adbf331 : 51 + 0x56]
[task 2022-01-13T08:13:08.026Z] 08:13:08 INFO - eip = 0x6d336825 esp = 0x2d6ff70c ebp = 0x2d6ff8e4 ebx = 0x00000290
[task 2022-01-13T08:13:08.026Z] 08:13:08 INFO - esi = 0x000001a0 edi = 0x2d6ff740
[task 2022-01-13T08:13:08.026Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.026Z] 08:13:08 INFO - 8 xul.dll!static viaduct::Request::send(viaduct::Request) [lib.rs:08d254139525017ceb5f98be59d3a5864adbf331 : 84 + 0x1ac]
[task 2022-01-13T08:13:08.027Z] 08:13:08 INFO - eip = 0x6d33878f esp = 0x2d6ff8ec ebp = 0x2d6ff9cc ebx = 0x00000001
[task 2022-01-13T08:13:08.027Z] 08:13:08 INFO - esi = 0x2d6ff9c0 edi = 0x2d6ff95c
[task 2022-01-13T08:13:08.027Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.027Z] 08:13:08 INFO - 9 xul.dll!fog_control::init::viaduct_uploader::impl$0::upload(fog_control::init::viaduct_uploader::ViaductUploader*, alloc::string::String, alloc::vec::Vec<u8,alloc::alloc::Global>, alloc::vec::Vec<tuple$<alloc::string::String,alloc::string::String>,alloc::alloc::Global>) [viaduct_uploader.rs:08d254139525017ceb5f98be59d3a5864adbf331 : 25 + 0x729]
[task 2022-01-13T08:13:08.027Z] 08:13:08 INFO - eip = 0x6d32fa11 esp = 0x2d6ff9d4 ebp = 0x2d6ffc18 ebx = 0x2a3997c0
[task 2022-01-13T08:13:08.028Z] 08:13:08 INFO - esi = 0x2d6ffa2c edi = 0x2d6ffb30
[task 2022-01-13T08:13:08.028Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.028Z] 08:13:08 INFO - 10 xul.dll!std::sys_common::backtrace::__rust_begin_short_backtrace<glean::net::impl$0::trigger_upload::closure$0,tuple$<> >(glean::net::impl$0::trigger_upload::closure$0) [backtrace.rs:f1edd0429582dd29cccacaf50fd134b05593bd9c : 123 + 0x5ad]
[task 2022-01-13T08:13:08.028Z] 08:13:08 INFO - eip = 0x6e879455 esp = 0x2d6ffc20 ebp = 0x2d6ffd78 ebx = 0x2d6ffcd8
[task 2022-01-13T08:13:08.028Z] 08:13:08 INFO - esi = 0x2d6ffc70 edi = 0x00000104
[task 2022-01-13T08:13:08.029Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.029Z] 08:13:08 INFO - 11 xul.dll!core::ops::function::FnOnce::call_once<std::thread::impl$0::spawn_unchecked::closure$1,tuple$<> >(std::thread::impl$0::spawn_unchecked::closure$1*) [function.rs:f1edd0429582dd29cccacaf50fd134b05593bd9c : 227 + 0x56]
[task 2022-01-13T08:13:08.029Z] 08:13:08 INFO - eip = 0x6e87cc47 esp = 0x2d6ffd80 ebp = 0x2d6ffd90 ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.029Z] 08:13:08 INFO - esi = 0x2f683460 edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.029Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.030Z] 08:13:08 INFO - 12 xul.dll!std::sys::windows::thread::impl$0::new::thread_start() [thread.rs:f1edd0429582dd29cccacaf50fd134b05593bd9c : 58 + 0x18]
[task 2022-01-13T08:13:08.030Z] 08:13:08 INFO - eip = 0x6e07d746 esp = 0x2d6ffd98 ebp = 0x2d6ffdc4 ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.030Z] 08:13:08 INFO - esi = 0x6e07d6e0 edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.030Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.030Z] 08:13:08 INFO - 13 kernel32.dll!BaseThreadInitThunk + 0x18
[task 2022-01-13T08:13:08.031Z] 08:13:08 INFO - eip = 0x76d8fa29 esp = 0x2d6ffdcc ebp = 0x2d6ffdd4 ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.031Z] 08:13:08 INFO - esi = 0x6e07d6e0 edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.031Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.031Z] 08:13:08 INFO - 14 mozglue.dll!patched_BaseThreadInitThunk(int, void*, void*) [WindowsDllBlocklist.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 574 + 0xa]
[task 2022-01-13T08:13:08.031Z] 08:13:08 INFO - eip = 0x71f2b8f8 esp = 0x2d6ffddc ebp = 0x2d6ffe18 ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.031Z] 08:13:08 INFO - esi = 0x6e07d6e0 edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.032Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.032Z] 08:13:08 INFO - 15 ntdll.dll!_RtlUserThreadStart + 0x2e
[task 2022-01-13T08:13:08.032Z] 08:13:08 INFO - eip = 0x77457a9e esp = 0x2d6ffe20 ebp = 0x2d6ffe74 ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.032Z] 08:13:08 INFO - esi = 0x76d8fa10 edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.032Z] 08:13:08 INFO - Found by: call frame info
[task 2022-01-13T08:13:08.032Z] 08:13:08 INFO - 16 ntdll.dll!_RtlUserThreadStart + 0x1a
[task 2022-01-13T08:13:08.033Z] 08:13:08 INFO - eip = 0x77457a6e esp = 0x2d6ffe7c ebp = 0x2d6ffe84 ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.033Z] 08:13:08 INFO - esi = 0x76d8fa10 edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.033Z] 08:13:08 INFO - Found by: call frame info
Comment 18•3 years ago
|
||
The stack shows that Glean's performing a ping upload. If datareporting.healthreport.uploadEnabled
is false
, then the only ping the Glean SDK could be uploading is a "deletion-request" ping.
The profile directory is used for storage. Is there any possibility of the profile directory being reused between runs?
(( The solution remains the same: set telemetry.fog.test.localhost_port
to -1
and FOG's viaduct_uploader will early return on frame 9... but if not-"deletion-request" pings are being crafted and uploaded in defiance of the data upload pref, that's not a good thing. ))
Assignee | ||
Comment 19•3 years ago
|
||
Thanks Chris!
So I had another look at the logs and actually checked which tests are causing this failure. Interestingly it's always the ame one: test_replace_with_external_profile
.
In this test we make use of a profile as generated by mozrunner. And for this newly generated profile we DON'T set the automation preferences. As such Telemetry will be enabled by default.
The reason why it only crashes for this test is that all the other tests actually clone the profile and reusing the current preferences, which include the required automation preferences.
To get it fixed we will have to pass all current preferences to mozrunner when creating this new profile.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 20•3 years ago
|
||
I'll have a patch soon but note that Marionette still seems to cause network requests to external sites. At least that is what I can see when generating a gecko profile: https://share.firefox.dev/3Aeoo7U. I'll take care of this over on bug 1371576.
Assignee | ||
Comment 21•3 years ago
|
||
Comment 22•3 years ago
|
||
Comment 23•3 years ago
|
||
bugherder |
Updated•2 years ago
|
Description
•