Closed Bug 1749981 Opened 3 years ago Closed 3 years ago

Marionette unit test "test_profile_management.py | test_replace_with_external_profile" doesn't set required preferences

Categories

(Remote Protocol :: Marionette, defect, P3)

Default
defect

Tracking

(firefox98 fixed)

RESOLVED FIXED
98 Branch
Tracking Status
firefox98 --- fixed

People

(Reporter: aryx, Assigned: whimboo)

References

Details

Attachments

(1 file)

Marionette tests failed today with the same crash signature which affected users in bug 1749910.

Does it submit data to telemetry (and should it not run this)? Or does a different connection explain the failure?

The assertion right before the crash includes the following in the stack:

https://treeherder.mozilla.org/logviewer?job_id=363967009&repo=autoland&lineNumber=64107-64141

[task 2022-01-13T08:11:05.139Z] 08:11:05     INFO -  Assertion failure: false (We are done sending this request!), at /builds/worker/checkouts/gecko/netwerk/protocol/http/Http3Stream.cpp:273
[..]
[task 2022-01-13T08:11:05.337Z] 08:11:05     INFO -  #23: BaseThreadInitThunk[C:\Windows\System32\KERNEL32.DLL +0x1fa29]
[task 2022-01-13T08:11:05.337Z] 08:11:05     INFO -  #24: DllBlocklist_Initialize[Z:\task_164205822091498\build\application\firefox\mozglue.dll +0x7b8f8]
[task 2022-01-13T08:11:05.338Z] 08:11:05     INFO -  #25: RtlGetAppContainerNamedObjectPath[C:\Windows\SYSTEM32\ntdll.dll +0x67a9e]
[task 2022-01-13T08:11:05.338Z] 08:11:05     INFO -  #26: RtlGetAppContainerNamedObjectPath[C:\Windows\SYSTEM32\ntdll.dll +0x67a6e]

So I assume this is related to the DLL BlockList feature?

I still don't know what users actually hit in bug 1749910 and why it's fixed now.

Flags: needinfo?(aryx.bugmail)

Sounds like fetching the DLL blocklist can also hit the HTTP/3 bug.

Flags: needinfo?(aryx.bugmail) → needinfo?(jwalker)

Given that we disable Telemetry for Marionette (Remote Protocol) I'm going to remove it from the summary and make it an intermittent bug.

Not sure if we should / can stop using the DLL blocklist for Marionette tests.

Summary: marionette tests should not connect to telemetry → Firefox crash with Marionette when updating DLL blocklist? [@ mozilla::net::Http3Stream::OnReadSegment(char const*, unsigned int, unsigned int*)]

I assume that we can mark it as duplicate of bug 1749957? Or is any follow-up necessary?

Flags: needinfo?(aryx.bugmail)

This bug is about the test environment to a server with got Firefox into the bad state else we shouldn't see the crashes. These kind of connections should be prevented (other test suites didn't fail - there must be a difference).

Flags: needinfo?(aryx.bugmail)

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #2)

Sounds like fetching the DLL blocklist can also hit the HTTP/3 bug.

Thanks Sebastian, I'll make sure people are aware.

Flags: needinfo?(jwalker)

ni? toshi in case there are any changes we should make to the DLL blocklist download code.

Flags: needinfo?(tkikuchi)
Severity: -- → S1
Priority: -- → P1

Gian-Carlo, can we please get a new bug filed for whatever is need in core to not cause such a crash? I would take this bug for the Marionette changes to actually disable the blocklist for testing similar to:

https://searchfox.org/mozilla-central/rev/3de56eb5f266f523340e739ae1b53258e0a95dfe/testing/profiles/unittest-required/user.js#202-203

Flags: needinfo?(gpascutto)

Cloned into bug 1750173.

No longer blocks: 1750173
Severity: S1 → --
Flags: needinfo?(gpascutto)
Priority: P1 → --
Flags: needinfo?(tkikuchi)

I closed that bug again: Toshihito confirmed the DLL blocklist is entirely static (it needs to work before we load any Windows networking DLLs!), and doesn't update over the net.

Looking at the crash log, the stack pasted is corrupt and not the actual one the warning came from:

[task 2022-01-13T08:13:06.839Z] 08:13:06     INFO -  Thread 7 Socket Thread (crashed)
[task 2022-01-13T08:13:06.839Z] 08:13:06     INFO -   0  xul.dll!mozilla::net::Http3Stream::OnReadSegment(char const*, unsigned int, unsigned int*) [Http3Stream.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 273 + 0x0]
[task 2022-01-13T08:13:06.840Z] 08:13:06     INFO -   1  xul.dll!static mozilla::net::nsHttpTransaction::ReadRequestSegment(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*) [nsHttpTransaction.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 735 + 0x20]
[task 2022-01-13T08:13:06.841Z] 08:13:06     INFO -   2  xul.dll!nsBufferedInputStream::ReadSegments(nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*) [nsBufferedStreams.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 433 + 0x21]
(there's more, but nothing revealing the problem)
(and here we see some DLL Blocklist stack simply because that patched the threadstart, unrelated to the problem)

But it's not obvious who was doing what that caused the action on the Socket Thread, as it's just processing a message from its event loop. While this happened the DLL Blocklist and third party stuff was indeed running, but that seems to be because they were still busy hashing the DLLs injected into Firefox, and they're not necessarily the origin of the network query.

Interesting. Is there any way that I could use to reproduce this failure? Otherwise it will be impossible to actually get this investigated and fixed.

I just checked our settings and we indeed do now allow the blocklist to access the network:
https://searchfox.org/mozilla-central/rev/3de56eb5f266f523340e739ae1b53258e0a95dfe/testing/marionette/client/marionette_driver/geckoinstance.py#114-115

Severity: -- → S3
Priority: -- → P3

Okay, maybe it's not a coincidence:

task 2022-01-13T08:13:08.020Z] 08:13:08     INFO -  Thread 91 glean.upload
...
[task 2022-01-13T08:13:08.025Z] 08:13:08     INFO -   6  xul.dll!static mozilla::Viaduct::ViaductCallback(mozilla::`anonymous namespace'::ViaductByteBuffer) [Viaduct.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 46 + 0x167]
...

So we hit the known issue with Viaduct again. But maybe the fact that we're collecting DLL hashes at exactly this time isn't a coincidence, as we have third-party module telemetry (bug 1686229).

So if it' indeed Telemetry which exact preferences would have to be set to make glean to not upload anything? Maybe we indeed miss to set a preference.

I had a look at the testing profiles under /testing/profiles for telemetry occurrences and these I found:

https://searchfox.org/mozilla-central/search?q=telemetry&path=testing%2Fprofiles

Note that over on bug 1662544 we seem to have only disabled various telemetry features. So is it enough to set user_pref("toolkit.telemetry.server", "https://%(server)s/telemetry-dummy"); here? In our configs we don't set any pref that's including telemetry in its name. Shall we reset the prefs that disable certain features?

Flags: needinfo?(chutten)
See Also: → 1662544
Summary: Firefox crash with Marionette when updating DLL blocklist? [@ mozilla::net::Http3Stream::OnReadSegment(char const*, unsigned int, unsigned int*)] → Firefox crash with Marionette enabled [@ mozilla::net::Http3Stream::OnReadSegment(char const*, unsigned int, unsigned int*)]

Chris, maybe you can also have a look at the patches on bug 1662544? What did we wrong in disabling Telemetry?

The only thing you ever need to do in order to disable data reporting in Firefox is to uncheck the box in about:preferences#privacy that says "Allow Firefox to send technical and interaction data to Mozilla". This corresponds to the pref datareporting.healthreport.uploadEnabled which, if false will tell Firefox Telemetry not to upload anything, and will forbid the Glean SDK from even collecting anything.

...except. The act of flipping that pref from true to false shows that the user has removed consent for us to collect data about them. We use this as a signal to trigger self-serve data deletion which means we have to send a "deletion-request" ping containing the anonymous identifiers we need to determine which data to delete on our systems.

If the profile that Marionette was using had the value set to true (its default), and only later (after the first time FOG inits, to be precise) was it set to false, then the Glean SDK would craft a "deletion-request" ping and try to upload it. This includes if the value were changed while Firefox wasn't running: if (on a given profile) FOG ever initialized the Glean SDK with data upload enabled, if it inits later with upload disabled it would trigger such a ping. And if the testing prefs didn't say otherwise, the Glean SDK will try to send it to our production servers, which were running HTTP/3 for those two hours last week.

If this is actually what's happening, then before we set datareporting.healthreport.uploadEnabled to false we should tell Glean not to use the network by setting telemetry.fog.test.localhost_port to -1. (( The reason this wasn't part of bug 1662544 is because this capability was only added later by bug 1737157. And the reason I didn't change the driver prefs in bug 1737157 is because it appeared as though those prefs were being set before FOG was being initialized. (which still may be the case, but if so I'm at a loss for what it is that Glean thinks it should be uploading) ))

(( The third-party modules telemetry is a red herring as that's Legacy Firefox Telemetry (which uses ServiceRequest), not Glean (which uses viaduct). ))

Flags: needinfo?(chutten)

Thanks Chris! Here some clarification:

With both Marionette and geckodriver we we prepare the profile before launching Firefox. Hereby we also create a user.js file which contains all the required preferences that require a restart or as above have to be set before the first start. Hereby we actually set datareporting.healthreport.uploadEnabled to false. And as I read it means that Glean will not collect any data, and Telemetry to not upload anything. But there was still a HTTP3 connection and this probably cannot be the deletion-request ping then. But what is it?

Here the stack from the glean.upload thread as it can be seen in the crash stack:
https://treeherder.mozilla.org/logviewer?job_id=363967009&repo=autoland&lineNumber=71042

[task 2022-01-13T08:13:08.020Z] 08:13:08     INFO -  Thread 91 glean.upload
[task 2022-01-13T08:13:08.021Z] 08:13:08     INFO -   0  ntdll.dll!NtWaitForAlertByThreadId + 0xc
[task 2022-01-13T08:13:08.021Z] 08:13:08     INFO -      eip = 0x774646dc   esp = 0x2d6ff564   ebp = 0x2d6ff5bc   ebx = 0x2d6ff638
[task 2022-01-13T08:13:08.021Z] 08:13:08     INFO -      esi = 0x00000000   edi = 0x2bdfe090   eax = 0x00000000   ecx = 0x00000000
[task 2022-01-13T08:13:08.021Z] 08:13:08     INFO -      edx = 0x00000000   efl = 0x00000202
[task 2022-01-13T08:13:08.021Z] 08:13:08     INFO -      Found by: given as instruction pointer in context
[task 2022-01-13T08:13:08.021Z] 08:13:08     INFO -   1  ntdll.dll!RtlSleepConditionVariableSRW + 0x11c
[task 2022-01-13T08:13:08.022Z] 08:13:08     INFO -      eip = 0x774322fd   esp = 0x2d6ff568   ebp = 0x2d6ff5bc   ebx = 0x2d6ff638
[task 2022-01-13T08:13:08.022Z] 08:13:08     INFO -      esi = 0x00000000   edi = 0x2bdfe090
[task 2022-01-13T08:13:08.022Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.022Z] 08:13:08     INFO -   2  KERNELBASE.dll!SleepConditionVariableSRW + 0x22
[task 2022-01-13T08:13:08.022Z] 08:13:08     INFO -      eip = 0x76fe48e3   esp = 0x2d6ff5c4   ebp = 0x2d6ff5e0   ebx = 0x2d6ff638
[task 2022-01-13T08:13:08.022Z] 08:13:08     INFO -      esi = 0x00000000   edi = 0x2bdfe090
[task 2022-01-13T08:13:08.022Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.023Z] 08:13:08     INFO -   3  mozglue.dll!mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) [ConditionVariable_windows.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 58 + 0xe]
[task 2022-01-13T08:13:08.023Z] 08:13:08     INFO -      eip = 0x71f13a49   esp = 0x2d6ff5e8   ebp = 0x2d6ff620   ebx = 0x2d6ff638
[task 2022-01-13T08:13:08.023Z] 08:13:08     INFO -      esi = 0x00000000   edi = 0x2bdfe090
[task 2022-01-13T08:13:08.023Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.024Z] 08:13:08     INFO -   4  xul.dll!mozilla::OffTheBooksCondVar::Wait(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) [BlockingResourceBase.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 534 + 0x10]
[task 2022-01-13T08:13:08.024Z] 08:13:08     INFO -      eip = 0x6705aec4   esp = 0x2d6ff628   ebp = 0x2d6ff658   ebx = 0x00000000
[task 2022-01-13T08:13:08.024Z] 08:13:08     INFO -      esi = 0x2bdfe0bc   edi = 0x247792e0
[task 2022-01-13T08:13:08.024Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.024Z] 08:13:08     INFO -   5  xul.dll!mozilla::OffTheBooksCondVar::Wait() [BlockingResourceBase.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 514 + 0xb]
[task 2022-01-13T08:13:08.024Z] 08:13:08     INFO -      eip = 0x6705adff   esp = 0x2d6ff660   ebp = 0x2d6ff668   ebx = 0x2bdfe0bc
[task 2022-01-13T08:13:08.025Z] 08:13:08     INFO -      esi = 0x2bdfe030   edi = 0x2cf0b400
[task 2022-01-13T08:13:08.025Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.025Z] 08:13:08     INFO -   6  xul.dll!static mozilla::Viaduct::ViaductCallback(mozilla::`anonymous namespace'::ViaductByteBuffer) [Viaduct.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 46 + 0x167]
[task 2022-01-13T08:13:08.025Z] 08:13:08     INFO -      eip = 0x6c804cd7   esp = 0x2d6ff670   ebp = 0x2d6ff704   ebx = 0x2bdfe0bc
[task 2022-01-13T08:13:08.025Z] 08:13:08     INFO -      esi = 0x2bdfe030   edi = 0x2cf0b400
[task 2022-01-13T08:13:08.026Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.026Z] 08:13:08     INFO -   7  xul.dll!viaduct::backend::ffi::impl$1::send(viaduct::backend::ffi::FfiBackend*, viaduct::Request) [ffi.rs:08d254139525017ceb5f98be59d3a5864adbf331 : 51 + 0x56]
[task 2022-01-13T08:13:08.026Z] 08:13:08     INFO -      eip = 0x6d336825   esp = 0x2d6ff70c   ebp = 0x2d6ff8e4   ebx = 0x00000290
[task 2022-01-13T08:13:08.026Z] 08:13:08     INFO -      esi = 0x000001a0   edi = 0x2d6ff740
[task 2022-01-13T08:13:08.026Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.026Z] 08:13:08     INFO -   8  xul.dll!static viaduct::Request::send(viaduct::Request) [lib.rs:08d254139525017ceb5f98be59d3a5864adbf331 : 84 + 0x1ac]
[task 2022-01-13T08:13:08.027Z] 08:13:08     INFO -      eip = 0x6d33878f   esp = 0x2d6ff8ec   ebp = 0x2d6ff9cc   ebx = 0x00000001
[task 2022-01-13T08:13:08.027Z] 08:13:08     INFO -      esi = 0x2d6ff9c0   edi = 0x2d6ff95c
[task 2022-01-13T08:13:08.027Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.027Z] 08:13:08     INFO -   9  xul.dll!fog_control::init::viaduct_uploader::impl$0::upload(fog_control::init::viaduct_uploader::ViaductUploader*, alloc::string::String, alloc::vec::Vec<u8,alloc::alloc::Global>, alloc::vec::Vec<tuple$<alloc::string::String,alloc::string::String>,alloc::alloc::Global>) [viaduct_uploader.rs:08d254139525017ceb5f98be59d3a5864adbf331 : 25 + 0x729]
[task 2022-01-13T08:13:08.027Z] 08:13:08     INFO -      eip = 0x6d32fa11   esp = 0x2d6ff9d4   ebp = 0x2d6ffc18   ebx = 0x2a3997c0
[task 2022-01-13T08:13:08.028Z] 08:13:08     INFO -      esi = 0x2d6ffa2c   edi = 0x2d6ffb30
[task 2022-01-13T08:13:08.028Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.028Z] 08:13:08     INFO -  10  xul.dll!std::sys_common::backtrace::__rust_begin_short_backtrace<glean::net::impl$0::trigger_upload::closure$0,tuple$<> >(glean::net::impl$0::trigger_upload::closure$0) [backtrace.rs:f1edd0429582dd29cccacaf50fd134b05593bd9c : 123 + 0x5ad]
[task 2022-01-13T08:13:08.028Z] 08:13:08     INFO -      eip = 0x6e879455   esp = 0x2d6ffc20   ebp = 0x2d6ffd78   ebx = 0x2d6ffcd8
[task 2022-01-13T08:13:08.028Z] 08:13:08     INFO -      esi = 0x2d6ffc70   edi = 0x00000104
[task 2022-01-13T08:13:08.029Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.029Z] 08:13:08     INFO -  11  xul.dll!core::ops::function::FnOnce::call_once<std::thread::impl$0::spawn_unchecked::closure$1,tuple$<> >(std::thread::impl$0::spawn_unchecked::closure$1*) [function.rs:f1edd0429582dd29cccacaf50fd134b05593bd9c : 227 + 0x56]
[task 2022-01-13T08:13:08.029Z] 08:13:08     INFO -      eip = 0x6e87cc47   esp = 0x2d6ffd80   ebp = 0x2d6ffd90   ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.029Z] 08:13:08     INFO -      esi = 0x2f683460   edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.029Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.030Z] 08:13:08     INFO -  12  xul.dll!std::sys::windows::thread::impl$0::new::thread_start() [thread.rs:f1edd0429582dd29cccacaf50fd134b05593bd9c : 58 + 0x18]
[task 2022-01-13T08:13:08.030Z] 08:13:08     INFO -      eip = 0x6e07d746   esp = 0x2d6ffd98   ebp = 0x2d6ffdc4   ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.030Z] 08:13:08     INFO -      esi = 0x6e07d6e0   edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.030Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.030Z] 08:13:08     INFO -  13  kernel32.dll!BaseThreadInitThunk + 0x18
[task 2022-01-13T08:13:08.031Z] 08:13:08     INFO -      eip = 0x76d8fa29   esp = 0x2d6ffdcc   ebp = 0x2d6ffdd4   ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.031Z] 08:13:08     INFO -      esi = 0x6e07d6e0   edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.031Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.031Z] 08:13:08     INFO -  14  mozglue.dll!patched_BaseThreadInitThunk(int, void*, void*) [WindowsDllBlocklist.cpp:08d254139525017ceb5f98be59d3a5864adbf331 : 574 + 0xa]
[task 2022-01-13T08:13:08.031Z] 08:13:08     INFO -      eip = 0x71f2b8f8   esp = 0x2d6ffddc   ebp = 0x2d6ffe18   ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.031Z] 08:13:08     INFO -      esi = 0x6e07d6e0   edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.032Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.032Z] 08:13:08     INFO -  15  ntdll.dll!_RtlUserThreadStart + 0x2e
[task 2022-01-13T08:13:08.032Z] 08:13:08     INFO -      eip = 0x77457a9e   esp = 0x2d6ffe20   ebp = 0x2d6ffe74   ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.032Z] 08:13:08     INFO -      esi = 0x76d8fa10   edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.032Z] 08:13:08     INFO -      Found by: call frame info
[task 2022-01-13T08:13:08.032Z] 08:13:08     INFO -  16  ntdll.dll!_RtlUserThreadStart + 0x1a
[task 2022-01-13T08:13:08.033Z] 08:13:08     INFO -      eip = 0x77457a6e   esp = 0x2d6ffe7c   ebp = 0x2d6ffe84   ebx = 0x2e4f2688
[task 2022-01-13T08:13:08.033Z] 08:13:08     INFO -      esi = 0x76d8fa10   edi = 0x6e07d6e0
[task 2022-01-13T08:13:08.033Z] 08:13:08     INFO -      Found by: call frame info
Flags: needinfo?(chutten)

The stack shows that Glean's performing a ping upload. If datareporting.healthreport.uploadEnabled is false, then the only ping the Glean SDK could be uploading is a "deletion-request" ping.

The profile directory is used for storage. Is there any possibility of the profile directory being reused between runs?

(( The solution remains the same: set telemetry.fog.test.localhost_port to -1 and FOG's viaduct_uploader will early return on frame 9... but if not-"deletion-request" pings are being crafted and uploaded in defiance of the data upload pref, that's not a good thing. ))

Flags: needinfo?(chutten)

Thanks Chris!

So I had another look at the logs and actually checked which tests are causing this failure. Interestingly it's always the ame one: test_replace_with_external_profile.

In this test we make use of a profile as generated by mozrunner. And for this newly generated profile we DON'T set the automation preferences. As such Telemetry will be enabled by default.

The reason why it only crashes for this test is that all the other tests actually clone the profile and reusing the current preferences, which include the required automation preferences.

To get it fixed we will have to pass all current preferences to mozrunner when creating this new profile.

Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Summary: Firefox crash with Marionette enabled [@ mozilla::net::Http3Stream::OnReadSegment(char const*, unsigned int, unsigned int*)] → Marionette unit test "test_profile_management.py | test_replace_with_external_profile" doesn't set required preferences

I'll have a patch soon but note that Marionette still seems to cause network requests to external sites. At least that is what I can see when generating a gecko profile: https://share.firefox.dev/3Aeoo7U. I'll take care of this over on bug 1371576.

See Also: → 1371576
Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1168f9e487af [marionette] Fix test_profile_management.py to use required preferences for external profile. r=webdriver-reviewers,jdescottes
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: