Remove the main/public-suffix-list collection
Categories
(Firefox :: Remote Settings Client, task)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox136 | --- | fixed |
People
(Reporter: jewilde, Assigned: leplatrem, Mentored)
References
Details
(Keywords: good-first-bug)
Attachments
(1 file)
The public suffix list doesn't seem to be updated via Remote Settings any longer. As the PSL is a sensitive area can the collection itself be deleted and the corresponding code for updating via Remote Settings be removed from tree?
Comment 1•3 years ago
|
||
In particular, it seems like we tried updating it via Remote Settings and found that it would break things and that we probably wouldn't try to fix it. Even if we leave most of the code in the browser, it would be great to disconnect it in a surgical way so that should the bucket ever be updated it won't cause any effect.
Comment 2•1 year ago
|
||
So should this code and the remote collection be removed? It looks like we're still running it on every startup (though on idle)...
Comment 3•1 year ago
|
||
IMO, yes, but someone from networking should confirm.
Yes, this isn't being used so I think it can be removed along with PublicSuffixList.sys.mjs and code that uses it. Also the two tests
Comment 5•1 year ago
|
||
comment 4 is a pretty good starting point for doing this work.
| Assignee | ||
Comment 6•1 year ago
|
||
Updated•1 year ago
|
Comment 9•1 year ago
•
|
||
Backed out for causing failures at nsEffectiveTLDService.cpp.
Backout link: https://hg.mozilla.org/integration/autoland/rev/460cd2e6a8e27cec9001282ce1899bed2d487858
Failure log:
https://treeherder.mozilla.org/logviewer?job_id=490595113&repo=autoland&lineNumber=9226
https://treeherder.mozilla.org/logviewer?job_id=490591144&repo=autoland&lineNumber=4832
| Assignee | ||
Comment 10•1 year ago
|
||
This is frustrating.
I ran the tests with ./mach try auto and the tests passed:
https://treeherder.mozilla.org/push-health/push?repo=try&revision=21caa964d12fec4b1376222b8856a6bd625a5364&tab=tests&testGroup=pr
I obviously don't reproduce this failure locally:
$ ./mach test netwerk/test/browser/browser_bug1535877.js
...
Ran 2 checks (1 subtests, 1 tests)
Expected results: 2
Unexpected results: 0
OK
The stacktrace is relatively clear. nsEffectiveTLDService::GetInstance() was called, and nsEffectiveTLDService::Init() wasn't.
[task 2025-01-17T12:59:05.357Z] 12:59:05 INFO - Initializing stack-fixing for the first stack frame, this may take a while...
[task 2025-01-17T12:59:18.277Z] 12:59:18 INFO - GECKO(5906) | #01: nsEffectiveTLDService::GetInstance() [netwerk/dns/nsEffectiveTLDService.cpp:76]
[task 2025-01-17T12:59:18.278Z] 12:59:18 INFO - GECKO(5906) | #02: mozilla::nsRFPService::GetOverriddenFingerprintingSettingsForURI(nsIURI*, nsIURI*) [toolkit/components/resistfingerprinting/nsRFPService.cpp:2284]
[task 2025-01-17T12:59:18.278Z] 12:59:18 INFO - GECKO(5906) | #03: mozilla::nsRFPService::GetOverriddenFingerprintingSettingsForChannel(nsIChannel*) [toolkit/components/resistfingerprinting/nsRFPService.cpp:2156]
[task 2025-01-17T12:59:18.278Z] 12:59:18 INFO - GECKO(5906) | #04: mozilla::AntiTrackingUtils::UpdateAntiTrackingInfoForChannel(nsIChannel*) [toolkit/components/antitracking/AntiTrackingUtils.cpp:1117]
[task 2025-01-17T12:59:18.279Z] 12:59:18 INFO - GECKO(5906) | #05: nsBaseChannel::AsyncOpen(nsIStreamListener*) [netwerk/base/nsBaseChannel.cpp:684]
...
- the removed code in
PublicSuffixList.sys.mjsdid not send the"public-suffix-list-updated"signal from its init (from browserglue) - the removed observer code in
nsEffectiveTLDServicedid not callnsEffectiveTLDService::Init()either
Could someone with more experience explain me how my patch could have an influence on the order of calls between nsEffectiveTLDService::Init() and nsEffectiveTLDService::GetInstance(), or how it could have introduced a race-condition please?
Gijs, I'm sorry to bother you with this, but I may have to call your super powers to the rescue 😬
Thank you 🙏
Comment 11•1 year ago
|
||
If you replace this code with
nsCOMPtr<nsIEffectiveTLDService> tldService =
do_GetService(NS_EFFECTIVETLDSERVICE_CONTRACTID);
Does that fix things? I'm guessing here, but my theory is that nsEffectiveTLDService::Init() was getting called as a result of a do_GetService(NS_EFFECTIVETLDSERVICE_CONTRACTID); call somewhere as a result of the code removing, and that took care of Init which seems to only be called when you use do_GetService. But I'm only making educated guesses about all of that.
Comment 13•1 year ago
•
|
||
I think Tom is probably right in terms of that working... or at least I hope he is.
I'm actually a bit confused about how this is all supposed to work. Ehsan created this helper in https://phabricator.services.mozilla.com/D20239 and friends, and Nika reviewed. But at the time, the internals of GetInstance called the do_getService helper, which if the service doesn't exist will call CreateInstance which in turn will call Init() (see generated code). Then later changes made the internals of GetInstance use tldService = mozilla::components::EffectiveTLD::Service() instead. I would expect that to be doing the same as the do_getService call but apparently it's not? Basically I don't understand. Nika reviewed Ehsan's patch and presumably understands all the goop and where the goop is currently not working...
Comment 14•1 year ago
•
|
||
Actually... a bunch of the crashes are from this really strange test: https://searchfox.org/mozilla-central/source/netwerk/test/browser/browser_bug1535877.js which tries to manually create the service. In a browser test. I would not be surprised if right now, the fact that the PSL sys.mjs module creates the effective tld service makes this work, and afterwards, nothing else bothers creating the service, and so that createInstance call is the first, or something?
Anyway, that test is weird, and probably not helping us.
However, there are also other crashes, e.g. https://treeherder.mozilla.org/logviewer?job_id=490592156&repo=autoland&lineNumber=19553 . And I don't know what is going on there - it suggests the DAFSA is broken. Though, this is the test after the dodgy test above. Maaaaybe the createInstance thing from the test created the service, and then we GC'd it because nothing hangs on to the result of createInstance, and then because gService is a raw pointer that just points to garbage and we crash trying to use it?
I wonder if the whole thing would go away if you add a line to the dodgy test that simply does let foo = Services.eTLD; before the weirdo createInstance call (to ensure the service has been created "properly" before we try to do weird things like manually calling createInstance). It certainly seems worth a shot.
Leaving needinfo for Nika who can hopefully see a way through all the wild theories here. :-)
Comment 15•1 year ago
|
||
(In reply to :Gijs (he/him) from comment #14)
Actually... a bunch of the crashes are from this really strange test: https://searchfox.org/mozilla-central/source/netwerk/test/browser/browser_bug1535877.js which tries to manually create the service. In a
browsertest. I would not be surprised if right now, the fact that the PSLsys.mjsmodule creates the effective tld service makes this work, and afterwards, nothing else bothers creating the service, and so thatcreateInstancecall is the first, or something?
That createInstance call does look quite suspicious. As mentioned in the original bug 1535877, this is a service and should not be createInstance-ed. In the patch in that bug some code to return a runtime error was added, though it appears that both ehsan and I missed that the call would lead to this kind of breakage if the createInstance call is not the first call to create the eTLD service.
My guess is that the temporary object is being created, and quickly destroyed due to the error. This hits this line in the destructor which is causing the static to be cleared.
This then becomes a problem if some code tries to call nsEffectiveTLDService::GetInstance again (specifically that method) after the broken createInstance call. I'm not entirely certain why that wouldn't have been happening before these changes though, as if my reading of the bug are correct, the service would need to nowadays be accessed both before & after that test (with after being with GetInstance), and previously only accessed either before or after, which seems unlikely.
The smallest "fix" here would probably be to change this line to instead read:
if (gService == this) {
gService = nullptr;
}
Not the most elegant solution, but might be worth trying out that change to see if it fixes the issue, which would suggest that I'm correct about what's going wrong.
However, there are also other crashes, e.g. https://treeherder.mozilla.org/logviewer?job_id=490592156&repo=autoland&lineNumber=19553 . And I don't know what is going on there - it suggests the DAFSA is broken. Though, this is the test after the dodgy test above. Maaaaybe the
createInstancething from the test created the service, and then we GC'd it because nothing hangs on to the result ofcreateInstance, and then becausegServiceis a raw pointer that just points to garbage and we crash trying to use it?
A use of the eTLD service after the broken createInstance call definitely seems like it would quite possibly fail.
According to this:
https://searchfox.org/mozilla-central/rev/9a66d18cb35595c89f499a1011c9dd7e573fce77/build/docs/defining-xpcom-components.rst#146-149
``singleton`` (optional, default=``False``)
If true, this component's constructor is expected to return the same
singleton for every call, and no ``mozilla::components::<name>::Create()``
method will be generated for it.
But the component definition just lists the constructor - which definitely won't return the same object for every call.
'singleton': True,
'type': 'nsEffectiveTLDService',
'headers': ['/netwerk/dns/nsEffectiveTLDService.h'],
'init_method': 'Init',
Comment 17•1 year ago
|
||
Comment 18•1 year ago
|
||
| bugherder | ||
| Assignee | ||
Comment 19•1 year ago
|
||
Thank you everyone for your inputs, that helped a lot!
The fix that Nika suggested worked as expected.
But based on what you shared here, it seems relevant to refactor the instantiation code around this service, so I just opened https://bugzilla.mozilla.org/show_bug.cgi?id=1943262
Description
•