Copy Clean Link: Support wildcard subdomain
Categories
(Core :: Privacy: Anti-Tracking, enhancement, P3)
Tracking
()
People
(Reporter: manuel, Unassigned)
References
(Blocks 2 open bugs)
Details
For bugs like Bug 1921130, we need to apply the rule for all subdomains. Those subdomains can't be enumerated.
Reporter | ||
Updated•5 months ago
|
Comment 1•5 months ago
|
||
Hi I would like to work on this bug. Could I get some pointers towards what I need to do, and what file(s) to look at? Thanks!
Reporter | ||
Comment 2•5 months ago
|
||
Hi, sorry for not getting back to you earlier. The reason for the delayed response, is that there is no easy answer. The answer depends on how much effort you want to spend. I'd like to switch over to doing "Copy Clean Link" in Rust. And in the end game use the uBlock origin syntax for URL cleaning (maybe using adblock-crate). This would also support wildcard subdomains and much more.
(1) If you'd be up for the challenge, you could pick up my rewrite in Rust in Bug 1877418 and complete it. Then replace the underlying engine with the filterlist syntax for a much more standardized and flexible approach.
(2) However, you could also implement the matching in the current C++ implementation. Add an entry for "*.hostname.tdl" and look it up in URLQueryStringStripper::ShouldStripParam function.
Doing (2) is probably a better idea. But if you want to go the route for (1) I'd be happy too :)
Comment 3•4 months ago
|
||
All good! I’d definitely be interested in taking up the challenge — it sounds like a great opportunity to learn a lot even if I eventually need to pivot back to (2).
Just to confirm my understanding: to complete (1), URLQueryStringStripper.cpp would need to be completely translated into Rust by continuing the work started in lib.rs from Bug 1877418? Then to complete this bug specifically, I'd adapt the Rust equivalent method for URLQueryStringStripper::ShouldStripParam from using hardcoded values to a more flexible filterlist method, possibly through adblock-crate methods or a similar filterlist-based approach?
Would you be able to confirm if that scope sounds about right? Thanks!
Reporter | ||
Comment 4•4 months ago
•
|
||
URLQueryStringStripper.cpp would need to be completely translated into Rust by continuing the work started in lib.rs
I think only one function within that file is to be translated to Rust. All other functions are used by some other query stripping service 😅 (See Bug 1706602 and Bug 1773983, we don't want to touch them in this bug). And I'd be fine to land the patch that has the query stripping in Rust without additional functionality first (to reduce the scope for the first patch).
I think one roadblocker I had with the current patch, was that I couldn't differentiate between whether query parameter would have a =
sign or not when iterating through all query parameters. Therefore, I couldn't pass all tests to not modify unrelated parameters included in the patch. I was gonna open a bug in the URL crate for that, but didn't get to the point yet.
Then to complete this bug specifically, I'd adapt the Rust equivalent method for URLQueryStringStripper::ShouldStripParam from using hardcoded values to a more flexible filterlist method
Yes, and this could be done in a separate bug and patch. When your patch for the Rust rewrite starts shaping up, you can open a bug for the rust rewrite to get that landed first.
possibly through adblock-crate methods or a similar filterlist-based approach?
Yeah, I think that would be the way to go. Personally the adblock-crate looks like the crate providing the functionality we need, but I have just read the README and nothing more, so I can't tell for sure and if you find something else you'd prefer, we can also take a look at that.
Would you be able to confirm if that scope sounds about right?
I think you roughly got the scope. I think it is a totally doable project. I'd suggest you take on the current patch from Bug 1877418 and continue with that, because it already includes the step of introducing a new Rust library within gecko.
Comment 5•4 months ago
|
||
Much apologies for the slow response, final exams rapidly picked up.
I think only one function within that file is to be translated to Rust.
Right, that does make more sense! I definitely would like to give it a shot then.
I think one roadblocker I had with the current patch, was that I couldn't differentiate between whether query parameter would have a = sign or not when iterating through all query parameters.
Just to clarify this roadblock, when using the URL crate to parse a url, if the link contains a parameter "x=," it becomes flattened to "x" (or maybe vice versa)? Also before a working solution is accepted, would the roadblock be required to be handled? By that, is/would handling it as a manual edge case be permissible (if it's a reasonable thing to do at all that is) or would it require waiting for an update to URL crate.
Before diving in, I'm still a bit hazy on the specifics. So to be clear with you on the plan of action:
-
Pull your patch implementation from Bug1877418 as a base, translate ShouldStripParam/the relevant function from C++ to Rust.
-
Inform you when I'm ready to push a patch. After you land the patch with the Rust query stripping, create a new bug request and push my patch.
-
If 2) is accepted, come back to this bug, where the details of how the filterlist should be handled can be determined (deeper dive into adblock-crate or alternatives). Meaning for now it's fine to leave looking into adblock-crate for later?
Thanks!
Reporter | ||
Comment 6•4 months ago
|
||
Much apologies for the slow response
No worries. There is no time pressure, take your time.
Just to clarify this roadblock, when using the URL crate to parse a url, if the link contains a parameter "x=," it becomes flattened to "x" (or maybe vice versa)?
Excaclty, iirc the vice versa case. "x" becomes "x=".
Also before a working solution is accepted, would the roadblock be required to be handled?
No, not a roadblocker. I just tested and it is also a current behavior. I've filed Bug 1960853 to document the behavior. Doesn't need to be addressed at all from you. Addressing it in the URL crate would be much welcome, but can be a different project.
Before diving in, I'm still a bit hazy on the specifics. So to be clear with you on the plan of action:
- Pull your patch implementation from Bug1877418 as a base, translate ShouldStripParam/the relevant function from C++ to Rust.
Yes, basically comandeer my revision and complete it. I think it already translated the full function to Rust. You'd need to make sure to pass existing tests and also mark the new tests in the patch testing for behavior in Bug 1960853 as todo. Now that I know that Bug 1877418 is already the current behavior, I could also complete it if you prefer that?
- Inform you when I'm ready to push a patch. After you land the patch with the Rust query stripping, create a new bug request and push my patch.
Sounds good. I think you can modify the patch attached in Bug 1877418 within that bug. Feel free to reach out on matrix #anti-tracking:mozilla.org
or here in the bug any time.
If 2) is accepted, come back to this bug, where the details of how the filterlist should be handled can be determined (deeper dive into adblock-crate or alternatives). Meaning for now it's fine to leave looking into adblock-crate for later?
Sounds great!
Comment 7•4 months ago
|
||
Now that I know that Bug 1877418 is already the current behavior, I could also complete it if you prefer that?
Yeah, actually I'd be much obliged! If I'm understanding correctly — if you complete the patch, that would contain the scope on my end to purely replacing the core query stripping logic inside strip_inner — presumably by swapping out the “drop query parameters” section for a filterlist-based method?
If that's the case, I'd be perfectly happy to shift my focus to researching adblock-crate and other potential filterlist implementations. Let me know if that sounds good!
Description
•