1330454 - Implement baseDomain extraction in Fennec

Reporter

Description

•

9 years ago

We currently don't have a full-featured domain extraction implementation in Fennec. We want to be able to get the actual site domain, i.e. things like: mozilla.org -> mozilla.org www.mozilla.org -> mozilla.org www.mobile.mozilla.org -> mozilla.org etc. We have a basic implementation that strips www/mobile, but that's incomplete, and ideally we want to always get the base domain: https://dxr.mozilla.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/StringUtils.java#129 Gecko does have an implementation, but that's not too useful for UI and DB code: https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsIEffectiveTLDService#getBaseDomain() It seems simplest to port the iOS implementation of host parsing: https://github.com/mozilla-mobile/firefox-ios/blob/f65be4372e20d284aa1fc340159597db97194f8f/Utils/Extensions/NSURLExtensions.swift#L297 We already ship the required TLD lists, which are parsed in PublicSuffixPatterns.

Andrzej Hunt :ahunt

Reporter

Comment 1

•

9 years ago

I initially want this for the new domains table, see e.g.: https://bugzilla.mozilla.org/show_bug.cgi?id=1319485#c17 It would be good to replace existing uses of StringUtils.stripCommonSubdomains() with this new implementation, after some careful testing, but I'll leave that for another bug. I wonder if this would need overrides though, e.g. users might expect mail.google.com and google.com to be distinct.

Andrzej Hunt :ahunt

Reporter

Comment 2

•

9 years ago

On second thought, we might not need this: https://bugzilla.mozilla.org/show_bug.cgi?id=1319485#c22 I'll reopen this bug if needed, but its superfluous for now.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX

Sebastian Kaspari (:sebastian; :pocmo)

Comment 3

•

9 years ago

You might be looking for this: https://dxr.mozilla.org/mozilla-central/source/mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/publicsuffix/PublicSuffix.java#18

Comment hidden (mozreview-request)

Andrzej Hunt :ahunt

Reporter

Comment 6

•

9 years ago

I've pushed my WIP patches just in case we need them, we might need this code in Bug 1319485, depending on how we want the domains table to look.

Andrzej Hunt :ahunt

Reporter

Updated

•

9 years ago

Assignee: nobody → ahunt

Status: RESOLVED → REOPENED

Iteration: --- → 1.14

Priority: -- → P1

Resolution: WONTFIX → ---

Whiteboard: [MobileAS]

Comment hidden (mozreview-request)

Andrzej Hunt :ahunt

Reporter

Updated

•

9 years ago

Attachment #8826697 - Flags: review?(s.kaspari)

Attachment #8826699 - Flags: review?(s.kaspari)

Attachment #8832246 - Flags: review?(s.kaspari)

Sebastian Kaspari (:sebastian; :pocmo)

Comment 13

•

9 years ago

mozreview-review

Comment on attachment 8826697 [details] Bug 1330454 - Introduce DomainProcessor to split domains https://reviewboard.mozilla.org/r/104596/#review110232 ::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/publicsuffix/DomainProcessor.java:125 (Diff revision 3) > + public static DomainProcessor fromURL(final Context context, final String url) { > + String host; > + try { > + host = new URL(url).getHost(); > + } catch (MalformedURLException e) { > + host = url; > + } > + > + return new DomainProcessor(context, host); > + } > + > + public static DomainProcessor fromHost(final Context context, final String host) { > + return new DomainProcessor(context, host); > + } Using those methods will later access PublicSuffixPatterns.EXACT and we will read a file from disk (once). This can take some time and potentially block the UI thread. So either transform this into callback based methods or make sure to add annotations and an assetion using ThreadUtils.

Attachment #8826697 - Flags: review?(s.kaspari) → review+

Sebastian Kaspari (:sebastian; :pocmo)

Comment 14

•

9 years ago

mozreview-review

Comment on attachment 8826699 [details] Bug 1330454 - Migrate domain prefix/suffix stripping to use DomainProcessor https://reviewboard.mozilla.org/r/104598/#review110234

Attachment #8826699 - Flags: review?(s.kaspari) → review+

Sebastian Kaspari (:sebastian; :pocmo)

Comment 15

•

9 years ago

mozreview-review

Comment on attachment 8832246 [details] Bug 1330454 - Remove deprecated subdomain/suffix stripping code https://reviewboard.mozilla.org/r/108582/#review110236

Attachment #8832246 - Flags: review?(s.kaspari) → review+

Sebastian Kaspari (:sebastian; :pocmo)

Comment 16

•

9 years ago

mozreview-review

Comment on attachment 8826697 [details] Bug 1330454 - Introduce DomainProcessor to split domains https://reviewboard.mozilla.org/r/104596/#review110238 ::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/util/publicsuffix/DomainProcessor.java:55 (Diff revision 3) > + while (i < prefixes.length) { > + final String prefix = prefixes[i]; > + > + if (host.regionMatches(true, offset, prefix, 0, prefix.length())) { > + offset += prefix.length(); > + // reset the iterator, as described above > + i = 0; > + } else { > + i++; > + } > + } I guess whenever we end up with a public suffix + one segment then we should stop too: m.www.mobile.com -> stop at mobile.com However this would make this a bit more expensive and shouldn't be a synchronous call anymore.

Sebastian Kaspari (:sebastian; :pocmo)

Comment 17

•

9 years ago

mozreview-review

Comment on attachment 8832247 [details] Bug 1330454 - Pre: move stripCommonSubdomains into (package-private) DomainProcessorUtils https://reviewboard.mozilla.org/r/108584/#review110240

Attachment #8832247 - Flags: review+

Comment hidden (mozreview-request)

Andrzej Hunt :ahunt

Reporter

Comment 24

•

8 years ago

I decided to make DomainProcessor calls background-thread only. Some of the clients already use a background thread, and callbacks don't really work well for some of the other locations (e.g. it would be hard to rewrite BrowserSearch to use callbacks, but pushing all of the BrowserSearch code into a background thread does work well). That required some rework of the homepanel context menu - but overall that seems to result in a slightly better UI, so this is probably useful overall. (Additional patches incoming via reviewboard - I'll try to squash them down after review.)

Andrzej Hunt :ahunt

Reporter

Updated

•

8 years ago

Attachment #8834147 - Flags: review?(s.kaspari)

Jenn Chaulk (:jchaulk)

Updated

•

8 years ago

Iteration: 1.14 → 1.15

Sebastian Kaspari (:sebastian; :pocmo)

Comment 25

•

8 years ago

mozreview-review

Comment on attachment 8834147 [details] Bug 1330454 - Allow immediate PublicSuffix access https://reviewboard.mozilla.org/r/110188/#review112024 Mh. This adds some AsyncTasks and runs code that we normally wouldn't need here. I wonder if we should step back and re-think whether the DomainProcessor is the right approach. Maybe we want to be more specific and not split a domain in all its parts all the time?

Andrzej Hunt :ahunt

Reporter

Comment 26

•

8 years ago

Yeah, I've been wondering that too. I had to add async tasks to: - HomePanels (i.e. TwoLinePageRow): We only use the stripped domain as a fallback if an item doesn't have a title. This should very rarely be the case. Moreover, TwoLinePageRow didn't do any stripping - stripping was only ever performed in the context menu. We could easily completely remove any domain stripping here, without significant impact. - Browser Toolbar: I think it's worth doing this here, since it's extremely user visible, so having the best possible String to show seems worth it. Doing this doesn't make things significantly more complicated. (But we could revert back to the simpler stripping too.) - BrowserSearch autocomplete: this is more questionable, but I definitely want to have the DomainProcessor in place for our DB usage, since that is essentially set in stone once the migration is run (and I'd prefer it to be completely correct). I think I still prefer having a single authoritative implementation to minimise confusion. One option I considered was having DomainProcessor.processImmediate() method that can be run on the UI thread: if data isn't loaded yet, we just return null (and the caller can just use the raw URL) - if data is loaded we don't need the background thread anymore. This would probably be suitable for all the above cases (and we just need to make sure we try to load the suffix list *eventually* - we could even just post a background runnable from ProcessImmediate if data isn't loaded yet, so we'd hopefully have it ready for the next call). This makes DomainProcessor more complicated, but it might be worth it for simpler UI code?

Andrzej Hunt :ahunt

Reporter

Comment 27

•

8 years ago

(In reply to Andrzej Hunt :ahunt from comment #26) > - BrowserSearch autocomplete: this is more questionable, but Continuing where I forgot to finish my sentence: BrowserSearch autocomplete is less immediately visible, and also has less user impact (since most sites don't need the better domain stripping), so is also a candidate for reverting to the simpler stripCommonSubdomains().

Comment hidden (mozreview-request)

Andrzej Hunt :ahunt

Reporter

Updated

•

8 years ago

Attachment #8834148 - Attachment is obsolete: true

Attachment #8834148 - Flags: review?(s.kaspari)

Andrzej Hunt :ahunt

Reporter

Comment 33

•

8 years ago

I've tried a slightly different approach to the AsyncTask issue - I'm not convinced it's ideal either (and it makes DomainProcessor a bit more complicated), but at least the frontend code stays simple. (I still need to double-check my patches, consider them draft quality.)

Sebastian Kaspari (:sebastian; :pocmo)

Comment 34

•

8 years ago

(In reply to Andrzej Hunt :ahunt from comment #26) > I definitely want to have the DomainProcessor in place for our DB usage, > since that is essentially set in stone once the migration is run (and I'd > prefer it to be completely correct). I think I still prefer having a single > authoritative implementation to minimise confusion. Yeah, the DomainProcessor itself isn't a problem. I was more concerned about every action needing to go through it - even if they only want to strip some things from the URL and not split it into all components. And then adding the async step in ToolbarDisplayLayout is a bit scary: Can this introduce race conditions? Every time we change the behavior here there's a chance someone opens a new spoofing security bug. :) (This reminds me of bug 1286178). (In reply to Andrzej Hunt :ahunt from comment #33) > I've tried a slightly different approach to the AsyncTask issue - I'm not > convinced it's ideal either (and it makes DomainProcessor a bit more > complicated), but at least the frontend code stays simple. (I still need to > double-check my patches, consider them draft quality.) Nice. I'll clear the review flag and then just reflag me as soon as you are happy. Btw. I'm just looking at ToolbarDisplayLayout and as we have the splitting behind a pref that is disabled by default anyways - maybe we can remove that if this makes the code here easier.

Sebastian Kaspari (:sebastian; :pocmo)

Updated

•

8 years ago

Attachment #8834147 - Flags: review?(s.kaspari)

Comment hidden (mozreview-request)

Andrzej Hunt :ahunt

Reporter

Comment 40

•

8 years ago

Comment on attachment 8826699 [details] Bug 1330454 - Migrate domain prefix/suffix stripping to use DomainProcessor I've tried a new approach which uses the prefix lists *if* loaded, and otherwise falls back to the old string stripping - thoughts? I'm not sure if this is still far too complicated, we could just use the old method anywhere that uses the UI thread. The new implementation for this case is in: "String stripCommonSubdomainsFromURL(" in DomainProcessor.java

Attachment #8826699 - Flags: feedback?(s.kaspari)

Andrzej Hunt :ahunt

Reporter

Updated

•

8 years ago

Depends on: 1340662

Sebastian Kaspari (:sebastian; :pocmo)

Comment 41

•

8 years ago

mozreview-review

Comment on attachment 8832247 [details] Bug 1330454 - Pre: move stripCommonSubdomains into (package-private) DomainProcessorUtils https://reviewboard.mozilla.org/r/108584/#review111560

Attachment #8832247 - Flags: review+

Sebastian Kaspari (:sebastian; :pocmo)

Comment 42

•

8 years ago

mozreview-review

Comment on attachment 8834147 [details] Bug 1330454 - Allow immediate PublicSuffix access https://reviewboard.mozilla.org/r/110188/#review116076

Attachment #8834147 - Flags: review?(s.kaspari) → review+

Sebastian Kaspari (:sebastian; :pocmo)

Comment 43

•

8 years ago

(In reply to Andrzej Hunt :ahunt from comment #40) > I've tried a new approach which uses the prefix lists *if* loaded, and > otherwise falls back to the old string stripping - thoughts? I'm not sure if > this is still far too complicated, we could just use the old method anywhere > that uses the UI thread. This looks good, I think! :)

Sebastian Kaspari (:sebastian; :pocmo)

Updated

•

8 years ago

Attachment #8826699 - Flags: feedback?(s.kaspari) → feedback+

Jenn Chaulk (:jchaulk)

Updated

•

8 years ago

Iteration: 1.15 → 1.16

Andrzej Hunt :ahunt

Reporter

Updated

•

8 years ago

Iteration: 1.16 → ---

Priority: P1 → P2

Sebastian Kaspari (:sebastian; :pocmo)

Updated

•

8 years ago

Blocks: as-android-notyet

Jenn Chaulk (:jchaulk)

Updated

•

8 years ago

Assignee: andrzej → nobody

Jenn Chaulk (:jchaulk)

Updated

•

8 years ago

Priority: P2 → P3

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Comment 44

•

8 years ago

I did this at some point: URIUtils.getFormattedDomain(context, uri, /* include public suffix */ true, /* subdomainCount */ 0); http://searchfox.org/mozilla-central/rev/f6dc0e40b51a37c34e1683865395e72e7fca592c/mobile/android/base/java/org/mozilla/gecko/util/URIUtils.java#88

Status: REOPENED → RESOLVED

Closed: 9 years ago → 8 years ago

Resolution: --- → WORKSFORME

Nobody; OK to take it and work on it

Assignee

Updated

•

5 years ago

Product: Firefox for Android → Firefox for Android Graveyard

Bug 1330454 - Introduce DomainProcessor to split domains 9 years ago Andrzej Hunt :ahunt 59 bytes, text/x-review-board-request	sebastian : review+	Details
Bug 1330454 - Migrate domain prefix/suffix stripping to use DomainProcessor 9 years ago Andrzej Hunt :ahunt 59 bytes, text/x-review-board-request	sebastian : review+ sebastian : feedback+	Details
Bug 1330454 - Remove deprecated subdomain/suffix stripping code 9 years ago Andrzej Hunt :ahunt 59 bytes, text/x-review-board-request	sebastian : review+	Details
Bug 1330454 - Pre: move stripCommonSubdomains into (package-private) DomainProcessorUtils 9 years ago Andrzej Hunt :ahunt 59 bytes, text/x-review-board-request	ahunt : review? ahunt sebastian : review+	Details
Bug 1330454 - Allow immediate PublicSuffix access 8 years ago Andrzej Hunt :ahunt 59 bytes, text/x-review-board-request	sebastian : review+	Details
Bug 1330454 - Make domain-stripping in homepanels thread-safe 8 years ago Andrzej Hunt :ahunt 59 bytes, text/x-review-board-request		Details