Last Comment Bug 782453 - Add site-specific User Agent infrastructure and use it to fix AOL Mail
: Add site-specific User Agent infrastructure and use it to fix AOL Mail
Status: RESOLVED FIXED
: dev-doc-needed
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla18
Assigned To: Dão Gottwald [:dao]
:
Mentors:
Depends on:
Blocks: 778408 786613 787054 788422 797363 901085
  Show dependency treegraph
 
Reported: 2012-08-13 16:11 PDT by Dão Gottwald [:dao]
Modified: 2015-04-02 07:17 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
fixed


Attachments
patch (3.36 KB, patch)
2012-08-13 16:11 PDT, Dão Gottwald [:dao]
gerv: superreview-
Details | Diff | Splinter Review
patch v2 (6.24 KB, patch)
2012-08-13 22:28 PDT, Dão Gottwald [:dao]
no flags Details | Diff | Splinter Review
patch v2 (6.10 KB, patch)
2012-08-13 22:40 PDT, Dão Gottwald [:dao]
gerv: superreview+
Details | Diff | Splinter Review
patch v3 (6.16 KB, patch)
2012-08-14 10:07 PDT, Dão Gottwald [:dao]
no flags Details | Diff | Splinter Review
patch v4 (7.09 KB, patch)
2012-08-15 10:13 PDT, Dão Gottwald [:dao]
bzbarsky: review+
ehsan: review+
Details | Diff | Splinter Review
patch v4 with some comments added (7.53 KB, patch)
2012-08-28 23:01 PDT, Dão Gottwald [:dao]
no flags Details | Diff | Splinter Review
patch v5 (7.69 KB, patch)
2012-08-30 04:42 PDT, Dão Gottwald [:dao]
lukasblakk+bugs: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Dão Gottwald [:dao] 2012-08-13 16:11:49 PDT
Created attachment 651562 [details] [diff] [review]
patch

For maximum flexibility, I think we should do this on a per application basis. Mobile already did something similar for youtube (removed in bug 778561, since youtube has been fixed).

I tried to contact AOL, pointing to bug 778408, but got no response so far. When they react, we can remove the workaround (while keeping the infrastructure for other sites with problems). I don't think we gain much by waiting till the last minute before applying such workarounds.
Comment 1 Gervase Markham [:gerv] 2012-08-13 17:38:43 PDT
Comment on attachment 651562 [details] [diff] [review]
patch

Firstly, I do think there's value in fixing stuff late on, as it allows us to put more pressure on the site if the webmasters can see it doesn't work. But I also agree that every system should have at least one test case, and a site which has been really non-responsive is a good choice.

Secondly, I think we should instead use a pref-based system, e.g.:

general.useragent.override.com.aol = "<insert UA here>"

My Gecko-fu is weak and old, but we used to have a PrefBranch system which would do this sort of thing. The idea would be that you split the UA up into parts, and then if the first part matches a branch, look for first+second, and so on. If you hit a leaf, use that string as the UA. This allows us to put in place and remove overrides for arbitrary DNS scopes without code changes (so we could do it e.g. in an update to the quick-patching addon). 

For bonus points, allow the value to be a s/// regexp which you run on the current UA; so you could either do:

general.useragent.override.com.aol = "Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Firefox/17.0"

or

general.useragent.override.com.aol = "s!Gecko/[\d\.]+!Gecko/20100101!"

Gerv
Comment 2 Dão Gottwald [:dao] 2012-08-13 22:28:52 PDT
Created attachment 651635 [details] [diff] [review]
patch v2
Comment 3 Dão Gottwald [:dao] 2012-08-13 22:34:40 PDT
(In reply to Gervase Markham [:gerv] from comment #1)
> Firstly, I do think there's value in fixing stuff late on, as it allows us
> to put more pressure on the site if the webmasters can see it doesn't work.

There's little value because applying the override doesn't harm the web at large. Doing it late is risky as it reduces manual testing time; a site may have remaining issues that become visible after applying the override.
Comment 4 Dão Gottwald [:dao] 2012-08-13 22:40:13 PDT
Created attachment 651637 [details] [diff] [review]
patch v2

I removed the constraint that a domain must contain a dot.
Comment 5 Gervase Markham [:gerv] 2012-08-14 05:43:32 PDT
Comment on attachment 651637 [details] [diff] [review]
patch v2

This seems much more like it - thank you :-)

I'm not the right guy to be giving this code proper code review, of course.

I picked "!" as the separator without too much thought. Ideally, we'd use "/", but it appears far too often in UA strings. We need something which doesn't, given that we are just using "split". "!" is a common alternative choice. Does anyone have any comments on my choice?

Nit: I'd expect there to be a final "!", to match normal regexp definition syntax, at least in Perl. Then:
  let [search, replace] = val.substring(1).split("!", 2);

Does channel.URI.host return the hostname in Punycode form, for IDNs? I think we need to use a call which does.

gData needs to have a more descriptive name.

How about some tests? :-)

Another thought: we'll need some lightweight policy about when we deploy this mechanism, and for what sites. I don't want the list to grow without bound.

Gerv
Comment 6 Dão Gottwald [:dao] 2012-08-14 06:07:03 PDT
(In reply to Gervase Markham [:gerv] from comment #5)
> I picked "!" as the separator without too much thought. Ideally, we'd use
> "/", but it appears far too often in UA strings. We need something which
> doesn't, given that we are just using "split". "!" is a common alternative
> choice. Does anyone have any comments on my choice?

http://www.useragentstring.com/pages/All/ lists "!" a couple of times but "#" only once. "§" would be another option. We could also use multiple characters.

> Nit: I'd expect there to be a final "!", to match normal regexp definition
> syntax, at least in Perl. Then:
>   let [search, replace] = val.substring(1).split("!", 2);

It's not normal regexp definition syntax, e.g. you can't escape "!". I basically just need a separator. It might even make sense to even drop the leading delimiter.

> Does channel.URI.host return the hostname in Punycode form, for IDNs? I
> think we need to use a call which does.

URI.asciiHost should do the trick.
Comment 7 Gervase Markham [:gerv] 2012-08-14 06:14:07 PDT
§ seems a bit esoteric. I'm happy to stick with !, given that the UAs which use it are utterly irrelevant, but would also accept #.

The question about which delimiters to have is one of those "uncanny valley" questions - if it's not precisely regexp syntax, do you make it as close as possible so people will understand it, or do you make it different to reduce the risk of people hitting the gaps in your coverage? Given the lack of likelihood of people wanting to use or escape !, I'd go for the former, and use !from!to!. 

Gerv
Comment 8 Dão Gottwald [:dao] 2012-08-14 06:24:05 PDT
Any delimiter but / is probably too exotic for it to help most people recognize the first part as a regular expression...
Comment 9 Dão Gottwald [:dao] 2012-08-14 06:36:28 PDT
(In reply to Dão Gottwald [:dao] from comment #6)
> http://www.useragentstring.com/pages/All/ lists "!" a couple of times but
> "#" only once. "§" would be another option. We could also use multiple
> characters.

"|" is also unused.
Comment 10 Gervase Markham [:gerv] 2012-08-14 06:43:31 PDT
But | is a character commonly used in regexps for alternation. If we use it as the separator, it can't be used for that.

Gerv
Comment 11 Dão Gottwald [:dao] 2012-08-14 07:02:50 PDT
True. Another thing that come to my mind with regards to leading and trailing delimiters is that trailing flags are also not supported.
Comment 12 Gervase Markham [:gerv] 2012-08-14 08:22:12 PDT
Yep, good point. OK, let's go for a single separator, of "!".

Gerv
Comment 13 Gervase Markham [:gerv] 2012-08-14 08:26:08 PDT
Hang on, that would make "presence of a ! anywhere" as the regexp marker, differentiating it from a straight UA. Which is a bit obscure. Aargh.

This is serious bikeshedding :-)

Let's stick with what we have now! Ideally, it would detect a stray trailing ! and ignore it, but I think that'll happen anyway. Or you could still switch to:
  let [search, replace] = val.substring(1).split("!", 2);

Gerv
Comment 14 Dão Gottwald [:dao] 2012-08-14 09:22:16 PDT
(In reply to Gervase Markham [:gerv] from comment #13)
> Hang on, that would make "presence of a ! anywhere" as the regexp marker,
> differentiating it from a straight UA. Which is a bit obscure. Aargh.

Think of it as a separator between two pref value parts rather than an explicit regexp marker. (! is a rather unusual separator, though, so I'm leaning towards #.)
Comment 15 Dão Gottwald [:dao] 2012-08-14 10:07:19 PDT
Created attachment 651807 [details] [diff] [review]
patch v3

Not sure how to approach tests here...
Comment 16 Henri Sivonen (:hsivonen) (Not reading bugmail or doing reviews until 2016-09-26) 2012-08-14 23:33:42 PDT
I was thinking of a simple modification to the C++ code that builds the UA string and a global pref for turning off all site-specific hacks (the Safari way). That sort of thing should be enough for desktop (considering that it is for Chrome and Safari), but *maybe* we need a more generic system for mobile if the problem is wider on mobile. Does this patch work for B2G?

(In reply to Gervase Markham [:gerv] from comment #1)
> (so we could do it e.g. in an update to the quick-patching addon). 

How do hotfix add-ons interact with prefs? Do they write prefs or add more fallback values? What ensures that the pref gets cleaned up when no longer needed?
Comment 17 Dão Gottwald [:dao] 2012-08-15 01:31:40 PDT
(In reply to Henri Sivonen (:hsivonen) from comment #16)
> I was thinking of a simple modification to the C++ code that builds the UA
> string

That's not necessarily simple. nsHttpHandler.cpp would need to know about the host...

> That sort of thing should be enough for desktop (considering that it
> is for Chrome and Safari), but *maybe* we need a more generic system for
> mobile if the problem is wider on mobile. Does this patch work for B2G?

b2g/chrome/content/shell.js should be able to invoke the module just like nsBrowserGlue.js is doing in my patch. Prefs would be set in b2g/app/b2g.js.

> (In reply to Gervase Markham [:gerv] from comment #1)
> > (so we could do it e.g. in an update to the quick-patching addon). 
> 
> How do hotfix add-ons interact with prefs? Do they write prefs or add more
> fallback values? What ensures that the pref gets cleaned up when no longer
> needed?

I believe hotfix add-ons could write user values and manually clear them when being uninstalled or add default prefs that would automatically disappear with the add-on. Anyway, I'm pretty sure there are ways to handle this, as modifying prefs was a prime use case for hotfix add-ons; this question is neither specific to this bug nor new.
Comment 18 Gervase Markham [:gerv] 2012-08-15 02:13:45 PDT
Henri is right that we need a global disable switch for these changes. QA people looking to see if sites are fixed will certainly want to set this. And we may actually want to set it permanently in Nightly and Aurora (but that's a decision for another time).

general.useragent.overrides_enabled = true (default)

Gerv
Comment 19 Patrick McManus [:mcmanus] 2012-08-15 05:48:54 PDT
Comment on attachment 651807 [details] [diff] [review]
patch v3

can you make a silly offer of proof that shows this is trivially cheap for hosts that are not aol.com?
Comment 20 Dão Gottwald [:dao] 2012-08-15 06:50:08 PDT
The overhead will depend on the number of overrides due to the need to match the hosts.
Comment 21 Gervase Markham [:gerv] 2012-08-15 06:51:28 PDT
Right. If we start needing a large number of these rules (and I sincerely hope we will not) then we should switch to a hash table rather than string matching. But this is the simple solution if the number of rules is small.

Gerv
Comment 22 Patrick McManus [:mcmanus] 2012-08-15 07:16:05 PDT
what's the overhead of the patch as proposed (with just the aol entry) for something that doesn't match?
Comment 23 Gervase Markham [:gerv] 2012-08-15 07:28:47 PDT
These three function calls:

+  let channel = aSubject.QueryInterface(Ci.nsIHttpChannel);
+  let host = channel.URI.asciiHost;
+  let hostLength = host.length;

plus a string equality check and, in most cases, a lastIndexOf per entry.

Actually, as of Firefox 17, we could use string.endsWith() (bug 772733): 

+    if (host.endsWith(domain) &&
+        (hostLength == domainLength ||
+         host[hostLength - domainLength - 1] == ".")) {
+      channel.setRequestHeader("User-Agent", gOverrides[domain], false);
+      break;
+    }

That should be even faster. Micro-optimization :-)

Gerv
Comment 24 Dão Gottwald [:dao] 2012-08-15 10:13:43 PDT
Created attachment 652133 [details] [diff] [review]
patch v4

- using String.endsWith now
- global pref added
Comment 25 Dão Gottwald [:dao] 2012-08-15 10:48:47 PDT
Comment on attachment 652133 [details] [diff] [review]
patch v4

>+function HTTP_on_modify_request(aSubject, aTopic, aData) {
>+  let channel = aSubject.QueryInterface(Ci.nsIHttpChannel);
>+  let host = channel.URI.asciiHost;
>+
>+  for (let domain in gOverrides) {
>+    if (host == domain ||
>+        host.endsWith("." + domain)) {

I could move "." + domain outside of the loop to reduce the overhead further when there's larger number of overrides...
Comment 26 Patrick McManus [:mcmanus] 2012-08-15 11:02:48 PDT
I was actually more interested in the impact of adding uses of modify-request where that list is often empty. (?) The micro-optimization of string matching isn't really interesting to me.
Comment 27 Dão Gottwald [:dao] 2012-08-15 11:09:44 PDT
I don't see why the mere existence of the observer would be particularly interesting here. There's the usual XPCOM overhead, as in many other places where we run code.

Note that Fennec has been observing http-on-modify-request for some time now.
Comment 28 Gervase Markham [:gerv] 2012-08-27 04:03:10 PDT
bz: polite ping on this review? This mechanism is somewhat in demand...

Gerv
Comment 29 Boris Zbarsky [:bz] (TPAC) 2012-08-27 14:25:03 PDT
Comment on attachment 652133 [details] [diff] [review]
patch v4

r=me on the neckoish bits; I'm not sure I'm competent to review the sBrowserGlue.js stuff, though it looks fine at first glance.
Comment 30 :Ehsan Akhgari 2012-08-27 15:29:15 PDT
Comment on attachment 652133 [details] [diff] [review]
patch v4

Review of attachment 652133 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the below fixed.  But we also need to test this.  One simple testing strategy is to set the site_specific_overrides pref to true in a test, add a pref for an override on example.com, and then load a page from that domain to make sure that the UA has indeed changed.  The test should test both the regex and non-regex case.

::: browser/app/profile/firefox.js
@@ +221,5 @@
>  #else
>  pref("general.autoScroll", true);
>  #endif
>  
> +pref("general.useragent.override.aol.com", "Gecko/[^ ]*#Gecko/20100101");

Please add a comment saying what this does and why it's needed.

::: modules/libpref/src/init/all.js
@@ +22,5 @@
>  pref("keyword.URL", "https://www.google.com/search?ie=UTF-8&oe=utf-8&q=");
>  pref("keyword.enabled", false);
>  pref("general.useragent.locale", "chrome://global/locale/intl.properties");
>  pref("general.useragent.compatMode.firefox", false);
> +pref("general.useragent.site_specific_overrides", true);

I really think that this magic is worth a bit of documentation.  I can't think of a better place for that documentation to live than here.  :/

I'm fine if you prefer the documentation to live on MDN...

::: netwerk/protocol/http/UserAgentOverrides.jsm
@@ +22,5 @@
> +    gPrefBranch.addObserver("", buildOverrides, false);
> +
> +    Services.prefs.addObserver(PREF_OVERRIDES_ENABLED, buildOverrides, false);
> +
> +    Services.obs.addObserver(HTTP_on_modify_request, "http-on-modify-request", false);

Hmm, I'm assuming that hopefully the site_specific_overrides pref would be false most of the time!  :-)  Assuming that, I think we need to restructure this code a bit to avoid adding an http-on-modify-request observer unless that pref is set to true.
Comment 31 Dão Gottwald [:dao] 2012-08-27 15:40:10 PDT
> Hmm, I'm assuming that hopefully the site_specific_overrides pref would be
> false most of the time!  :-)  Assuming that, I think we need to restructure
> this code a bit to avoid adding an http-on-modify-request observer unless
> that pref is set to true.

The pref only exists for testing purposes. If we want no overrides at all, we can just avoid loading the module.
Comment 32 :Ehsan Akhgari 2012-08-27 15:45:02 PDT
(In reply to comment #31)
> > Hmm, I'm assuming that hopefully the site_specific_overrides pref would be
> > false most of the time!  :-)  Assuming that, I think we need to restructure
> > this code a bit to avoid adding an http-on-modify-request observer unless
> > that pref is set to true.
> 
> The pref only exists for testing purposes. If we want no overrides at all, we
> can just avoid loading the module.

That sounds fine to me.
Comment 33 Dão Gottwald [:dao] 2012-08-28 22:58:51 PDT
> ::: modules/libpref/src/init/all.js
> @@ +22,5 @@
> >  pref("keyword.URL", "https://www.google.com/search?ie=UTF-8&oe=utf-8&q=");
> >  pref("keyword.enabled", false);
> >  pref("general.useragent.locale", "chrome://global/locale/intl.properties");
> >  pref("general.useragent.compatMode.firefox", false);
> > +pref("general.useragent.site_specific_overrides", true);
> 
> I really think that this magic is worth a bit of documentation.  I can't
> think of a better place for that documentation to live than here.  :/
> 
> I'm fine if you prefer the documentation to live on MDN...

I don't think all.js is the right place for that documentation, as that pref exists for testing only and the actual overrides would be set elsewhere, e.g. firefox.js.
Comment 34 Dão Gottwald [:dao] 2012-08-28 23:01:37 PDT
Created attachment 656347 [details] [diff] [review]
patch v4 with some comments added
Comment 35 Dão Gottwald [:dao] 2012-08-28 23:52:02 PDT
https://hg.mozilla.org/mozilla-central/rev/418955e7c3a9
Comment 36 Dão Gottwald [:dao] 2012-08-29 02:19:16 PDT
Comment on attachment 656347 [details] [diff] [review]
patch v4 with some comments added

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 588909
User impact if declined: bug 778408
Testing completed (on m-c, etc.): manual on m-c
Risk to taking this patch (and alternatives if risky): low, no fallout expected
String or UUID changes made by this patch: none
Comment 37 Ed Morley [:emorley] 2012-08-29 06:10:29 PDT
Backed out for (weird) Windows xpcshell failures:
https://hg.mozilla.org/mozilla-central/rev/c72b90b1c8d2
Comment 38 Dão Gottwald [:dao] 2012-08-30 04:42:40 PDT
Created attachment 656817 [details] [diff] [review]
patch v5

(In reply to Ed Morley [:edmorley] from comment #37)
> Backed out for (weird) Windows xpcshell failures:

I still don't know what's wrong with those Places xpcshell tests, nor do I understand why they fail on 32-bit Win7 debug only, but protecting UserAgentOverrides against init being called twice or uninit being called before init seems to avoid the orange: https://tbpl.mozilla.org/?tree=Try&rev=3337b15fc775
Comment 40 Ryan VanderMeulen [:RyanVM] 2012-08-30 19:04:22 PDT
https://hg.mozilla.org/mozilla-central/rev/19a91d0fd50b
Comment 41 Dão Gottwald [:dao] 2012-08-31 07:51:41 PDT
Comment on attachment 656817 [details] [diff] [review]
patch v5

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 588909
User impact if declined: bug 778408
Testing completed (on m-c, etc.): manual on m-c
Risk to taking this patch (and alternatives if risky): low, no fallout expected
String or UUID changes made by this patch: none

Note You need to log in before you can comment on or make changes to this bug.