Closed Bug 1127418 Opened 9 years ago Closed 9 years ago

Telemetry for overlapping HTTP and HTTPS logins

Categories

(Toolkit :: Password Manager, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: Paolo, Assigned: Paolo)

References

Details

User Story

*** HIGH LEVEL QUESTIONS ANSWERED ***

- How many sites are using HTTPS login forms, and in which mode exactly?
- Does our password manager do a good job in moving the needle for HTTPS adoption?
- Are we storing duplicate entries for HTTP and HTTPS? Is it accidental or intentional?
- Do users have multiple usernames for a site, or just multiple logins with the same username?
- Do we do a good job in avoiding duplicate entries where usernames differ unintentionally?
- How many entries do we have to deal with in a de-duplication scenario?

*** DATABASE STRUCTURE AS OF JANUARY 2015 ***

The logins database stores form-based logins using a key based on:
 - The origin of the page where the login form is located.
 - The origin of the URL to which the data is submitted (form target).
   This can be an empty string for some very old logins.
 - The username, where an empty string is a valid and distinct username.

The origins are in URL format ("scheme:host:port"), with port optional.

The database key is thus referred to as "S:H:P s:h:p U" below.

This means, for example, that the database can store two different entries where the origins and usernames are equal, except that there is a different scheme in the page origin. If the three are equal, there can be only one entry.

This does not include HTTP authentication logins, that are stored with a different system of keys.

*** GENERAL GOALS ***

We'd like to investigate a model where we can be more flexible and store less entries for similar origins where appropriate, for example to help websites migrate from HTTP to HTTPS. To do this in the best way, we can use telemetry to see how common the various combinations are.

In order to exclude non-default ports when equating HTTP and HTTPS origins, for an origin "S:H:P" we can compute a new key "S':H:P" where "S'" is equated to HTTPS on the default port only it was HTTP on the default port, and is still kept separately as HTTP if it was on a non-default port.

*** TELEMETRY ***

=== PWMGR_LOGIN_SCHEME ===

A general histogram to understand the distribution of the schemes of page and submit origins, and how many legacy logins are present in the wild.

This is equivalent to grouping by "S s", for form logins only, ignoring hosts and ports.

      0: HTTP to HTTP
      1: HTTP to HTTPS
      2: HTTP to any (empty submit origin)
      3: HTTPS to HTTP
      4: HTTPS to HTTPS
      5: HTTPS to any (empty submit origin)
      6: Other schemes involved

=== PWMGR_LOGIN_OVERLAP_LOOSE ===

Statistics for "loose" grouping, only by origins except scheme. Note how we don't use the term "realm" since it may be defined differently.

The grouping key is "S':H:P s':h:p".

      0: Total "loose" groups including those with only one login
      1: Number of groups with more than one login
      2: Number of groups with three or more logins
      3: All passwords equal within group (at least two required)
      4: All usernames equal within group (at least two required)
      5: All usernames and passwords equal (at least two required)

Bucket 0 gives the reference for the proportion of multi-login groups.

Buckets 2, 3, 4, and 5 are always subsets of bucket 1.

If all the usernames are equal within the group, this will also be counted as one "strict" group. If not all the usernames are equal, this is definitely a loose group that would not otherwise be detected as a single strict group.

=== PWMGR_LOGIN_OVERLAP_STRICT ===

Statistics for "strict" grouping by username and origins except scheme. Note how we don't use the term "realm" since it may be defined differently.

The grouping key is "S':H:P s':h:p U".

      0: Total "strict" groups including those with only one login
      1: Number of groups with more than one login
      2: Number of groups with three or more logins
      3: All passwords equal within group (at least two required)
      4: The most secure login (first in order) is also the most recent

Bucket 0 gives the reference for the proportion of multi-login groups.

Buckets 2, 3, and 4 are always subsets of bucket 1.

For a description of bucket 4, see below.

=== PWMGR_LOGIN_OVERLAP_STRICT_1ST_LAST_USED_DAYS ===
=== PWMGR_LOGIN_OVERLAP_STRICT_2ND_LAST_USED_DAYS ===

Each group of logins can be sorted by "preference" order, where the order is determined by the security of the page and submit origins ("S s").

Most secure:  HTTPS to HTTPS
Less secure:  Other to HTTPS
Less secure:  HTTPS to other (possibly a rare case)
Least secure: Other to other

For each "strict" group of two or more logins, the two histograms record the age in days since the first and second logins of the group were last used, in security order.

This determines whether the less secure logins are quickly abandoned.

Attachments

(1 file)

For logins with both HTTP and HTTPS for the same site, we can capture the aggregated data about:
- Total count of logins with those characteristics compared to total logins
- How many of those have the same HTTP and HTTPS username and password
- The time last used of the HTTP and HTTPS version of the login, separately
Assignee: nobody → paolo.mozmail
Status: NEW → ASSIGNED
User Story: (updated)
Attached patch The patchSplinter Review
See the "user story" of this bug for details.
Attachment #8559292 - Flags: review?(MattN+bmo)
User Story: (updated)
Comment on attachment 8559292 [details] [diff] [review]
The patch

Review of attachment 8559292 [details] [diff] [review]:
-----------------------------------------------------------------

It's not clear to me that the benefit of this patch outweighs the added complexity but I'll defer to dolske as module owner.
Attachment #8559292 - Flags: review?(MattN+bmo) → review?(dolske)
I've just finished discussing some of the UI designs for multiple credentials (currently at <https://www.lucidchart.com/documents/view/87ab1cc8-e708-49d3-8b91-6e2e6da346fb/5>). I believe this telemetry would give us an idea of how common the various cases we'd like to support could be.

If we're really concerned about code complexity, would it make sense to gather this data for a limited time, like land the patch in a few days and back it out from Nightly next cycle in 15 days, basically keeping the telemetry on Developer Edition and Beta for one cycle each? I'm pretty curious whether the data we can gather here matches our expectations or we find facts that may surprise us.
User Story: (updated)
Would also love to know occurences of entries for both www.domain.com and domain.com.
I'm going to concur with Matt in comment 2. The complexity of the analysis and code is way more detailed than anything we need. The original intent for adding some initial telemetry probes was that it was to be a cheap and simple way to help get a broader understanding of password manager usage. And for this particular feature, I think everyone felt it had pretty obvious user benefit, so the data isn't a prerequisite to deciding if we want to do it.

I think it would be better to implement the feature, and then consider opportunities to add simple telemetry to understand the usage. That might be as trivial as just comparing how many logins we got from a strict match vs the end result (bug 1124888 did something like this). That way we keep any required complexity in the feature itself, and avoid having telemetry that's measuring something different that what we're actually doing.

(In reply to :Paolo Amadini from comment #3)

> If we're really concerned about code complexity, would it make sense to
> gather this data for a limited time, like land the patch in a few days and
> back it out from Nightly next cycle in 15 days, basically keeping the
> telemetry on Developer Edition and Beta for one cycle each?

I'm still dubious of the value. It has obvious value from our experience, and a negative result from telemetry could just be interpreted as being a result of a bad password manager that people avoid using.

(In reply to Ryan Feeley from comment #4)
> Would also love to know occurences of entries for both www.domain.com and
> domain.com.

That would be something for a different (new) bug.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Attachment #8559292 - Flags: review?(dolske)
See Also: → 1150605
See Also: → 1134941
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: