1323851 - data analysis for mitm prevalence telemetry experiment

Dana Keeler (she/her) (use needinfo) [:keeler] (on leave)

Reporter

Description

•

8 years ago

(I have no idea if this is the right component or not - it seemed close, at least.)

In bug 1311479 we developed and deployed a telemetry experiment to measure the prevalence of mitm boxes (and in particular if any used publicly-trusted roots to mint their certificates). Now that it's over, we'd like to analyze the data.

Data analysis plan:

The results received from the experiment should be of the form:

{ "errorCode": <some number, most likely 0>,
  "error": <a short error description, mostly likely the empty string "">,
  "chain": [
    <a list of:
      {  "sha256Fingerprint": <hash of a certificate as a hex string>,
         "isBuiltInRoot": <true if this is a publicly-trusted root certificate>,
         "signatureAlgorithm": <the algorithm used to sign the certificate>
      }
    >
  ]
}

We're expecting nearly every result to either look exactly like this:

{ "errorCode": 0,
  "error": "",
  "chain": [
    { "sha256Fingerprint": "197feaf3faa0f0ad637a89c97cb91336bfc114b6b3018203cbd9c3d10c7fa86c",
      "isBuiltInRoot": false,
      "signatureAlgorithm": "sha256WithRSAEncryption"
    },
    { "sha256Fingerprint": "154c433c491929c5ef686e838e323664a00e6a0d822ccc958fb4dab03e49a08f",
      "isBuiltInRoot": false,
      "signatureAlgorithm": "sha256WithRSAEncryption"
    },
    { "sha256Fingerprint": "4348a0e9444c78cb265e058d5e8944b4d84f9662bd26db257f8934a443c70161",
      "isBuiltInRoot": true,
      "signatureAlgorithm": "sha1WithRSAEncryption"
    }
  ]
}

or to have different fingerprints but where the final certificate in the chain has "isBuiltInRoot" as false.

The #1 question we want to ask is if there are chains with different fingerprints and where the last certificate in the chain has "isBuiltInRoot" as true (and if so, what are the hashes, and what signature algorithms were used in the chain?).

Other than that, we probably just want general statistics on how often we got a non-zero error code or error string, or how often the chain fingerprints differed from the expected values.

(JC - anything I've forgotten?)

I don't know if the data analysis tools support this sort of thing, but my plan was basically to make a big hash set and put each result in the set (and count duplicates). From there, it would be easy to make passes over the set and pull out the data we were interested in.

Thanks!

Ryan VanderMeulen [:RyanVM]

Comment 1

•

8 years ago

Ryan Harter is going to assist with this.

Blocks: 1311479

Group: metrics-private

Component: Other → Metrics: Pipeline

Priority: -- → P1

Product: Data & BI Services Team → Cloud Services

QA Contact: mpressman

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Assignee: nobody → rharter

Ryan Harter [:harter]

Assignee

Comment 2

•

7 years ago

Hey All, I'm taking a look at this now.

Ryan Harter [:harter]

Assignee

Comment 3

•

7 years ago

Hey All. I put together this Jupyter notebook [0], which gathers all of the pings associated with your experiment.

Here's some documentation [1] on how to use our Jupyter notebook / Spark interface for analysis. Can you take a first pass at analyzing your data? I'm happy to answer questions as you go. 


[0] https://gist.github.com/harterrt/1b018a887c89d08cb4be0bd8e9953cda
[1] https://wiki.mozilla.org/Telemetry/Custom_analysis_with_spark

Flags: needinfo?(dkeeler)

Ryan Harter [:harter]

Assignee

Comment 4

•

7 years ago

I just changed the link to the example Jupyter notebook to the following:
https://gist.github.com/harterrt/600ee23a7279ceed8efe999a78868c9d

Dana Keeler (she/her) (use needinfo) [:keeler] (on leave)

Reporter

Comment 5

•

7 years ago

Thanks! We're exploring the tools a bit now.

Flags: needinfo?(dkeeler)

J.C. Jones [:jcj] (he/him)

Comment 6

•

7 years ago

On this dataset, we've confirmed that we can disable SHA-1 for non-built-in-roots without breaking updates. One analysis dataset is here: https://gist.github.com/jcjones/098c9ee81213e6816cf372194f45e918

The final dataset was:

{'isAccum': True,
 'rooterrors': Counter({'0f993c8aef97baaf5687140ed59ad1821bb4afacf0aa9a58b5d57a338a3afbcb -12276': 1,
          '4348a0e9444c78cb265e058d5e8944b4d84f9662bd26db257f8934a443c70161 0': 2822178,
          '687fa451382278fff0c8b11f8d43d576671c6eb2bceab413fb83d965d06d2ff2 -12276': 22,
          '73c176434f1bc6d5adf45b0e76e727287c8de57616c1e6e6141a2b2cbc7d8e4c -12276': 1,
          'c3846bf24b9e93ca64274c0ec67c1ecc5e024ffcacd2d74019350e81fe546ae4 -12276': 1,
          'ff856a2d251dcd88d36656f450126798cfabaade40799c722de4d2b5db36a73a -12276': 1}),
 'total_errors': Counter({-16381: 1,
          -16379: 11,
          -16378: 1,
          -12276: 53,
          -12173: 1,
          -8191: 1,
          -8179: 299,
          -8162: 14,
          -8061: 351,
          -8016: 28}),
 'total_roots': Counter({u'0f993c8aef97baaf5687140ed59ad1821bb4afacf0aa9a58b5d57a338a3afbcb': 1,
          u'4348a0e9444c78cb265e058d5e8944b4d84f9662bd26db257f8934a443c70161': 2822178,
          u'687fa451382278fff0c8b11f8d43d576671c6eb2bceab413fb83d965d06d2ff2': 22,
          u'73c176434f1bc6d5adf45b0e76e727287c8de57616c1e6e6141a2b2cbc7d8e4c': 1,
          u'c3846bf24b9e93ca64274c0ec67c1ecc5e024ffcacd2d74019350e81fe546ae4': 1,
          u'ff856a2d251dcd88d36656f450126798cfabaade40799c722de4d2b5db36a73a': 1})}

This means the only instances of having an errorCode=0 (successful connection) are for the legitimate root. The MITM-esque situations are all combined with SSL_ERROR_BAD_CERT_DOMAIN (code -12276). Other errors are small and normal-seeming, too:

-16381: (1) MOZILLA_PKIX_ERROR_V1_CERT_USED_AS_CA
-16379: (11) MOZILLA_PKIX_ERROR_NOT_YET_VALID_CERTIFICATE
-16378: (1) MOZILLA_PKIX_ERROR_NOT_YET_VALID_ISSUER_CERTIFICATE
-12276: (53) SSL_ERROR_BAD_CERT_DOMAIN
-12173: (1) SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY
-8191: (1) SEC_ERROR_LIBRARY_FAILURE
-8179: (299) SEC_ERROR_UNKNOWN_ISSUER
-8162: (14) SEC_ERROR_EXPIRED_ISSUER_CERTIFICATE
-8061: (351) SEC_ERROR_OCSP_FUTURE_RESPONSE
-8016: (28) SEC_ERROR_CERT_SIGNATURE_ALGORITHM_DISABLED

Dana Keeler (she/her) (use needinfo) [:keeler] (on leave)

Reporter

Comment 7

•

7 years ago

Looks like we can close this now.

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

Bugzilla

Quick Search

data analysis for mitm prevalence telemetry experiment

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

Tracking

(Not tracked)

People

(Reporter: keeler, Assigned: harter)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated