Closed Bug 938117 Opened 11 years ago Closed 11 years ago

investigate increase in certificate attribute check failures since bug 803531

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

Details

From that bug:
(In reply to Robert Strong [:rstrong] (do not email) from comment #38)
> (In reply to Robert Strong [:rstrong] (do not email) from comment #29)
> > If we'd like to lessen the problems with people using an old stub the
> > bouncer url could be changed though I don't think (fingers crossed) that
> > will be too bad.
> After this landed the number of certificate attribute check failures went
> from an average of 5 per day across all channels with an average of 566,275
> attempted installations per day to an average of 492 per day with an average
> of 700,679 attempted installations per day across all channels for the
> period from 9/27 through 10/28 though the certificate change only affected
> the number of certificate attribute failures on nightly, aurora, and beta.
> 
> On 10/29 the number of certificate attribute check failures increased again.
> For the period from 10/29 through 11/10 the number of certificate attribute
> check failures averaged 43,000 per day out of an average of 726,455
> attempted installations per day and the certificate change affected the
> number of certificate attribute failures across all channels.
> 
> Date   # of failures
> 10/29	    63567
> 10/30	    71345
> 10/31	    57686
> 11/01	    51540
> 11/02	    42275
> 11/03	    39963
> 11/04	    40797
> 11/05	    37893
> 11/06	    37334
> 11/07	    34135
> 11/08	    31376
> 11/09	    27167
> 11/10	    23927
(In reply to Ben Hearsum [:bhearsum] from comment #0)
> From that bug:
> (In reply to Robert Strong [:rstrong] (do not email) from comment #38)
> > (In reply to Robert Strong [:rstrong] (do not email) from comment #29)
> > > If we'd like to lessen the problems with people using an old stub the
> > > bouncer url could be changed though I don't think (fingers crossed) that
> > > will be too bad.
> > After this landed the number of certificate attribute check failures went
> > from an average of 5 per day across all channels with an average of 566,275
> > attempted installations per day to an average of 492 per day with an average
> > of 700,679 attempted installations per day across all channels for the
> > period from 9/27 through 10/28 though the certificate change only affected
> > the number of certificate attribute failures on nightly, aurora, and beta.
> > 
> > On 10/29 the number of certificate attribute check failures increased again.

This is interesting, because 10/29 was the day we shipped Firefox 25.0. Could it be that we have bad fingerprints/issuer riding the trains?
And to be clear, we're talking about certificate attribute check failures during stub installations only, right? (As opposed to during maintenance service or other updates.)
Flags: needinfo?(robert.bugzilla)
(In reply to Ben Hearsum [:bhearsum] from comment #1)
> (In reply to Ben Hearsum [:bhearsum] from comment #0)
> > From that bug:
> > (In reply to Robert Strong [:rstrong] (do not email) from comment #38)
> > > (In reply to Robert Strong [:rstrong] (do not email) from comment #29)
> > > > If we'd like to lessen the problems with people using an old stub the
> > > > bouncer url could be changed though I don't think (fingers crossed) that
> > > > will be too bad.
> > > After this landed the number of certificate attribute check failures went
> > > from an average of 5 per day across all channels with an average of 566,275
> > > attempted installations per day to an average of 492 per day with an average
> > > of 700,679 attempted installations per day across all channels for the
> > > period from 9/27 through 10/28 though the certificate change only affected
> > > the number of certificate attribute failures on nightly, aurora, and beta.
> > > 
> > > On 10/29 the number of certificate attribute check failures increased again.
> 
> This is interesting, because 10/29 was the day we shipped Firefox 25.0.
> Could it be that we have bad fingerprints/issuer riding the trains?

Or perhaps we do just have tons and tons of people with an old stub installer. If we really do think that, I'm not sure if there's anything we can do at this point. Comment #0 suggested that we could change the bouncer URL, but I don't think that's possible now that we've shipped a stub installer with both the old and new fingerprints that point at the same bouncer url.
(In reply to Ben Hearsum [:bhearsum] from comment #2)
> And to be clear, we're talking about certificate attribute check failures
> during stub installations only, right? (As opposed to during maintenance
> service or other updates.)
Yes though that is the only data I have access to atm.

I suspect it is old installers especially since the failure rate has decreased quickly
10/29	63567
10/30	71345
10/31	57686
11/01	51540
11/02	42275
11/03	39963
11/04	40797
11/05	37893
11/06	37334
11/07	34135
11/08	31376
11/09	27167
11/10	23927
Flags: needinfo?(robert.bugzilla)
(In reply to Robert Strong [:rstrong] (do not email) from comment #4)
> (In reply to Ben Hearsum [:bhearsum] from comment #2)
> > And to be clear, we're talking about certificate attribute check failures
> > during stub installations only, right? (As opposed to during maintenance
> > service or other updates.)
> Yes though that is the only data I have access to atm.
> 
> I suspect it is old installers especially since the failure rate has
> decreased quickly
> 10/29	63567
> 10/30	71345
> 10/31	57686
> 11/01	51540
> 11/02	42275
> 11/03	39963
> 11/04	40797
> 11/05	37893
> 11/06	37334
> 11/07	34135
> 11/08	31376
> 11/09	27167
> 11/10	23927

Wheh, that's a relief. Do you think there's anything we can do about this other than be sure to change the bouncer product next time we change fingerprints?
For the current issue not much can be done though it might be possible to update the stub installers to use a new bouncer link and set the links for the old stub installers to point the old installers.

I'm considering additional solutions for the future and will file a bug after I've had time to consider it.
(In reply to Robert Strong [:rstrong] (do not email) from comment #6)
> For the current issue not much can be done though it might be possible to
> update the stub installers to use a new bouncer link and set the links for
> the old stub installers to point the old installers.

I was thinking about that too, but then we'll have 3 different stubs:
1) old fingerprints + old bouncer link
2) new fingerprints + old bouncer link
3) new fingerprints + new bouncer link

...and if we flip the old bouncer link back to an installer with the old fingerprints we'll fix users with #1 but break users with #2 - if I'm understanding correctly.
That is correct but there is the possibility that group #2 is smaller than group #1 since group #2 has only had a few weeks to exist and it could be mitigated by having a few days of overlap where all of the groups point to the same download. I was just putting this out as a possible way to mitigate this though I don't have anyway to know if this would be better than just leaving it as it is currently.
Latest stats including the sum of all certificate errors
9/25	923
9/26	873
9/27	1394
9/28	1964
9/29	1664
9/30	1844
10/1	2082
10/2	1850
10/3	2072
10/4	1888
10/5	1744
10/6	1626
10/7	1777
10/8	1719
10/9	1858
10/10	1874
10/11	1690
10/12	1559
10/13	1530
10/14	1674
10/15	1571
10/16	1616
10/17	1826
10/18	1552
10/19	1501
10/20	1393
10/21	1635
10/22	1604
10/23	1649
10/24	1514
10/25	1502
10/26	1494
10/27	1438
10/28	1470
10/29	72347
10/30	98002
10/31	82954
11/1	75472
11/2	64810
11/3	63533
11/4	64747
11/5	62476
11/6	61496
11/7	57107
11/8	53351
11/9	47253
11/10	42640
11/11	47142
11/12	46107
11/13	46474
11/14	45067
11/15	43483
11/16	37491
Thanks for that! It's a relief to see that we plateaued already...
I'm going to check how many of those are actually new users (e.g. not installing on top of an existing install and no pre-existing profile). I suspect these are people that download the stub and periodically run it.
It is around 60% pave over. Perhaps the stale stubs can be account for by file sharing sites and organizations. It is hard to determine. :(
I don't think there's anything actionable here at this point...
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Latest stats including all certificate errors. The percentage is calculated from the number of certificate errors divided by the total install attempts. Slowly but surely it is self correcting.
9/25     923    0%
9/26     873    0%
9/27    1394    0%
9/28    1964    0%
9/29    1664    0%
9/30    1844    0%
10/1    2082    0%
10/2    1850    0%
10/3    2072    0%
10/4    1888    0%
10/5    1744    0%
10/6    1626    0%
10/7    1777    0%
10/8    1719    0%
10/9    1858    0%
10/10   1874    0%
10/11   1690    0%
10/12   1559    0%
10/13   1530    0%
10/14   1674    0%
10/15   1571    0%
10/16   1616    0%
10/17   1826    0%
10/18   1552    0%
10/19   1501    0%
10/20   1393    0%
10/21   1635    0%
10/22   1604    0%
10/23   1649    0%
10/24   1514    0%
10/25   1502    0%
10/26   1494    0%
10/27   1438    0%
10/28   1470    0%
10/29  72347    9%
10/30  98002   13%
10/31  82954   11%
11/1   75472   11%
11/2   64810   10%
11/3   63533   10%
11/4   64747    9%
11/5   62476    8%
11/6   61496    7%
11/7   57107    7%
11/8   53351    7%
11/9   47253    7%
11/10  42640    7%
11/11  47142    7%
11/12  46107    6%
11/13  46474    6%
11/14  45067    6%
11/15  43483    6%
11/16  37491    5%
11/17  32062    5%
11/18  36248    5%
11/19  35605    5%
11/20  33797    4%
11/21  32845    5%
11/22  32558    5%
11/23  29370    5%
11/24  26981    5%
11/25  30548    4%
11/26  30405    4%
11/27  30178    4%
11/28  28811    4%
11/29  27916    4%
11/30  25960    4%
12/1   24497    4%
12/2   27725    4%
12/3   28007    4%
Latest weekly stats including all certificate errors.
10/06/13   12103  0.24%
10/13/13   11270  0.24%
10/20/13   10791  0.22%
10/27/13  396493  7.97%
11/03/13  409963  7.87%
11/10/13  308404  6.23%
11/17/13  232485  4.68%
11/24/13  200799  4.34%
12/01/13  185236  3.96%
12/08/13  175205  3.78%
12/15/13  166325  3.64%
12/22/13  152883  3.44%
12/29/13  143729  3.39%
01/05/14  158200  3.29%
01/12/14  151997  3.05%
01/19/14  152937  3.00%
01/26/14  144356  2.93%
02/02/14  143969  2.82%
02/09/14  138259  2.76%
02/16/14  137615  2.68%
02/23/14  128806  2.56%
03/02/14  119737  2.48%
03/09/14  113964  2.43%
03/16/14  116595  2.31%
03/23/14  112801  2.20%
03/30/14  107620  2.24%
Ben, I analyzed the data further and believe I found an issue with some systems and the new certificate. The error we expected is for the certificate attribute checks and that has been steadily lessening over time. I also see a rise in certificate untrusted errors and that has not been lessening anywhere near the same rate.

            Untrusted       Attributes
10/06/13    8041  0.16%     3349  0.07%
10/13/13    7726  0.16%     2900  0.06%
10/20/13    7354  0.15%     2785  0.06%
10/27/13   98644  1.98%   287103  5.77%
11/03/13  150167  2.88%   248665  4.78%
11/10/13  127242  2.57%   173149  3.50%
11/17/13   81912  1.65%   142747  2.88%
11/24/13   76825  1.66%   119272  2.58%
12/01/13   75282  1.61%   105923  2.26%
12/08/13   74587  1.61%    96758  2.09%
12/15/13   73123  1.60%    89724  1.96%
12/22/13   67802  1.53%    82087  1.85%
12/29/13   66586  1.57%    74171  1.75%
01/05/14   78741  1.64%    76485  1.59%
01/12/14   78394  1.57%    70767  1.42%
01/19/14   82109  1.61%    68042  1.34%
01/26/14   77900  1.58%    63939  1.30%
02/02/14   77956  1.52%    63606  1.24%
02/09/14   78110  1.56%    57852  1.16%
02/16/14   78229  1.53%    57226  1.12%
02/23/14   75131  1.49%    51637  1.02%
03/02/14   71938  1.49%    45987  0.95%
03/09/14   68395  1.46%    43660  0.93%
03/16/14   69783  1.38%    45046  0.89%
03/23/14   69947  1.37%    41112  0.80%
03/30/14   66814  1.39%    39144  0.82%
04/06/14   66013  1.38%    36993  0.77%

I'll investigate more in the coming weeks.
Note: the 4/6 data is bogus... I have some placeholder data for that week since today's data is not yet available.
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.