Make improvements to new-password heuristics for accuracy and performance
Categories
(Toolkit :: Password Manager, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox74 | --- | unaffected |
firefox75 | --- | unaffected |
firefox76 | --- | fixed |
People
(Reporter: bdanforth, Assigned: bdanforth)
References
Details
Attachments
(1 file)
After landing Bug 1595244 in Nightly, initial verification and telemetry data (courtesy of Bug 1548878, Bug 1616356 and Bug 1619498) identified some opportunities to improve the new-password heuristics -- particularly with respect to accuracy in identifying new password fields on password change forms and performance.
These improvements from the upstream GitHub repo should be incorporated into NewPasswordModel.jsm
.
Assignee | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
See https://bugzilla.mozilla.org/show_bug.cgi?id=1595244#c19 for the scoop on the latest model.
Assignee | ||
Comment 2•5 years ago
|
||
- Update NewPasswordModel.jsm to use an improved model at 4a2349963d from the upstream GitHub repo (see this comment).
- Add a check in the test_isProbablyANewPasswordField.js unit test to ensure the method early returns (i.e. returns false) when the feature is disabled by the signon.generation.confidenceThreshold pref.
- Fix a couple of existing plain mochitests that were failing their checkAutoCompleteResults assertions due to the updated model triggering the password generation result in the autocomplete popup unexpectedly in the tests' forms. There is now enough signal in those forms for _isProbablyANewPasswordField to return false.
Comment 3•5 years ago
|
||
(In reply to Erik Rose [:erik][:erikrose] from bug 1595244 comment #19)
Precision is up 4% and recall 10%:
Testing accuracy per tag: 0.98137 95% CI: (0.96048, 1.00000) FPR: 0.06818 95% CI: (0.00000, 0.14266) FNR: 0.00000 95% CI: (0.00000, 0.00000) Precision: 0.97500 Recall: 1.00000 F1 Score: 0.98734
Hi Erik, thanks for the updated model! We were aiming for a 2-3% FPR to ship (vs. the 6.8% above @ a 0.5 threshold) so could you provide a confidence threshold to hit that target at the expense of more FNR? (FWIW we did manual testing on Chrome and found a 1% FPR for the page set we used which mostly included popular login and registration forms).
Thanks
Comment 4•5 years ago
•
|
||
Sure, a threshold of .75 will do what you want:
Testing accuracy per tag: 0.93168 95% CI: (0.89270, 0.97065)
FPR: 0.02273 95% CI: (0.00000, 0.06676)
FNR: 0.08547 95% CI: (0.03481, 0.13613)
Precision: 0.99074 Recall: 0.91453
F1 Score: 0.95111
Given that the middle of the confidence histogram is sparse, this shouldn't change your real-world performance much and almost certainly not for the worse†. Just beware that we're now fitting to the testing set, so we have no blind evaluation of how we're doing. Perhaps telemetry can fill part of that hole.
†There are only 3 FPs in the testing set, so we're effectively hand-picking a confidence threshold that's high enough to exclude 2 of them. As you might imagine, there's a lot of chance involved in what confidences are needed to succeed on that tiny set of 3.
Comment 6•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Description
•