osclientcerts: slow devices can make firefox startup slow
Categories
(Core :: Security: PSM, defect, P2)
Tracking
()
People
(Reporter: bugs, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [psm-backlog])
All HTTP sites work fine. All HTTPS ones time out.
Problem first encountered after updating to nightly mid-last week, and since it was a busy week, just switched to stable. Tried a clean profile today, was still failing. Even after manually enabling enterprise roots. Tried downloading clean nightly, still failed.
Tried a few sites with browser console open - nothing useful in there apart from all the telemetry etc https timing out.
Comment 2•4 years ago
|
||
Not sure exactly how to diagnose such issue. JC, do you know of some NSS/PSM changes that happened in the past weeks? What kind of logs would we need to figure out what's wrong?
Comment 3•4 years ago
|
||
Thanks for the needinfo Valentin -
nemo, just to confirm, this only happens with Nightly 82 and 83? Does it happen with beta (or release) (and could you let me know what version(s) you tested)?
We might need a wireshark capture and key (from SSLKEYLOGFILE
) - is that something you'd be able to do?
Release is fine. I haven't tested beta, but it is (probably) fine given how recent the error is. Request for Wireshark is not new (corporate mitm was broken in firefox back in january too) I'll try to get on that when I get a few free minutes.
Hm... and a key? I guess it's probably fine... I need to look up what we did last time, if I mailed the mozilla engineer privately. Even if the contents are probably not sensitive I'd rather not attach to the bug if possible...
Comment 6•4 years ago
|
||
Thanks, nemo, I remember now. You can email me the key (jc@mozilla.com), or mark the key as private
here in the bug and then only core-security-release
team can get to it, which is a small number of trusted Mozilla folk. :)
Hey. Haven't had time to do a full bisect or wireshark, but I needed to recover this profile, and since -offline no longer functions with firefox, and the broken HTTPS causes firefox to hang to the point of unresponsiveness on startup, I grabbed a few nightlies.
2020-09-04, 2020-08-23, 2020-08-22, 2020-08-21 (I was a bit optimistic about that week ☺ ) all hang.
2020-08-11 works.
When I have some time I'll download a few more to bisect down that 2020-08-11 - 2020-08-21 range (and do the wireshark)
Aw crud... sorry if I just spawned a wild goose chase. 2020-08-11 is hanging too!
I'd swear it worked. Briefly.
Ok. Will try some really old nightlies. Maybe it's something they changed network config-wise.
Ugh. Guess I'd better try beta too.
Comment 9•4 years ago
|
||
Hi nemo, you can use mozregression do to the bisection for you.
Just point it to your profile, and say good/bad for each version and it'll track down the issue to the commit level. It's very convenient.
A few more questions: Do you use a proxy or VPN?
Could you also check in about:config
if you have any doh-rollout
prefs and also attach the output of about:support
? Thanks!
Reporter | ||
Comment 10•4 years ago
|
||
Normally I would use mozregression, but unfortunately this is on a Windows 10 VMware View I VPN into to get work done.
WRT DOH, it is definitely not DOH related since it happens even on internal dev machines on the .local domain when SSL is enabled.
But in any case it happened even with DOH disabled in the clean profile I created, and when accessing sites by IP.
The other weird thing about this is I thought they did not MitM internal dev machines...
BTW, the removal of support for Offline mode... really really annoying.
In the brand new profile that is failing, I didn't see anything in particular in about:support that might interest you. AV is all Windows standard... the NS security stuff looks totally normal - NSPR 4.29beta, NSS 3.57 beta
Since thankfully about:config isn't being blocked yet on desktop (my poor stable android is now on fdroid), I force-enabled enterprise roots (I'm guessing the hanging was preventing it from being set) and restarted. It made no difference whatsoever.
Reporter | ||
Comment 11•4 years ago
|
||
Oh... and just to be clear, on the Windows 10 VM I'm VPN'd into, connecting OUT from firefox proceeds normally apart from the corporate MitM that sometimes (but not always it seems?) somerwhere in the routing that inserts itself into connections to remote sites. There is no other VPNing or proxying, and Firefox Stable works fine so long as enterprise certs are enabled.
Reporter | ||
Comment 12•4 years ago
|
||
Ohhhh. I think I know what happened with the 2020-08-11 that worked, then failed. I neglected to disable autoupdate. Gotta make sure I do that before loading the profile ☺ ☺
I'm going to restore 2020-08-11, disable autoupdate in testing profiles, and verify, but I'm going to stick with the breakage being somewhere in that range. I'd just verified 2020-08-01 was working and was like.... waiiiiitasec, that green ↑ in the corner... I bet that's why 2020-08-11 failed the 2nd time I started it 😃
We still need either a regression range or a packet trace to diagnose this further.
Reporter | ||
Comment 14•4 years ago
|
||
Well, you have a 10 day range at least between 2020-08-11 (works) and 2020-08-21 (fails) - I'll try to set aside some time to download and test few more builds tomorrow to get that down to 1 day and maybe a wireshark. It's been crazy busy lately both at work and with online school.
Reporter | ||
Comment 15•4 years ago
|
||
Ok! Spent a boring half hour at end of work session today downloading builds... Nightly from 2020-08-17 with Build ID 20200817214602 WORKS Nightly from 2020-08-18 with Build ID ?????????????? (locks up, can't get to about:support) FAILS. Next time I have some free time, wireshark... Oh, and since a couple of times I neglected to disable autoupdate while setting up profiles, I can confirm that latest nightly still fails.
Reporter | ||
Comment 16•4 years ago
|
||
Heh. This morning when I connected to the work Windows 10 VM it was finally semi-responsive, so I typed in about:support - which promptly hung again (does everything require an HTTPs connection now?). After checking back in half an hour it had unlocked so I was finally able to get a build ID.
Nightly from 2020-08-17 with Build ID 20200817214602 WORKS
Nightly from 2020-08-18 with Build ID 20200818092452 FAILS
Comment 17•4 years ago
|
||
Thank you for chasing this down!
In one of the affected Nightlies, does the behavior change if you go to about:config
, change security.osclientcerts.autoload
to false, and restart the browser?
Reporter | ||
Comment 19•4 years ago
|
||
So... sounds like you have a theory and fact that everything is working now confirms it. Is this flag new? Does this still mean you need wireshark from me? Or... I guess... possibly more detail on the OS certs? Or are we good.
Comment 20•4 years ago
|
||
The only NSS/PSM change that landed on the 17th is https://hg.mozilla.org/mozilla-central/rev/7e5f806e44a8d35de67bf4ea3e9060836bcf440d. osclientcerts is only enabled by default on Nightly, which also aligns with your report.
I don't think we need a PCAP any more. Dana probably has a better idea of what, if any, further information would be useful from your environment.
Logs from osclientcerts would help. Can you follow the instructions in bug 1629002 comment 16 and attach the output here? Thanks!
Reporter | ||
Comment 22•4 years ago
|
||
Just FYI, I turned over a log to J.C. on Tuesday. I hope I did it correctly.
Comment 23•4 years ago
|
||
You did, sorry, I don't think Dana had a chance to look them over yet. I definitely haven't :(
Reporter | ||
Comment 24•4 years ago
|
||
No worries, just wanted to be sure you got it. And also to clear out the needinfo nag message 😝
Reporter | ||
Comment 25•4 years ago
|
||
Oh, and just so I understand what we have to look forward to... so if I read https://wiki.mozilla.org/Release_Management/Calendar correctly, on 2020-10-20 all the people on release firefox over here will have completely non-functional, frozen browsers?
Unfortunately the log doesn't indicate what may be wrong. You say Firefox hangs - can you use Process Hacker [0] to see what Firefox is doing when it hangs?
When you run Process Hacker, you should see Firefox in the list of running programs. If you double-click on the Firefox process that is hanging, you should see a window with a "Threads" tab. On that tab, if you select the thread that's hanging and double-click it, another window should open up with a stack trace.
Also, this feature is only enabled in Nightly, so this won't be released as-is.
Reporter | ||
Comment 27•4 years ago
|
||
The symptoms (to me) suggest it's just blocking waiting on the remote HTTPS request until it times out. Loading firefox, loading about:support all take a very long time to finally become functional. I assume it's all the background HTTPS requests firefox does when phoning home for statistics, checking for updates, addons to inactivate, whatever. Could be wrong though!
The main UI seems to be behaving identically to what happens if I load an HTTPS site. The tab just sits there for a long long time and then finally fails to connect.
I suspect if I could figure out how to get offline mode to work these days there'd be no more UI hangs.
I will try to see if there's any indication of what Firefox is doing with the tool described. I bet if I breakpoint it'll be in something due to establishing an SSL connection..
Fortunately the workaround described above perfectly fixes the bug. Still worried about others once this makes it to release though. I rather doubt my VM is that different from all the others around here.
Reporter | ||
Comment 28•4 years ago
|
||
Oh, my apologies. That last sentence was pretty important. Thanks, I was a bit worried.
Ok. Will try process hacker tomorrow when I sign back onto the work network.
Reporter | ||
Comment 29•4 years ago
|
||
Hm... looks like they won't let me install process hacker. Gets yanked immediately by, I guess, the antivirus, and don't seem to be allowed to run it either. Getting permissions for this sort of thing often fails and takes forever anyway. I kinda feel like you just want be to do the equivalent of gdb attach to process, and backtrace at attach point. Is there some other way to do that in windows?
Reporter | ||
Comment 30•4 years ago
|
||
DDGing around, I see there is WinDbg - gonna try installing that.
Reporter | ||
Comment 31•4 years ago
|
||
Found a promising thread, some sort of hang in
KERNELBASE!WaitForMultipleObjects
WinSCard!_SendSCardIOCTL
WinSCard!_SendSCardConnect
Soooo, smartcard related. Odd since smartcard seems to work fine in other apps (Outlook, smartcard management).
But I will send all the threads over to you guys.
(In reply to nemo from comment #31)
Found a promising thread, some sort of hang in
KERNELBASE!WaitForMultipleObjects
WinSCard!_SendSCardIOCTL
WinSCard!_SendSCardConnect
Thanks! Can you include the full stack?
Reporter | ||
Comment 33•4 years ago
|
||
I emailed it to JC, same as the other debug info you'd requested.
Reporter | ||
Comment 34•4 years ago
|
||
Oh, BTW, I can also email text from the smartcard diagnostic tool if that helps at all.
I also reverified that my cert is accessible in the standard UI but firefox still locks up.
That might be helpful. You can also email me directly instead of going through J.C.
Thanks for the logs. We're determining how to address this.
Reporter | ||
Comment 37•4 years ago
|
||
Hm. I'm not 100% certain the new title is correct. Once firefox stopped hanging on startup, I was able to trigger a hang by accessing about:support or any https site (http sites were fine).
So it's clearly not just startup. And... I'm not sure it is just a slow device since I'd literally wait for half an hour before getting a usable URL bar. The smartcard is slow, it is not that slow.
But maybe there are several related problems with osclientcerts or recent ssl changes. In any case, turning it off fixed all my issues.
Reporter | ||
Comment 38•4 years ago
|
||
Ooh... Or maybe every single SSL operation was rescanning the card instead of caching?
That sure would explain it 😃 So it's not just one slow-ish card action but like.. hundreds of them. Maybe even stacking.
Bug 1669414 should have improved things somewhat. How does the latest Nightly work for you?
Reporter | ||
Comment 40•3 years ago
|
||
Nightly firefox in a clean profile no longer hangs. security.osclientcerts.autoload set to true.
Don't see any blue flickering lights on smart card to indicate read/access.
So seems to be fixed!
Is there anything else I should check? I admit I didn't really read the other bug too carefully.
Great - thanks! If normal use of the browser works for you, I'm satisfied this bug is fixed. If you do see any issues related to client certificates, feel free to file a new bug in this component.
Description
•