Closed
Bug 510627
Opened 15 years ago
Closed 15 years ago
Windows CE hanging on some SSL sites
Categories
(Core :: Networking, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
status1.9.2 | --- | beta1-fixed |
People
(Reporter: Dolske, Assigned: Dolske)
References
Details
(Keywords: verified1.9.2, Whiteboard: [nv])
Attachments
(2 files)
4.91 KB,
text/plain
|
Details | |
787 bytes,
patch
|
vlad
:
review+
Biesinger
:
superreview+
|
Details | Diff | Splinter Review |
We've found a number of bugs where the Tegra device locks up hard (ie, mouse pointer frozen, kernel debugger can't connect to it) after visiting a SSL site. Not every SSL site does this, however.
Assignee | ||
Updated•15 years ago
|
Whiteboard: [nv]
Assignee | ||
Comment 5•15 years ago
|
||
I've previously sent some info out via email regarding this, appending here for posterity.
I took a spin through existing hang bugs, and verified that the workaround mentioned at the end of these email seems to make all the sites work fine. So, it's most likely a common root cause, so I've duped them all to this bug.
Assignee | ||
Updated•15 years ago
|
Severity: normal → critical
Priority: P1 → --
Assignee | ||
Comment 7•15 years ago
|
||
Kalle noticed that the kernel's KITL thread has a priority level of 131, and our thread was setting a priority level of 116 (0 is the highest priority). This ends up being the reason that the kernel debugger wasn't working...
If I change the CeSetPriorityLevel() call to use 132 instead of 116 (so it's 1 priority level lower than the KITL thread), I can still reproduce the hang but now I can use the kernel debugger to poke at the device. Yay -- progress!
[I'm not having any luck getting symbols or breakpoints working, though. This is partially a result of almost always breaking into some point in the kernel (for which I don't have symbols), but symbols for my Mozilla build also seem to be missing. The debugger seems to be very finicky, so I'll try again with a full debug build, maybe that will help.]
Looking at the system process/thread info is revealing, though. The high-priority (132) process thread normally has a very low CPU user-time total (ie, it doesn't actually do much), and the main Firefox thread (where we actually do almost everything) accumulates CPU user-time as it's used. But in the hang conditions, the high-priority thread accumulates CPU rapidly while all the other threads' counters stay frozen.
So, this very much looks like one thread is spinning on the CPU and starving everything else. There seems to be a potential loop between nsSSLIOLayerPoll() and nsSSLThread::requestPoll() -- the last line of requestPoll() is actually calling nsSSLIOLayerPoll() again, so it's possible the code is relying on something else being scheduled in between.
Assignee | ||
Comment 8•15 years ago
|
||
Kaie, are you familiar with this code? Lots of history in this bug, but the last paragraph of comment 7 seems to be the key. It certainly looks suspicious, but perhaps I'm missing something.
Assignee | ||
Comment 9•15 years ago
|
||
Assignee: nobody → dolske
Attachment #394972 -
Flags: superreview?(cbiesinger)
Attachment #394972 -
Flags: review?(vladimir)
Assignee | ||
Updated•15 years ago
|
Component: General → Networking
QA Contact: general → networking
Comment on attachment 394972 [details] [diff] [review]
Patch v.1 (remove CeSetThreadPriority)
Get rid of the #include <windows.h> at the top of the file as well; it got added for this, iirc.
Attachment #394972 -
Flags: review?(vladimir) → review+
Updated•15 years ago
|
Attachment #394972 -
Flags: superreview?(cbiesinger) → superreview+
Comment 11•15 years ago
|
||
SSL thread is used to decouple Mozilla's single-threaded network engine and the SSL read/write calls (which may sometimes blocked with an UI prompt).
The "request..." functions are called by the main/network thread to make decisions what I/O requests are OK to be sent to the decoupled SSL thread.
A pollable event is used by SSL thread to wake up the network thread when some previously requested I/O is ready to be fetched.
Important question:
On your platform, did SSL thread succeed to create a "pollable event"?
In other words, is
nsSSLIOLayerHelpers::mSharedPollableEvent
null or non-null ?
On platforms where we fail to get a pollable event we need to live with a busy loop while waiting for I/O to complete. But we use sleep calls to reduce that effect. See want_sleep_and_wakeup_on_any_socket_activity.
You said:
> the last line of requestPoll() is actually
> calling nsSSLIOLayerPoll() again
Yes, but only in some limited scenarios.
You should see that in most scenarios the poll call is not reached but function requestPoll will "return early".
Question 2:
Can you test? In your busy loop, does requestPoll call poll (often)?
Assignee | ||
Comment 12•15 years ago
|
||
Pushed Patch v.1: http://hg.mozilla.org/mozilla-central/rev/30a01dc450d7
(leaving open for the moment)
Flags: blocking1.9.2+
Assignee | ||
Comment 13•15 years ago
|
||
Keywords: fixed1.9.2
Comment 14•15 years ago
|
||
Using Mozilla/5.0 (Windows; U; WindowsCE 6.0; en-US; rv:1.9.2a2pre)
Gecko/20090827 Namoroka/3.6a2pre as well as yesterday's build, I visited a bunch of various SSL sites and encountered no issues.
Assignee | ||
Comment 15•15 years ago
|
||
I think we can just call this fixed. Comment 11 indicates the busy-poll can be normal, so due to the high-priority we were assigning that thread nothing else would be able to run (including the network stack, most likely). Good enough for me.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
status1.9.2:
--- → beta1-fixed
Keywords: fixed1.9.2
Updated•15 years ago
|
Keywords: verified1.9.2
You need to log in
before you can comment on or make changes to this bug.
Description
•