Closed Bug 705755 Opened 13 years ago Closed 7 years ago

Reintroduce handling of SSL short write after SSL thread removal

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla57

Tracking Flags:

Tracking

Status

firefox-esr52

---

wontfix

firefox55

---

wontfix

firefox56

---

wontfix

firefox57

---

fixed

People

(Reporter: briansmith, Assigned: mayhemer)

References

Details

(Keywords: regression, Whiteboard: [psm-assigned])

Attachments

(1 file, 3 obsolete files)

v1 13 years ago Honza Bambas (:mayhemer) 5.43 KB, patch	KaiE : review-	Details \| Diff \| Splinter Review
v1 (merged) 7 years ago Honza Bambas (:mayhemer) 5.51 KB, patch	keeler : review+	Details \| Diff \| Splinter Review
v1.1 7 years ago Honza Bambas (:mayhemer) 7.73 KB, patch	mayhemer : review+	Details \| Diff \| Splinter Review
v1.1 7 years ago Honza Bambas (:mayhemer) 7.81 KB, patch	mayhemer : review+	Details \| Diff \| Splinter Review

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Reporter

Description

•

13 years ago

+++ This bug was initially created as a clone of Bug #674147 +++ (In reply to Honza Bambas (:mayhemer) from Bug 674147 Comment 37) > You are regressing bug 378629: > https://bugzilla.mozilla.org/show_bug.cgi?id=356470#c54 > https://bugzilla.mozilla.org/show_bug.cgi?id=378629#c11 > > PSMSend must watch for libssl sending just request - 1 amount of data (= a > "short write"). In that case, the simplest solution is IMO that we have to > limit next data send to just 1 byte, simply explained in a pseudo code: > > PSMSend(buffer, count) > { > static flag = false; > if (flag) > count = 1; > flag = false; > > written = lower->send(buffer, count) > if (written == count - 1) > flag = true; > } > > We can do this in a separate bug, but ASAP. The final solution is up to > you, but I'd like to review and test it. This is marked blocking bug 674147 because that is how we show "this bug is caused by that bug" but I still intend to land the patch in bug 674147 first.

Honza Bambas (:mayhemer)

Assignee

Comment 1

•

13 years ago

I renamed the bug to be more descriptive and also added refs to other bugs that gets affected by bug 674147. I agree to land that bug first, this one is not a critical issue.

Blocks: 356470, 378629

Summary: remove the SSL thread (nsSSLThread) → Reintroduce handling of SSL short write after SSL thread removal

Honza Bambas (:mayhemer)

Assignee

Updated

•

13 years ago

No longer blocks: 674147

Depends on: 674147

Honza Bambas (:mayhemer)

Assignee

Comment 2

•

13 years ago

-> me Stealing this one.

Assignee: nobody → honzab.moz

Status: NEW → ASSIGNED

Honza Bambas (:mayhemer)

Assignee

Comment 3

•

13 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

- implements what the SSL thread had been implementing before its removal - adds also check for 16383 bytes of written data ; this often happens even for larger buffers like 32768, w/o it we jitter again with 2 bytes being later written separately - the 16383 const is disputable Added Nelson for feedback to get more info on how all ssl lib indicates the short write. To have what we had before SSL thread removal we may just remove the 16383 const checking and we are done.

Attachment #582708 - Flags: review?(kaie)

Attachment #582708 - Flags: feedback?(nelson)

Ludovic Hirlimann [:Usul]

Comment 4

•

13 years ago

Aren't we short on time for reviews on these for ff11 ?

Kai Engert [:KaiE:]

Comment 5

•

13 years ago

I agree with the patch in general, just some small comments: +/* Secure write sometimes returns this amount of bytes written even for larger + sizes of the output buffer, consider this as indication of a short write */ +static const PRInt32 kShortWrite16k = 16383; (a) Please explain (in the comment) where this number comes from. I had to research myself. I believe this is based on libssl's constant #define MAX_FRAGMENT_LENGTH 16384 Maybe add this comment: // Value is based on libssl's maximum buffer size (in a private header): // MAX_FRAGMENT_LENGTH - 1 // TODO: Query this value once at runtime (changes in NSS required) (b) Nit, optional: Given you only use this constant once, why not move it closer to the place where you use it? Makes the explanation easier to find when reading the code. (c) + // Value of the last byte pending from the SSL short write that needs + // to be passed to subsequent calls to send to perform the flush. + char mShortWritePendingByte; Please change that to "unsigned char", because that's what you use everywhere else. (d) + if (socketInfo->IsShortWritePending()) { + // We got "SSL short write" last time, try to flush the pending byte. + buf = socketInfo->GetShortWritePendingByteRef(); + amount = 1; + ### + PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("[%p] pushing 1 byte after SSL short write", + fd)); + } + If I understand correctly, if we reach position ###, the following must be true: - the function was called with: amount >= 1 - this expression is true: *(const unsigned char*) pending-ref == *buf-given-to-function[0] I propose to add an assertion, that way you don't need to change the "buf" pointer. if (socketInfo->IsShortWritePending()) { // We got "SSL short write" last time, try to flush the pending byte. NS_ASSERTION(amount >= 1 && buf[0] == *(const unsigned char*) socketInfo->GetShortWritePendingByteRef(), "unexpected buffer after short write"); amount = 1; PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("[%p] pushing 1 byte after SSL short write", fd)); } (e) + // SSL Short Write handling. + // + // True when SSL layer has indicated an "SSL short write", i.e. need + // to call on send one or more times to push all pending data to write. + bool mIsShortWritePending; + // These are only valid iff mIsShortWritePending is true. + // + // Value of the last byte pending from the SSL short write that needs + // to be passed to subsequent calls to send to perform the flush. + char mShortWritePendingByte; + // Original amount of data the upper layer has requested to write to + // return after the successful flush. + PRInt32 mShortWriteOriginalAmount; In this block it's difficult to see which variables you refer to when saying "These are only valid iff mIsShortWritePending is true." I'd align the block clearer, maybe like this: // True when SSL layer has indicated an "SSL short write", i.e. need // to call on send one or more times to push all pending data to write. bool mIsShortWritePending; // Value of the last byte pending from the SSL short write that needs // to be passed to subsequent calls to send to perform the flush. // Only valid if mIsShortWritePending unsigned char mShortWritePendingByte; // Original amount of data the upper layer has requested to write to // return after the successful flush. // Only valid if mIsShortWritePending PRInt32 mShortWriteOriginalAmount;

Kai Engert [:KaiE:]

Comment 6

•

13 years ago

Please run some quick local testing with a debug build, after you do (d), and confirm you don't see the assertion. If you address my proposals, r=kaie No second round of review necessary, but please attach the new patch for reference.

Kai Engert [:KaiE:]

Comment 7

•

13 years ago

Comment on attachment 582708 [details] [diff] [review] v1 r- If you address my small comments, you may mark r=kaie on your new patch.

Attachment #582708 - Flags: review?(kaie) → review-

Kai Engert [:KaiE:]

Comment 8

•

13 years ago

> r- because of the char/unsigned char inconsistence.

Brian Smith (:briansmith, :bsmith, use NEEDINFO?)

Reporter

Updated

•

13 years ago

Blocks: 674147

No longer depends on: 674147

Honza Bambas (:mayhemer)

Assignee

Comment 9

•

11 years ago

We may want to check this is actually still needed.

Status: ASSIGNED → NEW

Honza Bambas (:mayhemer)

Assignee

Comment 10

•

9 years ago

David, can you or someone from your team please verify this is still an issue and should be fixed?

Assignee: honzab.moz → nobody

Flags: needinfo?(dkeeler)

Dana Keeler (she/her) (use needinfo) [:keeler]

Comment 11

•

9 years ago

Sure, I can investigate this.

Flags: needinfo?(dkeeler)

Dana Keeler (she/her) (use needinfo) [:keeler]

Updated

•

9 years ago

Whiteboard: [psm-backlog]

Dana Keeler (she/her) (use needinfo) [:keeler]

Updated

•

8 years ago

Attachment #582708 - Flags: feedback?(nelson)

Dana Keeler (she/her) (use needinfo) [:keeler]

Updated

•

8 years ago

Priority: -- → P3

Honza Bambas (:mayhemer)

Assignee

Updated

•

7 years ago

Comment 12

•

7 years ago

"perf" key word?

Honza Bambas (:mayhemer)

Assignee

Updated

•

7 years ago

Assignee: nobody → honzab.moz

Blocks: 1382886

Status: NEW → ASSIGNED

Honza Bambas (:mayhemer)

Assignee

Comment 13

•

7 years ago

Attached patch v1 (merged) (obsolete) — Details — Splinter Review

I addressed all the Kai's 5 years old comments where I believed it was necessary. I can't trigger the code path tho, so hard to say if this breaks something or actually fixes something.

Attachment #582708 - Attachment is obsolete: true

Attachment #8900871 - Flags: review?(dkeeler)

Honza Bambas (:mayhemer)

Assignee

Updated

•

7 years ago

Whiteboard: [psm-backlog] → [psm-active]

Honza Bambas (:mayhemer)

Assignee

Updated

•

7 years ago

Target Milestone: mozilla11 → ---

Dana Keeler (she/her) (use needinfo) [:keeler]

Comment 14

•

7 years ago

Comment on attachment 8900871 [details] [diff] [review] v1 (merged) Review of attachment 8900871 [details] [diff] [review]: ----------------------------------------------------------------- Cool. This looks good. r=me with comments addressed (particularly the assertion failure and the documentation). I managed to reproduce the short write situation by running a server locally and using tc to limit the bandwidth and set a delay to it. It looks like Firefox can get into an inefficient state where it will send a large packet followed by a small packet followed by a large packet, etc.. This patch addresses the issue and makes the packet size much more uniform (and large). Unfortunately, it seems this would be hard to write a test for. Maybe we could add this as a QA smoketest task? Or I think we have some talos tests that simulate network effects that should be able to get Firefox in this state... Anyway, I think we should elaborate on the documentation in this patch. Maybe something like this: NSS indicates that it can't write all requested data (due to network congestion, for example) by returning either one less than the amount of data requested or 16383, if the requested amount is greater than 16384. We refer to this as a "short write". If we simply returned the amount that NSS did write, the layer above us would then call PSMSend with a very small amount of data (often 1). This is inefficient and can lead to alternating between sending large packets and very small packets. To prevent this, we alert the layer calling us that the operation would block and that it should be retried later, with the same data. When it does, we tell NSS to write the remaining byte it didn't write in the previous call. We then return the total number of bytes written, which is the number that caused the short write plus the additional byte we just wrote out. ::: security/manager/ssl/nsNSSIOLayer.cpp @@ +1483,5 @@ > #endif > > + if (socketInfo->IsShortWritePending() && amount > 0) { > + // We got "SSL short write" last time, try to flush the pending byte. > + MOZ_DIAGNOSTIC_ASSERT(*static_cast<unsigned char const*>(buf) == *socketInfo->GetShortWritePendingByteRef(), This assertion fails for me. If I'm understanding correctly, the higher layer is retrying the original write, right? If so, buf wouldn't be the byte(s) that didn't get written - buf would be the original data, in which case buf[mShortWriteOriginalAmount - 1] should be mShortWritePendingByte, right? In fact, if we wanted to give ourselves more confidence that the right thing was happening, maybe we could save the original buf and compare it to the new buf (probably in debug-only builds, though). @@ +1497,4 @@ > int32_t bytesWritten = fd->lower->methods->send(fd->lower, buf, amount, > flags, timeout); > > + /* Secure write sometimes returns this amount of bytes written even for larger nit: //-style comments, please @@ +1501,5 @@ > + sizes of the output buffer, consider this as indication of a short write. > + > + Value is based on libssl's maximum buffer size (in a private header): > + MAX_FRAGMENT_LENGTH - 1 > + TODO: Query this value once at runtime (changes in NSS required) */ Let's file a bug in NSS :: Libraries to make this API available and reference it here. ::: security/manager/ssl/nsNSSIOLayer.h @@ +175,5 @@ > + // True when SSL layer has indicated an "SSL short write", i.e. need > + // to call on send one or more times to push all pending data to write. > + bool mIsShortWritePending; > + > + // These are only valid iff mIsShortWritePending is true. nit: "only valid if and only if" is a bit redundant :)

Attachment #8900871 - Flags: review?(dkeeler) → review+

Dana Keeler (she/her) (use needinfo) [:keeler]

Updated

•

7 years ago

Priority: P3 → P1

Whiteboard: [psm-active] → [psm-assigned]

Honza Bambas (:mayhemer)

Assignee

Comment 15

•

7 years ago

Attached patch v1.1 (obsolete) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=165fe0f4d2460cbd8e6ab320b1f549b391e8d65d

Attachment #8900871 - Attachment is obsolete: true

Attachment #8902272 - Flags: review+

Honza Bambas (:mayhemer)

Assignee

Comment 16

•

7 years ago

Attached patch v1.1 — Details — Splinter Review

(updated ci message)

Attachment #8902272 - Attachment is obsolete: true

Attachment #8902311 - Flags: review+

Honza Bambas (:mayhemer)

Assignee

Updated

•

7 years ago

Keywords: checkin-needed

Pulsebot

Comment 17

•

7 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/21350cc2ff6b Handle SSL short-write correctly to save CPU looping. r=keeler

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 18

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/21350cc2ff6b

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

status-firefox55: --- → wontfix

status-firefox56: --- → wontfix

status-firefox-esr52: --- → wontfix

You need to log in before you can comment on or make changes to this bug.