Last Comment Bug 378629 - SSL file uploads settle into oscillating pattern with very small packets
: SSL file uploads settle into oscillating pattern with very small packets
Status: RESOLVED FIXED
corruption
: dataloss, fixed1.8.1.5, regression
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: P2 normal (vote)
: mozilla1.9alpha8
Assigned To: Kai Engert (:kaie)
:
Mentors:
Depends on: 705755
Blocks: 137155 334142 337770
  Show dependency treegraph
 
Reported: 2007-04-24 11:02 PDT by Boris Zbarsky [:bz] (Out June 25-July 6)
Modified: 2011-11-28 10:13 PST (History)
14 users (show)
benjamin: blocking1.9+
dveditz: wanted1.8.1.x+
dveditz: blocking1.8.0.13-
dveditz: wanted1.8.0.x+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
xpcom+netwerk patch v1, trunk only (11.20 KB, patch)
2007-04-24 23:02 PDT, Boris Zbarsky [:bz] (Out June 25-July 6)
no flags Details | Diff | Review
experimental SSL thread patch (8.68 KB, patch)
2007-04-25 00:08 PDT, Kai Engert (:kaie)
no flags Details | Diff | Review
ssl-thread patch v3 (trunk and 1.8 branch) [THIS ONE checked in] (10.20 KB, patch)
2007-05-22 15:12 PDT, Kai Engert (:kaie)
no flags Details | Diff | Review
whitespace ignore version of ssl-thread patch v3 (trunk and 1.8 branch) (9.75 KB, patch)
2007-05-22 15:14 PDT, Kai Engert (:kaie)
rrelyea: review+
dveditz: approval1.8.1.5+
Details | Diff | Review

Description Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-24 11:02:49 PDT
See bug 356470.  This causes two problems:

1)  It's suboptimal from the point of view of performance, I would guess
2)  It tickles a bug in Apache servers.  While Apache has released a fix,
    there's no guarantee that everyone is upgrading.  We can push fixes a lot
    better than Apache can, given update.

Oh, and this is apparently a branch regression too.

Still waiting on feedback on whether the patch I posted in bug 356470 actually helps or whether we need changes to the secureWrite implementation too.
Comment 1 Nelson Bolyard (seldom reads bugmail) 2007-04-24 11:14:42 PDT
If an objective is to speed up uploads, then you should also increase the 
minimum write size to a much larger value for all writes except the last.  
I might suggest 64KB.  The reason is that writes of any amount less than 
the TCP transmit window size (a 16 bit value) are subject to Nagle delays.
Comment 2 ralp.no.h 2007-04-24 11:26:56 PDT
Most comments on bug 356470 are irrelevant to this bug until bug 356470 comment 44.  (Regression is narrowed down in bug 356470 comment 15 and bug 356470 comment 19.)

Also, I'm pretty sure the OS for this should be Windows only, not Linux.
Comment 3 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-24 11:37:27 PDT
Ah, so Nelson reminded me that the function we really care about is ssl_SecureSend, not nsSSLThread::requestWrite.  And indeed, ssl_SecureSend is getting the alternating sizes when it's called, even with my patch.

I traced through it, and nsSSLThread::requestWrite seems to also have a 4kb buffer that it's filling (but not really!), causing the problems.  We shouldn't really even need changes to the various streams, assuming this code is fixed, I think.

The pattern is the following:

1)  We call requestWrite with 4096 bytes.  It's in the ssl_writing_done state.
2)  si->mThreadData->mSSLResultRemainingBytes is some small number N (8 bytes,
    or 1 byte, or 163 bytes, whatever).
3)  It sets the state to ssl_idle and returns N
4)  We call requestWrite with 4096 - N bytes.  It's in the ssl_idle state.  It
    copies 4096 - N bytes into its buffer, and sets state to ssl_pending_write.
5)  It returns -1 and sets the error to WOULD_BLOCK.
6)  We call requestWrite with 4096 bytes.  It's in the ssl_writing_done state.
7)  si->mThreadData->mSSLResultRemainingBytes is 4096 - N
8)  It sets state to ssl_idle and returns 4096 - N.  I guess it copied that
    data in on the previous call, eh?
9)  We call requestWrite with N bytes.  It's in the ssl_idle state.  It copies
    N bytes into its buffer and sets state to ssl_pending_write.
10) It returns -1 and sets the error to WOULD_BLOCK

After that we go back to step 1.

What would it take for this code to just go ahead and fill the buffer before trying to put it on the wire?  Kai, any ideas?
Comment 4 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-24 11:38:42 PDT
As far as OS goes, we get the oscillating behavior on all platforms; it might be that our headers are slightly different on different platforms (e.g. different UA string) leading to different oscillation sizes).
Comment 5 Kai Engert (:kaie) 2007-04-24 20:33:49 PDT
Boris,

yes, nsSSLThread::requestWrite is using its own buffer to collect the pending data, so it can be used on the other thread.

But that buffer is not limited to 4K. Individually for any particular socket, the buffer will grow to the maximum value the caller attempted to write in a single call.

You are proposing the SSL thread could:
> go ahead and fill the buffer before
> trying to put it on the wire

But how would the implementation of requestWrite know, whether or not the caller will request to write more data? (The caller might as well be finished after this chunk and request a read next.)

Are you proposing to introduce a timer / a delay?
Should the SSL thread wait for a small amount of time, while the implementation of requestWrite could potentially add more data to the buffer?
Sounds tricky.

Alternatively, could this get optimized at the caller side?

The caller could know it has more than (4096-N) bytes queued, and therefore could always call requestWrite( min(4096, queuesize)) ?
Comment 6 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-24 21:14:19 PDT
> But how would the implementation of requestWrite know, whether or not the
> caller will request to write more data?

Because on the previous call the caller tried to write more and we didn't consume all of it?

For example, if the caller attempts to write an amount that _would_ fit in the buffer but doesn't because of whatever our leftover state is (as here), could we consume nothing on that call?  And then consume all the data the caller 

> Alternatively, could this get optimized at the caller side?

Not really; the caller has been asked to write 4096 bytes to the socket (by the HTTP code); it's just doing that.  It can write it a bit at a time (as here), but it's not allowed to write more than 4096 bytes by the stream APIs...
Comment 7 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-24 21:35:53 PDT
Maybe we _can_ work around this on the caller somehow... not sure...
Comment 8 Kai Engert (:kaie) 2007-04-24 21:45:40 PDT
> 1)  We call requestWrite with 4096 bytes.  It's in the ssl_writing_done state.
> 2)  si->mThreadData->mSSLResultRemainingBytes is some small number N (8 bytes,
>     or 1 byte, or 163 bytes, whatever).
> 3)  It sets the state to ssl_idle and returns N
> 4)  We call requestWrite with 4096 - N bytes.  It's in the ssl_idle state.  It
>     copies 4096 - N bytes into its buffer, and sets state to ssl_pending_write.
> 5)  It returns -1 and sets the error to WOULD_BLOCK.
> 6)  We call requestWrite with 4096 bytes.  


The previous call was (4096-N).

It got rejected with WOULDBLOCK.

But now the caller requests 4096?

Doesn't that mean the caller was able to fill up its buffer?

If the caller is able to fill up its buffer to 4096 on WOULDBLOCK, could it attempt to fill up its buffer on a short write, too?

That way it could do write requests close to 4096 most of the time?
Comment 9 Kai Engert (:kaie) 2007-04-24 21:50:12 PDT
(In reply to comment #6)
> > But how would the implementation of requestWrite know, whether or not the
> > caller will request to write more data?
> 
> Because on the previous call the caller tried to write more and we didn't
> consume all of it?

Yes, we could immediately take the remaining bytes.
This is the same amount of bytes the caller will request to write next.
But because we don't know whether more bytes will follow, we must send it out immediately.

Either we take the bytes immediately, or we wait for the next requestWrite that will occur with the same number of bytes - it will result in the same call to the underlying socket - because we don't know whether more bytes will follow.

(Unless we'd add a delay + timer)
Comment 10 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-24 22:38:32 PDT
> Doesn't that mean the caller was able to fill up its buffer?

Yes, but it doesn't know that...

> could it attempt to fill up its buffer on a short write, too?

Not sure.  Trying to do that, in fact.  Part of the thing to keep in mind is that there are at least 4 separate stream classes interacting here.  And the outermost stream is told (by the HTTP code) to send 4096 bytes and no more.  So the only way to "fill the buffer" would be to go out all the way to the outermost caller somehow.  I'm trying to create a patch that would do that, but we'll see.  The reason the WOULD_BLOCK let it fill up the buffer is that it got propagated all the way out...

> Yes, we could immediately take the remaining bytes.

I guess that would only help insofar as it would make the packet sequence N, 4096-N, 4096, N, ... (assuming you only grow your buffer to twice the caller's buffer size, of course).
Comment 11 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-24 23:02:37 PDT
Created attachment 262729 [details] [diff] [review]
xpcom+netwerk patch v1, trunk only

What this patch does is to make sure that we start each substream of a multiplex input stream on a 4096-byte boundary.  In effect, we align the start of the file we're uploading with the start of a 4096-byte block, and then read the file in those blocks.  We don't recover from oscillations if they appear (can't, in fact), but we eliminate the oscillations that were due to the file not lining up on a 4096-byte boundary.

This eliminates the oscillations I was seeing.  HOWEVER....

With this patch, I see us make a number of calls to nsSSLThread::requestWrite and ssl_SecureSend with 4096 bytes as the length.  nsSSLThread::requestWrite alternates between returning 4096 and -1, everything is peachy... until a point when nsSSLThread::requestWrite only consumes 4095 bytes.  Presumably this happens when ssl_SecureSend returns "one less than the amount passed in", as described in bug 356470 comment 54.

The problem, of course, is that after that point we start oscillating in the 4095/1 pattern originally observed in bug 356470.

Could we make the PSM code ASS-ume that if all but one of the bytes it asked to send were read that actually means all the data was read, propagate _that_ answer to its callers, then make sure to resend the one byte to make this all work out?  That probably means posting an event to send the remaining byte, and canceling the event if we get another write from the caller or something....  Otherwise we reintroduce bug 80092, for all intents and purposes.

Of course the same code could live in NSS, not PSM, with the same effect.  That's how I wish bug 80092 had been fixed to start with...
Comment 12 Kai Engert (:kaie) 2007-04-25 00:07:22 PDT
(In reply to comment #11)
> With this patch, I see us make a number of calls to nsSSLThread::requestWrite
> and ssl_SecureSend with 4096 bytes as the length.  nsSSLThread::requestWrite
> alternates between returning 4096 and -1, everything is peachy... until a point
> when nsSSLThread::requestWrite only consumes 4095 bytes.  Presumably this
> happens when ssl_SecureSend returns "one less than the amount passed in", as
> described in bug 356470 comment 54.
> 
> The problem, of course, is that after that point we start oscillating in the
> 4095/1 pattern originally observed in bug 356470.
> 
> Could we make the PSM code ASS-ume that if all but one of the bytes it asked to
> send were read that actually means all the data was read, propagate _that_
> answer to its callers, then make sure to resend the one byte to make this all
> work out?


Simpler idea:

We could declare the SSL thread shall try harder to flush the buffer it has already received.

When the SSL thread discovers that it could not write out all its data, it could immediately do an additional call with the remaining bytes.

In the special case where the SSL layer returned a short write, IIUC this additional call to write one byte will succeed with a return value of 1.

Would that work?

I hacked a patch, Boris do you want to try it?
Comment 13 Kai Engert (:kaie) 2007-04-25 00:08:01 PDT
Created attachment 262736 [details] [diff] [review]
experimental SSL thread patch
Comment 14 Kai Engert (:kaie) 2007-04-25 00:14:54 PDT
hmm, in that patch I might also handle the case where moreBytesWritten == 0.
Will look into that tomorrow.
Comment 15 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-25 00:18:43 PDT
> When the SSL thread discovers that it could not write out all its data, it
> could immediately do an additional call with the remaining bytes.

Ah, hmm.  Yeah, that would work.  In fact, you would only need to do that if all but 1 bytes were read, if I understand the NSS setup correctly.

> In the special case where the SSL layer returned a short write, IIUC this
> additional call to write one byte will succeed with a return value of 1.

I _think_ that's right.  Nelson, can you sanity-check this idea?

Kai, I can try tomorrow; let me know whether you want me to try the patch you attached or wait for the thing you're planning to look into, ok?
Comment 16 Nelson Bolyard (seldom reads bugmail) 2007-04-25 01:18:00 PDT
FYI, Here is some documentation on libSSL internals that I sent to Boris 
on 4/24.

When ssl_SecureSend is called to send an amount N <= 16K on a non-blocking
socket, I believe it will return only one of these values:
        N  all data was encrypted and successfully written to underlying
               TCP socket.
      N-1  all data was encrypted, but not all encrypted data was accepted
               by the underlying TCP socket, and some remains buffered. 
               This is a "short write".  Note that if N-1 == 0, then 
               ssl_SecureSend will return -1 with PR_WOULD_BLOCK_ERROR.
       -1  with error PR_WOULD_BLOCK_ERROR (EWOULDBLOCK), no data was sent.
       -1  with some other fatal error.  In this case, ssl_SecureSend must
               not be called again on this same socket.

In addition, when ssl_SecureSend is called to send amounts of more than 
16KB on a non-blocking socket, it may also return these "short write" 
values, which signify that some data was sent, but not all of it:
      m*16K     (a non-zero integer multiple of 16K, less than N)
     (m*16K)-1  

Note that if N == 4K, this function cannot return numbers such as 193 or 
(4K - 193).
Comment 17 Kai Engert (:kaie) 2007-04-25 17:42:30 PDT
Comment on attachment 262736 [details] [diff] [review]
experimental SSL thread patch

(In reply to comment #15)
> Kai, I can try tomorrow; let me know whether you want me to try the patch you
> attached or wait for the thing you're planning to look into, ok?


yes, please wait for the next patch
Comment 18 Nelson Bolyard (seldom reads bugmail) 2007-04-28 14:57:34 PDT
So, the challenge/question is how nsSSLThread::requestWrite knows that the 
caller has no more data to write, right?  

A timeout has been suggested, but may be complicated.  Here's another idea.

The caller uses a zero-length write to tell nsSSLThread::requestWrite that 
it has no more data to write immediately, and that nsSSLThread::requestWrite 
should flush its buffer out to SSL.  A zero length write need not necessarily
imply EOF, and it wouldn't hurt (other than being inefficient) to do it more
often than necessary.  
Comment 19 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-04-29 12:52:35 PDT
> The caller uses a zero-length write

Can't do that.  The caller that "knows" that there is nothing more to write is not calling nsSSLThread::requestWrite directly, and the APIs it's going through to do it are frozen and can't be used to do zero-length writes.

Redesigning all of Necko is a lot more complicated than anything nsSSLThread would have to do to deal with this situation, trust me.
Comment 20 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-21 12:35:12 PDT
Kai, note that we probably want this on branches too, since the NSS changes that cause problems here landed on the branch (and since we had to back out another fix we wanted on branch due to what I suspect is basically this problem)...  It might be good to try to coordinate the scheduling with the branch release schedule.
Comment 21 Kai Engert (:kaie) 2007-05-22 06:59:05 PDT
My assumption was, this is not critical, because we still behave correctly.

But I wonder if this could also be the cause for bug 377481, bug 368611, bug 380744, we should find out. I want to work on this next.
Comment 22 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-22 07:18:07 PDT
We may behave correctly, but we trigger a bug in a widely deployed web server...  The result are unacceptable regressions from a branch update, from a user's point of view.
Comment 23 Kai Engert (:kaie) 2007-05-22 09:47:18 PDT
(In reply to comment #15)
> > When the SSL thread discovers that it could not write out all its data, it
> > could immediately do an additional call with the remaining bytes.
> 
> Ah, hmm.  Yeah, that would work.  In fact, you would only need to do that if
> all but 1 bytes were read, if I understand the NSS setup correctly.

Actually, I fear it's not that simple.

libSSL returns N-1 if it could not flush out all data.
If SSLThread immediately calls write again on the nonblocking socket, it won't help, right? Chances are, the buffers are still full.

So SSLThread would have to wait for a bit.
But waiting causes us to block on all SSL sockets, so we shouldn' do that.

Boris, your other idea (SSL thread could report to caller that N+1 was written) seems difficult. Assuming the caller is done with all its data, who will trigger SSLThread to flush the remaining byte?

I believe we require another call to write, in order to give libSSL a chance to flush out the remaining bytes.

Ideally, I'd prefer to avoid reporting numbers based on assumptions.

So, time for a new idea.

If the SSLThread detects a short write (exactly one byte remaining),
it could:
- make a note of this state internally
- wake up the caller, but report back WOULDBLOCK
- assume the caller will try again (sooner or later the underlying socket will become writeable again and the caller will wake up?)
- we arrive at SSLThread::requestWrite and detect there is only a single byte left to be sent
- try to the single byte. If for some reason we still get wouldblock, report that back (again)
- finally writing the single byte succeeds, and we report back the full amount

I'll try to hack a patch now.
Comment 24 Kai Engert (:kaie) 2007-05-22 15:12:43 PDT
Created attachment 265722 [details] [diff] [review]
ssl-thread patch v3  (trunk and 1.8 branch) [THIS ONE checked in]

This patch implements the idea explained in the previous comment.
It applies to both 1.8 branch and trunk.

I tested this with plain-smtp-over-ssl.
Without the patch, I get oscillation.
With patch I see a stable transfer of 4096 byte blocks.

(I also tested sending the same message with an S/Mime signature, this gave me osciallation between 42 and 4054 bytes, but I believe this is the caller's fault.)

Boris, would you be be able to try this patch?
(Sorry for the delay)
Comment 25 Kai Engert (:kaie) 2007-05-22 15:14:12 PDT
Created attachment 265723 [details] [diff] [review]
whitespace ignore version of ssl-thread patch v3 (trunk and 1.8 branch)

I think this patch is a bit easier to review, because I changed some indendation, which is not visible in this patch.
Comment 26 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-22 17:56:47 PDT
I'm not going to be able to test this until June 2 at best...  I'm out of town until then.

Do you still see the S/Mime oscillation if you apply my first patch in this bug?

As for review... am I really the right person?  Are there no PSM peers?  :(  I'm not really going to be able to review until June 2 at best either; mid-July is more likely.
Comment 27 Kai Engert (:kaie) 2007-05-23 04:14:27 PDT
(In reply to comment #26)
> Do you still see the S/Mime oscillation if you apply my first patch in this
> bug?

On trunk, I do not get oscillation with S/Mime SMTP.
Both with and without your patch.

On 1.8 branch, I get oscillation with S/Mime SMTP.
But your patch does not apply to 1.8 branch.

Do you propose to backport your patch to 1.8 branch?


> As for review... am I really the right person?  Are there no PSM peers?  :( 

There are peers, sure.
I was interested in your opinion because you had worked on this already :-)
Comment 28 Kai Engert (:kaie) 2007-05-23 04:19:43 PDT
Boris, you nominated this bug for the 1.8.0 branch.
However, the SSL thread is not contained on that branch, it was new in 1.8.1, Firefox 2.0.

I think we can remove the nomination for 1.8.0 branch.

Looking at the original problem in bug 356470, we were told, FF 1.5.x does not have the bug.
Comment 29 Kai Engert (:kaie) 2007-05-23 04:43:09 PDT
(In reply to comment #27)
> But your patch does not apply to 1.8 branch.
> Do you propose to backport your patch to 1.8 branch?

Looks like 1.8 branch does not contain nsThreadUtils.h / nsRunnable, so probably your patch depends on new thread manager changes?
Comment 30 Wan-Teh Chang 2007-05-23 06:56:36 PDT
Kai, the latest FF 1.5.0.x, which also uses NSS 3.11.x, should have this bug.
The original problem in bug 356470 was reported while FF 1.5.0.x was still
using NSS 3.10.x (FF 1.5.0.6 and 1.5.0.7 were mentioned in the bug), and
bug 356470 comment 61 seems to indicate that FF 1.5.0.10 has the HTTPS
file upload bug.
Comment 31 Kai Engert (:kaie) 2007-05-23 07:19:54 PDT
(In reply to comment #30)
> Kai, the latest FF 1.5.0.x, which also uses NSS 3.11.x, should have this bug.
> The original problem in bug 356470 was reported while FF 1.5.0.x was still
> using NSS 3.10.x (FF 1.5.0.6 and 1.5.0.7 were mentioned in the bug), and
> bug 356470 comment 61 seems to indicate that FF 1.5.0.10 has the HTTPS
> file upload bug.


Ok, thanks for the clarification.

This means we'd need a different fix for 356470 on the 1.8.0 branch, because there is no SSL thread on that branch.
Comment 32 Kai Engert (:kaie) 2007-05-23 07:20:39 PDT
re-adding the blocking 1.8.0 branch ? flag
Comment 33 Kai Engert (:kaie) 2007-05-23 07:22:51 PDT
Because my earlier results were not clear, I decided to run a larger series of tests. I tried many combinations of trunk-or-branch, with-or-without-patches, and different-tests.

Here are my results:


                1.8 branch.  1.8 branch       1.8 branch
                no patches   bz+kaie patches  kaie patch

signed mail     
smtp sarttls    stable       stable           stable
imap-s sent     jumping      stable           stable

clear mail
smtp starttls   stable       stable           stable
imap-s sent     jumping      42+4054          stable

upload          jumping      165+3931         164+3932

problems?                    yes, xul.mfasl
                             startup crash


                trunk       trunk           trunk      trunk
                no patches  bz+kaie patches kaie patch bz patch

signed mail     
smtp sarttls    jumping     stable          stable     jumping
imap-s sent     jumping     42+4054         42+4054    jumping

clear mail
smtp starttls   jumping     stable          stable     jumping
imap-s sent     jumping     stable          42+4054    jumping

upload          jumping     stable          165+3931   jumping

problems?       all trunk builds fail on first attempt to copy
                mail to sent folder over imap-s


stable means:
  the initial exchange might have small packages
  but the bulk transfer until the end uses 4096 packets

jumping means:
  the bulk transfer uses alternating small and large packets,
    where the size constantly changes

n+n e.g. 42+4054 means:
  the initial exchange might have small packages
  but the bulk transfer uses a stable alternation between the
    two given packate sizes, 42, 4054, 42, 4054, 42, ...
Comment 34 Kai Engert (:kaie) 2007-05-23 07:33:15 PDT
Here is my summary for the 1.8 branch based on my test results:

The patch proposed by bzbarsky for the 1.8 branch (bug 356470 attachment 262592 [details] [diff] [review]), which I used in my tests on the 1.8 branch, seems to be problematic.
With that patch applied, we seem to get corrupt .mfasl files in the profile directory.
After each run the following attempt to start the application crashed.
I had to delete all *.mfasl files in my profile, then starting it up worked again.
When comparing the columns
  1.8 branch        1.8 branch 
  bz+kaie patches   kaie patch
there was no (real) difference in behaviour.

I said no "real" difference, because we have one 42+4054 vs. stable difference.
But this seems to be really random behaviour, caused by the caller.
Sometimes I get stable, sometimes I get 42+4054.

I conclude (bug 356470 attachment 262592 [details] [diff] [review]) should not be used on 1.8 branch.

While my patch v3 works fine to mail+ssl on the 1.8 branch, it still does not fix the file upload osciallation.

I do not know whether the 165+3931 oscillation is "good enough". (I could not reproduce any failures with the testcase from bug 356470, I suspect the server got fixed).


TODO:
If we require full stability on 1.8 branch for file uploads, I think someone would have to come up with a better networking / xpcom io patch for 1.8 branch.
Comment 35 Kai Engert (:kaie) 2007-05-23 07:39:58 PDT
Here is my summary for the trunk based on my test results:


Using only my patch v3, we see an improved situation (identical to the 1.8 branch situation):
- we get stable behaviour for mail, most of the time
- we sometimes get a little oscillation for mail
- we get improved behaviour for file upload (but still a little bit oscillating)

Using both Boris' patch from this bug (attachment 262729 [details] [diff] [review]) and my patch v3,
we see an additional improvement:
- file upload is now always stable
- (but we still get a little bit of oscillation with mail)


I propose we take both patches for trunk.
Comment 36 Kai Engert (:kaie) 2007-05-23 07:42:53 PDT
Comment on attachment 262729 [details] [diff] [review]
xpcom+netwerk patch v1, trunk only

Renaming Boris attachment from original: "_still_ doesn't work... but now the problem is definitely in PSM/NSS land"

to "xpcom+netwerk patch v1, trunk only"
Comment 37 Kai Engert (:kaie) 2007-05-23 07:44:15 PDT
Comment on attachment 265722 [details] [diff] [review]
ssl-thread patch v3  (trunk and 1.8 branch) [THIS ONE checked in]

renaming "Patch v3" to "ssl-thread patch v3  (trunk and 1.8 branch)"
Comment 38 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-23 08:44:44 PDT
> Do you propose to backport your patch to 1.8 branch?

Possibly.  Depends on what the HTTP oscillation looks like once the SSL changes happen.  And yeah, I'd need to look up what all those APIs look like there and whatnot.  :(

> I was interested in your opinion because you had worked on this already :-)

If it fixes the "clamp oscillations to 1/4095" issue, I'm pretty happy, I think.... ;)  I could sr, I guess, but I'd want someone who knows this code to r=.

> The patch proposed by bzbarsky for the 1.8 branch (bug 356470 attachment
> 262592 [details]), which I used in my tests on the 1.8 branch, seems to be
> problematic.

Oh, that patch is just wrong.  I figured that out while working on the patch in this bug, but just forgot to mark it obsolete.  Sorry about that.  We'll just need to port my patch from this bug to branch, if possible and wanted.
Comment 39 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-05-25 07:49:58 PDT
So my point is that someone familiar with this code should review this before I do, esp since I'm not sure when I'll be able to do it.  :(  It might well not be until mid-July at this point; my time for working on Mozilla things basically ran out about a week ago.
Comment 40 Kai Engert (:kaie) 2007-06-05 17:50:24 PDT
Good news.

I might have said earlier that I'm unable to reproduce.
Yes, I'm still unable to reproduce on Linux.

But I am finally able to reproduce the original bug on Windows, using the testcase from bug 356470 at https://staging.sr.admission.net/joetemp/file_upload.html

My first test was on the 1.8 branch, Firefox 2, Windows.
It seems my Patch v3 is sufficient and makes the upload succeed.

I'm currently building Firefox trunk on Windows and will test that, too.
Comment 41 Kai Engert (:kaie) 2007-06-05 19:20:05 PDT
Windows trunk: upload fails.

Windows trunk with Patch v3: upload works
Comment 42 Robert Relyea 2007-06-06 18:47:56 PDT
Comment on attachment 265723 [details] [diff] [review]
whitespace ignore version of ssl-thread patch v3 (trunk and 1.8 branch)

please check in the white space version, however.
Comment 43 Kai Engert (:kaie) 2007-06-10 16:47:22 PDT
I checked in patch v3.

Please go ahead and test tomorrow's nightly builds against any broken servers.

We might also check this patch into the 1.8 branch for FF/TB 2.0.x after some more baking.

(Unfortunately, this patch is not appropriate for the 1.8.0 branch (FF/TB 1.5.x) because it does not have a SSL thread)

The oscillation is not completely gone. But I believe that any further optimization needs to happen at the levels above PSM (necko, mailnews, etc.). (and any optimization attempts on the 1.8.0 branch for FF/TB 1.5.x, too)

Comment 44 Kai Engert (:kaie) 2007-06-10 16:54:03 PDT
Comment on attachment 265723 [details] [diff] [review]
whitespace ignore version of ssl-thread patch v3 (trunk and 1.8 branch)

Requesting approval for 1.8.1.5 for FF/TB 2.0.0.x

But proposing a couple of days additional trunk baking.
Comment 45 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-06-10 16:58:59 PDT
I filed bug 383976 on my xpcom/netwerk patch.
Comment 46 Daniel Veditz [:dveditz] 2007-06-19 11:04:30 PDT
Not a blocker, but a fix would be nice so we'll look at the approval request.

Given the checkin is this now "FIXED" on trunk? Feedback from that is one thing we'll be looking for before approving the trunk patch.
Comment 47 Kai Engert (:kaie) 2007-06-20 06:00:52 PDT
This is now "MUCH IMPROVED" on trunk.

It is "FIXED for me" when I tested the testcase from bug 356470.

I wouldn't call it completely fixed, as someone might want to improve Necko to completely eliminate all kinds of oscillation.
Comment 48 Benjamin Smedberg [:bsmedberg] 2007-06-20 06:05:36 PDT
Let's mark this one FIXED and file another bug if there are additional imporovements needed.
Comment 49 Boris Zbarsky [:bz] (Out June 25-July 6) 2007-06-21 12:23:27 PDT
It's already filed: see comment 45.
Comment 50 Daniel Veditz [:dveditz] 2007-06-28 11:42:44 PDT
Comment on attachment 265723 [details] [diff] [review]
whitespace ignore version of ssl-thread patch v3 (trunk and 1.8 branch)

approved for 1.8.1.5, a=dveditz for release-drivers
Comment 51 Kai Engert (:kaie) 2007-07-06 18:40:28 PDT
Checked in for 1.8.1.5

Note You need to log in before you can comment on or make changes to this bug.