Closed Bug 54081 Opened 24 years ago Closed 24 years ago

Data corruption uploading binary data (such as GIF's)

Categories

(Core :: Networking, defect, P2)

x86
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: jesup, Assigned: darin.moz)

References

(Blocks 1 open bug)

Details

(Keywords: dataloss, Whiteboard: [rtm++][dogfood+][nsbeta3-])

Attachments

(18 files)

39.97 KB, image/gif
Details
39.97 KB, image/gif
Details
39.97 KB, image/gif
Details
39.97 KB, image/gif
Details
39.97 KB, image/gif
Details
39.97 KB, image/gif
Details
3.48 KB, text/plain
Details
3.34 KB, text/plain
Details
252 bytes, text/html
Details
255 bytes, text/html
Details
38.97 KB, image/png
Details
2.10 KB, text/plain
Details
74.76 KB, application/octet-stream
Details
74.76 KB, application/octet-stream
Details
561 bytes, patch
Details | Diff | Splinter Review
779 bytes, patch
Details | Diff | Splinter Review
275.18 KB, image/gif
Details
101.71 KB, image/gif
Details
FreeBSD 4.1 20000925xx

When uploading a 40K GIF to bugzilla, I found when viewing it the image stopped
after showing part of it.  Closer inspection shows that the data uploaded
differs at regular intervals from the original.

My file differed at ~0x489f to ~0x5e7a, ~0x708b to ~0x7e7a, and ~0x8d0f to
~0x9e7a.  The end of each section of corruption is very interesting (0x...e7a).

Probably closely related to bug 47936, which is nsbeta3+ dogfood+ and supposedly
fixed.  I wonder if this is a regression due to that fix.
Dataloss
Nominating for nsbeta3, dogfood
qawanted for other platforms (I can't run win95 dailys for the last week).
upload with Win32 2000092708 was ok.  Will retest with fresh pull/build inder
FreeBSD today
This may be a dup of bug 53341. See additional comments from me there
No joy.  The errors are still there, but the exact position doesn't seem
identical from attempt to attempt.  I'm going to re-check with win32. 53341 is
marked Linux, so maybe this is a general Unix (but not win32) problem?  Sorry,
my Win32 box has become unstable and can't even download Mozilla - I'll try
again later.  ANy other Win32 testers would be welcomed.

OS->Linux since the problem is seen there, and there's no catch-all "Unix" entry
(and Linux reports get more attention).
OS: FreeBSD → Linux
Uploading images is a very common use of a browser.  Even if we stay with the PG
rated crowd, folks still push images of their family members etc.  The
work-around is to use another browser... which makes this dogfood-plus.
I think it is now too late to make the beta3 train, so I'm with sadness marking
this beta3-, but putting in an rtm nomination, and bumping to P2 priority (this
is MAJOR functionality!!)
Keywords: rtm
Priority: P3 → P2
Whiteboard: [dogfood+][nsbet3-]
This does seem to be a dupe of Bug 52030, but this bug has gotten more and
better attention.  Not marking dupe.
Spam: fixing broken whiteboard
Whiteboard: [dogfood+][nsbet3-] → [dogfood+][nsbeta3-]
The retest with win32 worked again, so I'm assuming it's a Linux/FreeBSD/Unix bug.
Adding eric pollman since he's assigned 53341 which is a dup of this (or vice
versa).  52030 is also a dup.  I'm going to close both of those out as dups and
let this one be the flag-carrier (since it's now dogfood+).

Eric or Gagan - who should have this?  Is it networking or forms?
*** Bug 53341 has been marked as a duplicate of this bug. ***
*** Bug 52030 has been marked as a duplicate of this bug. ***
Randell: thanks for these test cases and yes its going to be one of us (pollmann
for forms and me for networking)
Eric: You want to take the first crack at this?
Assignee: gagan → pollmann
Marking rtm+.
Whiteboard: [dogfood+][nsbeta3-] → [rtm+][dogfood+][nsbeta3-]
Marking [rtm need info] Waiting for patch to be attached, review + super-review.
Whiteboard: [rtm+][dogfood+][nsbeta3-] → [rtm need info][dogfood+][nsbeta3-]
*** Bug 54463 has been marked as a duplicate of this bug. ***
When posting multipart form data a temporary file is being written and then that 
file is read by netlib and sent.

I wrote a small C application that reads in the formpost multipart file and 
writes out the data portion. This way I can view the actual gifs that were 
written into the formpost file. I have uploaded several files one that was 
almost 3 megs and the data in the formpost looked fine.

I can only conclude at this point that it is a netlib issue. I will attach my 
little helper app.
reassigning
Assignee: pollmann → gagan
I did all my initial testing on WinNT and just noticed the bug is for Linux. So
I just tested it on Linux with the little helper app and I was able to see the
gif that was inside the formpost data file just fine.
Please feel free to test using the above testcase - it will upload a file to a
web server I have set up to save the result to a file.  The file can then be
displayed and/or deleted by clicking on a link in the resulting page.  (Thanks
to Rod for the idea!)  There are about 4 MBytes of space free on the server.
Ack, ignore the above test case - it doesn't work (Looking into why, probably
not related to this bug).

I will attach a second testcase momentarily that does an equivalent thing.  I
was able to attach a large .gif image with no corruption using the new testcase
on my FreeBSD build. (Pulled yesterday from the tip)  Can anyone confirm that
this corruption bug exists in recent build?
This bug still exists in Linux on build 2000100809. 

I attempted to upload a screenshot for a bug of mine and it was corrupted. The
URL is  http://bugzilla.mozilla.org/showattachment.cgi?attach_id=16477 

Also adding self to cc list

David
Okay, as you can see in the last post, small files are okay ( < about 60K ), but
after that they become corrupted.  This is Linux Build 2000101206.
We have no patch for this rather serious bug.  Is anyone on the scent of this
defect?  If so, please let me know and I will start trying to track it down.  I
don't think I'll get very far starting from scratch.
I presume this is a regression could we try and find out when it first 
happened?

Marking relnote: ~Binary images seem to be corrupted, please use ns6.01.~
Keywords: relnote, relnoteRTM
I must not understand this right. In the release note of Netscape 6, we're going
to tell them to use 6.01 (which doesn't exist yet) to solve a *known* data loss
problem?

Am I really reading this right?
Confirming on Linux build 2000101509.  The formpost-XX temporary file also
appears to to have been generated correctly, and when I upload it, I find
the file similarly corrupted.

Investigating Necko...
Well, since this one is fixed in 6.01, does anyone have any idea about the 
patch?

Has anyone traced the dataflow to find where the corruption begins?

This is one of only two "corruption" bugs with votes at the moment.
I am working on tracing the dataflow.  So far, I found that it is
not a problem with nsHTTPEncodeStream, b/c I can send data using
application/x-www-form-urlencoded (which bypasses nsHTTPEncodeStream)
that also gets corrupted.  That means the problem must lie in the
actual data transfer, and that's where I'm currently investigating.
Ok, this looks like a transport problem.  The corruption occurs after the first
80470007 [NS_BASE_STREAM_WOULD_BLOCK] error.  The bytes later written (when the
socket becomes writable) are corrupted.

Take a look at the following attachments...
1) the nsSocketTransport log file.
2) the formpost (as generated by nsFormFrame.cpp) stripped of its header.
3) the content section of the HTTP transaction (as received by the server).
What I should also mention is that if you count the number of bytes transmitted
(by looking at attachment 17565 [details]) before the first 80470007 error, you'll get
a total of 55024 bytes.  If you "diff" the formpost before (attachment 17566 [details])
and after (attachment 17567 [details]) transmission, you'll find a discrepancy at byte
54897.  What's going on?  Well, the transmission also includes a header of 127
bytes, and 54897 + 127 equals 55024.  So, it seems like whenever we can't write
all the data without blocking, we're going to get into trouble.
Could the buffer be being deallocated after WOULD_BLOCK, or the buffer pointer
not bumped, or bumped incorrectly?
Great investigative work darin!  Since this problem dwells in nsSocketTransport,
do you believe that it also affects FTP upload and SMTP?
It's hard for me to say what is actually the culprit here.  It'd be nice to have
someone, who knows nsSocketTransport, to take a look at this problem.  From
just looking at the log file and the source code, it's not obvious (at least
not to me) where the problem is occuring.

This looks similar to bug 47936... can someone familar with that bug take a
look.  This is probably a regression of that bug.

I would assume this bug would cause problems with any "large" socket writes,
including FTP upload and SMTP, so this is definitely an important bug to fix!
So, it sounds like this is a bug with the AsyncWrite case of the 
nsSocketTransport...

Is this bug really Linux only?  Or do all platforms fail if a WOULD_BLOCK occurs 
during the write?

-- rick
Since PR_Write == PR_Send and can return less # of bytes then requested, but not 
WOULD_BLOCK - we may want to check if the code behaves correctly in such case
It looks like we are properly handling the case that ruslan mentions.
From nsSocketTransport.cpp(1083):

    len = PR_Write(fd, fromRawSegment, count);
    if (len > 0) {
      *writeCount = (PRUint32)len;
    }

So, irrespective of the error condition, we always remember how many bytes
were written.
This fix turns out to be a one-liner!!  We were ignoring the toOffset
parameter passed to nsWriteToSocket.  This parameter is used when
nsWriteToSocket is called from doWriteFromStream (which occurs when
we do a form POST and probably in many other cases as well).

I'm attaching a patch to nsSocketTransport.cpp which fixes this.  r=?
I seem to remember another time when we proposed this change, and warren thought 
that it was wrong...

I don't remember why -it seems like the right thing to do...

warren, what do you think?
-- rick
If you look at the way nsPipe2.cpp would call nsWriteToSocket (in
ReadSegments), for example, you would see that using the toOffset
parameter in this way makes sense.  The thing that bothers me, though,
is why on earth is it called toOffset if it is to be used in this
manner?  It seems like it should be called fromOffset or something
like that... but, maybe I'm missing something?!?
I don't think that's right. toOffset is the offset in the destination, not the 
source. The source buffer offset is adjusted for you by the pipe code's 
implementation of ReadSegments. 

Or is this being called from some other implementation of ReadSegments? Ever 
since we combined nsIBufferInputStream with nsIInputStream, there's more of 
these around the codebase. Maybe the problem is in the caller.
You're totally right, and that makes more sense.  Then the correct thing
to do in the case of nsSocketTransport is to similarly pre-offset the
read buffer before calling nsWriteToSocket.  In doWriteFromStream this
effects the code as such:

  nsWriteToSocket(nsnull, mSocketFD, mWriteBuffer, mWriteBufferIndex, ...

should become:

  nsWriteToSocket(nsnull, mSocketFD, mWriteBuffer + mWriteBufferIndex, 0, ...

and then nsWriteToSocket can ignore the toOffset parameter as before. What
do you think? 
Right, that explains the problem. Have you verified that the call to 
nsWriteToSocket at line 1371 is the only offender?

Also, it's not that nsWriteToSocket ignores the toOffset, but that toOffset 
really doesn't make any sense for sockets. They're not things you can offset 
into.
Yes.. the only other place it gets called is through nsIPipe::ReadSegments.
I'm attaching a new patch.
Attached patch Revised patchSplinter Review
r=warren
mscott: can you s/r this? thx.
Assignee: gagan → darin
sr=mscott

just to be safe, Darin, can you try sending yourself a message via SMTP just to
make sure that writing data to the socket in the smtp case isn't adversley
effected by this change? 
SMTP checks out ok (~300k attachment w/o corruption)
rtm++
Whiteboard: [rtm need info][dogfood+][nsbeta3-] → [rtm++][dogfood+][nsbeta3-]
cc: esther - we should try this out once fixed for mail attachments.
Landed patch on the trunk and patch per PDT approval.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Sorry, I meant... the patch is on the trunk and the BRANCH...
Attached image test upload
verified on branch:
WinNT 1024
Linux rh6 1024
Mac os9 1024

used test case provided and attachment for uploading, also checked SMTP

needs checked on trunk
adding vtrunk keyword
Keywords: vtrunk
verified on trunk
WinNT 2000013120
Linux rh6 2000013106
Mac os8.6 2001010209
Status: RESOLVED → VERIFIED
Keywords: vtrunk
*** Bug 42047 has been marked as a duplicate of this bug. ***
Keywords: qawanted, rtmnsrtm
-renoteRTM
Keywords: relnoteRTM
-relnote
Keywords: relnote
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: