Closed
Bug 203379
Opened 22 years ago
Closed 22 years ago
eliminate some memcpy calls from MD5
Categories
(NSS :: Libraries, enhancement, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
3.9
People
(Reporter: bugz, Assigned: bugz)
Details
(Whiteboard: 3.3.5)
Attachments
(2 files, 5 obsolete files)
10.78 KB,
patch
|
Details | Diff | Splinter Review | |
2.53 KB,
text/html
|
Details |
when handling "large" input to MD5_Update, we fill the md5 state with input
bytes, do a compression, then proceed in chunks of 64 bytes until the remainder
is less then 64, then copy the remainder into the state.
During the loop over 64-byte chunks, we memcpy into the md5 state buffer and
compress the state. However, the code does not modify the state buffer, only
the 4 state variables, independent of the buffer. Hence there is no need for
the memcpy, the compression function could act on the input buffer directly.
![]() |
Assignee | |
Comment 1•22 years ago
|
||
This eliminates about 100 calls to memcpy during the handling of an SSL request
(based on using selfserv/strsclnt).
![]() |
Assignee | |
Updated•22 years ago
|
Attachment #121710 -
Flags: review?(nelsonb)
![]() |
||
Comment 2•22 years ago
|
||
Comment on attachment 121710 [details] [diff] [review]
suggested patch to remove memcpy during loop
r=nelsonb. Good Job!
Attachment #121710 -
Flags: review?(nelsonb) → review+
![]() |
||
Comment 3•22 years ago
|
||
I yielded a Bus Error stressing selfserv with this patch on a Solaris
box (dsame2). I reproduced the failure with a debug build, and
gathered this trace:
Running: selfserv -t 100 -d
/tmp/workarea-nss/mozilla/tests_results/security/dsame2.1/server -n dsam
e2.red.iplanet.com -p 12344 -w nss -L 30
(process id 14355)
Reading libfreebl_hybrid_3.so
Warning: freeing memory block 58b28 from ordinary malloc
t@104 (l@9) signal BUS (invalid address alignment) in md5_compress at line 331
in file "md5.c"
331 wbuf[0] = lendian(wb[0]);
(/usr/dist/pkgs/forte_dev,v6.2/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where
current thread: t@104
=>[1] md5_compress(cx = 0x183ba0, buf = 0x1911d2 "\\x"), line 331 in "md5.c"
[2] MD5_Update(cx = 0x183ba0, input = 0x1911d2 "\\x", inputLen = 108U), line
441 in "md5.c"
[3] MD5_Update(cx = 0x183ba0, input = 0x1911bc "", inputLen = 130U), line 677
in "loader.c"
[4] NSC_DigestUpdate(hSession = 11U, pPart = 0x1911bc "", ulPartLen = 130U),
line 1173 in "pkcs11c.c"
[5] PK11_DigestOp(context = 0x181110, in = 0x1911bc "", inLen = 130U), line
4157 in "pk11skey.c"
[6] ssl3_UpdateHandshakeHashes(ss = 0x15f2e0, b = 0x1911bc "", l = 130U), line
2506 in "ssl3con.c"
[7] ssl3_HandleHandshakeMessage(ss = 0x15f2e0, b = 0x1911bc "", length =
130U), line 7957 in "ssl3con.c"
[8] ssl3_HandleHandshake(ss = 0x15f2e0, origBuf = 0x15f488), line 8109 in
"ssl3con.c"
[9] ssl3_HandleRecord(ss = 0x15f2e0, cText = 0xfd7cec48, databuf = 0x15f488),
line 8378 in "ssl3con.c"
[10] ssl3_GatherCompleteHandshake(ss = 0x15f2e0, flags = 0), line 203 in
"ssl3gthr.c"
[11] ssl_GatherRecord1stHandshake(ss = 0x15f2e0), line 1260 in "sslcon.c"
[12] ssl_Do1stHandshake(ss = 0x15f2e0), line 149 in "sslsecur.c"
[13] ssl_SecureRecv(ss = 0x15f2e0, buf = 0xfd7cf178 "", len = 10240, flags =
0), line 970 in "sslsecur.c"
[14] ssl_SecureRead(ss = 0x15f2e0, buf = 0xfd7cf178 "", len = 10240), line 989
in "sslsecur.c"
[15] ssl_Read(fd = 0x1749c0, buf = 0xfd7cf178, len = 10240), line 1262 in
"sslsock.c"
[16] PR_Read(fd = 0x1749c0, buf = 0xfd7cf178, amount = 10240), line 138 in
"priometh.c"
[17] handle_connection(tcp_sock = 0x1749c0, model_sock = 0x5d628, requestCert
= 0), line 861 in "selfserv.c"
[18] jobLoop(a = (nil), b = (nil), c = 0), line 488 in "selfserv.c"
[19] thread_wrapper(arg = 0x16021c), line 456 in "selfserv.c"
[20] _pt_root(arg = 0x16e650), line 214 in "ptthread.c"
(/usr/dist/pkgs/forte_dev,v6.2/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx)
Can't recall seeing the warning:
Warning: freeing memory block 58b28 from ordinary malloc
I did have the zone allocator engaged.
Its about the 23rd time we hit the deadly line:
t@104 (l@4) stopped in md5_compress at line 331 in file "md5.c"
331 wbuf[0] = lendian(wb[0]);
dbx> p cx
cx = 0x2eb3b8
dbx> p *cx
*cx = {
lsbInput = 1516U
msbInput = 0
cv = (1131376117U, 339469488U, 434704201U, 1222861192U)
u = {
b = "\0177\001\030\f\001\020JZÙïvY\võåØ9/±ÏS\024ެ1*JöI퀵 \016"
w = (2130806296U, 201396298U, 1524232054U, 1493956069U, 3627626417U,
3478328500U, 2888903242U, 4132040100U, 3038776832U, 4096U, 8519808U,
1407565377U, 3979923787U, 1113336828U, 4007556908U, 1833807144U)
}
}
dbx> p wb
wb = 0x31ac42
dbx> p *wb
*wb = 170430278U
dbx>
Comment 4•22 years ago
|
||
Apparently one of the purposes of the memcpy is to
make sure the input data is in a buffer that is
32-bit aligned. This patch now simply casts a
pointer to the original input buffer to a (PRUint32 *)
pointer. This typecast is unsafe.
Comment 5•22 years ago
|
||
I suggest that we implement a two-part solution.
1. In md5_compress, we do the memcpy only if the
original input buffer is not 32-bit aligned. If
the original input buffer is 32-bit aligned, we
operate on it directly.
2. Modify the callers of md5_compress to allocate
input buffers on 32-bit boundaries.
![]() |
Assignee | |
Comment 6•22 years ago
|
||
I realized that problem some time after I filed the bug... However, I still
think it's possible to eliminate the memcpy calls (all of them, not just the
~25% that happen to be word-aligned) on big-endian machines, because of the
assignments we do at the beginning of md5_compress. The trick would be to do
the assignments using 4-bytes instead of ints.
![]() |
Assignee | |
Comment 7•22 years ago
|
||
This patch removes the same memcpy call as the last patch, in most cases, but
in a safer way.
1. On big-endian architectures, we have to pre-process the input buffer,
treating each set of 4 bytes as a word, then converting the word to
little-endian format. Previously, we memcpy'ed into the full buffer, then
operated on each word. This patch creates each word from a set of 4 bytes,
thus filling the input buffer with assignment operations instead of memcpy (we
were already doing assignments anyway).
2. I tested the previous patch on x86, which is why I didn't see a crash. We
can safely cast a set of uint8s to a set of uint32s on this platform. This
second patch preserves the same behavior as the previous patch on x86.
I wrote some code that takes a 1024-byte buffer and loops over it calling
MD5_Update, first doing all 1024 bytes, then the last 1023 bytes, then the last
1022 bytes, and so on. This should test all possible alignments of a 32-bit
word equally. I then did this loop 1000 times. Timing measurements below are
in milliseconds.
On Linux x86 dual-400MHz Pentium,
before patch: 14854, 14840, 14844
after patch: 13595, 13614, 13655
On Solaris 5.8 Sparc5,
before patch: 25539, 25552, 25546
after patch: 23230, 23260, 23222
Attachment #121710 -
Attachment is obsolete: true
![]() |
||
Comment 8•22 years ago
|
||
Comment on attachment 122041 [details] [diff] [review]
revised patch, handle misaligned buffers
looks right. Sorry I missed the alignment implications of the previous patch.
:(
Attachment #122041 -
Flags: review+
![]() |
||
Comment 9•22 years ago
|
||
Comment on attachment 122041 [details] [diff] [review]
revised patch, handle misaligned buffers
I saw selfserv stress generate the following messages with this patch:
strsclnt: PR_Write returned error -12272:
SSL peer reports incorrect Message Authentication Cod
selfserv: HDX PR_Read returned error -12273:
SSL received a record with an incorrect Message Authentication Code.
I cleaned and did restarts again, and yielded these:
SSL peer cannot verify your certificate.
selfserv: HDX PR_Read returned error -12271:
Peer's certificate has an invalid signature.
strsclnt: -- SSL: Server Certificate Invalid, err -81
82.
SunOS dsame2 5.8 Generic_108528-19 sun4u sparc SUNW,Ultra-4
Comment 10•22 years ago
|
||
Comment on attachment 122041 [details] [diff] [review]
revised patch, handle misaligned buffers
Is _X86_ defined for Linux/x86? You should add a comment
explaining why the memcpy call is not necessary for x86.
Comment 11•22 years ago
|
||
Comment on attachment 122041 [details] [diff] [review]
revised patch, handle misaligned buffers
>+#ifdef IS_LITTLE_ENDIAN
>+#ifdef _X86_
>+ wBuf = (PRUint32 *)input;
>+#else
> memcpy(cx->inBuf, input, MD5_BUFFER_SIZE);
>- md5_compress(cx);
>+ wBuf = cx->u.w;
>+#endif
In the non-x86 case, if "input" is 32-bit aligned, we
don't need the memcpy call and can set wBuf to "input".
![]() |
Assignee | |
Comment 12•22 years ago
|
||
I'm not sure what's going on here. I was able to reproduce the problems Kirk
saw on Solaris, though not by using blapitest (that is, the MD5 self-test would
pass, but SSL wouldn't work).
I couldn't see anything wrong with the patch, so I made incremental changes
until I saw the error. The update-compress code looks like this:
compress() {
prepare_state_as_little_endian(cx);
/* md5 compression code */
}
update() {
compress(cx);
}
The last patch changed it to this:
compress() {
/* md5 compression code */
}
update() {
prepare_state_as_little_endian(cx);
compress();
}
That simple change, independent of any others, causes Solaris to stop working.
The new patch restores the old order, but does it while removing the memcpy in
most cases. I'm not sure what to make of it, but it works. Perhaps a compiler
bug? This was happening on both debug and optimized builds, though.
This patch also incorporates Wan-Teh's suggestions. I did some testing to see
if the new if-branch causes any performance loss, doesn't seem to:
Solaris
before patch: 25547 25566 25619
after patch: 23225 23218 23231
Linux (different machine, huey):
before patch: 6003 6003 6003
after patch: 5537 5537 5537
Attachment #122041 -
Attachment is obsolete: true
![]() |
Assignee | |
Updated•22 years ago
|
Whiteboard: 3.3.5
![]() |
||
Comment 13•22 years ago
|
||
What I want to know is this: Is the problem with patch 2 above experienced on
any platform other than Solaris? Or is it only on Solaris?
Has it been tested on other big-endian platforms?
![]() |
Assignee | |
Comment 14•22 years ago
|
||
After posting that comment, I built a tree with patch #2 on my powerbook, which
is big-endian. I saw the same failure. I tried patch #3. Kudos to the first
person who can explain what is going on to me ;)
![]() |
Assignee | |
Comment 15•22 years ago
|
||
Sorry, that last comment should say that I then tried patch #3 and it worked.
![]() |
Assignee | |
Comment 16•22 years ago
|
||
I don't think the problem is in the MD5 code. I played around some more, and
the difference described in comment #12 is all the difference. That is, calling
the function to modify the state buffer before md5_compress never works, calling
it inside of md5_compress always works. Nothing happens in the interim, it's:
md5_compress() {
a = cx->cv[0];
...
}
md5_update() {
md5_prep_state_le(cx);
md5_compress(cx);
}
vs.
md5_compress() {
md5_prep_state_le(cx);
a = cx->cv[0];
...
}
md5_update() {
md5_compress(cx);
}
I believe some other part of the code is trampling on memory here, possibly the
function call stack. I would try this under purify, but I only have it on
wintel :( I guess I could try forcing it down the big-endian path.
I tried doing an MD5 hash of a large file, it works with both patch #2 and patch
#3. So far, the failure has only been seen with SSL. It will happen for a
single thread and single connection. That's all I know for now.
![]() |
||
Comment 17•22 years ago
|
||
Selfserv stress appears to be a wash.
![]() |
Assignee | |
Comment 18•22 years ago
|
||
sigh... Time for Ian to enter the programmer penalty box...
The problem was in the patch, and the result of bad programming practice by the
original author of the MD5 code (me) who didn't use curly braces around an if
statement. I added a new line of code after the if, and under the right
conditions (provoked by SSL), the new line would only be executed when the
expression was true. But the new line (the call to md5_prep_state_le()) was
meant to happen in ALL cases.
This didn't happen in my test using a large file because blapitest only called
MD5_Update once.
This new patch is the same as patch #2, already reviewed by Nelson, but with
the appropriate curly braces.
Attachment #122235 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 19•22 years ago
|
||
Forgot to include Wan-Teh's suggestions (comment #10 and comment #11). Must be
Friday.
Attachment #122309 -
Attachment is obsolete: true
![]() |
||
Comment 20•22 years ago
|
||
I registered an improvement in selfserv performance with
the latest patch (md5_fixed3.patch). Restarts showed an
small gain with less CPU usage. I did two runs, with the
patch engaged, and saw idle a percentage point up in both.
Attachment #122283 -
Attachment is obsolete: true
![]() |
Assignee | |
Comment 21•22 years ago
|
||
checked in to tip:
Checking in md5.c;
/cvsroot/mozilla/security/nss/lib/freebl/md5.c,v <-- md5.c
new revision: 1.8; previous revision: 1.7
done
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
![]() |
Assignee | |
Updated•22 years ago
|
Priority: -- → P2
Target Milestone: --- → 3.9
![]() |
Assignee | |
Comment 22•22 years ago
|
||
Checked into 3.3 branch as md5.c rev 1.5.48.1.
You need to log in
before you can comment on or make changes to this bug.
Description
•