Closed Bug 203379 Opened 22 years ago Closed 22 years ago

eliminate some memcpy calls from MD5

Categories

(NSS :: Libraries, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugz, Assigned: bugz)

Details

(Whiteboard: 3.3.5)

Attachments

(2 files, 5 obsolete files)

when handling "large" input to MD5_Update, we fill the md5 state with input bytes, do a compression, then proceed in chunks of 64 bytes until the remainder is less then 64, then copy the remainder into the state. During the loop over 64-byte chunks, we memcpy into the md5 state buffer and compress the state. However, the code does not modify the state buffer, only the 4 state variables, independent of the buffer. Hence there is no need for the memcpy, the compression function could act on the input buffer directly.
This eliminates about 100 calls to memcpy during the handling of an SSL request (based on using selfserv/strsclnt).
Attachment #121710 - Flags: review?(nelsonb)
Comment on attachment 121710 [details] [diff] [review] suggested patch to remove memcpy during loop r=nelsonb. Good Job!
Attachment #121710 - Flags: review?(nelsonb) → review+
I yielded a Bus Error stressing selfserv with this patch on a Solaris box (dsame2). I reproduced the failure with a debug build, and gathered this trace: Running: selfserv -t 100 -d /tmp/workarea-nss/mozilla/tests_results/security/dsame2.1/server -n dsam e2.red.iplanet.com -p 12344 -w nss -L 30 (process id 14355) Reading libfreebl_hybrid_3.so Warning: freeing memory block 58b28 from ordinary malloc t@104 (l@9) signal BUS (invalid address alignment) in md5_compress at line 331 in file "md5.c" 331 wbuf[0] = lendian(wb[0]); (/usr/dist/pkgs/forte_dev,v6.2/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where current thread: t@104 =>[1] md5_compress(cx = 0x183ba0, buf = 0x1911d2 "\\x"), line 331 in "md5.c" [2] MD5_Update(cx = 0x183ba0, input = 0x1911d2 "\\x", inputLen = 108U), line 441 in "md5.c" [3] MD5_Update(cx = 0x183ba0, input = 0x1911bc "", inputLen = 130U), line 677 in "loader.c" [4] NSC_DigestUpdate(hSession = 11U, pPart = 0x1911bc "", ulPartLen = 130U), line 1173 in "pkcs11c.c" [5] PK11_DigestOp(context = 0x181110, in = 0x1911bc "", inLen = 130U), line 4157 in "pk11skey.c" [6] ssl3_UpdateHandshakeHashes(ss = 0x15f2e0, b = 0x1911bc "", l = 130U), line 2506 in "ssl3con.c" [7] ssl3_HandleHandshakeMessage(ss = 0x15f2e0, b = 0x1911bc "", length = 130U), line 7957 in "ssl3con.c" [8] ssl3_HandleHandshake(ss = 0x15f2e0, origBuf = 0x15f488), line 8109 in "ssl3con.c" [9] ssl3_HandleRecord(ss = 0x15f2e0, cText = 0xfd7cec48, databuf = 0x15f488), line 8378 in "ssl3con.c" [10] ssl3_GatherCompleteHandshake(ss = 0x15f2e0, flags = 0), line 203 in "ssl3gthr.c" [11] ssl_GatherRecord1stHandshake(ss = 0x15f2e0), line 1260 in "sslcon.c" [12] ssl_Do1stHandshake(ss = 0x15f2e0), line 149 in "sslsecur.c" [13] ssl_SecureRecv(ss = 0x15f2e0, buf = 0xfd7cf178 "", len = 10240, flags = 0), line 970 in "sslsecur.c" [14] ssl_SecureRead(ss = 0x15f2e0, buf = 0xfd7cf178 "", len = 10240), line 989 in "sslsecur.c" [15] ssl_Read(fd = 0x1749c0, buf = 0xfd7cf178, len = 10240), line 1262 in "sslsock.c" [16] PR_Read(fd = 0x1749c0, buf = 0xfd7cf178, amount = 10240), line 138 in "priometh.c" [17] handle_connection(tcp_sock = 0x1749c0, model_sock = 0x5d628, requestCert = 0), line 861 in "selfserv.c" [18] jobLoop(a = (nil), b = (nil), c = 0), line 488 in "selfserv.c" [19] thread_wrapper(arg = 0x16021c), line 456 in "selfserv.c" [20] _pt_root(arg = 0x16e650), line 214 in "ptthread.c" (/usr/dist/pkgs/forte_dev,v6.2/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) Can't recall seeing the warning: Warning: freeing memory block 58b28 from ordinary malloc I did have the zone allocator engaged. Its about the 23rd time we hit the deadly line: t@104 (l@4) stopped in md5_compress at line 331 in file "md5.c" 331 wbuf[0] = lendian(wb[0]); dbx> p cx cx = 0x2eb3b8 dbx> p *cx *cx = { lsbInput = 1516U msbInput = 0 cv = (1131376117U, 339469488U, 434704201U, 1222861192U) u = { b = "\0177\001\030\f\001\020JZÙïvY\võåØ9/±ÏS\024ެ1*JöI퀵 \016" w = (2130806296U, 201396298U, 1524232054U, 1493956069U, 3627626417U, 3478328500U, 2888903242U, 4132040100U, 3038776832U, 4096U, 8519808U, 1407565377U, 3979923787U, 1113336828U, 4007556908U, 1833807144U) } } dbx> p wb wb = 0x31ac42 dbx> p *wb *wb = 170430278U dbx>
Apparently one of the purposes of the memcpy is to make sure the input data is in a buffer that is 32-bit aligned. This patch now simply casts a pointer to the original input buffer to a (PRUint32 *) pointer. This typecast is unsafe.
I suggest that we implement a two-part solution. 1. In md5_compress, we do the memcpy only if the original input buffer is not 32-bit aligned. If the original input buffer is 32-bit aligned, we operate on it directly. 2. Modify the callers of md5_compress to allocate input buffers on 32-bit boundaries.
I realized that problem some time after I filed the bug... However, I still think it's possible to eliminate the memcpy calls (all of them, not just the ~25% that happen to be word-aligned) on big-endian machines, because of the assignments we do at the beginning of md5_compress. The trick would be to do the assignments using 4-bytes instead of ints.
This patch removes the same memcpy call as the last patch, in most cases, but in a safer way. 1. On big-endian architectures, we have to pre-process the input buffer, treating each set of 4 bytes as a word, then converting the word to little-endian format. Previously, we memcpy'ed into the full buffer, then operated on each word. This patch creates each word from a set of 4 bytes, thus filling the input buffer with assignment operations instead of memcpy (we were already doing assignments anyway). 2. I tested the previous patch on x86, which is why I didn't see a crash. We can safely cast a set of uint8s to a set of uint32s on this platform. This second patch preserves the same behavior as the previous patch on x86. I wrote some code that takes a 1024-byte buffer and loops over it calling MD5_Update, first doing all 1024 bytes, then the last 1023 bytes, then the last 1022 bytes, and so on. This should test all possible alignments of a 32-bit word equally. I then did this loop 1000 times. Timing measurements below are in milliseconds. On Linux x86 dual-400MHz Pentium, before patch: 14854, 14840, 14844 after patch: 13595, 13614, 13655 On Solaris 5.8 Sparc5, before patch: 25539, 25552, 25546 after patch: 23230, 23260, 23222
Attachment #121710 - Attachment is obsolete: true
Comment on attachment 122041 [details] [diff] [review] revised patch, handle misaligned buffers looks right. Sorry I missed the alignment implications of the previous patch. :(
Attachment #122041 - Flags: review+
Comment on attachment 122041 [details] [diff] [review] revised patch, handle misaligned buffers I saw selfserv stress generate the following messages with this patch: strsclnt: PR_Write returned error -12272: SSL peer reports incorrect Message Authentication Cod selfserv: HDX PR_Read returned error -12273: SSL received a record with an incorrect Message Authentication Code. I cleaned and did restarts again, and yielded these: SSL peer cannot verify your certificate. selfserv: HDX PR_Read returned error -12271: Peer's certificate has an invalid signature. strsclnt: -- SSL: Server Certificate Invalid, err -81 82. SunOS dsame2 5.8 Generic_108528-19 sun4u sparc SUNW,Ultra-4
Comment on attachment 122041 [details] [diff] [review] revised patch, handle misaligned buffers Is _X86_ defined for Linux/x86? You should add a comment explaining why the memcpy call is not necessary for x86.
Comment on attachment 122041 [details] [diff] [review] revised patch, handle misaligned buffers >+#ifdef IS_LITTLE_ENDIAN >+#ifdef _X86_ >+ wBuf = (PRUint32 *)input; >+#else > memcpy(cx->inBuf, input, MD5_BUFFER_SIZE); >- md5_compress(cx); >+ wBuf = cx->u.w; >+#endif In the non-x86 case, if "input" is 32-bit aligned, we don't need the memcpy call and can set wBuf to "input".
Attached patch handle solaris problems (obsolete) — Splinter Review
I'm not sure what's going on here. I was able to reproduce the problems Kirk saw on Solaris, though not by using blapitest (that is, the MD5 self-test would pass, but SSL wouldn't work). I couldn't see anything wrong with the patch, so I made incremental changes until I saw the error. The update-compress code looks like this: compress() { prepare_state_as_little_endian(cx); /* md5 compression code */ } update() { compress(cx); } The last patch changed it to this: compress() { /* md5 compression code */ } update() { prepare_state_as_little_endian(cx); compress(); } That simple change, independent of any others, causes Solaris to stop working. The new patch restores the old order, but does it while removing the memcpy in most cases. I'm not sure what to make of it, but it works. Perhaps a compiler bug? This was happening on both debug and optimized builds, though. This patch also incorporates Wan-Teh's suggestions. I did some testing to see if the new if-branch causes any performance loss, doesn't seem to: Solaris before patch: 25547 25566 25619 after patch: 23225 23218 23231 Linux (different machine, huey): before patch: 6003 6003 6003 after patch: 5537 5537 5537
Attachment #122041 - Attachment is obsolete: true
Whiteboard: 3.3.5
What I want to know is this: Is the problem with patch 2 above experienced on any platform other than Solaris? Or is it only on Solaris? Has it been tested on other big-endian platforms?
After posting that comment, I built a tree with patch #2 on my powerbook, which is big-endian. I saw the same failure. I tried patch #3. Kudos to the first person who can explain what is going on to me ;)
Sorry, that last comment should say that I then tried patch #3 and it worked.
I don't think the problem is in the MD5 code. I played around some more, and the difference described in comment #12 is all the difference. That is, calling the function to modify the state buffer before md5_compress never works, calling it inside of md5_compress always works. Nothing happens in the interim, it's: md5_compress() { a = cx->cv[0]; ... } md5_update() { md5_prep_state_le(cx); md5_compress(cx); } vs. md5_compress() { md5_prep_state_le(cx); a = cx->cv[0]; ... } md5_update() { md5_compress(cx); } I believe some other part of the code is trampling on memory here, possibly the function call stack. I would try this under purify, but I only have it on wintel :( I guess I could try forcing it down the big-endian path. I tried doing an MD5 hash of a large file, it works with both patch #2 and patch #3. So far, the failure has only been seen with SSL. It will happen for a single thread and single connection. That's all I know for now.
Attached file Selfserv Stress Results (obsolete) —
Selfserv stress appears to be a wash.
Attached patch fixed patch #2 (obsolete) — Splinter Review
sigh... Time for Ian to enter the programmer penalty box... The problem was in the patch, and the result of bad programming practice by the original author of the MD5 code (me) who didn't use curly braces around an if statement. I added a new line of code after the if, and under the right conditions (provoked by SSL), the new line would only be executed when the expression was true. But the new line (the call to md5_prep_state_le()) was meant to happen in ALL cases. This didn't happen in my test using a large file because blapitest only called MD5_Update once. This new patch is the same as patch #2, already reviewed by Nelson, but with the appropriate curly braces.
Attachment #122235 - Attachment is obsolete: true
Forgot to include Wan-Teh's suggestions (comment #10 and comment #11). Must be Friday.
Attachment #122309 - Attachment is obsolete: true
I registered an improvement in selfserv performance with the latest patch (md5_fixed3.patch). Restarts showed an small gain with less CPU usage. I did two runs, with the patch engaged, and saw idle a percentage point up in both.
Attachment #122283 - Attachment is obsolete: true
checked in to tip: Checking in md5.c; /cvsroot/mozilla/security/nss/lib/freebl/md5.c,v <-- md5.c new revision: 1.8; previous revision: 1.7 done
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Priority: -- → P2
Target Milestone: --- → 3.9
Checked into 3.3 branch as md5.c rev 1.5.48.1.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: