Closed Bug 203379 Opened 22 years ago Closed 22 years ago

eliminate some memcpy calls from MD5

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

3.9

People

(Reporter: bugz, Assigned: bugz)

Details

(Whiteboard: 3.3.5)

Attachments

(2 files, 5 obsolete files)

suggested patch to remove memcpy during loop 22 years ago Ian McGreer 8.06 KB, patch	nelson : review+	Details \| Diff \| Splinter Review
revised patch, handle misaligned buffers 22 years ago Ian McGreer 10.33 KB, patch	nelson : review+	Details \| Diff \| Splinter Review
handle solaris problems 22 years ago Ian McGreer 9.91 KB, patch		Details \| Diff \| Splinter Review
Selfserv Stress Results 22 years ago Kirk Erickson 2.13 KB, text/html		Details
fixed patch #2 22 years ago Ian McGreer 10.52 KB, patch		Details \| Diff \| Splinter Review
fixed patch #2, rev 2 22 years ago Ian McGreer 10.78 KB, patch		Details \| Diff \| Splinter Review
Selfserv Stress Results 22 years ago Kirk Erickson 2.53 KB, text/html		Details

Ian McGreer

Assignee

Description

•

22 years ago

when handling "large" input to MD5_Update, we fill the md5 state with input bytes, do a compression, then proceed in chunks of 64 bytes until the remainder is less then 64, then copy the remainder into the state. During the loop over 64-byte chunks, we memcpy into the md5 state buffer and compress the state. However, the code does not modify the state buffer, only the 4 state variables, independent of the buffer. Hence there is no need for the memcpy, the compression function could act on the input buffer directly.

Ian McGreer

Assignee

Comment 1

•

22 years ago

Attached patch suggested patch to remove memcpy during loop (obsolete) — Details — Splinter Review

This eliminates about 100 calls to memcpy during the handling of an SSL request (based on using selfserv/strsclnt).

Ian McGreer

Assignee

Updated

•

22 years ago

Attachment #121710 - Flags: review?(nelsonb)

Nelson Bolyard (seldom reads bugmail)

Comment 2

•

22 years ago

Comment on attachment 121710 [details] [diff] [review] suggested patch to remove memcpy during loop r=nelsonb. Good Job!

Attachment #121710 - Flags: review?(nelsonb) → review+

Kirk Erickson

Comment 3

•

22 years ago

I yielded a Bus Error stressing selfserv with this patch on a Solaris box (dsame2). I reproduced the failure with a debug build, and gathered this trace: Running: selfserv -t 100 -d /tmp/workarea-nss/mozilla/tests_results/security/dsame2.1/server -n dsam e2.red.iplanet.com -p 12344 -w nss -L 30 (process id 14355) Reading libfreebl_hybrid_3.so Warning: freeing memory block 58b28 from ordinary malloc t@104 (l@9) signal BUS (invalid address alignment) in md5_compress at line 331 in file "md5.c" 331 wbuf[0] = lendian(wb[0]); (/usr/dist/pkgs/forte_dev,v6.2/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where current thread: t@104 =>[1] md5_compress(cx = 0x183ba0, buf = 0x1911d2 "\\x"), line 331 in "md5.c" [2] MD5_Update(cx = 0x183ba0, input = 0x1911d2 "\\x", inputLen = 108U), line 441 in "md5.c" [3] MD5_Update(cx = 0x183ba0, input = 0x1911bc "", inputLen = 130U), line 677 in "loader.c" [4] NSC_DigestUpdate(hSession = 11U, pPart = 0x1911bc "", ulPartLen = 130U), line 1173 in "pkcs11c.c" [5] PK11_DigestOp(context = 0x181110, in = 0x1911bc "", inLen = 130U), line 4157 in "pk11skey.c" [6] ssl3_UpdateHandshakeHashes(ss = 0x15f2e0, b = 0x1911bc "", l = 130U), line 2506 in "ssl3con.c" [7] ssl3_HandleHandshakeMessage(ss = 0x15f2e0, b = 0x1911bc "", length = 130U), line 7957 in "ssl3con.c" [8] ssl3_HandleHandshake(ss = 0x15f2e0, origBuf = 0x15f488), line 8109 in "ssl3con.c" [9] ssl3_HandleRecord(ss = 0x15f2e0, cText = 0xfd7cec48, databuf = 0x15f488), line 8378 in "ssl3con.c" [10] ssl3_GatherCompleteHandshake(ss = 0x15f2e0, flags = 0), line 203 in "ssl3gthr.c" [11] ssl_GatherRecord1stHandshake(ss = 0x15f2e0), line 1260 in "sslcon.c" [12] ssl_Do1stHandshake(ss = 0x15f2e0), line 149 in "sslsecur.c" [13] ssl_SecureRecv(ss = 0x15f2e0, buf = 0xfd7cf178 "", len = 10240, flags = 0), line 970 in "sslsecur.c" [14] ssl_SecureRead(ss = 0x15f2e0, buf = 0xfd7cf178 "", len = 10240), line 989 in "sslsecur.c" [15] ssl_Read(fd = 0x1749c0, buf = 0xfd7cf178, len = 10240), line 1262 in "sslsock.c" [16] PR_Read(fd = 0x1749c0, buf = 0xfd7cf178, amount = 10240), line 138 in "priometh.c" [17] handle_connection(tcp_sock = 0x1749c0, model_sock = 0x5d628, requestCert = 0), line 861 in "selfserv.c" [18] jobLoop(a = (nil), b = (nil), c = 0), line 488 in "selfserv.c" [19] thread_wrapper(arg = 0x16021c), line 456 in "selfserv.c" [20] _pt_root(arg = 0x16e650), line 214 in "ptthread.c" (/usr/dist/pkgs/forte_dev,v6.2/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) Can't recall seeing the warning: Warning: freeing memory block 58b28 from ordinary malloc I did have the zone allocator engaged. Its about the 23rd time we hit the deadly line: t@104 (l@4) stopped in md5_compress at line 331 in file "md5.c" 331 wbuf[0] = lendian(wb[0]); dbx> p cx cx = 0x2eb3b8 dbx> p *cx *cx = { lsbInput = 1516U msbInput = 0 cv = (1131376117U, 339469488U, 434704201U, 1222861192U) u = { b = "\0177\001\030\f\001\020JZÙïvY\võåØ9/±ÏS\024Ž¬1*JöIí€µ \016" w = (2130806296U, 201396298U, 1524232054U, 1493956069U, 3627626417U, 3478328500U, 2888903242U, 4132040100U, 3038776832U, 4096U, 8519808U, 1407565377U, 3979923787U, 1113336828U, 4007556908U, 1833807144U) } } dbx> p wb wb = 0x31ac42 dbx> p *wb *wb = 170430278U dbx>

Wan-Teh Chang

Comment 4

•

22 years ago

Apparently one of the purposes of the memcpy is to make sure the input data is in a buffer that is 32-bit aligned. This patch now simply casts a pointer to the original input buffer to a (PRUint32 *) pointer. This typecast is unsafe.

Wan-Teh Chang

Comment 5

•

22 years ago

I suggest that we implement a two-part solution. 1. In md5_compress, we do the memcpy only if the original input buffer is not 32-bit aligned. If the original input buffer is 32-bit aligned, we operate on it directly. 2. Modify the callers of md5_compress to allocate input buffers on 32-bit boundaries.

Ian McGreer

Assignee

Comment 6

•

22 years ago

I realized that problem some time after I filed the bug... However, I still think it's possible to eliminate the memcpy calls (all of them, not just the ~25% that happen to be word-aligned) on big-endian machines, because of the assignments we do at the beginning of md5_compress. The trick would be to do the assignments using 4-bytes instead of ints.

Ian McGreer

Assignee

Comment 7

•

22 years ago

Attached patch revised patch, handle misaligned buffers (obsolete) — Details — Splinter Review

This patch removes the same memcpy call as the last patch, in most cases, but in a safer way. 1. On big-endian architectures, we have to pre-process the input buffer, treating each set of 4 bytes as a word, then converting the word to little-endian format. Previously, we memcpy'ed into the full buffer, then operated on each word. This patch creates each word from a set of 4 bytes, thus filling the input buffer with assignment operations instead of memcpy (we were already doing assignments anyway). 2. I tested the previous patch on x86, which is why I didn't see a crash. We can safely cast a set of uint8s to a set of uint32s on this platform. This second patch preserves the same behavior as the previous patch on x86. I wrote some code that takes a 1024-byte buffer and loops over it calling MD5_Update, first doing all 1024 bytes, then the last 1023 bytes, then the last 1022 bytes, and so on. This should test all possible alignments of a 32-bit word equally. I then did this loop 1000 times. Timing measurements below are in milliseconds. On Linux x86 dual-400MHz Pentium, before patch: 14854, 14840, 14844 after patch: 13595, 13614, 13655 On Solaris 5.8 Sparc5, before patch: 25539, 25552, 25546 after patch: 23230, 23260, 23222

Attachment #121710 - Attachment is obsolete: true

Nelson Bolyard (seldom reads bugmail)

Comment 8

•

22 years ago

Comment on attachment 122041 [details] [diff] [review] revised patch, handle misaligned buffers looks right. Sorry I missed the alignment implications of the previous patch. :(

Attachment #122041 - Flags: review+

Kirk Erickson

Comment 9

•

22 years ago

Comment on attachment 122041 [details] [diff] [review] revised patch, handle misaligned buffers I saw selfserv stress generate the following messages with this patch: strsclnt: PR_Write returned error -12272: SSL peer reports incorrect Message Authentication Cod selfserv: HDX PR_Read returned error -12273: SSL received a record with an incorrect Message Authentication Code. I cleaned and did restarts again, and yielded these: SSL peer cannot verify your certificate. selfserv: HDX PR_Read returned error -12271: Peer's certificate has an invalid signature. strsclnt: -- SSL: Server Certificate Invalid, err -81 82. SunOS dsame2 5.8 Generic_108528-19 sun4u sparc SUNW,Ultra-4

Wan-Teh Chang

Comment 10

•

22 years ago

Comment on attachment 122041 [details] [diff] [review] revised patch, handle misaligned buffers Is _X86_ defined for Linux/x86? You should add a comment explaining why the memcpy call is not necessary for x86.

Wan-Teh Chang

Comment 11

•

22 years ago

Comment on attachment 122041 [details] [diff] [review] revised patch, handle misaligned buffers >+#ifdef IS_LITTLE_ENDIAN >+#ifdef _X86_ >+ wBuf = (PRUint32 *)input; >+#else > memcpy(cx->inBuf, input, MD5_BUFFER_SIZE); >- md5_compress(cx); >+ wBuf = cx->u.w; >+#endif In the non-x86 case, if "input" is 32-bit aligned, we don't need the memcpy call and can set wBuf to "input".

Ian McGreer

Assignee

Comment 12

•

22 years ago

Attached patch handle solaris problems (obsolete) — Details — Splinter Review

I'm not sure what's going on here. I was able to reproduce the problems Kirk saw on Solaris, though not by using blapitest (that is, the MD5 self-test would pass, but SSL wouldn't work). I couldn't see anything wrong with the patch, so I made incremental changes until I saw the error. The update-compress code looks like this: compress() { prepare_state_as_little_endian(cx); /* md5 compression code */ } update() { compress(cx); } The last patch changed it to this: compress() { /* md5 compression code */ } update() { prepare_state_as_little_endian(cx); compress(); } That simple change, independent of any others, causes Solaris to stop working. The new patch restores the old order, but does it while removing the memcpy in most cases. I'm not sure what to make of it, but it works. Perhaps a compiler bug? This was happening on both debug and optimized builds, though. This patch also incorporates Wan-Teh's suggestions. I did some testing to see if the new if-branch causes any performance loss, doesn't seem to: Solaris before patch: 25547 25566 25619 after patch: 23225 23218 23231 Linux (different machine, huey): before patch: 6003 6003 6003 after patch: 5537 5537 5537

Attachment #122041 - Attachment is obsolete: true

Ian McGreer

Assignee

Updated

•

22 years ago

Whiteboard: 3.3.5

Nelson Bolyard (seldom reads bugmail)

Comment 13

•

22 years ago

What I want to know is this: Is the problem with patch 2 above experienced on any platform other than Solaris? Or is it only on Solaris? Has it been tested on other big-endian platforms?

Ian McGreer

Assignee

Comment 14

•

22 years ago

After posting that comment, I built a tree with patch #2 on my powerbook, which is big-endian. I saw the same failure. I tried patch #3. Kudos to the first person who can explain what is going on to me ;)

Ian McGreer

Assignee

Comment 15

•

22 years ago

Sorry, that last comment should say that I then tried patch #3 and it worked.

Ian McGreer

Assignee

Comment 16

•

22 years ago

I don't think the problem is in the MD5 code. I played around some more, and the difference described in comment #12 is all the difference. That is, calling the function to modify the state buffer before md5_compress never works, calling it inside of md5_compress always works. Nothing happens in the interim, it's: md5_compress() { a = cx->cv[0]; ... } md5_update() { md5_prep_state_le(cx); md5_compress(cx); } vs. md5_compress() { md5_prep_state_le(cx); a = cx->cv[0]; ... } md5_update() { md5_compress(cx); } I believe some other part of the code is trampling on memory here, possibly the function call stack. I would try this under purify, but I only have it on wintel :( I guess I could try forcing it down the big-endian path. I tried doing an MD5 hash of a large file, it works with both patch #2 and patch #3. So far, the failure has only been seen with SSL. It will happen for a single thread and single connection. That's all I know for now.

Kirk Erickson

Comment 17

•

22 years ago

Attached file Selfserv Stress Results (obsolete) — Details

Selfserv stress appears to be a wash.

Ian McGreer

Assignee

Comment 18

•

22 years ago

Attached patch fixed patch #2 (obsolete) — Details — Splinter Review

sigh... Time for Ian to enter the programmer penalty box... The problem was in the patch, and the result of bad programming practice by the original author of the MD5 code (me) who didn't use curly braces around an if statement. I added a new line of code after the if, and under the right conditions (provoked by SSL), the new line would only be executed when the expression was true. But the new line (the call to md5_prep_state_le()) was meant to happen in ALL cases. This didn't happen in my test using a large file because blapitest only called MD5_Update once. This new patch is the same as patch #2, already reviewed by Nelson, but with the appropriate curly braces.

Attachment #122235 - Attachment is obsolete: true

Ian McGreer

Assignee

Comment 19

•

22 years ago

Attached patch fixed patch #2, rev 2 — Details — Splinter Review

Forgot to include Wan-Teh's suggestions (comment #10 and comment #11). Must be Friday.

Attachment #122309 - Attachment is obsolete: true

Kirk Erickson

Comment 20

•

22 years ago

Attached file Selfserv Stress Results — Details

I registered an improvement in selfserv performance with the latest patch (md5_fixed3.patch). Restarts showed an small gain with less CPU usage. I did two runs, with the patch engaged, and saw idle a percentage point up in both.

Attachment #122283 - Attachment is obsolete: true

Ian McGreer

Assignee

Comment 21

•

22 years ago

checked in to tip: Checking in md5.c; /cvsroot/mozilla/security/nss/lib/freebl/md5.c,v <-- md5.c new revision: 1.8; previous revision: 1.7 done

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Ian McGreer

Assignee

Updated

•

22 years ago

Priority: -- → P2

Target Milestone: --- → 3.9

Ian McGreer

Assignee

Comment 22

•

22 years ago

Checked into 3.3 branch as md5.c rev 1.5.48.1.

You need to log in before you can comment on or make changes to this bug.