Closed Bug 466272 Opened 16 years ago Closed 16 years ago

Modify MathUtils for better integration into new String class

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: daumling, Assigned: daumling)

References

Details

Attachments

(1 file, 4 obsolete files)

Changes to MathUtils as discussed 16 years ago Michael Daumling 39.79 KB, application/octet-stream		Details
Now it is marked as patch. 16 years ago Michael Daumling 39.79 KB, patch		Details \| Diff \| Splinter Review
Updated patch following comments. 16 years ago Michael Daumling 61.13 KB, patch	stejohns : review+ lhansen : review-	Details \| Diff \| Splinter Review
Cosmetic fixes, and use of VMPI_alloca 16 years ago Michael Daumling 64.01 KB, patch	lhansen : review+	Details \| Diff \| Splinter Review
Security fix: increase double-to-string buffer size to 380 16 years ago Michael Daumling 1.23 KB, patch	stejohns : review+	Details \| Diff \| Splinter Review

Michael Daumling

Assignee

Description

•

16 years ago

At least, Tamarin-Redux' MathUtils should be modified to work with char* instead of wchar*, since Strings now can be 8 bits wide. The parsing methods should work with String instances, because string widhts can vary. Will be tackled once the new Strings have landed.

Michael Daumling

Assignee

Comment 1

•

16 years ago

Attached file Changes to MathUtils as discussed (obsolete) — Details

Changes: - convertIntegerToString() returns a char* pointing to the start of the string inside the buffer. This eliminates one memcpy(). The buffer is no longer NUL-terminated. - a new convertIntegerToString() returns a String instance. - convertDoubleToString() and convertDoubleToStringRadix() return a String instance. - convertDoubleToString() has been optimized where the sentinel is being tested/used. No more memcpy() necessary, as the minus sign is applied after the number has been generated. - convertStringToNumber()/converStringToDouble() now accept a Stringp and use StringIndexer. The sunspider string tests show up to 25% less memory usage, with some slight performance degradation in some tests, probably to the change to StringIndexer, where each access to a character requires a function call. On Windows, the executable size has shrunk by 39K, which seems a bit too big to me, but maybe some C runtime calls have been optimized away.

Attachment #352918 - Flags: review?(stejohns)

Steven Johnson

Comment 2

•

16 years ago

Mike, the diff is apparently binary, can you re-upload in text form?

Brent Baker

Comment 3

•

16 years ago

diff is text form, if you download it is just ascii, must have not marked it as a patch during upload.

Michael Daumling

Assignee

Comment 4

•

16 years ago

Attached patch Now it is marked as patch. (obsolete) — Details — Splinter Review

Sorry - apparently forgot to check the patch box.

Attachment #352918 - Attachment is obsolete: true

Attachment #353075 - Flags: review?(stejohns)

Attachment #352918 - Flags: review?(stejohns)

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 5

•

16 years ago

FWIW, you can edit patchness/MIME type from the "Edit Attachment" page, no need to repost.

Steven Johnson

Comment 6

•

16 years ago

Looks good, a few questions before review+ ... -- the comment for convertIntegerToString() says "The buffer is not NUL-terminated" but several callers rely on the return pointer being null-terminated. -- callers all declare fixed-sized buffers to hold the result; this is probably safe if the routine is checking length properly, but are we certain that the fixed-size is large enough in all cases? Perhaps declaring constants for the maximum expected sizes for (eg) uint32 would be useful. (I recall bugs in the past where unusual cases were larger than expected...)

Lars T Hansen

Comment 7

•

16 years ago

(In reply to comment #6) > Looks good, a few questions before review+ ... > > -- callers all declare fixed-sized buffers to hold the result; this is probably > safe if the routine is checking length properly, but are we certain that the > fixed-size is large enough in all cases? Perhaps declaring constants for the > maximum expected sizes for (eg) uint32 would be useful. (I recall bugs in the > past where unusual cases were larger than expected...) I assume the buffer sizes are the maximums given the floating point representation and the algorithm, so that should be OK. For ints it can vary depending on platform (63 or 32); stack space is sometimes at a premium so allocating a 130 byte buffer for the worst case (base 2 conversion on 64-bit platform) when 22 would do (base 10 conversion on 32-bit platform) is a little generous, but probably OK. But the code would probably *read* better if there were predefined constants, MAX_DIGITS_FOR_DOUBLE, MAX_DIGITS_FOR_UINTPTR, etc.

Michael Daumling

Assignee

Comment 8

•

16 years ago

I took the maximums I found in old code. See the diffs. The buffers should be bug enough. It may also be a good idea to add a special conversion routine for numbers that are 10-based, because the maximum for an integer would be 12 bytes (even on a 64 bit system, integers are still 32 bits). Before, the buffers were just allocated Big Enough (in wchars), and the code did not care to check the buffer size. Now, if the supplied buffer is too small, the conversion aborts. This is a much safer approach. Old code should *not* use character buffers, but switch to the APIs using and returning Strings. I only left the single routine that uses a buffer in for the Date object to avoid having to create a String instance as intermediate representation. Old code needs to be adapted anyway. But if adding a NUL at the end is important, I will be happy to re-add that character.

Steven Johnson

Comment 9

•

16 years ago

Checking the buffer size is always a good idea :-) I do like Lars idea to add predefined constants to improve readability. What about the null-termination issue? Am I mistaken?

Michael Daumling

Assignee

Comment 10

•

16 years ago

Attached patch Updated patch following comments. (obsolete) — Details — Splinter Review

This patch addresses comments from Lars and Steven: 1) Added three very wordy enum constants to MathUtils.h for the minimum buffer sizes required 2) Removed all temporary buffers from other code (except inside Date.cpp, where it makes sense). 3) convertIntegerToString() now NUL-terminates the buffer again (actually, it always did - the comment was just wrong) 4) Added an overload for convertIntegerToString() for 32-integers based 10. This overload uses a buffer of just 12 bytes, thereby relieving the CPU stack somewhat.

Attachment #353075 - Attachment is obsolete: true

Attachment #353416 - Flags: review?(stejohns)

Attachment #353075 - Flags: review?(stejohns)

Michael Daumling

Assignee

Comment 11

•

16 years ago

Comment on attachment 353416 [details] [diff] [review] Updated patch following comments. Lars, maybe you want to review this bug?

Attachment #353416 - Flags: review?(lhansen)

Steven Johnson

Comment 12

•

16 years ago

Comment on attachment 353416 [details] [diff] [review] Updated patch following comments. Let me know if you need assistance pushing it out to redux.

Attachment #353416 - Flags: review?(stejohns) → review+

Lars T Hansen

Comment 13

•

16 years ago

Comment on attachment 353416 [details] [diff] [review] Updated patch following comments. Generally the patch would be more manageable if submitted as two patches: one for changes to type names, the other for changes to logic. As it is, each obscures the other. Must fix before integration: Method names should never start with "_", as names starting with "_" are reserved for the implementation to use in the predefined libraries. This applies to e.g. _isHexNumber, _parseIntDigit (many others). Type names: * The use of (int) in Stringp AvmCore::formatAtomPtr is dodgy, we're trying not to use this type. * Types like "uint32", "sintptr" should not be used; we should use C99 std types a la "uint32_t", "intptr_t". I bring this up only because you've corrected many other uses of 'int'. Don't know what the cost is of correcting the rest, but would encourage looking into it. These would probably be good to fix if the complexity costs are controllable: MathUtils::convertDoubleToStringRadix allocates a 2KB buffer on the stack. It would be better if this buffer were allocated with VMPI_alloca because VMPI_alloca may shunt the allocation off to the heap if it deems it to be too large or if the platform has a short stack. (That probably can't be done if MathUtils are available directly from client code and converDoubleToStringRadix can throw an exception, though, given how VMPI_alloca performs cleanup on exceptions.) Though see below. Incidentals I discovered, probably not your fault but due to existing APIs: It's bad style that isInfinite does not return bool (given the name). There are two versions of isInfinite, isNaN, and isNegZero, one for Unix and one for others. I have no idea why that makes sense. It's likely clearer to write "(bool)t" or "t != 0" than "!!t". Other comments: MathUtils::convertDoubleToString now seems strictly less general than before since it relies on an integration with the Strings, therefore on the VM having been started. Ditto for some of the others. I'm not sure whether this is a problem or not. (Moving to VMPI_alloca as suggested above makes this integration even tighter.)

Attachment #353416 - Flags: review?(lhansen) → review-

Steven Johnson

Comment 14

•

16 years ago

(In reply to comment #13) > Method names should never start with "_", as names starting with "_" are > reserved for the implementation to use in the predefined libraries. This > applies to e.g. _isHexNumber, _parseIntDigit (many others). This is flagrantly violated in many many places already... granted these names are technically reserved, but I can't think of any nontrivial C/C++ project I've ever worked on that hasn't violated this.

Lars T Hansen

Comment 15

•

16 years ago

(In reply to comment #14) > (In reply to comment #13) > > Method names should never start with "_", as names starting with "_" are > > reserved for the implementation to use in the predefined libraries. This > > applies to e.g. _isHexNumber, _parseIntDigit (many others). > > This is flagrantly violated in many many places already... granted these names > are technically reserved, but I can't think of any nontrivial C/C++ project > I've ever worked on that hasn't violated this. I've had the same experience, but it's never too late to start mending your ways. Remember there are "C++" compilers out there that make many strange assumptions, and there's no sense in creating vulnerabilities if we don't have to. The "no useful C++ compiler would outlaw this" mantra has a tendency to be used to justify all sorts of things, so the more rarely we use it the better. (Besides which, I hate how underscore-prefixed names look, but in this case it's easier to argue with chapter & verse than with esthetic judgements...)

Michael Daumling

Assignee

Comment 16

•

16 years ago

Attached patch Cosmetic fixes, and use of VMPI_alloca (obsolete) — Details — Splinter Review

- Fixed all cosmetic nits. - Added comment why isInfinite() does not return a bool. - Replaced all heap allocations > 12 bytes with calls to VMPI_alloca. - Left bit53 = !!bit53 in because the variable is an int32_t, not a bool. I hope that downstream usage of MathUtils does not need character buffers; after all, the AVM should not be a repository for general-purpose functionality, but hand-tailored for performance and security. But this is just my opinion. BTW - we should start a bug (or a Wiki page) that collects coding style guidelines. It would save a lot of time.

Attachment #353416 - Attachment is obsolete: true

Attachment #353630 - Flags: review?(lhansen)

Lars T Hansen

Comment 17

•

16 years ago

(In reply to comment #16) > - Left bit53 = !!bit53 in because the variable is an int32_t, not a bool. Since !!bit53 computes a bool value that is then converted to int32_t I don't see why that's relevant, but it's alright with me. > I hope that downstream usage of MathUtils does not need character buffers; > after all, the AVM should not be a repository for general-purpose > functionality, but hand-tailored for performance and security. But this is just > my opinion. Since Tamarin/AVM is open-source and its primary client is not, it's probably inevitable for these functions to become tightly intertwined with Tamarin. But the fact remains that high quality, portable routines like these will be used by client code, and we should not resist that too much; instead we might want to partition the code into a layer that has minimal Tamarin bindings and which can be used from the outside, and a shim that makes use from within Tamarin convenient. > BTW - we should start a bug (or a Wiki page) that collects coding style > guidelines. It would save a lot of time. We have an internal page at Adobe, I'll contact you independently about that. IMO that should migrate into a file in the doc/ directory of Tamarin, but it's not been on the critical path.

Lars T Hansen

Updated

•

16 years ago

Attachment #353630 - Flags: review?(lhansen) → review+

Steven Johnson

Comment 18

•

16 years ago

GCC complains about convertIntegerToString() being ambiguous (between the radix-supplied and radix-10 versions)... I'm going to make the treatAsUnsigned argument non-optional to resolve this.

Steven Johnson

Comment 19

•

16 years ago

(In reply to comment #18) > GCC complains about convertIntegerToString() being ambiguous (between the > radix-supplied and radix-10 versions)... I'm going to make the treatAsUnsigned > argument non-optional to resolve this. Actually, there are still some possible ambiguities -- I'm going to rename the 3 overloads to convertIntegerToStringBuffer, convertIntegerToStringRadix, and convertIntegerToStringBase10 for this push.

Steven Johnson

Comment 20

•

16 years ago

pushed to redux (with minor revisions) as changeset: 1241:9f9a9a0287a6

Brent Baker

Updated

•

16 years ago

Blocks: 402864

Michael Daumling

Assignee

Comment 21

•

16 years ago

Attached patch Security fix: increase double-to-string buffer size to 380 — Details — Splinter Review

Per email thread from Steven and Lars, the buffer for double-to-string conversions should be at least 380 characters. Attachment 353630 [details] [diff] has been marked obsolete because it has been pushed to Tamarin-Redux.

Attachment #353630 - Attachment is obsolete: true

Attachment #353835 - Flags: review?(stejohns)

Steven Johnson

Updated

•

16 years ago

Attachment #353835 - Flags: review?(stejohns) → review+

Steven Johnson

Comment 22

•

16 years ago

pushed to redux as changeset: 1246:fdce7566c1d4

Michael Daumling

Assignee

Comment 23

•

16 years ago

I guess that this bug is resolved.

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Chris Peyer

Comment 24

•

15 years ago

Resolved fixed engineering / work item that has been pushed. Setting status to verified.

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.