Closed
Bug 466272
Opened 16 years ago
Closed 16 years ago
Modify MathUtils for better integration into new String class
Categories
(Tamarin Graveyard :: Virtual Machine, defect)
Tamarin Graveyard
Virtual Machine
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: daumling, Assigned: daumling)
References
Details
Attachments
(1 file, 4 obsolete files)
1.23 KB,
patch
|
stejohns
:
review+
|
Details | Diff | Splinter Review |
At least, Tamarin-Redux' MathUtils should be modified to work with char* instead of wchar*, since Strings now can be 8 bits wide. The parsing methods should work with String instances, because string widhts can vary.
Will be tackled once the new Strings have landed.
Assignee | ||
Comment 1•16 years ago
|
||
Changes:
- convertIntegerToString() returns a char* pointing to the start of the string inside the buffer. This eliminates one memcpy(). The buffer is no longer NUL-terminated.
- a new convertIntegerToString() returns a String instance.
- convertDoubleToString() and convertDoubleToStringRadix() return a String instance.
- convertDoubleToString() has been optimized where the sentinel is being tested/used. No more memcpy() necessary, as the minus sign is applied after the number has been generated.
- convertStringToNumber()/converStringToDouble() now accept a Stringp and use StringIndexer.
The sunspider string tests show up to 25% less memory usage, with some slight performance degradation in some tests, probably to the change to StringIndexer, where each access to a character requires a function call. On Windows, the executable size has shrunk by 39K, which seems a bit too big to me, but maybe some C runtime calls have been optimized away.
Attachment #352918 -
Flags: review?(stejohns)
Comment 2•16 years ago
|
||
Mike, the diff is apparently binary, can you re-upload in text form?
Comment 3•16 years ago
|
||
diff is text form, if you download it is just ascii, must have not marked it as a patch during upload.
Assignee | ||
Comment 4•16 years ago
|
||
Sorry - apparently forgot to check the patch box.
Attachment #352918 -
Attachment is obsolete: true
Attachment #353075 -
Flags: review?(stejohns)
Attachment #352918 -
Flags: review?(stejohns)
Comment 5•16 years ago
|
||
FWIW, you can edit patchness/MIME type from the "Edit Attachment" page, no need to repost.
Comment 6•16 years ago
|
||
Looks good, a few questions before review+ ...
-- the comment for convertIntegerToString() says "The buffer is not NUL-terminated" but several callers rely on the return pointer being null-terminated.
-- callers all declare fixed-sized buffers to hold the result; this is probably safe if the routine is checking length properly, but are we certain that the fixed-size is large enough in all cases? Perhaps declaring constants for the maximum expected sizes for (eg) uint32 would be useful. (I recall bugs in the past where unusual cases were larger than expected...)
Comment 7•16 years ago
|
||
(In reply to comment #6)
> Looks good, a few questions before review+ ...
>
> -- callers all declare fixed-sized buffers to hold the result; this is probably
> safe if the routine is checking length properly, but are we certain that the
> fixed-size is large enough in all cases? Perhaps declaring constants for the
> maximum expected sizes for (eg) uint32 would be useful. (I recall bugs in the
> past where unusual cases were larger than expected...)
I assume the buffer sizes are the maximums given the floating point representation and the algorithm, so that should be OK. For ints it can vary depending on platform (63 or 32); stack space is sometimes at a premium so allocating a 130 byte buffer for the worst case (base 2 conversion on 64-bit platform) when 22 would do (base 10 conversion on 32-bit platform) is a little generous, but probably OK. But the code would probably *read* better if there were predefined constants, MAX_DIGITS_FOR_DOUBLE, MAX_DIGITS_FOR_UINTPTR, etc.
Assignee | ||
Comment 8•16 years ago
|
||
I took the maximums I found in old code. See the diffs. The buffers should be bug enough. It may also be a good idea to add a special conversion routine for numbers that are 10-based, because the maximum for an integer would be 12 bytes (even on a 64 bit system, integers are still 32 bits).
Before, the buffers were just allocated Big Enough (in wchars), and the code did not care to check the buffer size. Now, if the supplied buffer is too small, the conversion aborts. This is a much safer approach.
Old code should *not* use character buffers, but switch to the APIs using and returning Strings. I only left the single routine that uses a buffer in for the Date object to avoid having to create a String instance as intermediate representation. Old code needs to be adapted anyway. But if adding a NUL at the end is important, I will be happy to re-add that character.
Comment 9•16 years ago
|
||
Checking the buffer size is always a good idea :-) I do like Lars idea to add predefined constants to improve readability.
What about the null-termination issue? Am I mistaken?
Assignee | ||
Comment 10•16 years ago
|
||
This patch addresses comments from Lars and Steven:
1) Added three very wordy enum constants to MathUtils.h for the minimum buffer sizes required
2) Removed all temporary buffers from other code (except inside Date.cpp, where it makes sense).
3) convertIntegerToString() now NUL-terminates the buffer again (actually, it always did - the comment was just wrong)
4) Added an overload for convertIntegerToString() for 32-integers based 10. This overload uses a buffer of just 12 bytes, thereby relieving the CPU stack somewhat.
Attachment #353075 -
Attachment is obsolete: true
Attachment #353416 -
Flags: review?(stejohns)
Attachment #353075 -
Flags: review?(stejohns)
Assignee | ||
Comment 11•16 years ago
|
||
Comment on attachment 353416 [details] [diff] [review]
Updated patch following comments.
Lars, maybe you want to review this bug?
Attachment #353416 -
Flags: review?(lhansen)
Comment 12•16 years ago
|
||
Comment on attachment 353416 [details] [diff] [review]
Updated patch following comments.
Let me know if you need assistance pushing it out to redux.
Attachment #353416 -
Flags: review?(stejohns) → review+
Comment 13•16 years ago
|
||
Comment on attachment 353416 [details] [diff] [review]
Updated patch following comments.
Generally the patch would be more manageable if submitted as two patches: one for changes to type names, the other for changes to logic. As it is, each obscures the other.
Must fix before integration:
Method names should never start with "_", as names starting with "_" are reserved for the implementation to use in the predefined libraries. This applies to e.g. _isHexNumber, _parseIntDigit (many others).
Type names:
* The use of (int) in Stringp AvmCore::formatAtomPtr is dodgy, we're trying not to use this type.
* Types like "uint32", "sintptr" should not be used; we should use C99 std types a la "uint32_t", "intptr_t".
I bring this up only because you've corrected many other uses of 'int'. Don't know what the cost is of correcting the rest, but would encourage looking into it.
These would probably be good to fix if the complexity costs are controllable:
MathUtils::convertDoubleToStringRadix allocates a 2KB buffer on the stack. It would be better if this buffer were allocated with VMPI_alloca because VMPI_alloca may shunt the allocation off to the heap if it deems it to be too large or if the platform has a short stack. (That probably can't be done if MathUtils are available directly from client code and converDoubleToStringRadix can throw an exception, though, given how VMPI_alloca performs cleanup on exceptions.) Though see below.
Incidentals I discovered, probably not your fault but due to existing APIs:
It's bad style that isInfinite does not return bool (given the name).
There are two versions of isInfinite, isNaN, and isNegZero, one for Unix and one for others. I have no idea why that makes sense.
It's likely clearer to write "(bool)t" or "t != 0" than "!!t".
Other comments:
MathUtils::convertDoubleToString now seems strictly less general than before since it relies on an integration with the Strings, therefore on the VM having been started. Ditto for some of the others. I'm not sure whether this is a problem or not. (Moving to VMPI_alloca as suggested above makes this integration even tighter.)
Attachment #353416 -
Flags: review?(lhansen) → review-
Comment 14•16 years ago
|
||
(In reply to comment #13)
> Method names should never start with "_", as names starting with "_" are
> reserved for the implementation to use in the predefined libraries. This
> applies to e.g. _isHexNumber, _parseIntDigit (many others).
This is flagrantly violated in many many places already... granted these names are technically reserved, but I can't think of any nontrivial C/C++ project I've ever worked on that hasn't violated this.
Comment 15•16 years ago
|
||
(In reply to comment #14)
> (In reply to comment #13)
> > Method names should never start with "_", as names starting with "_" are
> > reserved for the implementation to use in the predefined libraries. This
> > applies to e.g. _isHexNumber, _parseIntDigit (many others).
>
> This is flagrantly violated in many many places already... granted these names
> are technically reserved, but I can't think of any nontrivial C/C++ project
> I've ever worked on that hasn't violated this.
I've had the same experience, but it's never too late to start mending your ways. Remember there are "C++" compilers out there that make many strange assumptions, and there's no sense in creating vulnerabilities if we don't have to. The "no useful C++ compiler would outlaw this" mantra has a tendency to be used to justify all sorts of things, so the more rarely we use it the better.
(Besides which, I hate how underscore-prefixed names look, but in this case it's easier to argue with chapter & verse than with esthetic judgements...)
Assignee | ||
Comment 16•16 years ago
|
||
- Fixed all cosmetic nits.
- Added comment why isInfinite() does not return a bool.
- Replaced all heap allocations > 12 bytes with calls to VMPI_alloca.
- Left bit53 = !!bit53 in because the variable is an int32_t, not a bool.
I hope that downstream usage of MathUtils does not need character buffers; after all, the AVM should not be a repository for general-purpose functionality, but hand-tailored for performance and security. But this is just my opinion.
BTW - we should start a bug (or a Wiki page) that collects coding style guidelines. It would save a lot of time.
Attachment #353416 -
Attachment is obsolete: true
Attachment #353630 -
Flags: review?(lhansen)
Comment 17•16 years ago
|
||
(In reply to comment #16)
> - Left bit53 = !!bit53 in because the variable is an int32_t, not a bool.
Since !!bit53 computes a bool value that is then converted to int32_t I don't see why that's relevant, but it's alright with me.
> I hope that downstream usage of MathUtils does not need character buffers;
> after all, the AVM should not be a repository for general-purpose
> functionality, but hand-tailored for performance and security. But this is just
> my opinion.
Since Tamarin/AVM is open-source and its primary client is not, it's probably inevitable for these functions to become tightly intertwined with Tamarin. But the fact remains that high quality, portable routines like these will be used by client code, and we should not resist that too much; instead we might want to partition the code into a layer that has minimal Tamarin bindings and which can be used from the outside, and a shim that makes use from within Tamarin convenient.
> BTW - we should start a bug (or a Wiki page) that collects coding style
> guidelines. It would save a lot of time.
We have an internal page at Adobe, I'll contact you independently about that. IMO that should migrate into a file in the doc/ directory of Tamarin, but it's not been on the critical path.
Updated•16 years ago
|
Attachment #353630 -
Flags: review?(lhansen) → review+
Comment 18•16 years ago
|
||
GCC complains about convertIntegerToString() being ambiguous (between the radix-supplied and radix-10 versions)... I'm going to make the treatAsUnsigned argument non-optional to resolve this.
Comment 19•16 years ago
|
||
(In reply to comment #18)
> GCC complains about convertIntegerToString() being ambiguous (between the
> radix-supplied and radix-10 versions)... I'm going to make the treatAsUnsigned
> argument non-optional to resolve this.
Actually, there are still some possible ambiguities -- I'm going to rename the 3 overloads to convertIntegerToStringBuffer, convertIntegerToStringRadix, and convertIntegerToStringBase10 for this push.
Comment 20•16 years ago
|
||
pushed to redux (with minor revisions) as changeset: 1241:9f9a9a0287a6
Assignee | ||
Comment 21•16 years ago
|
||
Per email thread from Steven and Lars, the buffer for double-to-string conversions should be at least 380 characters. Attachment 353630 [details] [diff] has been marked obsolete because it has been pushed to Tamarin-Redux.
Attachment #353630 -
Attachment is obsolete: true
Attachment #353835 -
Flags: review?(stejohns)
Updated•16 years ago
|
Attachment #353835 -
Flags: review?(stejohns) → review+
Comment 22•16 years ago
|
||
pushed to redux as changeset: 1246:fdce7566c1d4
Assignee | ||
Comment 23•16 years ago
|
||
I guess that this bug is resolved.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 24•15 years ago
|
||
Resolved fixed engineering / work item that has been pushed. Setting status to verified.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•