Closed Bug 562480 Opened 15 years ago Closed 15 years ago

Verbose mode perturbs the system by producing excessive String garbage

Categories

(Tamarin Graveyard :: Virtual Machine, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: rreitmai, Assigned: rreitmai)

References

Details

Attachments

(2 files, 6 obsolete files)

Verbose mode (i.e. -Dverbose= ) generates lots of temporary String garbage during its operation. One can speculate that this results in not only large memory churn but also sub-optimal performance. While this is normally not an issue for tamarin command-line usage with short run abcs, on large, longer running programs the overhead is most noticeable.
Attached patch ver 1 - as a study (obsolete) — Splinter Review
Assignee: nobody → rreitmai
Flags: flashplayer-qrb+
Priority: -- → P4
Target Milestone: --- → flash10.2
I'm seeing 2x performance increase with full verbose output on. Will double check a few areas and then push this out for formal review.
Attachment #442566 - Attachment is obsolete: true
Attachment #443528 - Flags: review?(edwsmith)
Attachment #443528 - Flags: feedback?(mmoreart)
Attachment #443528 - Flags: feedback?(kpalacz)
The patch looks large since it hits many spots in the code, but really only PrintWriter and derivatives have been modified extensively. Notice also that AvmCore loses its format() methods, which have been folded into the former.
Comment on attachment 443528 [details] [diff] [review] v3 - cleaner version with bug fixes and passes regressions Two minor issues and one larger one. The minor issues: - Misaligned whitespace in PrintWriter.cpp, on the last line (the "}" line of the definitions of the PRINT_STAR_OPERATOR_SUPPORT and PRINT_AMP_OPERATOR_SUPPORT macros. The "}" should be indented four more spaces. - In Namespace.h, the declarations of print() and printNonPublicWithName() are inside this: //#ifdef AVMPLUS_VERBOSE //#endif ... but in Namespace.cpp, print() is not inside an ifdef, but printNonPublicWithName() is inside this: #ifdef AVMPLUS_VERBOSE #endif In general, it is not clear to me if you have properly ifdef's all your new print() functions. The major issue is, for the sake of convenience and legibility, you should really be leveraging C++'s operator overloading of the << operator in the same way the C++ standard library does. For example, with the way you wrote the code, this old code: core->console << " " << traitNames[kind] << " name=" << Multiname::format(core, ns, name) << " slot_id=" << slot_id << " type=" << ctraits << "\n"; got converted to this new code: core->console << " " << traitNames[kind] << " name="; ns->printNonPublicWithName(core->console, name); core->console << " slot_id=" << slot_id << " type=" << ctraits << "\n"; The old code is definitely easier to read and easier to write, in my opinion. And it is possible to make the new code similar (or even identical) to the old code while still avoiding the construction of temporary strings, by doing a few C++ tricks. Let's take the above case as an example. To improve it, add the following inside class Multiname (that is, this is a public inner class): // Use 'Format' like this: myPrintWriter << Format(ns, name) class Format { public: Format(Namespacep ns, const Stringp name) : ns(ns), name(name) {} Namespacep ns; Stringp name; }; And add this in Multiname.h, *outside* of class Multiname (just after it): PrintWriter& operator<<(PrintWriter& prw, const Multiname::Format& mnf); Add this to Multiname.cpp: PrintWriter& operator<<(PrintWriter& prw, const Multiname::Format& mnf) { if (mnf.ns->isPublic() || ( // backwards compatibility mnf.ns->getType() != Namespace::NS_Public)) { prw << mnf.name; } else { prw << mnf.ns->getURI() << "::" << mnf.name; } return prw; } Now, in AbcParser.cpp near line 541, you can now call this in a manner almost identical to the old code sample above: core->console << " " << traitNames[kind] << " name=" << Multiname::Format(ns, name) << " slot_id=" << slot_id << " type=" << ctraits << "\n"; What we've done here is, we created an inner class Multiname::Format; when we write Multiname::Format(ns, name), that looks a little like a call to a static function, but it's not -- we're constructing an instance of the Format class on the stack, then passing that to our new operator<<() function, and then destroying it. The reason for all this is that by declaring a new class (Multiname::Format), we can declare a new overload of operator<<() that knows how to print that class. No temporary strings got created. It's true that a temporary Multiname::Format got created, but that is very cheap -- it's a super-quick stack allocation.
Attachment #443528 - Flags: feedback?(mmoreart) → feedback-
Attachment #443528 - Flags: review?(edwsmith)
dropped the R? until feedback is addressed
Comment on attachment 443528 [details] [diff] [review] v3 - cleaner version with bug fixes and passes regressions BTW, PrintWriter.h uses the idiom described by Mike for hexAddr, tabstop and percent (but looks like you're eliminating the last one). Also, player code might be affected by the change.
Attachment #443528 - Attachment is obsolete: true
Attachment #445208 - Flags: review?(mmoreart)
Attachment #443528 - Flags: feedback?(kpalacz)
Attachment #445208 - Flags: review?(kpalacz)
(In reply to comment #6) > (From update of attachment 443528 [details] [diff] [review]) > Two minor issues and one larger one. The minor issues: > > - Misaligned whitespace should be fixed. > In general, it is not clear to me if you have properly ifdef's all your new > print() functions. > I tried to open up more of the PrintWriter and related code to non-verbose builds, but doing so only where we aren't adding significantly to file size. Its my hope that we'd be able to call these methods from release. > The major issue is, for the sake of convenience and legibility, you should > really be leveraging C++'s operator overloading of the << operator Good idea; latest patch incorporates these changes.
(In reply to comment #8) > (From update of attachment 443528 [details] [diff] [review]) > BTW, PrintWriter.h uses the idiom described by Mike for hexAddr, tabstop and > percent (but looks like you're eliminating the last one). > Also, player code might be affected by the change. True, i've gotten rid of any dead code (tabstop) and I checked the player a while back against these changes, but good point I will do so again.
Comment on attachment 445208 [details] [diff] [review] introduces asXXX() wrappers hexAddr -> asHexAddr for consistency?
Attachment #445208 - Flags: review?(kpalacz) → review+
Comment on attachment 445208 [details] [diff] [review] introduces asXXX() wrappers For the most part, this is great, I love it. A few minor things (I am okay with checking in without fixing these if you need to): * In AvmCore::toErrorString(const Traits* t), you changed this: s = concatStrings(s, concatStrings(toErrorString(ns), newConstantStringLatin1("."))); to this: sb << ns << "."; Shouldn't that be: sb << toErrorString(ns) << "."; ? It turns out those will end up outputting the same thing, but still. * In PrintWriter::writeAtom(), which is a modified version of the old AvmCore::format(), you have a couple of hard-coded strings: "undefined", "true", "false". E.g. you have: *this << "undefined"; why not do this instead? *this << m_core->kundefined; It's really not that big a deal, but the latter is a teeny tiny bit more efficient -- it means you don't have another copy of the string, and it means operator<<() already knows the length of the string, and doesn't need to calculate it. Same for ktrue, kfalse, and knull. This is quite trivial, though. * I'm not too keen on the buffer-copying that you have moved from shell/ConsoleOutputStream::write(buffer, count) to PrintWriter::writeN(utf8, count) -- especially since this will be used every time someone does "core->console << myString" where myString is of type Stringp. Wouldn't it be better to avoid buffer-copying when you can? For example, with the new code, if someone does this: myStringBuffer << myString; then a pointless game of ping-pong ensues, with unnecessary buffer-copying and string-length-calculation: 1. String::print() calls PrintWriter::writeN(utf8, length) 2. PrintWriter::writeN(utf8, length) copies the string into a temporary buffer, null-terminates it, and calls PrintWriter::write(utf8). 3. PrintWriter::write(utf8) calls StringOutputStream::write(utf8) 4. StringOutputStream::write(utf8) does a VMPI_strlen() to get the length of the string, and then calls StringOutputStream::writeN(utf8, length) 5. Finally, StringOutputStream::writeN(utf8, length) appends the characters to its own buffer, and we're done. So in step 2, we had an unnecessary buffer-copy; and in step 4, we had an unnecessary VMPI_strlen(), because we had forgotten the string length even though earlier steps knew the string length. I think a better approach would be: - If anyone calls write(utf8), that function immediately calculates the string length via VMPI_strlen(), and then calls writeN(utf8, length) - All implementations of writeN(utf8, length) do their work without unnecessary buffer-copies. Here is how I would do that: - Add new pure-virtual writeN() to OutputStream - Change OutputStream::write() so that it is no longer pure virtual, and it is implemented like this: void write(const char* utf8) { writeN(utf8, VMPI_strlen(utf8)); } - Change the various implementations of OutputStream so that they no longer implement write(), and instead they implement writeN(). For example, instead of ConsoleOutputStream::write(const char* utf8) { Platform::GetInstance()->logMessage(utf8); } we would have ConsoleOutputStream::writeN(const char* utf8, size_t count) { Platform::GetInstance()->logMessageN(utf8, count); } This would require a new Platform::logMessageN(). On Mac, in addition to this: int PosixPartialPlatform::logMessage(const char* message) { return fprintf(stdout, "%s", message); } We would also have this: int PosixPartialPlatform::logMessageN(const char* message, size_t length) { return fprintf(stdout, "%.*s", length, message); }
Attachment #445208 - Flags: review?(mmoreart) → review+
Attachment #445208 - Flags: feedback?(edwsmith)
(In reply to comment #13) > (From update of attachment 445208 [details] [diff] [review]) > > sb << ns << "."; > > Shouldn't that be: > > sb << toErrorString(ns) << "."; > > ? It turns out those will end up outputting the same thing, but still. > Yes the output should be the same, but the latter creates an intermediate String. Much of the motivation for this patch was to remove this garbage. But good observation, I'll carefully double-check the toErrorString()'s before checking in. > * In PrintWriter::writeAtom(), which is a modified version of the old > AvmCore::format(), you have a couple of hard-coded strings: "undefined", > "true", "false". E.g. you have: > > *this << "undefined"; > > why not do this instead? > > *this << m_core->kundefined; > In an earlier version of the patch I completely removed m_core from the PrintWriter, thus necessitating the strings. I still have that as a goal but the last hold-out is MathUtils::convertToDoubleToString. Hopefully a follow-up patch will remove this final dependency at m_core can be removed. > * I'm not too keen on the buffer-copying that you have moved from > shell/ConsoleOutputStream::write(buffer, count) to PrintWriter::writeN(utf8, > count) The motivation for that work was to prevent extraneous strlen()'s being performed all over the place with the results being tossed out the window, since the ultimate consumer, VMPI_logMessage takes 1 parameters, a const char*). > > myStringBuffer << myString; > > then a pointless game of ping-pong ensues, with unnecessary buffer-copying and > string-length-calculation: > Excellent suggestion and I agree this is what the old code used to do and I think we can do better. To me the real motivator is having VMPI_logMessageN() and I didn't want to introduce that complexity into this initial clean-up patch. I now see at least 2 - 3 follow-up patches; (1) introduce logMessageN() and cleanup the buffer copies (2) remove MathUtils::convertDoubleToString() dependency on m_core
Attachment #445208 - Flags: feedback?(edwsmith) → feedback+
Comment on attachment 445208 [details] [diff] [review] introduces asXXX() wrappers I don't have anything further to add beyond Mike and Krzysztof's good comments.
Attached patch remove assert (obsolete) — Splinter Review
Remove an assert that really doesn't make sense, since StringOutputStream derives form a GCObject. The only part that is important is that the dtor() get called since the memory is managed outside of the gc. Outside of that, we could have some false positives via the 'buffer' ptr, but I doubt that this would affect performance significantly and certainly doesn't compromise integrity.
Attachment #449357 - Flags: review?(edwsmith)
Attachment #449357 - Flags: review?(edwsmith) → review-
Comment on attachment 449357 [details] [diff] [review] remove assert The assert protects a real hazard. writes to m_buffer in StringOutputStream::writeN() are not through a write barrier. Maybe the bug is that OutputStream is declared as a GCObject. since it appears to be a pure virtual interface, maybe it should not extend GCObject. If a subclass needs to be gc allocated it can extend GCObject instead? (although that's rumored to be not necessary).
Steven's patch that allows StringOutputStream to reside on stack/root or not.
Attachment #449357 - Attachment is obsolete: true
Attachment #449989 - Flags: review?(edwsmith)
Comment on attachment 449989 [details] [diff] [review] Allow StringOutputStream to live as a gc object PrintWriter.h: 77 void* container = m_gc->FindBeginningFast(this); // may be null if we are on stack/root The comment on FindBeginnFast reads: // Used on the critical path of the write barrier path. gcItem must point into // managed memory, and NULL is never returned. Don't use this for GCAssert and // similar purposes, but use it when speed is key. void *FindBeginningFast(const void *gcItem); Based on that, this code seems wrong.
Attachment #449989 - Flags: review?(edwsmith) → review-
(In reply to comment #19) > (From update of attachment 449989 [details] [diff] [review]) > // Used on the critical path of the write barrier path. gcItem must > point into > // managed memory, and NULL is never returned. Don't use this for > GCAssert and IIRC, List<> uses IsPointerToGCPage() to decide whether it can/should call FindBeginningFast.
Attached patch v2 - fixup (obsolete) — Splinter Review
Bracket FindBeginningFast() call with a IsPointerToGCPage() ala avmplusList
Attachment #449989 - Attachment is obsolete: true
Attachment #451144 - Flags: review?(edwsmith)
Bracket FindBeginningFast() call with a IsPointerToGCPage() ala avmplusList
Attachment #451144 - Attachment is obsolete: true
Attachment #451147 - Flags: review?(edwsmith)
Attachment #451144 - Flags: review?(edwsmith)
Comment on attachment 451147 [details] [diff] [review] v2 - fixup .. previous patch had extra junk this comment above the StringBuffer constructor needs a better explanation: // this is only for use by subclasses of AvmCore is that really the case and if so, why? R+ with that clarified.
Attachment #451147 - Flags: review?(edwsmith) → review+
The comment is no longer relevant and was removed. pushed http://hg.mozilla.org/tamarin-redux/rev/e346e71f0b0b
status?
Blocks: 587884
Blocks: 587886
closing as opened dep. bugs for remaining issues.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Flags: flashplayer-bug+
(In reply to Rick Reitmaier from comment #24) > The comment is no longer relevant and was removed. > > pushed http://hg.mozilla.org/tamarin-redux/rev/e346e71f0b0b Note that the above was a follow-up to the larger patch, which was pushed to TR in changeset 4712:a74f4089e339 http://hg.mozilla.org/tamarin-redux/rev/a74f4089e339
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: