Closed Bug 467591 Opened 16 years ago Closed 16 years ago

AVM shell should be more portable

Categories

(Tamarin Graveyard :: Virtual Machine, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: lhansen, Assigned: rishah)

References

Details

Attachments

(1 file, 5 obsolete files)

The VM core is more portable than the shell, which depends on stdio and many other things; we should clean this up by factoring out platform specifics into separate files (probably) so that ports can more easily use the shell.
Blocks: 472712
OS: Mac OS X → All
Hardware: x86 → All
Assignee: nobody → rishah
Status: NEW → ASSIGNED
+1. In TT we wrote most of the shell code in AS3 to accomplish this. Might be worth porting over.
The AS3 version of shell in TT provides a nice abstraction over the native implementation of Shell class. - (1) But more work is required to create APIs for File read/write, program entry/exit points and such like. - (2) I feel it would be better to phase out the changes - create proper APIs for (2) - convert native shell to its AS3 Does that sound reasonable?
I agree, porting over the AS3 is worthwhile but not the first order of business.
agree, and as long as we can hold off building a lot of api wrappers for a while, i still hold a glimmer of hope that we can build a tool for this instead (swig?)
I don't know what swig would do for us here, the problem is that all the systems provide wildly incompatible APIs eg for something as simple as console output. (Anyway, I think the main issue at this point is just to make what we have portable, so I don't see a lot of APIs showing up in any case.)
Swig would just enable generating AS3 wrappers for C api's instead of us doing each one by hand. for a few, no big deal. if you look at the ways we use python api's that interface with posix (etc), it gets more daunting.
Attached patch VM shell portability changes (obsolete) — Splinter Review
API changes for logging functions will be uploaded as a separate patch to this bug.
Attachment #363032 - Flags: review?(lhansen)
Comment on attachment 363032 [details] [diff] [review] VM shell portability changes manifest.mk isn't right; on Windows-ARM avmshellWin.cpp is added to the command line twice. In SystemClass.h, the representation of user_argv must be documented. I thought we finally settled on UTF8. But SystemClass::getArgv treats user_argv as Latin-1. Representation of string arguments to File operations must be documented (whether UTF8 or Latin-1) int32_t file lengths and positions (limited to 2GB) are not adequate in this day and age. Since we've already determined that we require 64-bit integers I would prefer if you changed this. A comment in the avmshellMac.cpp file says "mac/unix/soloris" [sic] but it's really mac-only. Inconsistent ifdef-indentation throughout, pick one that works for you (I prefer left-aligned; edwin prefers indented. Sigh.) avmshell.h, line 97, forward decl of Platform should not be necessary here. avmshell.cpp 324, strncpy does not NUL-terminate the destination if the source fills or is longer than the destination. The old code is broken too but this is a good time to fix it. What happened to AVM_SHELL_PLATFORM_HOOKS? Was this not really supported before or did you remove the functionality? In my opinion the "w" and "wb" convention of stdio is superior to the flags you've chosen to use. In general you might do better to simply reuse the interface from stdio; it conserves developer brain print. In general the operations of Platform and File need to be documented properly. I see that you're careful about destroying a file object in the implementation code, but it's not clear what 'destroyFile' is supposed to do. (Could be mistaken as a 'remove the underlying file' operation.) I'm not wild about GenericFile and GenericPlatform being called that, since they are not actually valid implementations of either File or Platform (they remain abstract). Something like 'LibcPartialPlatform' is more like it. But it's not really a big deal. I think the overall design is fine and apart from the notes above the code looks clean. Please either argue or fix.
Attachment #363032 - Flags: review?(lhansen) → review-
(In reply to comment #8) > int32_t file lengths and positions (limited to 2GB) are not adequate in this > day and age. Since we've already determined that we require 64-bit integers I > would prefer if you changed this. size_t? > Inconsistent ifdef-indentation throughout, pick one that works for you (I > prefer left-aligned; edwin prefers indented. Sigh.) I don't actually, I will go with the flow, and even document a recommendation. > I'm not wild about GenericFile and GenericPlatform being called that, since > they are not actually valid implementations of either File or Platform (they > remain abstract). Something like 'LibcPartialPlatform' is more like it. But > it's not really a big deal. posix?
size_t is normally 32-bit on 32-bit systems. In ISO C fseek and ftell operate on long int but there's a separate fsetpos and fgetpos that take an opaque fpos_t. IMO we should just use int64_t. 'PosixPartialPlatform' might not be so bad. Beats 'libc'.
(In reply to comment #10) > size_t is normally 32-bit on 32-bit systems. In ISO C fseek and ftell operate > on long int but there's a separate fsetpos and fgetpos that take an opaque > fpos_t. IMO we should just use int64_t. better yet, uint64_t. unless you know of a filesystem that supports negative filesizes and offsets.
(In reply to comment #8) > (From update of attachment 363032 [details] [diff] [review]) > manifest.mk isn't right; on Windows-ARM avmshellWin.cpp is added to the command > line twice. > > In SystemClass.h, the representation of user_argv must be documented. > > I thought we finally settled on UTF8. But SystemClass::getArgv treats > user_argv as Latin-1. I will fix this. > Representation of string arguments to File operations must be documented > (whether UTF8 or Latin-1) Will do. > int32_t file lengths and positions (limited to 2GB) are not adequate in this > day and age. Since we've already determined that we require 64-bit integers I > would prefer if you changed this. int64_t sounds good. > A comment in the avmshellMac.cpp file says "mac/unix/soloris" [sic] but it's > really mac-only. Missed that during moving code around. > Inconsistent ifdef-indentation throughout, pick one that works for you (I > prefer left-aligned; edwin prefers indented. Sigh.) I really don't have a preference but I find indented ifdefs useful since they create code blocks that are easier to identify across feature defines. They make the code look a bit messy though but shouldn't there be fewer ifdefs anyways :-). > avmshell.h, line 97, forward decl of Platform should not be necessary here. Will remove it. > avmshell.cpp 324, strncpy does not NUL-terminate the destination if the source > fills or is longer than the destination. The old code is broken too but this > is a good time to fix it. Will fix it. > What happened to AVM_SHELL_PLATFORM_HOOKS? Was this not really supported > before or did you remove the functionality? AVM_SHELL_PLATFORM_HOOKS was added by Rory. With the newer bootstrap/teardown design for the platform I don't think that is needed. > In my opinion the "w" and "wb" convention of stdio is superior to the flags > you've chosen to use. In general you might do better to simply reuse the > interface from stdio; it conserves developer brain print. Not all platforms support the stdio file operations. My intent was to abstract the file attributes by means of an enumeration rather than platforms having to interpret the stdio-sytle attributes. > In general the operations of Platform and File need to be documented properly. > I see that you're careful about destroying a file object in the implementation > code, but it's not clear what 'destroyFile' is supposed to do. (Could be > mistaken as a 'remove the underlying file' operation.) I will surely add comments to all interface methods. However, isn't destroyFile go hand-in-hand with createFile. > I'm not wild about GenericFile and GenericPlatform being called that, since > they are not actually valid implementations of either File or Platform (they > remain abstract). Something like 'LibcPartialPlatform' is more like it. But > it's not really a big deal. PosixPartialPlatform sounds good. > I think the overall design is fine and apart from the notes above the code > looks clean. Please either argue or fix.
(In reply to comment #11) > (In reply to comment #10) > > size_t is normally 32-bit on 32-bit systems. In ISO C fseek and ftell operate > > on long int but there's a separate fsetpos and fgetpos that take an opaque > > fpos_t. IMO we should just use int64_t. > > better yet, uint64_t. unless you know of a filesystem that supports negative > filesizes and offsets. fseek and ftell operations can return -1 in case of an error. Having the return type to be uint64_t creates an wraparound issue in such cases. So, I feel int64_t is better and consistent with 32-bit APIs that have int as the data type.
Changing the type to int64_t is resulting in warnings for other functions in the VM core and shell. Most of them are expecting size_t. So, looks like size_t would be a better choice. Any objections?
(In reply to comment #14) > Changing the type to int64_t is resulting in warnings for other functions in > the VM core and shell. Most of them are expecting size_t. So, looks like > size_t would be a better choice. Any objections? I object. If int64_t is the right API then let's fix the consumers of these values, either temporarily by casting to size_t or permanently by making them all handle in64_t.
size_t instead of int64_t is closer to POSIX APIs and is much cleaner for portability in 32 vs 64-bit systems. Consider the modified APIs below - //similar to fread size_t read(void* buffer, size_t bytesToRead); //similar to fwrite size_t write(const void* buffer, size_t bytesToWrite); //based on ftell. However, ftell returns long and that makes it not suitable //for 64-bit systems. So I modified our API to have an out param and return //false in case of an error. bool getPosition(size_t* pos) const; //based on fseek. However, fseek's offset argument is of long type and //unsuitable for 64-bit. Additionally the offset could be -ve if SEEK_CUR is //specified which is not the case in our APIs that already consider offset/pos //from the start. So my modified version takes size_t which is a non-negative //number. void setPosition(size_t pos) = 0; //similar //no corresponding POSIX API and size can never be negative size_t size() const;
To elaborate on cleaner code for portability - size_t provides a standard abstraction in terms of data types depending on 33 or 64 bit systems. int64_t does not do that and in several cases would require explicit casting to int or size_t types on 32-bit machines and moreover, this casting might need to be done inside an additional ifdef to identify the system type.
(In reply to comment #17) > To elaborate on cleaner code for portability - > > size_t provides a standard abstraction in terms of data types depending on 33 > or 64 bit systems. int64_t does not do that and in several cases would > require explicit casting to int or size_t types on 32-bit machines and > moreover, this casting might need to be done inside an additional ifdef > to identify the system type. That's true, but I don't see how it is relevant. That file APIs on different systems take different size file offsets goes with the territory of this being a porting API. That means the portable code gets one good workable abstraction that the porting layer must do its best to implement. That does not mean > 2GB files need to be available everywhere, but when they are available they should be supported IMO. size_t is an unsigned type intended to represent the size of an object. Being unsigned it falls to your comment #13 above; being limited to representing the type in memory it cannot be relied on to accomodate the size of a file. In particular, it is 32 bits on 32-bit systems, and files on 32-bit systems now commonly exceed both 2GB and 4GB. ftell and fseek are terrible examples because they can't handle files larger than 2GB (on many systems, given that long is often 32 bits even on 64 bit systems), which is why we have fgetpos and fsetpos to begin with. The fact that casting may have to happen (or better, that code needs to be rewritten so as not to assume that files are relatively tiny) seems to be beside the point, because I though we were trying to construct a reasonably general and workable API, not just cast in stone what bugginess is already there. If the hardship of that rewrite is huge then that's a concern, but I assume that that is not likely to be true (and I await your proof to the contrary :-)
Blocks: 479431
I didn't see any discussion of OSDep but I think VMPI should subsume that as well.
I understand your point. Will make it int64_t and change the consumers.
(In reply to comment #19) > I didn't see any discussion of OSDep but I think VMPI should subsume that as > well. OSDep will go away as a part of creation of the Porting Layer which is a separate task. I will log a separate bug to track that.
Blocks: 478870
Blocks: 456054
Attached patch Shell portability changes [v2] (obsolete) — Splinter Review
Patch version 2 after making changes suggested by prior review.
Attachment #363032 - Attachment is obsolete: true
Attachment #363900 - Flags: review?(lhansen)
Note for Patch-V2. fgetpos and fsetpos don't work for our purposes since the fpos_t data type is opaque and can't be interpreted outside of these functions. I've used fseek and ftell for Posix platforms (Mac/Unix/Sun) because the POSIX standard requires that both these functions support long int data type for position. For Windows I have used _fseeki64 and _ftelli64 that support 64-bit positioning. For Windows Mobile, I couldn't find APIs to support access to larger files. My guess is that having files > 4 GB on WinMo devices is not going to be a reality in the near future.
not to argue, but just to play devil's advocate, i could imagine an encoded movie being >4GB, on a mobile device that had >8GB of media storage space. at least the devices can do it are easy to imagine, no idea about api's.
No code changes from v2.
Attachment #363900 - Attachment is obsolete: true
Attachment #363900 - Flags: review?(lhansen)
Attachment #363936 - Flags: review?(lhansen)
Was not aware that shell.xcodeproj have been deprecated and avmshell is the most current one.
Attachment #363936 - Attachment is obsolete: true
Attachment #363982 - Flags: review?(lhansen)
Attachment #363936 - Flags: review?(lhansen)
(In reply to comment #24) > not to argue, but just to play devil's advocate, i could imagine an encoded > movie being >4GB, on a mobile device that had >8GB of media storage space. at > least the devices can do it are easy to imagine, no idea about api's. Right. In that case, repeated fseek() will do the trick, portably (SEEK_CUR with positive offset to skip forward, SEEK_SET with 0 to go to the beginning). Don't know if we need to do that this week or if we can just add an assert that triggers if the offset exceeds the range of a long int.
Attachment #363982 - Flags: review?(lhansen) → review-
Comment on attachment 363982 [details] [diff] [review] Patch [v2.2] - Updated avmshell.xcodeproj Converging on something but too many outstanding issues still. They're all minor, I think we can nail this with one more iteration. (Notice I removed the old Xcode projects, so you can remove those changes from your patch.) General: Though I dislike the noisiness of doxygen style comments we do use them elsewhere, and it may be better to adopt them in the APIs. (Only a suggestion.) They tend to structure the description, it's harder to leave things off. Platform.h: "one-short" presumably better rendered as "one-shot" (in a comment) Is 1 second resolution really good enough for the timer? (Might be, just curious.) What is the encoding (latin1 or utf8) returned from readFromConsole? Needs documentation. Why is the data to be sent to the console void* plus length, not NUL-terminated string or at least char* plus length? Is there an encoding? The comment says "log message" which suggests text, not arbitrary binary data. initializeLogging is not obviously connected with anything else going on, so far as I can tell. The two other functions talk about a "console", that seems to be something else. (Looking at the code in the various implementations I understand what it does, but this should be made explicit.) File.h: There is no way for read or write to signal an error, nor anything indicating that errors will be signalled. With stdio the error signalling is performed by a combination of return value snooping and looking at errno, but we have no errno here. (fwrite says that an error occured if the items written was less than the items requested; that would work here too. fread requires errno peeking, so if we do want to handle errors we have to worry about whether to introduce errno - ick! - or find some other avenue.) Ditto setPosition(), really. Meaning of the return value of open() is not documented. WinFile.cpp: Stylistically, if you mean a "null pointer" it's better to use the value "NULL" than "0". PosixPartialPlatform.cpp: initializeLogging appends .log to the file name. Imo it has no business doing this; platform.h says the given filename is the filename that will be used. (Also the code that appends the . does not take into account path separators, and it needs to. You can't truncate a/b.c/d at the rightmost dot.) I realize the old code did append the .log and did have the same logic for doing so, but in the spirit of improving things I suggest we fix these things. Why does initializeLogging print the filename on stdout? For future reference, #ifdef NO_CONSOLE_FWRITE is bad style, this needs to be lifted to the feature level probably, not tucked away as an undocumented ifdef. (IMO, though, the posix code should not worry about this, it should let a broken platform create a working implementation and just concentrate on conforming to Posix.) avmshellWin.cpp: The convention of prefixing private names by "_" is a bad habit (shared by others in our group too :-), the prefix is reserved by the library. For a name like "_main" I'd be really concerned about that. avmshellUnix.cpp: Be sure to change the stack size back to 512KB if it was really you who changed it to 4MB (it could have been me). FileClass.cpp: FileClass::exists is the kind of thing that in my opinion should go into the porting layer, the implementation that's here is not very good, it may have side effects on some strange files, etc. If we don't fix it for this patch then we should log a bug to fix it. FileClass::read: if you mean 2GB, write 2GB. INT_MAX is not guaranteed to be a 32-bit value. Ditto for uses of UINT_MAX throughout the code, btw, for checking that a read request is within range. SIZE_T_MAX might be better; or just a constant.
Attachment #364420 - Flags: review?(lhansen)
Attachment #364420 - Flags: review?(stejohns)
Attachment #363982 - Attachment is obsolete: true
Comment on attachment 364420 [details] [diff] [review] patch [v3] - based on feedback from previous review Looks close enough that we can probably move forward and do further cleanup as we go... with a few nits: I'm a little puzzled over the lifespans of the File object; I presume createFile and destroyFile refer to the File object, not the underlying File if any... but is there any reason these objects can't be GC'ed rather than explicitly managed? (I see at least one missing destroy, ByteArrayGlue.cpp:739 can throw an error without destroying the File that's been created...) Are all the semantics of File well-defined (and enforced by core code)? e.g. what if you open without closing (or vice versa), or read without opening, etc... are we relying on the platform implementations to do something reasonable (and uniform) in this case? Why do we use size_t for read and write sizes, but int64_t for setPosition and size? It would be good to add a comment about what the "partial" is for PosixPartialPlatform (ie what is provided and what is left out). Win32 API calls need to have an explicit "A" or "W" suffix (eg "LoadLibraryA") so that we can compile regardless of whether UNICODE is defined or not. #ifdef NO_CONSOLE_FWRITE still seems to be hanging around; IMHO this shouldn't be in somethin claiming to be Posix.
Attachment #364420 - Flags: review?(stejohns) → review+
(In reply to comment #30) > (From update of attachment 364420 [details] [diff] [review]) > Looks close enough that we can probably move forward and do further cleanup as > we go... with a few nits: > > I'm a little puzzled over the lifespans of the File object; I presume > createFile and destroyFile refer to the File object, not the underlying File if > any... but is there any reason these objects can't be GC'ed rather than > explicitly managed? (I see at least one missing destroy, ByteArrayGlue.cpp:739 > can throw an error without destroying the File that's been created...) I'll look into making File collectible, will log a bug. Regarding ByteArrayGlue, good catch, i will fix it in a final patch. > Are all the semantics of File well-defined (and enforced by core code)? e.g. > what if you open without closing (or vice versa), or read without opening, > etc... are we relying on the platform implementations to do something > reasonable (and uniform) in this case? Yes, the platform implementation needs to take care of that. Performing error checks in the core/shell code adds a bit of overhead and clunkiness with no significant gains. > Why do we use size_t for read and write sizes, but int64_t for setPosition and > size? int64_t for set/getPosition() is added to support files larger than 4GB. Why int64_t vs uint64_t? The signed type is to return an error. Ditto for size(). For read/write though, the amount of data that can be read it limited by the amount of max memory that can be allocated, which is driven by size_t. Also, that makes the APIs closer to fread/fwrite which use size_t for the read/write count. > It would be good to add a comment about what the "partial" is for > PosixPartialPlatform (ie what is provided and what is left out). Will add that. > Win32 API calls need to have an explicit "A" or "W" suffix (eg "LoadLibraryA") > so that we can compile regardless of whether UNICODE is defined or not. Could you please specify which APIs? I don't recall adding any new Win32 APIs. > #ifdef NO_CONSOLE_FWRITE still seems to be hanging around; IMHO this shouldn't > be in somethin claiming to be Posix. NO_CONSOLE_FWRITE will go away as a part of my logging API changes.
File.h, etc. I have misgivings about isEOF being 'const', but OK. (Since 'mutable' is not portable then isEOF cannot cache its result, which is another consideration possibly.) Documentation for read should note that a return of 0 does not imply error. Typo (read): "Method to read a chunk of a data from the file" Platform.h: Comment on initializeLogging is meaningless, either specify semantics of opening a socket or remove the comment: "In cases where the messages need to be routed to a socket this method can also be used open a socket port" Comment on logMessage is trivially true and should be removed: "The implementation of this method could vary depending on the platform's preference and the message could be sent to a console, file or a socket" Comment on getUserData should say 'string' not 'blob' and more importantly should specify whether the data will be NUL terminated (and if so, if it will always be NUL terminated, like for fgets, which I think ought to be the case).
Attachment #364420 - Flags: review?(lhansen) → review+
(In reply to comment #32) > I have misgivings about isEOF being 'const', but OK. (Since 'mutable' is not > portable then isEOF cannot cache its result, which is another consideration > possibly.) mutable isn't portable? since when? (or "not portable" in the same sense that other standard C++ things like namespaces aren't supported in some defective compilers?)
Yeah, in the sense that some "C++" compilers don't support it. That said it can be faked by a level of indirection, so it may not be a big deal in practice.
Attachment #364420 - Attachment is obsolete: true
Attachment #365471 - Flags: review?(lhansen)
Attachment #365471 - Flags: review?(lhansen) → review+
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Fixed by changesets: 587c783cc772 and 63940ce5936a.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: