Closed Bug 248846 Opened 21 years ago Closed 13 years ago

MIME code should be sort-of-rewritten using real C++ objects

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: eyalroz1, Unassigned)

References

()

Details

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a2) Gecko/20040626 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8a2) Gecko/20040626 Quoting mailnews/mime/src/mimei.h : "The definition of these classes is somewhat idiosyncratic, since I defined my own small object system, instead of giving the C++ virus another foothold." This is obviously the work of a 'Real Progammer' (see http://catb.org/~esr/jargon/html/R/Real-Programmer.html for a definition)... and it's legacy code, and it's pretty ugly, and it's C compiled as C++, and it appears its current state prevents fixing some, or many, or most, of the 543 MIME bugs. As a first step, it seems to me that it would be a good idea if the MIME 'objects' under mailnews/mime/src were to be rewritten as real C++ objects (with or without internal heirarchical changes; but probably with some name changes). Then, perhaps a change in the way of interfacing with the content handlers. Reproducible: Always Steps to Reproduce:
Confirming bug. More than a couple of people have expressed this desire.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Over the next several weeks I will attempt to devote some time to preliminary tinkering. What I consider as best is not to change any of the logic, and make the most minimal necessary structural changes, in a transition from the 'pseudo-class-system' and the C-style code to an object-based implementation. If we (I?) can get the code to the point where all of the code-structural oddities and idiosyncrasies are gone, then we can begin to consider also rearranging the code. But let me be more specific about this concept of two stages. Suppose now I have an .h and a .cpp file called myclass.h and myclass.cpp , and their contents is a C-style "struct myclass", and functions taking pointers to struct myclass, which are pseudo-methods. However it might also include a function like: int do_some_stuff(struct myotherclass* moc) Perhaps it is more reasonable to make this a member of class myotherclass - but I won't do that, since in the initial phase I want to change as little as is necessary. I'll have do_some_stuff be a static method of class myclass. In the second phase this may change, or perhaps we'll discover we don't need this function at all (yes, there is dangling unused code in libmime). Of course, I will not be able to present any working patches while I progress, since we are talking about a semi-rewrite, e.g. before you modify everything, the whole will not be compilable. If anyone is interested in following my progress, I could simply post new files which I create and/or patches of existing files. Finally, I make no guarantees w.r.t. time frames, and I may even be forced by the circumstances to drop this altoghether.
I'm willing to help, since I'm already "tinkering" with libmime in the course of bug 236954 (I'm hoping to attach a first patch there soon). I, too, think it'd be a good move to first create C++ classes along the current structure and procede then with cleaning up the mess. ;-) Rebuilding the entire libmime in one patch, though, might be quite arduous. But I think it should be possible to start with a series of patches, where each patch transfers one specific class and all of its usage. (These patches would, of course, need to be applied in sequence.)
This is (almost) impossible, because the object system in place is not a real object system; it's a construction using C semantics only. So you can't replace the root of the 'object' heirarchy, because than other 'objects' won't be able to inherit it, and you can't replace the leaf, because you have nothing to inherit from. Like I said, I think the best I can do is post non-patches. As for bug 236954, if you intend to make large-scale changes in mimedrft.cpp, please let me know.
It appears I had been overly optimistic; the reimplementation of the mime object heirarchy will require pervasive use of XPCOM, due to the use of string-based selection of a class for instantiation.
Instead of spamming bugzilla with endless new patches, I have let the bug URL point to a directory on my server with draft code: http://www.earendil.ath.cx/mime-rewrite/ I believe I may need quite a bit of hand-holding, being inexperienced as I am with components, factories, services and interfaces, so do not spare your comments. Initally, here is a suggested interface for a MIME service, which will be used to create nsIMimePart's (formerly MimeObject's). This will be done a-la- http://lxr.mozilla.org/mozilla/source/rdf/base/src/nsRDFService.cpp#1536
(In reply to comment #5) > It appears I had been overly optimistic; the reimplementation > of the mime object heirarchy will require pervasive use of XPCOM, > due to the use of string-based selection of a class for instantiation. IMHO, XPCOM is not needed here. libmime is used only internally, so why to take overhead? Well, things like MimeObject* MimeFactory::getMimeObject(const char *mimetype) { if (!strcmp(...)) return new MimeType1(); else if ... return new MimeType2(); else ... } may look ugly, but this is simple and do not require XPCOM. If it's too ugly, a some better scheme could be used, but still being non-XPCOM...
(In reply to comment #7) >IMHO, XPCOM is not needed here. libmime is used only internally Although for things like .mht it may help to move parts to necko...
Well, in dmose's (humble?) opinion XPCOMifying these is a good idea. Plus the big-switch means that you can't use plug-ins - each new content handler would have to add a line to the switch code of the abstract base class. And if one wanted to allow for plugins one would end up reimplementing a big chunk of XPCOM. Finally, why should one not be able to manipulate messages from Javascript?
While looking for something completely different ;-), I ran into bug 13678 ("Redesign of header handling"), which seems to have the same topic as this one. It does contain a design proposal, but I'm not quite sure if it is still helpful.
Blocks: 13678
I have now already several interfaces at the URL, and they even compile. Unfortunately, I have had to disable the scriptability of possibly the most script-coveted function of the whole libmime, due to the fact that nsMimeDisplayOptions is not very XPCOM friendly (has numerous mysterious callbacks which do not seem to be set anywhere). I may XPCOMify it later. Callbacks in general seem like quite an obstacle for me.
Unfortunately, the amount and the depth of the changes I find myself forced to make in this rewrite is ever increasing. As an example, I will be defining a 'cooked mime content type' type, which will be sort of a dynamic enum, since the different content handler XPCOM components (either 'basic' or 'plugin', there won't really be any difference) will add to its possible values on startup. Such components will register a contract ID which consists of a common prefix and a suffix specific to their handled content type; this suffix will also be added to a global array which matches 'cooked content type' values to strings; finally, an entry will also be added to an array which matches 'cooked content type' values to boolean functions. When someone calls nsMimeService::Create() with some set of headers, the cooked mime type will be determined by calling the functions in the array until one of them reports a match; then, the appropriate string from the afforementioned array will be appended to the common prefix, to complete the contract ID whose component will then be instantiated. Wow, I can't believe I just said that, it's so convoluted. Still, all of this will not trouble the outside world like what happens now with the libmime code. Everything will eventually be nice and usable C++ classes and XPCOM components. I hope.
Denis wrote: > IMHO, xpcom is not needed here. libmime is used only internally, so why to > take overhead? Well, see that's the thing. Right now there's an XPCOM wrapper around an internal C-based object system. Since many (all?) of the callers user XPCOM, this ends up being an extra convulated layer of code which makes debugging, interoperability, and scriptability (not to mention string sharing) more difficult. It will almost certainly be cleaner to use XPCOM as the object system throughout. There are costs to XPCOM, it's true, but my suspicion is that the relative costs here are going to be small compared to the relative benefits. However, I don't have a deep understanding of the existing MIME code, so I've added some folks to the CC who have interest in and/or expertise with the MIME code. In particular, Jean-Francois Ducarroz is the real expert on the existing code, if I recall correctly. So if any of the newly CCed folks would like to chime in, that'd be great...
OS: Windows XP → All
Hardware: PC → All
Updated the files at the URL. Of special interest is the instantiation mechanism: nsMimeService::Create(), nsMimeService::DeterminePartType(), nsMimeService::RegisterHandler() . I _urge_ you to comment here or by e-mail; I would probably not have gotten this far without the valuable input from Dan Mosedale, Axel Hech and Neil Rashbrook. Also, I have looked netwerk/mime/public over briefly, and I find that: 1. the 'MIME Service' defined there should be renamed to 'MIME extension service' or 'MIME extension mapping service' to avoid conflict with the MIME service in my rewrite; 2. since the mime part type detection code also seems to does type<->file-extension mapping, the netwerk mime code may be elligible for integration with the rewritten libmime code - but not before the rewrite is done.
> 'MIME extension service' It does a heck of a lot more than extensions... It handles general mapping of all sorts of stuff to "MIME info" objects that encapsulate everything Mozilla knows about a MIME type.
Ok, I stand correct; how about 'MIMEInfoService'?
Doesn't necko need the current mime info stuff outside the context of mailnews? If so, apps like FireFox would rather have the current relatively small mime code, instead of shipping the new mailnews mime dll. I think using XPCOM objects for the main Mime objects is reasonable, but that doesn't mean you have to go through CreateInstance to create objects - just having the objects inherit from ISupports and support interfaces so you can use comptrs and the like is plenty. You can still just use new to instantiate the objects since mime is a closed system.
(In reply to comment #17) The problem is that I want consumer code to be able to instantiate a MIME part (based on the headers and the display options at the moment, perhaps more opaquely evenutally) without having to check which type of a MIME part this is. To do so, the abstract base class needs to 'know' about its inheritors in order to choose which of them to instantiate. That's a big part of the reason I'm using XPCOM. As for necko, if and when the rewrite is complete, someone could determine what the overlap is, separate it from both necko and the general MIME code, and have both of them use a single component/dll/whatever for the common functionality.
Can I ask what your motivation is for re-writing libmime at this point in time? What is the problem you are trying to solve? What is the end user benefit from re-writing this component? We have a really long list of important big things to do, I'd consider this to be way at the bottom of our priority list if its even on the priority list at this point in time. If you are looking for a project to work on that is going to have some user benefit I'm sure we can help point you in the right direction.
(In reply to comment #19) > Can I ask what your motivation is for re-writing libmime at this point in time? 1. I believe a rewrite of the MIME code will facilitate a solution to bugs 108010, 3901, 11013, 204612, 11521, 149771, 182627, 162138, 184869, and perhaps a few others as well, but most importantly, bug 2920. 2. unless I am mistaken, there does not seem to be a 'proper' internal representation of a message in the mailnews code. The MIME rewrite doesn't solve this, but it seems like a step in that direction. 3. The MIME code begs it. 4. I initially thought this would be an easy conversion from non-C++ish code to C++ish code; I did not foresee the numerous architectural difficulties I am encountering. 5. I have been assured by the moral authorities of #developers@irc.mozilla.org that working only on the bugs you want does not make you a selfish person :-) Another update of the files. Upcoming tasks: the dreaded 'parsing' code of mimei and mimeobj ; interfaces for callback and/or parsing mechanism redesign; child handling in nsMimeContainer; tinkering with MimeDisplayOptions.
> You can still just use new to instantiate the objects since mime is a closed > system. However, it really shouldn't be a closed system. When I started writing spam-filtering code back in the day, I was amazed to find that there was no easy way to dissect a message into MIME parts from JS. Exposing these interfaces as regular XPCOM components allocated by the shared allocator will allow for significantly more powerful extensions to be easily built, I believe.
well, it's closed in this sense - only the mime component would actually create the objects that represent those parts. It's like the msg db code - lots of client code refers to nsIMsgDBHdrs, but only the db code ever creates a msg hdr object, and then hands off the interface pointer to the client code.
David, the point is to break this closure. Bug 2920 is just the simplest example: take a message, delete some nodes in the MIME part heirarchy, and replace them with an 'external body' leaf part. I'm sure people will think of many more uses.
(In reply to comment #16) > Ok, I stand correct; how about 'MIMEInfoService'? Fine by me. ccing darin too.
Are you planning on re-writing GTK+ using C++ too? (The libmime object system looks a lot like the one GTK+ uses.) ;-) All kidding aside, I'm not sure the choice of object systems is the real problem here. Why does it prevent bugs from being fixed? Why not make incremental fixes? I tend to agree with mscott... why should Mozilla risk a major rewrite of this component? Why not layer XPCOM on top of libmime where appropriate to expose functionality to JS? I know this requires careful study of libmime, but that's required anyways if you intend to rewrite it. (In reply to comment #9) > Plus the big-switch means that you can't use plug-ins - each new content > handler would have to add a line to the switch code of the abstract base > class. Why not add a switch that invokes a XPCOM interface to extend the existing object system? Why not do so in a way that leaves existing (tested and proven) code intact? Or, would this equate to too much bloat? I'm not adverse to fixing the bugs you mentioned of course... I just think that you should exhaust all other reasonable avenues before going down the path of rewriting a large body of legacy code. The more code you change, the more regressions you will introduce -- that's a simple fact of software engineering. That said, if you really want to change the object system, then you should IMO look for a way to carefully swap out the existing object system for another one without really disturbing the implementation. Then that might be easier to digest and should make people less nervous :-) [sorry for the drive-by inquisition!]
(In reply to comment #25) > All kidding aside, I'm not sure the choice of object systems is the real problem > here. Why does it prevent bugs from being fixed? Answer me this. I am writing an extension or a patch. I have a string which is a raw MIME message I have read from somewhere magical you don't know about. Now I want to manipulate this message as a tree of MIME parts - regardless of any consideration of the way it would have been displayed by the mailnews GUI. Can I do that? No. All I can do, basically, is format it for display. But surely a MIME library should be more than a formatting aide? > Why not make incremental fixes? The same you you couldn't, say, write a single widget in Gtk using C++ and use it with the rest (I don't know Gtk, that's just an analogy). You'd have to add instrumentation/bridging/whatever code, which you don't even really want. Or maybe think about incrementally fixing Win9x vs. using NT (not that NT is all that wonderful). > I tend to agree with mscott... why should Mozilla risk a major rewrite of this component? How is there a risk? It's not as though I'm deleting anybody's files... if the mozilla developer community and/or the module owner/s decide they don't want this rewrite, it will stop. If you're referring to the numerous bugs that are sure to surface, than your argument would probably hold against rewriting any part of Mozilla. Hmm... Perhaps you are suggesting that this change be made on a new branch from the trunk to be merged at a later time? I have no experience with large-scale code integration so I can't say whether that would be a good idea or not. Of course since I do not have CVS privileges I don't need to worry about that. > Why not layer XPCOM on top of libmime where appropriate to expose functionality to JS? This is what's happening now. But the problems of libmime makes that exposition rather difficult to work with, and unpopular among developers (at least as far as I can tell). > I know this requires careful study of libmime, but > that's required anyways if you intend to rewrite it. > > (In reply to comment #9) > > Plus the big-switch means that you can't use plug-ins - each new content > > handler would have to add a line to the switch code of the abstract base > > class. > > Why not add a switch that invokes a XPCOM interface to extend the existing > object system? Why not do so in a way that leaves existing (tested and proven) > code intact? Or, would this equate to too much bloat? > > I'm not adverse to fixing the bugs you mentioned of course... I just think that > you should exhaust all other reasonable avenues before going down the path of > rewriting a large body of legacy code. The more code you change, the more > regressions you will introduce -- that's a simple fact of software engineering. > > That said, if you really want to change the object system, then you should IMO > look for a way to carefully swap out the existing object system for another one > without really disturbing the implementation. Then that might be easier to > digest and should make people less nervous :-) Although I initially wanted to do just that, it seems that its peculiarities are at the core of the libmime functionality. I can't bring myself to write uglier code over ugly code. I realize it means breakage and regression, but enough mailnews magnates seem to think it's worth it, and so do I. For now I ask you to pretend I'm not writing anything and reserve your judgement until I have something complete and sort-of working to offer (which may not even happen at all). > [sorry for the drive-by inquisition!] Among Mozilla's chief weapons are fear, surprise, ruthless efficiency and near fanatical devotion to free software :-)
I'm just curious, who are the mailnews magnates encouraging you to undertake this re-write at this particular point in time? As one of the mailnews module owners, I'm doing my best to discourage you! :)
See comment #1. Specifically, Dan Mosedale has been encouraging me on, so to speak, and has told me David Bienvenu also thinks it's a good idea to better XPCOMify the MIME code.
I had seen hints of this earlier, but only now do I seriously suspect that in writing libmime there was absolutely no intention of creating representations of messages as heirarchies of MIME parts for use as such, but rather only in the context of a stream conversion (perhaps even only in the context of the use of nsIMimeStreamConverter). nsIMimeStreamConverter, an entity which I have not yet familiarized myself with (nor with concept of listeners and stream converters in general in mozilla), is apparently given callbacks for reading and for writing lines or characters or whatnot from and to apparent streams, and then proceeds with parsing a mime message from the source 'stream', building a part heirarchy merely en passant, then probably destroying it after completing the writes of the converted message. I also lack any notion of what exactly happens with attachments in this process. Now, if this continues to be the case, then XPCOMifying the MIME part classes is useless, since they do not 'work' except in the context of a stream conversion process. So I need to do is make these classes be able to represent a message without any read (or write) callbacks, purely internally. And as for the conversion process, I can think of two options. The first option is to act like a frontend/backed compiler - to split the conversion into two phases: phase 1 - use only the read callbacks to build an internal message representation of thw whole message - with all of it in memory ; this process is probably completely independent of what the consumer code has specified the converted message should look like, which part types are allowed and which aren't, etc (although the consumer code could of course specify "only build the heirarchy starting from some subpart of the original message", etc). phase 2 - according to the kind of conversion the consumer code has asked for, do a traversal of the heirarchy and use only write callbacks to generate the output. A second option is to somehow build a skeleton of the message internally without actually reading the contents, and then having the traversal do both reading and writing. This is somewhat more like what is being done today, and I'm not yet sure exactly how it works now or how it could work in the rewritten code.
>David Bienvenu also thinks it's a good idea to better >XPCOMify the MIME code This is true, but I definitely agree with Scott that we have a lot of more important things to do. If it makes it easier to implement features like remove attachment, that would make this a lot more important. In theory, I think we can implement "remove attachment" w/o this, however, by implementing a stream listener that strips out the attachments and rewrites the message. I also hear and understand that this is what interests you...it sounds like an increasingly difficult task, however.
I personally think that the easier solution would be to write a specific emitter which will generate a tree structure of the parts. The output could be C++/JS data or XML or what ever you wish.
Ok, now I'm getting the feeling that there isn't that much support for a rewrite after all. I'd like to know whether you're discouraging me because you feel the rewrite won't help or won't matter, or because you think there is an easier solution to 2920? Because if the latter is true, than forget I mentioned 2920. I'm from the school that says let's first write code that does nothing much, but does it really well, then use it to actually do something. If the former reason is true, tell me to stop and I'll stop. I just want to reiterate that I didn't choose to start doing this because it seems like a cardinal challenge for mailnews, but rather because nobody else seems to want to do it, and because I think it is useful, and because I thought it would be easy to separate the chaffe from the wheat (which it turns out it isn't). BTW, I have compiled for myself a lists of the dependents on the mime/public and mime/src code: http://www.earendil.ath.cx/mime-rewrite/external-dependents.html
Well, as you can see, I have failed to devote enough time and effort to this rewrite over the past month and a half, so I'm 'formally' suspending my work on it.
Product: MailNews → Core
See also bug 108010, comment 4 for a few ideas regarding what fixing this will enable.
sorry for the spam. making bugzilla reflect reality as I'm not working on these bugs. filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
QA Contact: mime
Product: Core → MailNews Core
The plan now is to replace this with a mostly-pure JS implementation of MIME code (see bug 746052 for a start).
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.