Last Comment Bug 162765 - RFC2231 filename support for nsExternalAppHandler
: RFC2231 filename support for nsExternalAppHandler
Status: RESOLVED FIXED
: intl
Product: Core Graveyard
Classification: Graveyard
Component: File Handling (show other bugs)
: Trunk
: All All
: P3 normal (vote)
: mozilla1.5alpha
Assigned To: Jungshik Shin
: Chris Petersen
:
Mentors:
http://jshin.net/moztest/download.html
: 155949 205232 206788 (view as bug list)
Depends on: 191541
Blocks: 158006 177840
  Show dependency treegraph
 
Reported: 2002-08-14 17:15 PDT by nhottanscp
Modified: 2016-06-22 12:16 PDT (History)
13 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
a patch (12.56 KB, patch)
2002-08-27 16:21 PDT, Jungshik Shin
no flags Details | Diff | Splinter Review
a path (with mailnews dependency removed) (53.55 KB, patch)
2002-11-20 17:43 PST, Jungshik Shin
no flags Details | Diff | Splinter Review
files in mimehdrpar module (a lot of small files) (65.58 KB, patch)
2002-11-20 17:57 PST, Jungshik Shin
no flags Details | Diff | Splinter Review
patch v3 (a bit more organized, new files all included) (125.67 KB, patch)
2002-11-22 22:02 PST, Jungshik Shin
no flags Details | Diff | Splinter Review
a new patch (86.44 KB, patch)
2003-03-13 07:00 PST, Jungshik Shin
no flags Details | Diff | Splinter Review
a new patch (88.45 KB, patch)
2003-03-19 23:22 PST, Jungshik Shin
no flags Details | Diff | Splinter Review
yet another patch (90.69 KB, patch)
2003-03-21 03:19 PST, Jungshik Shin
no flags Details | Diff | Splinter Review
same as above (with non-essential part removed to help reviewers) (57.02 KB, patch)
2003-03-25 09:00 PST, Jungshik Shin
no flags Details | Diff | Splinter Review
a cleaned-up patch (91.54 KB, patch)
2003-04-17 06:15 PDT, Jungshik Shin
no flags Details | Diff | Splinter Review
a new patch (89.96 KB, patch)
2003-04-25 04:39 PDT, Jungshik Shin
cbiesinger: review-
Details | Diff | Splinter Review
a new patch addressing Chrsitian's concerns (98.56 KB, patch)
2003-06-03 06:52 PDT, Jungshik Shin
alecf: superreview-
Details | Diff | Splinter Review
diff. bet. the second latest and the latest patch (46.06 KB, patch)
2003-06-03 07:01 PDT, Jungshik Shin
no flags Details | Diff | Splinter Review
a new patch per Alec's comment (99.45 KB, patch)
2003-06-05 05:13 PDT, Jungshik Shin
cbiesinger: review+
alecf: superreview+
Details | Diff | Splinter Review
uriloader : 4-line fix (2.16 KB, patch)
2003-06-13 08:44 PDT, Jungshik Shin
cbiesinger: review+
bzbarsky: superreview+
Details | Diff | Splinter Review
fix to make mailto urls work in mail again (617 bytes, patch)
2003-06-17 14:18 PDT, Scott MacGregor
mozilla: superreview+
Details | Diff | Splinter Review

Description nhottanscp 2002-08-14 17:15:14 PDT
http://www.ietf.org/rfc/rfc2231.txt

separated from bug 161174, a filename in HTTP header may be encoded in RFC2231
form, currently Mozilla do not support that for file download.
Comment 1 Jungshik Shin 2002-08-22 00:22:17 PDT
I  filed bug 155949 and added pretty extensive information
on the issue. IMHO, it's better to make this one as a duplicate
of bug 155949. The other way around is all right with me,
but in that case what I wrote there would have to be copied here,
which seems to me to be sort of waste of time and resource. 
Comment 2 Boris Zbarsky [:bz] (still a bit busy) 2002-08-26 15:04:07 PDT
*** Bug 155949 has been marked as a duplicate of this bug. ***
Comment 3 Jungshik Shin 2002-08-27 16:21:55 PDT
Created attachment 96925 [details] [diff] [review]
a patch

With this patch, the last 5 cases at http://jshin.net/moztest/download.html
work as expected/intended. If RFC 2231 compliant C-D http header
is present, charset is read off and the parameter value is
decoded accordingly. When raw 8bit characters are used
without charset info. in C-D header, UTF-8 is tried first and 
if that does not work, the document charset (originCharset) is 
used instead.

A couple of issues:

EnsureUTF8Filename() was copied almost verbatim (aside
from URL encoding/decoding) from
nhotta's EnsureUTF8Spec() in nsSmtpService.cpp.
It might be a good idea to have EnsureUTF8String()
somewhere in intl/uconv to avoid cut'n'paste of
the code. 

We may want to have 'IsUTF8()' that works simliarly
to 'IsASCII()'. (I saw in one of bugs related to this
that 'IsUTF8()' was briefly discussed, but forgot which)
To me converting to UCS2 and then back
to UTF-8 and comparing the result with the original
string does not seem to be a very  good idea.  
  
And, there may be other problems as well. It'd be
nice to hear feedback, positive or negative.
Comment 4 Jungshik Shin 2002-08-27 16:30:28 PDT
One more issue which I think has to be dealt with
separately from this one is what to do when
the local file system encoding cannot represent 
characters in a suggested filename. Users can for sure
override, but my test result of the page mentioned
previously can be confusing to users. 

After verifying the patch works fine under ko_KR.UTF-8
locale, I ran Mozilla under C locale(Linux) and obviously
Korean characters in the URL aforementioned
cannot be represented in C locale. Without
any other option, Mozilla assumes that
the local file system encoding is the encoding
of the locale. However, the filepicker dialog
shows a Korean filename as if Mozilla can
use it. When I press 'OK' button, 
Mozilla falls back to 'something' and
the download manager uses that fallback. 
A more gracious and user-friendly approach
is necessary here.
Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2002-08-27 17:13:34 PDT
One thing that sorta leaps out at me there is that this makes uriloader depend
on a mailnews module... won't this break when we build with --disable-mailnews?
Comment 6 Jungshik Shin 2002-08-27 18:06:34 PDT
> this makes uriloader depend
> on a mailnews module... won't this break when we build with --disable-mailnews?

  I never thought about it. I guess that's  likely.(I wish we had 
a generic mime module not tied to mailnews.) Then, I looked
at Makefile.in in mailnews/mime/build and it does not depend on mailnews.
The dependency is the other way around (mailnews/base/build).
Now the question is what exactly --disable-mailnews disables.... 

If indeed that dependency
breaks a build with mailnews diabled, what would you suggest? Perhaps,
we just have to go back to nhotta's original code and build upon
it to support RFC 2231. Or just copy'n'past two functions in mimehdrs.cpp
into nsExternalHelperAppService.cpp although that may be pretty ugly(?).

 Before pondering further, let's get the definitive answer on the dependency
issue.   well, I can clobber and build
with --disable-mailnews. I'm gonna ask the owner of mime module..

  
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2002-08-27 18:16:47 PDT
The simplest solution would likely be to move this stuff into uconv or necko and
have both mailnews and uriloader is it from there....
Comment 8 Jungshik Shin 2002-08-27 23:30:17 PDT
A problem with moving this stuff out of mime module into uconv or necko
is that 'MimeHeaders_get_parameter()' is invoked 50+ times in mime module.
That, by itself, may not be a big problem (I believe none of them 
is used in a context critical to the performance) except that it's tedious
to change them all. Another function, mime_decode_filename() refered
to in GetMimeHeaderParameter() is invoked only 7times, but dividing
it into a part not depending on mime module and the other dependent on
it is a bit ugly although doable. If we wanna go down that path,
I guess we definitely need to talk to the owner of mime module,
ducarroz@netscape.com. I'm now adding ducarroz to cc.
Comment 9 Jungshik Shin 2002-11-14 00:14:52 PST
I took another look at this bug and at first it looked like
I could just move three functions out of mimehdrs.cpp 
and put them in uconv or necko. It turned out that
the extent to which things depend on each ohter
makes it hard to separate cleanly what I need from
the rest of mailnews/mime. 
 

Now I'm considering moving the whole mailnews/mime
out of mailnews and putting it under necko or somewhere else.
That may have been what Boris meant in comment #7.

Anyway, ducarroz and Boris, how does it sound? 
This is a big change (and the cvs admin. wouldn't
happy for havig to  do extensive operation to keep 'history' intact
after moving so many files....). 
Comment 10 Boris Zbarsky [:bz] (still a bit busy) 2002-11-14 00:20:14 PST
That's sort of what I meant, yes.  I'm not sure what the mailnews team's take on
that is.... ccing darin also, since he should certainly know about changes like
this.  :)

If we can't reuse the mailnews code, I would opt for a helper function somewhere
(sort of like the mimetype-parsing function we have) that would take a string
and parse out the disposition and filename.... that would allow us to save some
already-duplicated code in uriloader and exthandler.
Comment 11 Jungshik Shin 2002-11-18 15:45:15 PST
I tried compiling in mailnews/mime even when mailnews is disabled
(by tweaking a couple of Makefile.in and allmakefiles.sh. this way,
I thought we'd be able to avoid 'cvs surgery'), but
found that 'mime' has a dependency on the rest of mailnews
(there's only one dependency, but it's rather hard to remove it).
So, that doesn't work. 

Instead, I'm going for the second option while trying to resue mailnews/mime
code whenever possible by making a new directory next to mailnews/mime and
putting there what we need _outside_ mailnews.  What's in
there will be compiled regardless of whether mailnews is enabled or not.


 
Comment 12 Jungshik Shin 2002-11-20 17:43:48 PST
Created attachment 106968 [details] [diff] [review]
a path (with mailnews dependency removed)

I moved what's necessary for this out of mailnews/mime 
and put them in a new mailnews/mimehdrparm (name and 
location are tentative). Because some of them are utility functions
used by the rest of mailnews, I took some care to make them
available to mailnews 'directly' (not via xpcom). Perhaps, this
is not a good idea and I should expose them only thru xpcom.
I also have to make a couple of methods scriptable so that
Javascript can call them. I found that 'save target as'
takes a separate code path from 'Save' and is dealt with with JS.
Besides, it might be better to make 'service' do more work
than callers.  

I haven't yet implemented getting the MIME charset/encoding
corresp. to the current locale and use that as the last resort
when MIME charset info. is not otherwise available/inferred.
There's no XP-way available at the moment...

Anyway, please, take a look and make some comments. 
There are a lot of 'quirkiness' in a sense ...
Comment 13 Jungshik Shin 2002-11-20 17:57:43 PST
Created attachment 106972 [details] [diff] [review]
files in mimehdrpar module (a lot of small files)

A collection of new files in mimehdrparam module. 

I'm not sure of the best way to build a lib for this
module. Currently, both dll/shared lib as a component
and static lib. are built. When I only built
static lib., do_GetService() somehow failed
(perhaps I should have rebuilt from the top of the
tree in that case). On the other hand, I couldn't
resolve symbols for mailnews (when it's enabled)
when mimehdrparam was built as dll only. I found
some old news postings about this in xpcom newsgroup.

Any advice/comment on this and other aspects would be appreciated a lot.
Comment 14 Jungshik Shin 2002-11-20 18:05:03 PST
http://jshin.net/moztest/download.html (I don't have privilige to put
this in the URL field. Boris, can I take this bug?) 
has some testing cases (the last section). 
I'm going to revise the page  so that it may not work sometimes. 

BTW, i18n of 'save target as' is bug 158006. Once I make a couple
of methods exposed  to JS, it should be easy to fix it.
Comment 15 Boris Zbarsky [:bz] (still a bit busy) 2002-11-20 18:08:36 PST
oh.  I didn't realize you did not have the privs to take the bug.  Sorry about
that.  Reassigning.  Feel free to mark any of those as needing review from me
once you're somewhat happy with them.  ;)
Comment 16 Jungshik Shin 2002-11-21 00:20:05 PST
Thank you, Boris for assigning this to me. 
Anyway, before making some 'feature enhancements'
and structural changes I talked about, I need to ask a
few questions about the way things are laid out and
built. 

What my patch as of now does are :

  A.  make a new nsIMimeHdrParam service with a single method ( for now)
    to retrieve rfc2231/rfc2047-decoded values of mime header parameters
    (in my first patch, the method was added to nsIMimeConverter service,
     but it introduced mailnews dependency which needs to be avoided).
    This module may have to be expanded later if it turns out that
     Mozilla needs to generate (as opposed to parse) multipart/form-data
     (with RFC 2231 style C-D header) for form submission

  B. moves a dozen of functions out of libmime (comi18n.cpp and mimehdrs.cpp) 
    and put them into a new 'module' mimehdrparam. Tentatively, I put
    it under mailnews, but it can go anywhere in the source tree. 
    An upside of putting it there is that its 'relation' with mailnews
    is not entirely severed. On the other hand, it may be considered
     'hackish' to compile in 'mailnews/mimehdrparam' without turning
     on 'mailnews' when 'mailnews' is disabled. (mailnews/mimehdparam is
     added as a 'tire-9' module so that it's always built).

  C. most functions separated are not used in the rest of mailnews but
    there are three or four of them required by mailnews. To make them
    available to mailnews across 'module boundaries', I think there are
    two options. This is an area where I need your opinion as well as
    that of ducarroz and darin. 

    One is to add them as 'methods' to new nsIMimeHdrParam
    and the other is to just expose them as 'extern' and link 
    mailnews (when enabled) with mimehdrparam lib(static). I took the
    latter path. The latter, in turn, seems to be approachable 
    in two ways(or more ways ...). The first is to separate 
     a part that implements method(s) for nsIMimeHdrParam 
    from the part that need to be used by both MimeHdrParam
    and mailnews and build dll/so(component) for the former and 
    static lib. for the latter. The path my latest patch took
    is to leave them together, but to build both dll/so(component)
    and a static lib. for the whole thing. I set both
     IS_COMPONENT and BUILD_STATIC_LIBS in Makefile.in of
     src directory of mimehdrparam) and it works fine
     under Win2k(I'm gonna try it under Linux soon)
     Before that, I also
     tried FORCE_STATIC_LIBS without IS_COMPONENT but
     it didn't work. (compilation went thru, but
     |do_GetService| failed for nsIMimeHdrParam) I read quite a lot of news postings
     and comments in bug 46775 about 'static components
     for xpcom', but am not yet sure
     which is the right/optimal way which wouldn't break build on all
     platforms Mozilla is ported to. To make things more complicated,
     libmime is a part of metamodule 'mail'.
     
  As for item A, I guess I have a strong case.  Now for item B
  and C, I like to know what the module owner (ducarroz) as well as
  Boris and darin thinks because they're about taking away something from mailnews.
  I'm also taking a liberty to add waterson (hope he won't mind :-)) to 
  CC for 'xpcom build' issue.
   
  Thanks again for reading and help in advance..
Comment 17 Boris Zbarsky [:bz] (still a bit busy) 2002-11-21 11:44:58 PST
ccing alecf too; since we want to try to avoid creating a whole new module...
Not sure whether mailnews would be ok with moving this into necko or something
like that....
Comment 18 Jungshik Shin 2002-11-22 22:02:49 PST
Created attachment 107217 [details] [diff] [review]
patch v3 (a bit more organized, new files all included)

Like Boris wrote, I'd like to hear from mailnews/xpcom experts as to how to
divide
things and where to put them.  

In this patch, consumers of GetMimeHeaderParameter are relieved of
making sure the result is indeed in UTF-8. The chore is now taken care
by the method. I made it(actually there are two of them because
I'm not sure which string types are allowed in scriptable
methods. I found that nsAString works well instead of
wstring. Can I use nsACString as well in place of 'string') scriptable 
and it works with Javascript.  Javascript part fixes I18N issue
of 'save target as'(not completely but RFC 2231 compliant
cases are well taken care of) and C++ part fixes 'save'. Raw 8bit
chars are interpreted as UTF-8 at first and then as 'originCharset'
of the source URL if available. For textual types, I may also
try 'contentCharset' of the current channel(as opposed to
originCharset). . It appears easy
to do this in C++, but in JS, I need to dig up more.
However, checking contentCharset may not be such a good idea
because so many servers suck and don't emit the correct
MIME charset in HTTP header for textual types(text/*).	

Featurewise ('save target as' was filed as a separate bug. it's bug 158006),
it does what I want it to do. However, I'm not sure how to put things
together or take them apart (?).
Comment 19 Jean-Francois Ducarroz 2002-11-26 11:08:26 PST
Jungshik, I am busy right now but I'll take a close look soon.
Comment 20 Jean-Francois Ducarroz 2002-12-02 12:12:54 PST
I haven/t reviewed the whole patch (I am not the right person for things else
than mailnews by the way) but mimehdrparam should not be part of mailnews. we
don't want the browser to have a dependency on mailnews! The goal of extracting
this code from mime was to not have a dependency on mime (which is part of
mailnews). This code should live in netwerk: maybe in netwerk/mime or in
netwerk/mimehdrparam

ALso, looks like you have extracted too much stuff especially stuff used only
for emails.

cc'ing nhotta for his advice about international issues
Comment 21 Darin Fisher 2002-12-03 09:20:41 PST
use netwerk/mime/ instead of creating a new subdirectory under netwerk/.  thx!
Comment 22 Jungshik Shin 2002-12-12 19:55:35 PST
Ducarroz and Darin,
Thank you for your comments/answers and I'm sorry for the late response.

>looks like you have extracted too much stuff especially stuff used only for emails.

Actually, I tried to extract the only part of mailnews that I need. 
The only thing I think I extracted unnecessarily is the 
content of 'nsMailHeader.h' most of which are only
for emails as you wrote.  I'll put it back.
  
 A dozen or so functions I took away from mimehdrs.cpp and comi18n.cpp 
are not used by the rest of mailnews at least as of now.   There are, however, 
3 or 4 functions that are needed by both 'mimehdrparam'
and mailnews. They're the cause of  my headache because
I have to make them available across module boundaries
(see comment #16, item C). 
 

>use netwerk/mime/ instead of creating a new subdirectory under netwerk/.

  I'll try, but at the moment, it seems easier to make a new subdirectory under
netwerk if that's not the only way to take care of issues I wrote about
in comment #16 (item A and B were resolved by  your and Ducarroz's responses,
but to have a satisfactory resolution of C(in comment #16), I need some help from
xpcom expert.   Alecf, can you help? 




Comment 23 Alec Flett 2003-01-31 11:29:51 PST
sure. A few general comments:
- stuff should not only not depend on mailnews, but it shouldn't even be in the
mailnews/ subdirectory if you expect it to build when mailnews is disabled.
- who is the primary consumer of this interface, or what is the primary function
of this code that's being moved out? 
Is it network related? then it belongs in necko.dll or necko2.dll. 
Is it international/charset processing stuff? Then it belongs in i18n.dll. 
Is it for dealing with helper apps? then it belongs in the uriloader/ directory.

Some guidelines:
No matter where it goes, the only way to make methods available across modules
in xpcom is via an interface. If your utilities do not maintain state, or you
only need to maintain some singular state across the process, then consumers
will need to create it as a service. 

I know that creating an object just to make a stateless call seems like
overkill, but its the only way for us to do it.. and it is done across the rest
of the product in various places. Linking against a new DLL is really a last
resort, and trying to manually load a DLL, find a symbol, etc, has about as much
static footprint overhead as going through XPCOM and getting a service.

Furthermore, I would like to see as absolutely minimum code as possible moved
out of mailnews and made available to the general public! And if it isn't mail
related, it shouldn't even have mail in the names of the files, classes, or
contractIDs.

Let me know if I can be of more help - or catch me on IRC and we can discuss it.
Comment 24 Boris Zbarsky [:bz] (still a bit busy) 2003-01-31 12:09:51 PST
> Is it network related?

Yes.

> Is it international/charset processing stuff?

Yes.

The code in question is parsing code for general network headers that allows
non-ascii header values (handles parsing, conversion to Unicode, etc).  It'd be
fine in either of those modules....  

We could either make this a service (which would take the header CString every
time) or an object (takes string once, parses once, returns many results). 
Which one we choose should just depend on the usage patterns.
Comment 25 Jungshik Shin 2003-03-13 07:00:43 PST
Created attachment 117077 [details] [diff] [review]
a new patch

Finally, I managed to do what's long overdue. I'm
sorry for the delay and thank you all for your help.

I just want to put this somewhere safer than my disk :-).
This patch does everything I planned for this bug
(bug 158006 needs a small patch to JS code relying
on this patch) but a lot of clean-up and documentation 
have yet to be done. I'll do it soon(perhaps early next week).

In the meantime, here's what this patch does:

0. add a new interface nsIEnsureUTF8 to
   intl/uconv.	
   This is to avoid the code duplication. It's based
   on Naoki's implementation in mailnews/compose.
   At a couple of places, I need to use this 
   and this may be useful at other places as well. 
 
1. add a new interface nsIMIMEHeaderParam and
   its implementation to netwerk/mime.
   It offers 4 methods, but only one of them
   |getHeaderParam2| is for general consumption.
   Other three are implemented because they're
   not only needed internally but also used
   heavily in mailnews/mime. They're implemented
   by moving corresponding functions in
    mailnews/mime/src (see below) and applying
    a little cosmetic(??) touch and impedance
    matching.  

2. I redefined three functions in mailnews/mime/src
    (in comi18n.cpp and mimehdrs.cpp) as wrappers
    for three methods mentioned above. 
 
3.  Several static functions and arrays in comi18n.cpp
    and mimehdrs.cpp were moved to nsMIMEHeaerParam
    implementation. They're used by three methods
    above, but are NOT called by any other functions
    in mailnews so that it's safe to move them away
    from mailnews.

4. At a couple of places in uriloader, |GetHeaderParam2|
   is called to deal with RFC 2231 and RFC 2047 encoded
   C-D filename/name parameter. 

Please, take a look and give me some feedback.
There would be a lot of comments. I don't think
it's pretty as it is now. In many places,
I was caught in the middle between string/wstring
and ns*String especially because three methods
moved from mailnews/mime and other static functions
are almost entirely based on string and I didn't
want to modify them extensively (at least in the
first pass)
Comment 26 Boris Zbarsky [:bz] (still a bit busy) 2003-03-13 10:08:25 PST
From a totally cursory look:

1)  nsIEnsureUTF8.idl should not contain C++ goop.
2)  It should not have tabs.
3)  Why the [const] on the in params?  What does that do?
4)  What is the encoding of the return value of ensureUTF8Spec?
5)  Could the nsEnsureUTF8 code use that proposed function to check for
    UTF-8ness? Converting back and forth to UTF-16 seems silly....
6)  nsUConvModule.cpp seems to have some leakage from a different patch
7)  nsIMIMEHeaderParam.idl should have comments in JavaDoc format (starting with
    /**) and all methods should be commented, as well as all params.
    Indentation should be fixed.

I didn't read the code guts at all, pretty much; just the interfaces.
Comment 27 Christian :Biesinger (don't email me, ping me on IRC) 2003-03-13 11:38:45 PST
Comment on attachment 117077 [details] [diff] [review]
a new patch

NS_IENSUREUTF8_CONTRACTID
that is a bad name. the contract specifies an implementation, not an interface.

{249f52a3-2599-4b00-ba40-0481364831a2}
I don't think it's a good idea to give the interface the same id as the class.

+	AUTF8String convertToUTF8([const] in ACString aString, 

"in nsACString" already makes it const (C++ header has const nsACString&)

+static NS_DEFINE_CID(kIEnsureUTF8IID, NS_IENSUREUTF8_IID);
+static NS_DEFINE_CID(kCharsetConverterManagerCID,
NS_ICHARSETCONVERTERMANAGER_CID);

please don't use CIDs, but contract ids instead.

+NS_IMETHODIMP	nsEnsureUTF8::ConvertToUTF8
+(const nsACString &aString, const char *aCharset, PRBool aCheck,
+ nsACString &aUTF8String)

is there a special reason for putting the argument list onto a new line?

+  aUTF8String.Truncate(0);
remove the 0

+      aString.Equals(NS_ConvertUCS2toUTF8(NS_ConvertUTF8toUCS2(aString)))) {

do we _still_ not have a better way to do this?

+NS_IMETHODIMP	nsEnsureUTF8::EnsureUTF8Spec
+(const nsACString &aSpec, const char *aCharset, nsACString &aUTF8Spec)
+  aUTF8Spec.Truncate(0);
again, no 0, and same question about argument list

+  AString getHeaderParameter2([const] in AUTF8String aHeaderVal,

why the 2? why the const? 

+ /* 
+  * char * mime_decode_filename(char *name, const char *charset, 
+  *				 MimeDisplayOptions *opt) 

looks like you should remove it.
Comment 28 Alec Flett 2003-03-13 13:16:40 PST
Comment on attachment 117077 [details] [diff] [review]
a new patch

a few of my own comments:
- you need fresh license files from http://www.mozilla.org/MPL/boilerplate-1.1/
- don't use inout string - the memory management is really ugly. Instead use
inout AString - you'll find your code cleans up signifigantly. In fact, from
looking at the code, I can't tell if that really should be inout, or just out.
- when constructors and destructors have no body, you should inline them to let
the compiler optimize them out
- I'm really not a fan of all this code just copied and pasted in from mail..it
had previously been copied in from Netscape 4.x, and I'd really like to see it
streamlined and cleaned up.. there seems to be a lot of legacy stuff in there,
but I'm not sure.

Ugh, and all these "static char foo[256]" stuff - do we really need base64
decoding?

I guess before we just slap in all this code from mail, I want you (jshin) to
do a thorough code scrubbing to make sure you're only bringing in the
functionality you need, and that the code is factored properly (for instance,
do we already have a base64 decoder?)
Comment 29 Christian :Biesinger (don't email me, ping me on IRC) 2003-03-13 13:24:52 PST
>do we already have a base64 decoder

sure. NSPR has PL_Base64Decode.
http://lxr.mozilla.org/nspr/source/nsprpub/lib/libc/include/plbase64.h#68
Comment 30 Jungshik Shin 2003-03-13 20:47:59 PST
Thank you all for comments.

bz> 2)  It should not have tabs.  
bz> 7)  nsIMIMEHeaderParam.idl should have comments in JavaDoc format 

Yup. these are all parts of clean-up/documentation I wrote about.

bz > 3)  Why the [const] on the in params?  What does that do?
cb>   in nsACString" already makes it const
is there a special reason for putting the argument list onto a new line?

  Thanks. I must have done too many copy'n'paste without thinking much.
  
cb> is there a special reason for putting the argument list onto a new line?

I thought I was just following the 'local' convention in the directory, but
I may have been mistaken because a second look a moment ago didn't show me 
that pattern.

cb> +  AString getHeaderParameter2([const] in AUTF8String aHeaderVal,
cb> why the 2?

  Because there's another one with almost identical name that's only
used internally and by mailnews as a wrapper. Anyway, I'm not
so fond of various names used in my patch so that I'd listen
to you for better names. 

bz> 4)  What is the encoding of the return value of ensureUTF8Spec?

  It's UTF-8. I'll make it clear.

bz> 5)  Could the nsEnsureUTF8 code use that proposed function to check for
bz>     UTF-8ness? Converting back and forth to UTF-16 seems silly....

cb> do we _still_ not have a better way to do this?

  I fully agree. That's bug 191541 as you know :-) It's not there yet. 
I love to use that. The reason I didn't rush to add that is
I could find only a few plaes in the entire tree that checks
UTF-8ness. With that implemented, probably more people will
begin to use it.  

bz> 6)  nsUConvModule.cpp seems to have some leakage from a different patch

  I also noticed it and removed that in my tree.
 

 al>  do we really need base64 decoding?

   Yes, we do for RFC 2047 decoding. Pls, note that the code is shared
by mailnews and necko. Even necko alone needs it because quite  a lot of
broken http servers send out C-D filename param in RFC 2047 style
instead of RFC 2231 style. 

cb> sure. NSPR has PL_Base64Decode.

  Thanks.  With this information,
I can remove the static array you hate along with |intl_mime_decode_b| :-)

al> I'm really not a fan of all this code just copied and pasted in from mail..
al> it had previously been copied in from Netscape 4.x, and I'd really like 
al> to see it streamlined and cleaned up.. 

  I agree with you. I'm not a fan of stuff I copied
and pasted with a minimal change for impedance matching and 
filling up a couple of holes. I tried not to touch them
in this bug because I didn't want to introduce a regression
in mailnews. 

al> there seems to be a lot of legacy stuff in there,
al> but I'm not sure.

  Perhaps the coding style and things like that are from an
old C school. Feature-wise, I think most of them are still 
necessary. However, it could well be safe
to drop 'HZ' check in |intl_is_utf8| (better, we can just use
IsUTF8 I wrote about above).  

> I want you (jshin) to
> do a thorough code scrubbing to make sure you're only bringing in the
> functionality you need, and that the code is factored properly

  I'll. I believe I haven't brought in anything unnecessary, but I'll
take another look. As for factoring, I'll remove
an static array for Q decoding(256byte saving) in my local copy. This code
doesn't belong where speed is critical so that I can replace array look-up
with a couple of bound checks. 

Comment 31 Boris Zbarsky [:bz] (still a bit busy) 2003-03-13 21:01:36 PST
>  It's UTF-8. I'll make it clear.

Make it return a AUTF8String.  That should make it plenty clear... ;)
Comment 32 Jungshik Shin 2003-03-13 22:48:37 PST
>>  It's UTF-8. I'll make it clear.

>Make it return a AUTF8String.

  Ooops. It's not UTF-8 but it's US-ASCII. Because input urlSpec
is url-unescaped, converted to UTF-8 if necessary, and url-escaped
back before being returned. So I have to keep ACString. 

> nsIEnsureUTF8.idl should not contain C++ goop

  By this, I guess you meant removing CONTRACTID and
CID definitions for class implementing interface
and putting them in the header file
for a collection of CONTRACTIDs and CIDs for  the 
module the implementation belongs to (nsUConvModule in this case). 
Do you want me to go ahead and make this header file and
move all #defines's for CONTRACTIDs and CIDs  in intl/uconv/idl/*.idl
there? Hmm.. I'd rather not for this bug. Because then I have
to hunt for every single file refering  to nsIxxxxx corresponding to
intl/uconv/idl/*idl and get them to include nsUConvCID.h(yet to
be created) as well. 
 
In case of nsIMIMEHeaderParam.idl, I can just remove
C++ defines and put them in preexisting nsNetCID.h.

BTW, why do so many idl files have these defines? I'm aware
that no matter how many files may have this glitch, it doesn't
make the practice right, but am just curious. 

Or, did you mean a totally different thing?
 
  
exist  
Comment 33 Boris Zbarsky [:bz] (still a bit busy) 2003-03-13 23:04:18 PST
Yeah, I meant the CID/Contractid.  There's no particular need for a #define for
the contractid anyway; people should just use the string, imo.  But yeah, you
could create such a file for uconv; no need to do major cleanup, just start with
putting these in there.  ;)

As for why they're in IDL, someone who didn't have a clear understanding of the
difference between an interface and a contract decided it'd save time to do it
and the rest was history.  
Comment 34 Jungshik Shin 2003-03-13 23:37:15 PST
Thank you. I'll do it. BTW, I've just found what I'd been
looking for for a while (methods like ReplaceChar,
StripWhiteSpace..) that would simplify things a bit in my patch.
Why haven't they been  advertised? Is it because new implementations
for them haven't yet been delivered?

Another question.. Is the regular expression engine used by Javascript
interpreter available in a form or another to other parts of mozilla? 
Comment 35 Boris Zbarsky [:bz] (still a bit busy) 2003-03-13 23:44:11 PST
Those functions tend to have somewhat slow implementations, so they're
discouraged except when they're clearly the best option... But if you're in code
where perf is not critical, go for it.

As for regexps... bug 106590
Comment 36 Christian :Biesinger (don't email me, ping me on IRC) 2003-03-14 07:18:39 PST
>  Because there's another one with almost identical name that's only
>used internally and by mailnews as a wrapper.

To me, that sounds like a bad reason...
If it's only used internally, I don't think it should be advertised in an idl
file... and can mailnews directly use the other function?
Comment 37 Jungshik Shin 2003-03-14 14:53:26 PST
> I don't think it should be advertised in an idl
> file... and can mailnews directly use the other function?

  No. That's what I tried to do in my patch last year. Actually, I think
it's possible with some 'liking gymanstics', but Alec replied in comment #23
to my query about that possibility (comment #22 and comment #16 C)
that using an interface is the only (reasonable) way to make methods available
across modules although it sounds like an overkill.  

Anyway, I'll try to come up with better names for both. 
Comment 38 Jungshik Shin 2003-03-19 23:22:56 PST
Created attachment 117822 [details] [diff] [review]
a new patch 

Many of issues raised about the prev. patch were addressed. 
Among them are

  - interface documentation
  - interface definitions tightened up/cleaned up
  - got rid of b-decoder and a related static array
    (also gone are  a static array for q-decoder
     and other dispensable static functions using
     utility functions I found in string library)
  - C++ 'inflitration' in idl files
  - functions/methods were renamed to make them
    more in line with the local convention 
  - empty ctor/dtors inlined
  - now uses |IsUTF8| (patch for bug 193142 is required)
   .....

 I didn't touch most of C-style string manipulation in code
brought in from mailnews (they've been changed and a couple
of loose ends were tightened). Alec, do you want me to
recast them in iterator and things like that? It might be nice,
but do we need to? You know, RFC 2047/2231 handling is
kinda 'black magic' (well, to exaggerate a bit) and I'm
reluctant to touch them in _this_ bug.	 

Anyway, please take a look and hunt for glitches, holes and ... :-)
Comment 39 Jungshik Shin 2003-03-19 23:24:52 PST
Ooops. It's bug 191541 that deals with |IsUTF8|
Comment 40 Christian :Biesinger (don't email me, ping me on IRC) 2003-03-20 10:14:41 PST
Comment on attachment 117822 [details] [diff] [review]
a new patch 

+#if 0

please don't add #if 0 without a comment saying why the block is not just
removed...

+  // assume UTF-8 if the spec contains unescaped non ASCII
+  if (!IsASCII(aSpec)) {

should you use IsUTF8 here?

In +nsMIMEHeaderParamImpl::GetParameter(const nsACString& aHeaderVal, :
+    char *med;
...
+    rv = GetParameterInt(PromiseFlatCString(aHeaderVal).get(), aParamName, 
+			  &charset, aLang, &med); 
...
+    rv = DecodeParameter(nsCAutoString(med), charset, "",  PR_FALSE, str1);

Can you use nsXPIDLCString instead of char* for |med|? (also, even if not, you
should use nsDependentCString instead of nsCAutoString in the DecodeParameter
line)

this would probably also be a good idea for charset

+nsMIMEHeaderParamImpl::GetParameterInt(const char *aHeaderValue, 

is there a reason why you tend to declare variables some lines before you use
them?

+    NS_ASSERTION(!nsCRT::IsAsciiSpace(*str), "1.1 <rhp@netscape.com> 19 Mar
1999 12:00"); /* should be after whitespace already */

er, that assertion message looks very unhelpful. it seems the comment should be
the assertion message.
Comment 41 timeless 2003-03-20 12:11:31 PST
heh, i recognize that mailnews. the NS_ASSERTION ... rhp is because the code
used to PR_Assert and i got sick of my browser being unceremoneously killed by
silly asserts that i didn't understand. someone who understands the code is
welcome to replace the messages with a useful explanation, the messages are
there as they are because the change to NS_ASSERTION would make it relatively
hard for lazy people to find out who to blame and i didn't want people
complaining to me about them :).
Comment 42 Jungshik Shin 2003-03-21 03:18:01 PST
Thank you for reviewing.

+ // assume UTF-8 if the spec contains unescaped non ASCII
+  if (!IsASCII(aSpec)) {

> should you use IsUTF8 here?

No. The question to ask is whether or not to keep that if-block.

By removing that line, we can guard ourselves against mistakes of
calling EnsureUTF8Spec with unescaped (bare/raw) non-UTF-8 spec.
As it is now, if a bare/raw Non-Ascii spec is passed, it just returns 
the input without any change. I moved  that part of the code  from
what was originially written by Naoki. 
Naoki, what would you say? The more I think about it, the more I feel 
like either removing it (so that the input gets subjected to 
checking/ensuring by the rest of the code) or at least calling 
NS_EscapeURL so that it returns escaped URL. 


As for #if 0, it was just left there because I wasn't sure whether this
bug would go before/after bug 191541. Needless to say, I'll either
remove #if 0 block or activate it(and remove the |IsUTF8| depending on 
which patch gets committed first. Hopefully, bug 191541 goes before this one.

In a patch I'm going to upload in a moment,
I cleaned up the code a bit more : made NS_ASSERTION more informative, 
used XPIDLString(along with replacing PR_Malloc with nsMemory::Alloc/Clone
and removing PR_FREEIF a few places) , added more comments to RFC 2231 decoding
routine,
moved some variable definitions closer to where they're used, changed
some variable names to better reflect their usage, etc.

Comment 43 Jungshik Shin 2003-03-21 03:19:14 PST
Created attachment 117999 [details] [diff] [review]
yet another patch
Comment 44 Jungshik Shin 2003-03-21 21:30:38 PST
+#define REPLACEMENT_CHAR "\347\277\275" // UTF-8 encoding of U+FFFD

This is a mistaken I copied from mailnews without a thought. The first
byte should be '\357' (or \xEF).

Any thought, anyone? Fixing this would be a good news to non-English speakers.
 
Comment 45 Jungshik Shin 2003-03-25 09:00:39 PST
Created attachment 118424 [details] [diff] [review]
same as above (with non-essential part removed to help reviewers)

Same patch, but I just removed license blocks and other non-essential parts
hoping that that may help reviewers. Well, it's still very long and
this may be harder to review than the original.  Anyway, Alec, can you take a
look once more and tell me which part you want me to 'scrub' more?
Comment 46 Jungshik Shin 2003-04-17 06:15:39 PDT
Created attachment 120825 [details] [diff] [review]
a cleaned-up patch

I haven't made a transition to iterator, but everybody seems to agree that
we can deal with it in another bug after getting the required 
feature right here.

Below is what I sent to Alec (off-line) to help him better understand my patch.

I'm adding it here to help others as well. BTW, xpfe dependency is to 
fix bug 158006 (and perhaps one or two more in xpfe)

 The dependency relation is shown below. I may have missed a couple,
but should be roughly right. (at least, I am not making
up non-existent dependency :-))
									       
  
Mozilla 'modules':
uriload : getParameter
xpfe : getParameter
mailnews: getParameterInt, decodeRFC2047Header, decodeParameter
									       
  
exposed methods:
getParameter : getParameterInt, decodeParameter
getParameterInt:  [1]
decodeRFC2047Header: Is7bitCharset, DecodeRFC2047Str
decodeParameter : decodeRFC2047Header,
									       
  
static functions:
DecodeRFC2047Str: DecodeQ, CopyRawHeader, ConvertHTab
CopyRawHeader:
ConvertHTab:
Is7bitCharset:
DecodeQ:
Comment 47 Jungshik Shin 2003-04-25 04:39:44 PDT
Created attachment 121646 [details] [diff] [review]
a new patch

basically the same patch against a new tree. 
corrected my mistake in converting Unicode replacement char. in
UTF-8 to octal representation.

There are a few superreviewers here on CC so that I don't know whom to ask for
r/sr
Moreover, this patch is across modules.  alecf, bz, darin, please take this :-)

It's been sitting here long enough, hasn't it? 
nhotta, can you see if my change is all right with nsSMTPService?
 
ducarroz, I carefully preserved mailnews behavior and played with
Mozilla-mail with this patch to make sure that I'm not breaking anything
(spam mail in various encodings/languages with various degree of
standard-compliance helped me a lot with this testing!!). Could you
take a look as well?  

Thank you tons in advance.

P.S.  my patch for bug 150008(blocked by this bug) just got sr and I'm hoping
to land this together with that patch.
Comment 48 Jungshik Shin 2003-04-25 04:44:40 PDT
Sorry for spamming. The bug blocked by this one for which I have a patch with sr is 
bug 158006. 
Comment 49 Boris Zbarsky [:bz] (still a bit busy) 2003-04-25 18:01:05 PDT
jshin, I'm not going to be able to review this in the near future... (like in
the next week at least).
Comment 50 Jungshik Shin 2003-05-01 02:18:44 PDT
In comment #47, I explained how various methods/static functions are used in
necko/xpfe.
Here's the picture on the mailnews side (see  nsIMIMEHeaderParam.idl in my patch.).

getParameterInt() : does the job of |MimeHeaders_get_parameter| in
mailnews/mime/src/mimehdrs.cpp.
Now |MimeHeaders_get_parameter| is a wrapper for
nsMIMEHeaderParam::getParameterInt. Therefore,
mailnews code is not affected. 

decodeRFC2047Header() : |Mime_DecodeMimeHeader| in mailnews/mime/src/mimehdrs.cpp
                        is now a wrapper for this.
decodeParameter()     :  |mime_decode_filename| in mailnews/mime/comi18n.cpp is now
                         a wrapper for this.

As I wrote in my earlier comments, static functions moved from mailnews are NOT
called
by other parts of mailnews. 
Comment 51 Jungshik Shin 2003-05-11 20:47:40 PDT
*** Bug 205232 has been marked as a duplicate of this bug. ***
Comment 52 Jungshik Shin 2003-05-28 22:15:15 PDT
Comment on attachment 121646 [details] [diff] [review]
a new patch

Now that 1.4f branch was cut, let's try to make this bug not celebrate its
anniversary :-)
Comment 53 Christian :Biesinger (don't email me, ping me on IRC) 2003-05-30 07:18:38 PDT
Comment on attachment 121646 [details] [diff] [review]
a new patch

wow, this is a large patch... and I'm no module owner of any of this code...
but ok, I'll take a look at this patch.

In nsIEnsureUTF8.idl:
typo: "covnersion"

"|aSpec|(after" - missing space before the (

is there a reason this interface is not scriptable?

"The presence of non-ASCII characters is *blindly*
+   * regarded as an indication that your input spec is in unescaped UTF-8
"

don't we have an IsUTF8 funtion that you should use instead?

+   * @return empty if |aSpec| is in UTF-8.
+   *	      the URL-escaped converted string in ASCII if not.

not sure I like that. Why not always return a valid, UTF-8, escaped spec?

nsEnsureUTF8.cpp:
+  nsCAutoString inStr(aString);

um no. use this:
const nsAFlatCString& inStr = PromiseFlatCString(aString);

+  PRUnichar *ustr = (PRUnichar *) nsMemory::Alloc(dstLen * sizeof(PRUnichar));

might want to consider using an nsAutoPtr here

+NS_IMETHODIMP	nsEnsureUTF8::ConvertToUTF8
+(const nsACString &aString, const char *aCharset, PRBool aCheck,
+ nsACString &aUTF8String)

hm, earlier you used a different way (return type on own line, parameters on
same line as function name), please be consistent

+  if (!IsASCII(aSpec)) {
+    aUTF8Spec = aSpec;

why not use IsUTF8 here?

+  return rv;
this will always be NS_OK at this point, I would therefore directly write
return NS_OK


Is there a special reason for mixing string/ACString types in the interfaces?
Why not stick to the string classes?

more later... note to self: netwerk/mime/src/nsMIMEHeaderParamImpl.cpp
Comment 54 Christian :Biesinger (don't email me, ping me on IRC) 2003-05-30 12:33:37 PDT
another question - why not remove the #if 0 code in mailnews/mime/src/comi18n.cpp?
Comment 55 Jungshik Shin 2003-05-30 17:18:50 PDT
Thanks a lot  for taking a look at this big patch (well, actually, it's not so
big as it looks because a lot of codes are moved to necko out of mailnews)

> why not remove the #if 0 code in mailnews/mime/src/comi18n.cpp?

I guess that's just kept by mistake while removing all other cases like that.
I'll get rid of it.

> "The presence of non-ASCII characters is *blindly*
> +   * regarded as an indication that your input spec is in unescaped UTF-8"

> don't we have an IsUTF8 funtion that you should use instead?

See comment #42. I'm still waiting for Naoki's answer. Naoki, what would say to
my question in comment #42? 

>   * @return empty if |aSpec| is in UTF-8.
>   *	      the URL-escaped converted string in ASCII if not.

> not sure I like that. Why not always return a valid, UTF-8, escaped spec?

I considered that, but I just left the way its original (by Naoki) does because
I wanted to minimize the change in other parts. Thinking again about it, it
doesn't seem to be a very good justification. There might have been other
reasons. I'll get back to this later. 


> Is there a special reason for mixing string/ACString types in the interfaces?
> Why not stick to the string classes?

  I'd love to, but I was caught in the middle between my desire to be consistent
and my desire to simplify 'calling sequences' (and avoid unnecessary conversion
between string and string class).  I think some of them can be made more
consistent later when I do 'iterator-love', which I plan to in a separate bug.

As for other comments, I'll take care of them in next iteration once you're
through reviewing the current patch.
Comment 56 Christian :Biesinger (don't email me, ping me on IRC) 2003-05-31 10:32:35 PDT
Comment on attachment 121646 [details] [diff] [review]
a new patch

nsMIMEHeaderParamImpl.cpp:
+    // retrun charset.)
typo

+				     const PRBool aTryLocaleCharset, 

why const?

+   * @param aDefaultCharset  MIME charset to use in place of MIME charset
+   *			      specified in RFC 2047 style encoding
+   *			      when <code>aOverrideCharset</code> is set.
+   * @param aOverrideCharset When set, overrides MIME charset specified 
+   *			      in RFC 2047 style encoding with
<code>aDefaultCharset</code>

Couldn't you assume that the presence of aDefaultCharset implies
aOverrideCharset? Or did I misunderstand the meaning of aOverrideCharset?

+	 if (NS_SUCCEEDED(rv) && ensureUTF8 &&

no need to test for both here.

+  if (aEatContinuations) {
+    nsCAutoString temp(aResult);
+    temp.StripChars("\r\n");
+    aResult = temp;
+  }

this is not exactly the fastest way to do this...

|rv| seems to be unused in DecodeRFC2047Header.

+  nsCAutoString unQuoted(aParamValue);
+  unQuoted.ReplaceSubstring("\\\\", "\\");
+  unQuoted.StripChar('\\');
+  aResult = unQuoted;

also quite slow I guess... and, why bother replacing \\ with \ if you're
stripping all \ characters anyway?

+  if (NS_SUCCEEDED(rv) && !decoded.IsEmpty() && !decoded.Equals(aResult))
+    aResult = decoded;

I would think that comparing aResult and decoded is as costly as copying
decoded into aResult, so I wouldn't bother with the comparison.

+  out = dest = (char *)PR_Malloc(length + 1);
[...]
+  memset(out, 0, length + 1);

Use PR_Calloc:
http://www.mozilla.org/projects/nspr/reference/html/prmem2.html#21449

+PRBool Is7bitCharset(const char *input, PRUint32 len)

how does this compare to IsASCII? Also, what is HZ?

also, how about adding comments to the static functions about what they're
doing?

+  // Assume no more than 3X expansion due to UTF-8 conversion

is that a correct assumption?

uriloader/base/nsURILoader.cpp:
+	 rv = mimehdrpar->GetParameter(disposition, "",
+			NS_LITERAL_CSTRING(""), PR_FALSE, nsnull, dispToken);

don't use tabs

Also, can't you remove more of nsURILoader? line 303ff seem to do the same job
as nsIMIMEHeaderParam::GetParameter.

nsExternalHelperAppService.cpp:
+    aChannel->GetURI(getter_AddRefs(mSourceUrl));

uh. you are touching mSourceURL here, the previous code is not. Is that safe?

In comi18n.cpp / MIME_DecodeMimeHeader:
Why bother with returnVal instead of directly returning nsnull?

+  if (NS_FAILED(rv) || !mimehdrpar) 

no need to check both...

+  // I know this looks silly, but we have to do this to avoid 
+  // replacing |PR_FREEIF(s)| at over a dozen places [...]

Uh. If I were to decide, I would replace the PR_FREEIF. in other words: 
if you want to check this in with a review from me, replace the PR_FREEIF.

+    if (NS_FAILED(rv) || !mimehdrpar) 
again only check one... I won't mention this again, please fix all occurences.

+    rv = mimehdrpar->DecodeParameter(nsCAutoString(name), charset, 
+				     opt->default_charset, 
+				      opt->override_charset, result);

indentation looks wrong here. Also, please use nsDependentCString instead of
nsCAutoString. Finally, no need for returnVal in this method, as that is always
nsnull.


done with the patch. marking review- due to the above issues.
Comment 57 Jungshik Shin 2003-06-03 06:52:38 PDT
Created attachment 124819 [details] [diff] [review]
a new patch addressing Chrsitian's concerns

Thanks a lot for your thorough review. This is	a new patch addressing
your concerns. In what follows, I'm gonna reply to your comments only
if  further explanation is necessary.

Two  changes in nsIEnsureUTF8:	Now ConvertToUTF8 has 'aSkipCheck'
(instead of 'aCheck') indicating whether ASCIIness/UTF8ness check has to
be skipped. Along with this, non-ASCII 7bit charsets are better handled
now. EnsureUTF8Spec now returns AUTF8String (unescaped UTF8 spec.)

Other changes were made to address Christian's concerns. Still other
changes are noted below.

>   * @return empty if |aSpec| is in UTF-8.
>   *	      the URL-escaped converted string in ASCII if not.
> not sure I like that. Why not always return a valid, UTF-8, escaped spec?

 It turned out that I forgot to update the documentation after changing
 the behavior of the method. Actually, your comment made me look
 more closely at what it does and made a rather 'drastic' change in
 |EnsureUTF8Spec|. Now its return type is AUTF8String  and it always
 returns  _un_escaped UTF-8 spec. Checking out the way |spec| is used
 and defined in nsIURI, I convinced myself that there's no need to
 url-escape before returning.

> "The presence of non-ASCII characters is *blindly*
> +   * regarded as an indication that your input spec is in unescaped UTF-8"

> don't we have an IsUTF8 funtion that you should use instead?

  At least for now, I think I'd rather leave this alone. The assumption
  behind that is that in Mozilla |spec| (of nsIURI) is always either  in
  UTF-8 (unescaped) or in other charset (*url-escaped*).  By checking
  out ASCIIness at the beginning, we're saving	some cycles it'd take
  us to check UTF8ness and possibly redundant un-escaping/conversion.


+   * @param aDefaultCharset  MIME charset to use in place of MIME charset
+   *		      specified in RFC 2047 style encoding
+   *		      when <code>aOverrideCharset</code> is set.
+   * @param aOverrideCharset When set, overrides MIME charset specified 
+   *		      in RFC 2047 style encoding with aDefaultCharset

> Couldn't you assume that the presence of aDefaultCharset implies
> aOverrideCharset? Or did I misunderstand the meaning of aOverrideCharset?

  No, I can't. Sorry the documentation was not clear
  enough. aDefaultCharset is used in two cases: 1. charset is not
  available  because raw/bare string without charset specified per rfc
  2231/2047 is passed over. (our best guess is aDefaultCharset) 2. when
  aOverrideCharset is set even with  charset available. (charset is
  overriden by aDefaultCharset.)

+    nsCAutoString temp(aResult);
+    temp.StripChars("\r\n");
+    aResult = temp;

> this is not exactly the fastest way to do this...

I'm aware of that :-) (see comment #35)

+  nsCAutoString unQuoted(aParamValue);
+  unQuoted.ReplaceSubstring("\\\\", "\\");
+  unQuoted.StripChar('\\');
+  aResult = unQuoted;

> also quite slow I guess... and, why bother replacing \\ with \ if you're
> stripping all \ characters anyway?

  The logic was flawed and I replaced the above with iterator code.

+PRBool Is7bitCharset(const char *input, PRUint32 len)

> how does this compare to IsASCII? Also, what is HZ?

 There are 7bit charsets that are NOT US-ASCII. Examples include
ISO-2022-JP(-2), ISO-2022-CN, HZ-GB2312 and ISO-2022-KR. ISO-2022-JP
is still widely used for Japanese emails/news postings and HZ-GB* was
popular in early 1990's for Chinese emails and news postings but still
is used occasionally. We should be careful not to treat ISO-2022-JP*
and HZ* as US-ASCII when dealing with mail/news. Your question insipired
me to do some more clean-up, better documentation and some changes in
nsEnsureUTF8::ConvertToUTF8.


+  // Assume no more than 3X expansion due to UTF-8 conversion

> is that a correct assumption?

  Yes, 3x is sufficient.  A couple of weeks ago, I estimated this
mulitplication factor again for another program.

1.  no known ISO-8859-x/Windows-x/KOI8*/other single byte charsets can
encode non-BMP characters.  

2. all BMP characters can be expressed in three or fewer bytes in UTF-8 

3. ISO-2022-JP can encode half-width kana with a single byte (excluding
escape sequence) and a half-width kana	takes 3bytes in UTF-8.	The multiplier 

is at most three.

4. GB18030 and EUC-JP(JISX 0213) can encode non-BMP characters (4bytes
in UTF-8), but their representations in GB18030 and EUC-JP(JIS X 0213)
take at least two bytes (actually three in both cases) so that the
multiplication factor is certainly smaller than 3.

5. In addition,  we have extra room here because the input includes RFC 2047
overhead ('=?charset_name?......?=')

> Also, can't you remove more of nsURILoader? line 303ff seem to do the same
job
> as nsIMIMEHeaderParam::GetParameter.

   Thanks for the catch. I removed the redundant code.

nsExternalHelperAppService.cpp:
+    aChannel->GetURI(getter_AddRefs(mSourceUrl));

> uh. you are touching mSourceURL here, the previous code is not. Is that safe?


  I guess it's safe, but it's redudant. I changed it to use GetSource()
  method that retrieves mSourceUrl.

+  // I know this looks silly, but we have to do this to avoid 
+  // replacing |PR_FREEIF(s)| at over a dozen places [...]

> Uh. If I were to decide, I would replace the PR_FREEIF. in other words: 
> if you want to check this in with a review from me, replace the PR_FREEIF.

  I changed two out parameters (charset and language)
to be nsMemory::Free'd, which puts this function on par with other helper
functions that are wrappers over methods of nsIMIMEHeaderParam. Changing
return value as well is too risky because it's not always freed
immediately (sometimes it's hard to find where it's freed.).  I'd end
up modifying more than 60 lines in over a dozen additional files in
mailnews/mime/src.
Comment 58 Jungshik Shin 2003-06-03 07:01:27 PDT
Created attachment 124823 [details] [diff] [review]
diff. bet. the second latest and the latest patch

I don't know how much this diff. will help because it doesn't look like it's
easy to read..
Comment 59 Jungshik Shin 2003-06-03 09:33:58 PDT
Comment on attachment 124819 [details] [diff] [review]
a new patch addressing Chrsitian's concerns

Christian, can you take a look again? Hopefully, it won't take as long this
time.
Alec, will you take a look sometime later (perhaps after bug 206379 :-))?
Comment 60 Alec Flett 2003-06-04 15:58:28 PDT
Comment on attachment 124819 [details] [diff] [review]
a new patch addressing Chrsitian's concerns

now that EnsureUTF8Spec() is unconditionally outputting to the destination
string (Which I think is the right thing, btw) I think it needs a new name. How
about convertToUTF8WithCharset(...);?

as for getParameterInt() that sounds like its getting an integer version. Don't
shorten "Int" to "Internal" - just make it getParameterInternal()

But can't we just describe it better? What's the difference between the two?

Also nsIEnsureUTF8 is a bad name for a class - "ensure" is a verb, classes
should be nouns, like "nsIUTF8Converter" or "nsIUTF8ConverterService" or
something.

Also, we now have a CopyUCS2toUTF8() that you should use - it goes through the
old code which does what you've written, but at least this way we can correctly
implement CopyUCS2toUTF8 later and get a perf. improvement

+      nsCAutoString tempStr(valueStart, valueEnd - valueStart);
+      tempStr.StripChars("\r\n");
+      *aResult = (char*)nsMemory::Clone(tempStr.get(), tempStr.Length() + 1);

use ToNewCString() here.

+	   *aLang = (char *) nsMemory::Clone(sQuote1 + 1, sQuote2 - (sQuote1 +
1) + 1);

isn't there an nsCRT::strndup() or something? (if not, don't worry about it)


+	   else if (ns != *aResult)
+	     *aResult = ns;

don't bother checking, just assign.


+  // Assume no more than 3X expansion due to UTF-8 conversion
+  retbuff = (char *)PR_Malloc(3 * strlen(header) + 1);

it would be really nice to make this a nsCAutoString, or to just assign into
"aResult" - there is serious possibility here for buffer overflow, I think...
in fact if there is ANY way that the resulting string could be longer than 3x
the incoming string, we MUST switch to nsCAutoString.


the rest of the code looks ok, as best as I can tell. I'm going to sr- this for
now, but I think the cleanups should be pretty straight forward.
Comment 61 Jungshik Shin 2003-06-05 05:13:49 PDT
Created attachment 124986 [details] [diff] [review]
a new patch per Alec's comment

alecf, thank you for the review. This patch addresses all your concerns. 

The name changes:

EnsureUTF8 -> UTF8ConverterService [1]
EnsureUTF8Spec -> ConvertURISpecToUTF8
ConvertToUTF8 -> ConvertStringToUTF8

getParameterInt -> GetParameterInternal

+	   *aLang = (char *) nsMemory::Clone(sQuote1 + 1, sQuote2 -..
									       

> isn't there an nsCRT::strndup() or something? (if not, don't worry > about
it)

 There is, but I can't use it because nsCRT::strndup(char *) returns malloced
buffer (nsCRT::strndup(PRUnichar *) returns nsMemory::Alloc'd buffer) and I
need nsMemory::Alloc'd buffer.
 
BTW, GetParameterInternal does only RFC 2231 parsing while GetParameter does a
lot more(RFC 2047 fallback, fallback charset, raw 8bit string, etc). I'd not
have made the former a method if it had not been used extensively in mailnews.
I added a comment to this effect.  

+  // Assume no more than 3X expansion due to UTF-8 conversion
+  retbuff = (char *)PR_Malloc(3 * strlen(header) + 1);
									       

> it would be really nice to make this a nsCAutoString, or to just 
> assign into
> "aResult" - 

 Although there's no possibility of the buffer overflow, this is a nice thing
to do anyway, which helped me cut down the code size more.
  
> there is serious possibility here for buffer overflow, 
> I think...
> in fact if there is ANY way that the resulting string could be 
> longer than 3x
> the incoming string, we MUST switch to nsCAutoString.

  I can assure :-) you that (3 * inlen + 1) is the worst possible case.   Note
that input string cannot be expanded in an arbitrary manner but can only be
expanded under our control. The worst case is when somebody gives us a string
made soley of  octets illegal in both UTF-8 and defaultCharset. Another case is
when SCSU/BCSU is used and the sliding window of SCSU/BCSU is set into a
non-BMP block. However, currently Mozilla does not support SCSU/BCSU(see
intl/uconv. it may never.) so that even if Mozilla's given such a string, it
won't be interpreted that way (it'll be just treated as ASCII.) 


[1] As for nsIUTF8ConverterService, I'm not sure if that's the best choice. At
the moment, the name fits what it does. However, I found that there are a few
static functions through out the tree that may benefit from a potential
expansion of this interface. They're  ConvertToUnicode (in nsAppShell),
ConvertStringToUnicode (nsProfile) and ConvertStringToUTF8. The last one needs
only 'impedance' matching . If the former two's are also folded under this
interface, 'UTF8' in the name has to be changed. Unless it's an absolute evil
to lose the file history by renaming, I'd rather just go with
nsIUTF8ConverterService for this patch	and will change the file(interface)
name later if we do add ConvertToUnicode.
Comment 62 Jungshik Shin 2003-06-05 05:19:52 PDT
Comment on attachment 124986 [details] [diff] [review]
a new patch per Alec's comment

Asking r/sr.

Other changes: 

1. static ConvertHTab is now gone. It was used in two places. One of them was
replaced with ReplaceChar (it's slow, I know) and the other is now a simple
loop over a char. pointer.

2. Brand-new CopyUTF8toUTF16 is used in place of NS_ConvertUTF8toUCS2 in three
places. I also replaced NS_ConvertUCS2toUTF8 with CopyUTF16toUTF8 as suggested
by alecf.
Comment 63 Alec Flett 2003-06-05 09:34:07 PDT
That all sounds great. If we're talking about general conversions (like
replacing the ConvertToUnicode()) another option might be to fix the
nsICharsetConverterManager methods to use nsAString/nsACString - the service was
written way before these strings were available to us.. that is why their API is
so obtuse. 

I'll finish this review today...
Comment 64 Christian :Biesinger (don't email me, ping me on IRC) 2003-06-05 10:03:02 PDT
hm, if UTF8ConverterService is a bad name because you might add non-utf8 unicode
methods, why not name it UnicodeConverterService?

I'll also try to finish the review later today.
Comment 65 Alec Flett 2003-06-05 11:01:53 PDT
because at that point it sure sounds a lot like nsICharsetConverterService
Comment 66 Jungshik Shin 2003-06-05 11:29:46 PDT
I shouldn't have opened a 'can of worms' :-). For this bug, let's just go with
nsUTF8ConverterService because that name represents what it does now well
although we have to risk losing the file change history later when we decide to
expand this interface and change the file/interface name. See also bug 54857.
Something similar to Alec's alternative was briefly mentioned there. 

TIA for  reviews ^-^
Comment 67 Christian :Biesinger (don't email me, ping me on IRC) 2003-06-05 14:07:48 PDT
Comment on attachment 124986 [details] [diff] [review]
a new patch per Alec's comment

just noticed some nits in nsIUTF8ConverterService.idl:
+ * Portions created by the Initial Developer are Copyright (C) 2002
+ * the Initial Developer. All Rights Reserved.

we have 2003 now :)

+   */
+
+    AUTF8String convertStringToUTF8(in ACString aString, 

The AUTF8String line is a bit too far indented, should be two spaces less (also
the arguments for this function are indentet too much)
same for the other function on this interface


ok now to more serious stuff:
+  nsCOMPtr<nsIAtom> charsetAtom;
+  rv = ccm->GetCharsetAtom2(aCharset, getter_AddRefs(charsetAtom));

isn't alecf rewriting that stuff in another bug?
would it make sense to wait with landing this patch until that other patch is
ready, and use the new code?

+#define REPLACEMENT_CHAR "\357\277\275" // EF BF BD (UTF-8 encoding of U+FFFD)

would it be useful to mention what character U+FFFD is ("unknown character")?

mailnews/mime/src/mimehdrs.cpp
+  return NS_SUCCEEDED(rv) ? PL_strdup(result.get()) : nsnull; 

in comi18n.cpp, you used an explicit if instead of ?:... is there a special
reason for that? it would be good if you could be consistent, unless of course
you did that to be consistent with the current style of the file.

other than that, you can have r=biesi; assuming that my review is acceptable to
alecf, given that I'm no module owner.
Comment 68 Jungshik Shin 2003-06-05 14:35:55 PDT
Thanks a lot for the review.

> isn't alecf rewriting that stuff in another bug?
> would it make sense to wait with landing this patch until that other 
> patch is ready, and use the new code?

 Absolutely. That's my plan. I find it a bit hard to shuttle between two
versions(one with his patch and the other without) so that I just uploaded one
with the old interface. The effect of alecf's patch on this patch would be very
small (a few changes in Makefile.in, include files and just one change in
charset. cvt. manager invocation you spotted. 

I'll fix the indentation and "if vs '? :". As for U+FFFD, its name (Unicode
name) is 'REPLACEMENT CHARACTER' :-) See
http://www.unicode.org/charts/PDF/UFFF0.pdf  In Mozilla and elsewhere, U+FFFD is
used to mark unknown characters, invalid byte sequences, and so forth. 
Comment 69 Jungshik Shin 2003-06-07 17:22:19 PDT
While testing on Win2k, I found a problem I overlooked on Linux(I should have
paid attention to NS_ASSERT passing by in my terminal). 'NS_ASSERT' on Win32 is
so annoying that I can never miss it :-) After fixing this, it works as well on
Win2k as on Linux. 

In the following two places, I had to replace nsDependent(C)String with
Substring() as suggested in nsDependentString.h.

In nsUTF8ConverterService.cpp

> +  rv = unicodeDecoder->Convert(inStr.get(), &srcLen, ustr, &dstLen);
> +  if (NS_SUCCEEDED(rv))
> +    CopyUTF16toUTF8(nsDependentString(ustr, dstLen), aResult);
  +    CopyUTF16toUTF8(Substring(ustr, ustr + dstLen - 1), aResult);
  

In nsMIMEHeaderParamImpl.cpp

> +      cvtUTF8->ConvertStringToUTF8(nsDependentCString(aInput, aLen),
> +      aDefaultCharset, skipCheck, utf8Text))) {

+      cvtUTF8->ConvertStringToUTF8(Substring(aInput, aInput + aLen - 1),
+      aDefaultCharset, skipCheck, utf8Text))) {
Comment 70 Jungshik Shin 2003-06-07 22:49:22 PDT
> cvtUTF8->ConvertStringToUTF8(Substring(aInput, aInput + aLen - 1)

ooops. in both cases, '-1' should not be there. I didn't check out the def. of
Substring().

Another loose end found by running on Windows : In nsMIMEHeaderParamImpl.cpp : 
+      if (str == start)
+        return NS_ERROR_UNEXPECTED;
+      *aResult = (char *) nsMemory::Clone(start, (str - start) + 1);
+      NS_ENSURE_TRUE(*aResult, NS_ERROR_OUT_OF_MEMORY);

I forgot to null-terminate. 

      (*aResult)[str - start] = '\0';  // null-terminate

Certainly, it looks like another case for nsACString. I'll switch to it when
doing 'iterator-love'
Comment 71 Alec Flett 2003-06-10 13:47:38 PDT
Comment on attachment 124986 [details] [diff] [review]
a new patch per Alec's comment

+// XXX Do we have to check UTF-7? To be perfectly safe, we have to, but
+// who uses UTF-7 in the mail header? 

should probably check with bienvenu@netscape.com to see - IMAP uses UTF7 I
believe.

+  // B. title*=us-ascii'en-us'This%20is%20weired.

perhaps you meant "wierd"? :)

+// XXX : aTryLocaleCharset is not yet effective.

should you NS_ASSERTION() if someone tries to use it then?

other than that, this all looks good!

sr=alecf with the above changes/updates
Comment 72 Jungshik Shin 2003-06-11 02:54:22 PDT
Thank you for sr. 
I'll add NS_ASS..() and fix spelling errors. I'll land it with a necessary
change after you check in charsetAtom removal patch.

As for UTF-7, a modified form of UTF-7 is used for IMAP4rev? folder names, but
my code is a world apart from the code handling IMAP folder names (it's about
decoding the message headers : RFC 2047/2231). Maybe I'd better remove the
comment because 'perfect' there means not failing even once in 10^30(?) trials :-). 
Comment 73 Jungshik Shin 2003-06-11 04:10:03 PDT
alec>+// XXX Do we have to check UTF-7? To be perfectly safe, we have to, but
alec>+// who uses UTF-7 in the mail header? 

alec>should probably check with bienvenu@netscape.com to see

js> comment because 'perfect' there means not failing even once in 
js> 10^30(?) trials :-)

 Oops. Sorry I misread the 'context'. I thought the code in question is about
being generous in what you accept (that is, even though it's untagged
_violating_ the standard so that we have no oblication, we try our best to
figure out what's meant by a sender) but it turns out that the code is used when
we handle 'standard-compliant' cases (properly tagged). Still, the chance of
getting something like the following 

Subject:  =?UTF-7?Q?+rACsAQ?=    for &#44032;&#44033;   
Subject : =?UTF-7?Q?caf+AOk?=    for café  (set encoding to UTF-8) 

Or

Content-Disposition: attachment; filename*=utf-7''caf+AOk.txt

 is not much higher than 1 in 10^30 :-). However, if somebody points out that
it's a bug for Mozilla not to decode the above. I can't help admitting that she
is correct in principle although I'm tempted to counter that nobody in their
right mind would use it. How much are we willing to pay for this? Well, it's
another strcasecmp() and I'll add that to be correct for even very remotely
possible cases.

 bienvenu, nhotta and duccaroz, what would you say?
Comment 74 Boris Zbarsky [:bz] (still a bit busy) 2003-06-12 22:53:27 PDT
Hmm... So I just looked at the nsURILoader and nsExternalHelperAppService
changes (the only ones I care about, really) and it looks like both have the
same problem -- passing a length to EqualsIgnoreCase will return true if that
many chars match.... so "inlines" and "inline" are equal if you pass 6 as the
second arg to EqualsIgnoreCase, no?

One other question I have is why we pass the origin charset in when decoding the
filename in nsExternalHelperAppService but not when getting the disposition
type... is there a reason?

Other than that, the code refactoring looks awesome.
Comment 75 Jungshik Shin 2003-06-12 23:47:27 PDT
Fix was checked a while ago. 
 
Thanks for the catch.  
 
Instead of |dispToken.EqualsIgnoreCase("inline", 6)|, I should have used 
|dispToken.EqualsIgnoreCase("inline", dispToken.Length())|, shouldn't I? 
 
I'll make a quick patch. Somebody might say that Mozilla should be generous to 
accept 'inlines' as well, but I guess in this case, we'd rather play by the 
rules :-) 
 
As for not passing originCharset, we don't have to when fetching the 
disposition type which is always in US-ASCII (unless somebody makes an 
'over-i18n' proposal  and IETF accepts it ;-)). In case of filename parameter, 
there are a lot of broken CGI programs (perl, jsp, asp) in the wild that emit 
raw 8bit characters expecting clients to interpret them as encoded in 
originCharset. So, we have to 'bend' the rules and be a bit accomodating....    
Comment 76 Christian :Biesinger (don't email me, ping me on IRC) 2003-06-13 07:30:37 PDT
jshin: well if you're at that, why not just don't pass a length?
Comment 77 Jungshik Shin 2003-06-13 08:44:06 PDT
Created attachment 125584 [details] [diff] [review]
uriloader : 4-line fix

let me get quick r/sr
Comment 78 Boris Zbarsky [:bz] (still a bit busy) 2003-06-13 09:46:11 PDT
Comment on attachment 125584 [details] [diff] [review]
uriloader : 4-line fix

sr=me.	This is exactly what I wanted; passing the other length would be just
as bad when dispToken is too short...
Comment 79 Benjamin Smedberg [:bsmedberg] 2003-06-13 13:57:38 PDT
Did the checkin for this cause bug 209329 (mailto links don't work at all)?
Comment 80 Christian :Biesinger (don't email me, ping me on IRC) 2003-06-13 14:31:41 PDT
Comment on attachment 125584 [details] [diff] [review]
uriloader : 4-line fix

looks right
Comment 81 Jungshik Shin 2003-06-13 19:26:45 PDT
Fix checked in. Thank you all!  I'm done with this one. 
Comment 82 Scott MacGregor 2003-06-17 14:12:21 PDT
This fix broke mailto urls in mailnews. nsSmtpService.cpp ends up setting an
empty spec on new mailto urls. Did you test this before checking in your change
to the smtp service? =( 

=)

Comment 83 Scott MacGregor 2003-06-17 14:18:19 PDT
Created attachment 125849 [details] [diff] [review]
fix to make mailto urls work in mail again
Comment 84 David :Bienvenu 2003-06-17 14:33:34 PDT
Comment on attachment 125849 [details] [diff] [review]
fix to make mailto urls work in mail again

sr=bienvenu
Comment 85 Christian :Biesinger (don't email me, ping me on IRC) 2003-06-17 14:35:34 PDT
mscott: please see bug 209328 which covers mailto uris not working
Comment 86 Jungshik Shin 2003-06-17 18:11:41 PDT
This patch is not complete although it is simple and works on the surface. 
Please, see bug 209328.
Comment 87 Christian :Biesinger (don't email me, ping me on IRC) 2003-06-18 01:38:53 PDT
well, that patch was checked in anyway:
06/17/2003 22:34	scott%scott-macgregor.org 	mozilla/ mailnews/ compose/ src/
nsSmtpService.cpp 	1.125 	1/1  	fix the regression caused by Bug #162765 which
broke mailto urls.

without any review or super-review

Comment 88 Boris Zbarsky [:bz] (still a bit busy) 2003-06-18 02:01:13 PDT
It had sr, was written by the module peer, and fixed a bug that is still open 5
days after filing when it should have been a smoketest blocker, imo....
Comment 89 Jungshik Shin 2003-06-18 02:25:27 PDT
Sorry for the regession. BTW, please note that the patch was uploaded within 7
hrs of the report. Perhaps, I should have asked people on mailnews for r/sr. (my
patch uploaded to bug 209328 is  across modules unlike the one liner just
checked in). Anyway, I'll take care of the rest in bug 209328 as soon as  I get r. 
Comment 90 Scott MacGregor 2003-06-18 09:11:02 PDT
To answer Christian's comments, I checked this in right away because it fixed a
nasty regression in mail which I think should have been a blocker and caught
during the testing of the patch before it went in. 

Now that it's in, feel free to change how this was fixed via Bug #209328 as you
pointed out.
Comment 91 Jungshik Shin 2003-09-01 06:45:05 PDT
*** Bug 206788 has been marked as a duplicate of this bug. ***
Comment 92 Max Alekseyev 2006-09-19 16:35:26 PDT
It looks like I'm facing a similar bug to bug 206788 that is duplicate of this one.
Please take a look at bug 353113. Thanks.

Note You need to log in before you can comment on or make changes to this bug.