Last Comment Bug 384571 - RFC 2231 parameters not decoded when appearing in reversed order
: RFC 2231 parameters not decoded when appearing in reversed order
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla14
Assigned To: Julian Reschke
:
: Patrick McManus [:mcmanus]
Mentors:
: 588414 (view as bug list)
Depends on:
Blocks: 609667 783502
  Show dependency treegraph
 
Reported: 2007-06-15 02:54 PDT by roger.merat
Modified: 2012-09-10 16:44 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Proposed patch, incl test case (23.88 KB, patch)
2011-10-20 08:20 PDT, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test case (23.63 KB, patch)
2011-11-19 06:08 PST, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test case (23.63 KB, patch)
2011-11-22 13:37 PST, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test cases (24.34 KB, patch)
2011-12-28 14:12 PST, Julian Reschke
jduell.mcbugs: review-
Details | Diff | Splinter Review
Proposed patch, incl test case (work-in-progress) (24.37 KB, patch)
2012-02-04 05:25 PST, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test case (work-in-progress) (27.38 KB, patch)
2012-02-04 06:42 PST, Julian Reschke
no flags Details | Diff | Splinter Review
proposed patch, work-in-progress (28.78 KB, patch)
2012-02-13 07:53 PST, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test case (work-in-progress) (28.73 KB, patch)
2012-02-26 08:05 PST, Julian Reschke
no flags Details | Diff | Splinter Review
Proposed patch, incl test case (28.08 KB, patch)
2012-02-26 11:55 PST, Julian Reschke
jduell.mcbugs: review+
Details | Diff | Splinter Review
Proposed patch, incl updated test case results, plus new tests (28.69 KB, text/plain)
2012-03-02 05:32 PST, Julian Reschke
jduell.mcbugs: review+
Details
Proposed patch, incl updated test cases (24.48 KB, patch)
2012-03-04 14:12 PST, Julian Reschke
jduell.mcbugs: review+
Details | Diff | Splinter Review
clarified test case for reading past string boundaries (1.98 KB, patch)
2012-03-19 21:48 PDT, Jason Duell [:jduell] (needinfo me)
julian.reschke: review+
Details | Diff | Splinter Review
fix string overrun issue (947 bytes, patch)
2012-03-19 22:02 PDT, Jason Duell [:jduell] (needinfo me)
julian.reschke: review+
Details | Diff | Splinter Review
original patches with Jason's enhancements (24.88 KB, patch)
2012-03-20 06:53 PDT, Julian Reschke
jduell.mcbugs: review+
Details | Diff | Splinter Review

Description roger.merat 2007-06-15 02:54:45 PDT
User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8.1.4) Gecko/20070509 SeaMonkey/1.1.2
Build Identifier: Thunderbird 2.0.0.4 (20070604)

With RFC 2231 format, multiple parameters are used to encapsulate a single long parameter value. When these multiple parameters appear in ascending order, TB decode them correctly; but when they appear in reversed order (i.e. after been rewritten by a MTA), TB loose the parameter value.


For example, when TB receives

--Boundary_(ID_xK07D+TeHe1fzNTiytW9Hg)
Content-type: text/plain; CHARSET=iso-8859-1
Content-transfer-encoding: BASE64
Content-disposition: inline; filename*1=septttttt-huittttttt.txt;
 filename*0=unnnnnnnn-deuxxxxxx-troisssss-quatreeee-cinqqqqqq-sixxxxxxx-


the filename incorrectly appears as Part 1.2, while when TB receives

--Boundary_(ID_xK07D+TeHe1fzNTiytW9Hg)
Content-type: text/plain; CHARSET=iso-8859-1
Content-transfer-encoding: BASE64
Content-disposition: inline;
 filename*0=unnnnnnnn-deuxxxxxx-troisssss-quatreeee-cinqqqqqq-sixxxxxxx-;
 filename*1=septttttt-huittttttt.txt


the filename appears correctly as "unnnnnnnn-deuxxxxxx-troisssss-quatreeee-cinqqqqqq-sixxxxxxx-septttttt-huittttttt.txt


Reproducible: Always

Steps to Reproduce:
1. send a message containing a file with long filename using TB with mail.strictly_mime.parm_folding = 2
2. if the filename*n parameters are in reversed order in the Content-disposition: header line, the file name is lost 



Expected Results:  
TB should decode the filename*n parameter value rather than replacing it with Part 1.2

Sun JES MTA may rewrite multiple parameters in reversed order when the antivirus is configured using the conversion channel
Comment 1 WADA 2007-06-15 15:52:49 PDT
I think the header made by Sun JES MTA violates RFC2231, because RFC2231 is combination of "folder folding/unfolding" and "parameter splitting" instead of "multiple keyword parameters which has sequence number, placed any where".
Section name in RFC is "Parameter Value ***Continuations***".

>rfc 2231
> 3.  Parameter Value Continuations
>(snip)
> The count starts at 0 and increments by 1 for each subsequent section of the parameter value.
I think "each subsequent section" doesn't imply "each section can appear anywhere in a header", so I believe Sun JES MTA violates rule when header generation.

However, "in order" in RFC for recovery to original parameter is ambiguous.
>   The original parameter value is recovered by concatenating the
>   various sections of the parameter, in order.
So Sun JES MTA seems to consider it "concatenate splited parameter in order of decimal number, i.e. concatenate after sort by decimal number".

Changing logic for "recovery to original parameter" to SUN JES MTA style won't destroy correct RFC 2231 compliant header. So I'm not opposite to "enhancement for tolerance".
Comment 2 roger.merat 2007-06-18 00:52:36 PDT
I agree that RFC 2231 is ambiguous. It states that

- the mechanism MUST NOT depend on parameter ordering
- The solution is to use multiple parameters to contain a single parameter value
- The asterisk character ("*") followed by a decimal count is employed to indicate that multiple parameters are being used to encapsulate a single parameter value

As it does not say anything about the multiple parameters order, it is safer not to rely on ascending order. That is, for example, what the Horde IMP webmail User Agent does: it decodes correctly filename values coded in multiple parameters in reversed order as well as in ascending order.

BTW notice that N.Freed is co-author of RFC 2231 and developer of Sun JES MTA.
Comment 3 WADA 2007-06-18 01:26:04 PDT
(In reply to comment #2)
> BTW notice that N.Freed is co-author of RFC 2231 and developer of Sun JES MTA.
Can you invite him(or her) to this bug for "what is right way"? 

Comment 4 Chris Newman 2009-02-11 15:23:02 PST
I work with Ned and just asked him this question.  He says he thinks it's a bug in the spec that it doesn't express an ordering requirement when multiple parameters are used to encapsulate a single parameter value.  However, as the spec has been published for so long it's too late to fix this problem, so he has implemented it in his code so that order does not matter (and he found it a royal pain to do so).

In an upcoming release of Sun JES MTA (Ancho, not yet released), it will always emit 2231 multi-parameters in order, even if they were out-of-order when received.
Comment 5 Wayne Mery (:wsmwk, NI for questions) 2010-11-09 14:21:19 PST
(In reply to comment #4)
> In an upcoming release of Sun JES MTA (Ancho, not yet released), it will always
> emit 2231 multi-parameters in order, even if they were out-of-order when
> received.

so where does that leave this bug, and should it move to attachments or mime component?
Comment 6 Chris Newman 2010-11-09 22:10:11 PST
Since the spec isn't changing, a correct implementation of 2231 should re-order fragmented parameters if they are out of order before re-assembling them. Thunderbird's implementation is not correct. Since current releases of Oracle's MTA (former Sun) correctly reorder parameters so the client doesn't have to worry, that might reduce the priority of fixing this bug in Thunderbird.
Comment 7 Julian Reschke 2011-10-12 11:30:52 PDT
*** Bug 588414 has been marked as a duplicate of this bug. ***
Comment 8 Julian Reschke 2011-10-20 08:20:37 PDT
Created attachment 568398 [details] [diff] [review]
Proposed patch, incl test case

Functional changes:

- Continuations are collected in a linked list, to be processed at the end.

To get there, I made a few more changes:

- rather than short-circuiting, we always collect all information and only decide at the (only) exit point which param to use

- to do that, we collect param values for caseA, caseB and caseCorD separately (and for the two latter cases, their charset values)

- made the code that copies fragments into new strings more readable by calculating the length only once

- lots of other minor cleanups

I hope I haven't added a memory leak, and I also think I fixed one that would occur when both caseCorD and caseB are present (in this order), and both specify the charset.
Comment 9 Julian Reschke 2011-11-19 06:08:13 PST
Created attachment 575658 [details] [diff] [review]
Proposed patch, incl test case

Patch updated so that it applies on top of the change for bug 703015
Comment 10 Julian Reschke 2011-11-22 13:37:49 PST
Created attachment 576242 [details] [diff] [review]
Proposed patch, incl test case

Updated patch (fixes gcc complaining); try results: https://tbpl.mozilla.org/?tree=Try&rev=786562f9f2e8
Comment 11 Julian Reschke 2011-11-28 05:35:35 PST
Comment on attachment 576242 [details] [diff] [review]
Proposed patch, incl test case

(patch needs to be updated)
Comment 12 Julian Reschke 2011-12-28 14:12:12 PST
Created attachment 584641 [details] [diff] [review]
Proposed patch, incl test cases

Functional changes:

- Continuations are collected in a linked list, to be processed at the end.

To get there, I made a few more changes:

- rather than short-circuiting, we always collect all information and only decide at the (only) exit point which param to use

- to do that, we collect param values for caseA, caseB and caseCorD separately (and for the two latter cases, their charset values)

- made the code that copies fragments into new strings more readable by calculating the length only once

- lots of other minor cleanups

I hope I haven't added a memory leak, and I also think I fixed one that would occur when both caseCorD and caseB are present (in this order), and both specify the charset.

Try results: https://tbpl.mozilla.org/?tree=Try&rev=6c352fab1e30
Comment 13 Jason Duell [:jduell] (needinfo me) 2012-01-24 13:16:27 PST
Mark/David,

How important is the feature this bug would implement (ability to parse MIME header continuations if they're out of order, i.e. not 0, 1, 2, ...) to Thunderbird? (For HTTP we're heading towards using RFC 5987, which bans continuations.)  Abiding by a standard is good and well, but this patch will take some work and will involve some risk, and I'd like to get a sense of whether we'd rather try to say, change the spec so that it requires continuations to be in order (which seems to be a de-facto standard for the moment, given that we don't support otherwise and no one's been complaining AFAICT).
Comment 14 Jason Duell [:jduell] (needinfo me) 2012-01-24 13:25:12 PST
Comment on attachment 584641 [details] [diff] [review]
Proposed patch, incl test cases

Julian,

If we do take this there's a bunch of stuff I'd like to see changed.

1) A linked list isn't the best data structure for accumulating continuations.

For instance, when you accumulate results, you need to do N number of linear searches through the list, which is an n-squared algorithm.  The code will be both simpler and more efficient is you use an nsTArray<> instead: that way you can look up elements in constant (vs linear time) and accumulate them in order N time. Using nsTArray will also automatically get its destructor called (and those of the elements it contains) on any function exit, whereas now we've got to make sure we call combineContinuations if we've ever called addContinuation (I'm not sure we do: I haven't tried to comb through to check).

2) >  I hope I haven't added a memory leak.

I'm nervous at how many raw char *'s this code now has that have to be kept track of to avoid a memory leak.  I would be much happier if (in addition to the memory management added by a nsTArray destructor) we used something similar to nsAutoPtr to automatically delete these char*'s if they are !null (unless we wish to pass them back to the client, in which case we can call forget() on them to release them from nsAutoPtr deletion).   Alas, we can't use nsAutoPtr, because it called delete (we need to call free, since these are malloc'd).  We could probably take nsAutoFree (currently hidden in dom/base/nsJSEnvironment.cpp), add a Get() method to it, and use that.  Or something like that.   

A better alternative might be to change DoParameterInternal to actually take nsXPIDLCString's instead of char * (we're passing an nsXPIDLCString already), which would allow us to use proper string classes (and their memory mgmt) within DoParameterInternal w/o needing to convert them back to char * (and absorb yet another strdup hit) to write out the result.

And then there's the question of making sure none of this code breaks the net in any way, which I haven't looked at carefully enough yet.   First I want to find out if we really want this patch in the first place--I'm not sure the engineering/risk is worth the rather trivial benefit.

Minor:

So I see the existing code strips out '\r\n' (ie does line unwrapping), but only in case A.  Is there anything in the spec that prohibits line breaks in
continuations?

+typedef struct continuation : PRCListStr
+{
+  PRUint32 index;
+  char *value;
+} CONT;

Use "Cont" (or better yet, 'Continuation": we reserve all-caps for preprocessor macros.  You don't need the typedef in C++--just name the struct.

+    if (*str++ != '=') {
+      // don't accept parameters without "="
+      goto increment_str;

What did this code used to do when it allowed the logic to drop through?  I see
we still handle no equal sign when we're asking for an unnamed parameter, which
is good.  I can't think of what it means to parse a named parameter w/o an equal
sign to delineate the value, so *assume* this change doesnt break anything...

Watch out for adding extra whitespace--a lot of your blank lines have lots of spaces--they should be just \n\n

increment_str:
    while (nsCRT::IsAsciiSpace(*str)) ++str;
    if (*str == ';') ++str;
    while (nsCRT::IsAsciiSpace(*str)) ++str;

Any reason why this shouldn't be

    while (*str == ';' || nsCRT::IsAsciiSpace(*str)) 
      ++str;

?  Otherwise we won't skip extra ';' w/o param (or is that forbidden)?
Comment 15 David :Bienvenu 2012-01-24 13:37:17 PST
(In reply to Jason Duell (:jduell) from comment #13)
> Mark/David,
> 
> How important is the feature this bug would implement (ability to parse MIME
> header continuations if they're out of order, i.e. not 0, 1, 2, ...) to
> Thunderbird?
We haven't had any other real world complaints about TB's handling of this, which makes me think it's not that common in the real world. We aim to be liberal in what we accept and conservative in what we generate, so unless the spec can be fixed, my inclination is to support this correctly. But as you point out, there's some risk to the patch.
Comment 16 Julian Reschke 2012-01-25 13:27:33 PST
(In reply to Jason Duell (:jduell) from comment #14)
> 1) A linked list isn't the best data structure for accumulating
> continuations.

Thanks for the review. I did write point for point comments yesterday, but I can't see them here. Oh well; will redo most later.

> Minor:
> 
> So I see the existing code strips out '\r\n' (ie does line unwrapping), but
> only in case A.  Is there anything in the spec that prohibits line breaks in
> continuations?

It's like it was before; didn't want to make a change without a bug.


> +    if (*str++ != '=') {
> +      // don't accept parameters without "="
> +      goto increment_str;
> 
> What did this code used to do when it allowed the logic to drop through?  I
> see
> we still handle no equal sign when we're asking for an unnamed parameter,
> which
> is good.  I can't think of what it means to parse a named parameter w/o an
> equal
> sign to delineate the value, so *assume* this change doesnt break anything...

There was a bug in here before that was fixed by the code above (FF accepted "filename bar" as if it was "filename=bar").

> increment_str:
>     while (nsCRT::IsAsciiSpace(*str)) ++str;
>     if (*str == ';') ++str;
>     while (nsCRT::IsAsciiSpace(*str)) ++str;
> 
> Any reason why this shouldn't be
> 
>     while (*str == ';' || nsCRT::IsAsciiSpace(*str)) 
>       ++str;
> 
> ?  Otherwise we won't skip extra ';' w/o param (or is that forbidden)?

That already happens because of the main loop logic: see new test case at <http://greenbytes.de/tech/tc2231/#attemptyparam>
Comment 17 Julian Reschke 2012-02-04 05:25:05 PST
Created attachment 594433 [details] [diff] [review]
Proposed patch, incl test case (work-in-progress)
Comment 18 Julian Reschke 2012-02-04 06:42:09 PST
Created attachment 594440 [details] [diff] [review]
Proposed patch, incl test case (work-in-progress)
Comment 19 Julian Reschke 2012-02-13 07:53:09 PST
Created attachment 596687 [details] [diff] [review]
proposed patch, work-in-progress
Comment 20 Julian Reschke 2012-02-26 08:05:22 PST
Created attachment 600774 [details] [diff] [review]
Proposed patch, incl test case (work-in-progress)
Comment 21 Julian Reschke 2012-02-26 11:55:36 PST
Created attachment 600797 [details] [diff] [review]
Proposed patch, incl test case

general code cleanup, use nsTArray as suggested, change some of the return parameters to be C++ strings (will affect Thunderbird)
Comment 22 Julian Reschke 2012-02-26 13:15:12 PST
try results: https://tbpl.mozilla.org/?tree=Try&rev=faa18fd8206c
Comment 23 Jason Duell [:jduell] (needinfo me) 2012-03-01 22:26:04 PST
Comment on attachment 600797 [details] [diff] [review]
Proposed patch, incl test case

Review of attachment 600797 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  +r once a few minor nits are fixed (feel free to carry over +r on new patch)

Thanks for all your detailed work on this--I'm particularly grateful we've got a good set of test cases for this.

::: netwerk/mime/nsMIMEHeaderParamImpl.cpp
@@ +213,5 @@
> +
> +  // Get an upper bound for the length
> +  PRUint32 length = 0;
> +  for (PRUint32 i = 0; i < aArray.Length(); i++) {
> +    length += aArray[i].length;    

this line has got some superflous spaces at the end of it.

@@ +234,5 @@
> +        nsUnescape(c);
> +      }
> +    }
> +
> +    // Can't be non-empty

how about // return null if empty value

@@ +254,5 @@
> +                     bool aNeedsPercentDecoding)
> +{
> +  if (aIndex < aArray.Length() && aArray[aIndex].value) {
> +    NS_WARNING("duplicate RC2231 continuation segment #\n");
> +    return false;

add test case that hits this?  Didn't recall seeing one.

@@ +262,5 @@
> +    NS_WARNING("RC2231 continuation segment # exceeds limit\n");
> +    return false;
> +  }
> +
> +  Continuation cont (aValue, aLength, aNeedsPercentDecoding);

nit: next (blank) line has space chars

@@ +281,5 @@
> +    return -1;
> +  }
> +
> +  if (aLen > 1 && aValue[0] == '0') {
> +    NS_WARNING("trailing '0' not allowed in segment number\n");

s/trailing/leading

@@ +339,5 @@
>    *aResult = nsnull;
>  
> +  aCharset.Assign("");
> +  aLang.Assign("");
> +  

spaces in blank line

@@ +400,5 @@
> +  char *caseBResult = nsnull;
> +  char *caseCDResult = nsnull;
> +
> +  // collect continuation segments
> +  nsTArray<Continuation> segments;

AFAICT nsTArrays that aren't used are quite cheap, so I think it's ok to have this here even if it's not used in the common case.

@@ +401,5 @@
> +  char *caseCDResult = nsnull;
> +
> +  // collect continuation segments
> +  nsTArray<Continuation> segments;
> +  

just a single blank line here.

space chars in some of the blank lines around here.

@@ +417,3 @@
>  
> +    const char *nameStart = str;
> +    const char *nameEnd = NULL;

So you mix up NULL and nsnull in the patch.  Style guide now says "In new code, use NULL for pointers. In old code, using nsnull or 0 for consistency is allowed."  I learn something new every day :)  I don't care which you use, might be nice to be consistent across the file, but not a big deal.

@@ +494,2 @@
>  
> +      // 1st char past '*'       

trailing spaces

@@ +495,5 @@
> +      // 1st char past '*'       
> +      const char *cp = nameStart + paramLen + 1; 
> +
> +      // if param name ends in "*" we need do to RFC5987 "ext-value" decoding
> +      bool needExtDecoding = *(nameEnd - 1) == '*';      

ditto

@@ +502,5 @@
> +      bool caseCStart = (*cp == '0') && needExtDecoding;
> +
> +      // parse the segment number
> +      PRInt32 segmentNumber = -1;
> +      if (! caseB) {

no space between ! and case

::: netwerk/test/unit/test_MIME_params.js
@@ +81,4 @@
>  
>    // RFC 2231 not clear on correct outcome: we prefer non-continued extended
>    // (invalid; error recovery)
> +  ["attachment; filename=basic; filename*=UTF-8'l1'extended; filename*0*=UTF-8'l2'foo; filename*1=bar",

the l1/l2 stuff is intentional?  Having two charsets isn't the issue here, it's having both continuation/extended params, right?

@@ +191,5 @@
> +
> +  // check ordering
> +  ["attachment; filename=basic; filename*0*=UTF-8''0\r\n"
> +    + " filename*1=1; filename*2=2;filename*3=3;filename*4=4;filename*5=5\r\n" 
> +    + " filename*6=6; filename*7=7;filename*8=8;filename*9=9;filename*10=a\r\n"

ok that you omitted ';' at end of filename5/10?

Do we still have test coverage for case where we omit semicolon?  Are we still supporting that?  Only when carriage return follows?
Comment 24 Julian Reschke 2012-03-02 04:05:57 PST
(In reply to Jason Duell (:jduell) from comment #23)
> Comment on attachment 600797 [details] [diff] [review]
> Proposed patch, incl test case
> 
> Review of attachment 600797 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good!  +r once a few minor nits are fixed (feel free to carry over +r
> on new patch)
> ...

Thanks, will post a new patch.

> Thanks for all your detailed work on this--I'm particularly grateful we've
> got a good set of test cases for this.
> 
> ::: netwerk/mime/nsMIMEHeaderParamImpl.cpp
> @@ +213,5 @@
> > +
> > +  // Get an upper bound for the length
> > +  PRUint32 length = 0;
> > +  for (PRUint32 i = 0; i < aArray.Length(); i++) {
> > +    length += aArray[i].length;    
> 
> this line has got some superflous spaces at the end of it.

Ack. As a matter of fact, that's true for many other parts of the file I did not touch :-)

> @@ +234,5 @@
> > +        nsUnescape(c);
> > +      }
> > +    }
> > +
> > +    // Can't be non-empty
> 
> how about // return null if empty value

+1
 
> @@ +254,5 @@
> > +                     bool aNeedsPercentDecoding)
> > +{
> > +  if (aIndex < aArray.Length() && aArray[aIndex].value) {
> > +    NS_WARNING("duplicate RC2231 continuation segment #\n");
> > +    return false;
> 
> add test case that hits this?  Didn't recall seeing one.

will add.

> @@ +262,5 @@
> > +    NS_WARNING("RC2231 continuation segment # exceeds limit\n");
> > +    return false;
> > +  }
> > +
> > +  Continuation cont (aValue, aLength, aNeedsPercentDecoding);
> 
> nit: next (blank) line has space chars
> 
> @@ +281,5 @@
> > +    return -1;
> > +  }
> > +
> > +  if (aLen > 1 && aValue[0] == '0') {
> > +    NS_WARNING("trailing '0' not allowed in segment number\n");
> 
> s/trailing/leading
> 
> @@ +339,5 @@
> >    *aResult = nsnull;
> >  
> > +  aCharset.Assign("");
> > +  aLang.Assign("");
> > +  
> 
> spaces in blank line
> 
> @@ +400,5 @@
> > +  char *caseBResult = nsnull;
> > +  char *caseCDResult = nsnull;
> > +
> > +  // collect continuation segments
> > +  nsTArray<Continuation> segments;
> 
> AFAICT nsTArrays that aren't used are quite cheap, so I think it's ok to
> have this here even if it's not used in the common case.
> 
> @@ +401,5 @@
> > +  char *caseCDResult = nsnull;
> > +
> > +  // collect continuation segments
> > +  nsTArray<Continuation> segments;
> > +  
> 
> just a single blank line here.
> 
> space chars in some of the blank lines around here.
> 
> @@ +417,3 @@
> >  
> > +    const char *nameStart = str;
> > +    const char *nameEnd = NULL;
> 
> So you mix up NULL and nsnull in the patch.  Style guide now says "In new
> code, use NULL for pointers. In old code, using nsnull or 0 for consistency
> is allowed."  I learn something new every day :)  I don't care which you
> use, might be nice to be consistent across the file, but not a big deal.

I've made all the lines I touched use "NULL".

> @@ +494,2 @@
> >  
> > +      // 1st char past '*'       
> 
> trailing spaces
> 
> @@ +495,5 @@
> > +      // 1st char past '*'       
> > +      const char *cp = nameStart + paramLen + 1; 
> > +
> > +      // if param name ends in "*" we need do to RFC5987 "ext-value" decoding
> > +      bool needExtDecoding = *(nameEnd - 1) == '*';      
> 
> ditto
> 
> @@ +502,5 @@
> > +      bool caseCStart = (*cp == '0') && needExtDecoding;
> > +
> > +      // parse the segment number
> > +      PRInt32 segmentNumber = -1;
> > +      if (! caseB) {
> 
> no space between ! and case
> 
> ::: netwerk/test/unit/test_MIME_params.js
> @@ +81,4 @@
> >  
> >    // RFC 2231 not clear on correct outcome: we prefer non-continued extended
> >    // (invalid; error recovery)
> > +  ["attachment; filename=basic; filename*=UTF-8'l1'extended; filename*0*=UTF-8'l2'foo; filename*1=bar",
> 
> the l1/l2 stuff is intentional?  Having two charsets isn't the issue here,
> it's having both continuation/extended params, right?

Leftover from a test to check whether the parser gets confused when both notations specify different languages. Undone. This should be handled as a separate problem (will need to investigate)

> @@ +191,5 @@
> > +
> > +  // check ordering
> > +  ["attachment; filename=basic; filename*0*=UTF-8''0\r\n"
> > +    + " filename*1=1; filename*2=2;filename*3=3;filename*4=4;filename*5=5\r\n" 
> > +    + " filename*6=6; filename*7=7;filename*8=8;filename*9=9;filename*10=a\r\n"
> 
> ok that you omitted ';' at end of filename5/10?

No. Copied from broken test case that I changed later on.

> Do we still have test coverage for case where we omit semicolon?  Are we
> still supporting that?  Only when carriage return follows?

That's a good question. I think the tests did that by accident. I opened bug 732369 so we can treat this as a separate issue.
Comment 25 Julian Reschke 2012-03-02 05:32:47 PST
Created attachment 602323 [details]
Proposed patch, incl updated test case results, plus new tests
Comment 26 Julian Reschke 2012-03-02 05:36:34 PST
(In reply to Jason Duell (:jduell) from comment #23)
> ...
> > +{
> > +  if (aIndex < aArray.Length() && aArray[aIndex].value) {
> > +    NS_WARNING("duplicate RC2231 continuation segment #\n");
> > +    return false;
> 
> add test case that hits this?  Didn't recall seeing one.

Done. While doing it, I noticed that there was a case where the return value wasn't checked, so continuation processing did not stop. I fixed that.

Clarifying: we stop continuation process when we see the first segment we reject. I'm not sure that's the best approach (vs ignore, or throwing away what we collected before), but I believe it's closest to what we did before. After all, it's an edge case so it probably does not matter.
Comment 27 Julian Reschke 2012-03-02 06:53:59 PST
Try results: https://tbpl.mozilla.org/?tree=Try&rev=7a7dccf5a9bf
Comment 28 Jason Duell [:jduell] (needinfo me) 2012-03-02 22:54:21 PST
Comment on attachment 602323 [details]
Proposed patch, incl updated test case results, plus new tests

It all looks good.

One final question, though:  AFAICT we don't really gain much by changing aCharset/aLang from char** to nsACString &.  (My original idea was to change aResult to an nsACString, but we don't seem to need to).  I suspect it will be less work to keep the params char ** and use nsMemory::Clone as in the old code.  That way we don't change the IDL, and don't wind up just converting lang back into a char ** in DoGetParameter.   And we won't need to change Thunderbird.  Sound ok?
Comment 29 Julian Reschke 2012-03-04 10:55:58 PST
(In reply to Jason Duell (:jduell) from comment #28)
> Comment on attachment 602323 [details]
> Proposed patch, incl updated test case results, plus new tests
> 
> It all looks good.

Thanks.

> One final question, though:  AFAICT we don't really gain much by changing
> aCharset/aLang from char** to nsACString &.  (My original idea was to change
> aResult to an nsACString, but we don't seem to need to).  I suspect it will
> be less work to keep the params char ** and use nsMemory::Clone as in the
> old code.  That way we don't change the IDL, and don't wind up just
> converting lang back into a char ** in DoGetParameter.   And we won't need
> to change Thunderbird.  Sound ok?

I think we should make this and more changes in the mid-term, but it probably would be good to do them all at once at a later point of time. So I'll change the method signature...
Comment 30 Julian Reschke 2012-03-04 14:12:00 PST
Created attachment 602765 [details] [diff] [review]
Proposed patch, incl updated test cases
Comment 31 Julian Reschke 2012-03-05 00:30:00 PST
Try results: https://tbpl.mozilla.org/?tree=Try&rev=4539598c84f8
Comment 32 Jason Duell [:jduell] (needinfo me) 2012-03-05 13:55:15 PST
Comment on attachment 602765 [details] [diff] [review]
Proposed patch, incl updated test cases

Great--I think this is ready now.   I'm going to hold off on landing it until our next fork on March 13, just to give it more time to bake. 

You must be feeling like more of a C/C++ programmer now, eh Julian?
Comment 33 Jason Duell [:jduell] (needinfo me) 2012-03-15 13:07:13 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd5ac132e763
Comment 34 Jason Duell [:jduell] (needinfo me) 2012-03-15 23:46:29 PDT
Backed out, alas.  And it's a truly mysterious bug, too :(

   https://hg.mozilla.org/integration/mozilla-inbound/rev/c549f6a97089

On windows PGO builds (i.e. built with program global optimization), and *only* on those builds, we see this mochitest fail:

  ERROR TEST-UNEXPECTED-FAIL | /tests/content/xbl/test/test_bug389322.xhtml | JS 1.7 should not work in versionless HTML script tags - got false, expected true 

The code in question is this:

   http://mxr.mozilla.org/mozilla-central/source/content/xbl/test/test_bug389322.xhtml#106

It parses

   <script type="text/javascript; version=1.7">

I looked into this a little, and indeed, it turns out nsScriptLoader::ProcessScriptElement does call the MIME parser to read in the javascript version.

The real mystery, of course, is why this fails only on windows PGO builds.   Sadly, tracking these down tends to really suck.  Here's some history from philor:

"only clue I have is that one PGO-only test failure I saw in the long ago was from uninitialized memory read, using a freed pointer, something like that, which wound up reading something different after PGO stirred the part of the pot it shouldn't have been looking at.   Particularly fun since it would fail, but after backing out and then relanding it with no changes other than moving the position of code blocks around it would pass"

Lovely.  If we're *really* lucky, we're actually setting off some PGO-related error in some other code.

We'll need to set PGO on when sending potential fixes to try:

   https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build

Julian:  If you could go over the code and eyeball for any possible uninitialized memory reads, that'd be a start.
Comment 35 Julian Reschke 2012-03-15 23:57:50 PDT
It would be interesting to see whether the problem goes away if we temporarily switch off the continuation support completely...
Comment 36 Jason Duell [:jduell] (needinfo me) 2012-03-16 00:22:01 PDT
Apparently Valgrind may help here, but it's not clear to me that it works on win32--possibly with WINE:

  http://wiki.winehq.org/Wine_and_Valgrind
Comment 37 Julian Reschke 2012-03-17 14:17:07 PDT
For reference:

  https://tbpl.mozilla.org/?tree=Try&rev=0fbc9a41ae0e

I have looked at the code and have no clue yet what could be going on...
Comment 38 Jason Duell [:jduell] (needinfo me) 2012-03-19 21:48:02 PDT
Created attachment 607447 [details] [diff] [review]
clarified test case for reading past string boundaries

Ran Valgrind, and discovered one error at least.  AFAICT it was in the old code, too, but we didn't have a test that triggered it (Julian: you misplaced a slash and put down "\;r\n" in one of the tests, which is the test condition: I've made some clearer examples).

If a header string ends in a plain word (no '=') and is not followed by a semicolon, we wind up incrementing str to the terminating null byte, then when *str != '=' we merrily march past the null.   We seem to get lucky and hit another null somewhere before getting into serious trouble on most platforms, but perhaps not on windows PGO.
Comment 39 Jason Duell [:jduell] (needinfo me) 2012-03-19 22:02:15 PDT
Created attachment 607451 [details] [diff] [review]
fix string overrun issue

Here's the fix.  

Tryserver run for win32 PGO:

  https://tbpl.mozilla.org/?tree=Try&rev=70d86b5599fe
Comment 40 Julian Reschke 2012-03-20 02:36:22 PDT
New try: https://tbpl.mozilla.org/?tree=Try&rev=68bdf9da2823
Comment 41 Julian Reschke 2012-03-20 06:53:50 PDT
Created attachment 607530 [details] [diff] [review]
original patches with Jason's enhancements
Comment 42 Julian Reschke 2012-03-20 06:54:55 PDT
Comment on attachment 607530 [details] [diff] [review]
original patches with Jason's enhancements

Jason, this combines the three changes (and moves one of the tests to a different location in the test class); thanks for figuring out what was wrong!
Comment 43 Jason Duell [:jduell] (needinfo me) 2012-03-20 14:11:09 PDT
Pushed to inbound again

  https://hg.mozilla.org/integration/mozilla-inbound/rev/a2220d801e0e

since this try run looks green:

  https://tbpl.mozilla.org/?tree=Try&pusher=jduell@mozilla.com
Comment 44 Mounir Lamouri (:mounir) 2012-03-21 03:42:05 PDT
https://hg.mozilla.org/mozilla-central/rev/a2220d801e0e

Note You need to log in before you can comment on or make changes to this bug.