Closed Bug 1301989 Opened 8 years ago Closed 7 years ago

Failed to decode base64 string (at jsmime.js:70) - Make jsmime more tolerant to bad data and/or improve error handling.

Categories

(MailNews Core :: MIME, defect)

defect
Not set
normal

Tracking

(thunderbird_esr5255+ fixed, thunderbird55 fixed, thunderbird56 fixed)

RESOLVED FIXED
Thunderbird 56.0
Tracking Status
thunderbird_esr52 55+ fixed
thunderbird55 --- fixed
thunderbird56 --- fixed

People

(Reporter: joao.m.santos.silva, Assigned: jorgk-bmo)

References

Details

Attachments

(2 files, 3 obsolete files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0
Build ID: 20160728203720

Steps to reproduce:

I used two add-ons, "Remove Duplicate Messages" and "Remove Duplicate Messages (Alternate)", to find mail duplicates. They worked on other folders, but fail in folders with certain e-mails.


Actual results:

Add-on "crashed" (did not complete).

Error console shows:

Timestamp: 09/11/2016 08:26:44 PM
Error: Error: Failed to decode base64 string!
Source File: resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js
Line: 70

Timestamp: 09/11/2016 08:26:44 PM
Error: NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: [JavaScript Error: "Failed to decode base64 string!" {file: "resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js" line: 70}]'[JavaScript Error: "Failed to decode base64 string!" {file: "resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js" line: 70}]' when calling method: [nsIMimeConverter::decodeMimeHeader]
Source File: chrome://removeduplicates/content/showduplicatemessages.js
Line: 125


Expected results:

Duplicate mail result list.
I forgot to say that I've observed what seems to be this bug many years ago, while using the same add-ons. Now that I've used the add-ons again I saw the bug is still there.
Sorry, we don't deal with faulty add-ons here. Please contact the add-on author.
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
Whiteboard: [addon:Remove Duplicate Messages]
Are you sure this is a bug in the add-on (both add-ons)?

Because if I run a search by recipient in the same folder (no add-on) I get a similar error:

Timestamp: 09/12/2016 02:50:50 AM
Error: Error: Failed to decode base64 string!
Source File: resource:///modules/jsmime.jsm -> resource:///modules/jsmime/jsmime.js
Line: 70
Flags: needinfo?(jorgk)
Well, line 70 of jsmime.js has:
  return [atob(sanitize), buffer];
which is decoding base64.

Most likely you have a corrupt message in your folder. It may not be an add-on problem, but there's nothing we can do.

Can you attach the folder or find the message and attach it?

OK, I can reopen the bug but with the aim to make JSMime more tolerant to bad data.
Status: RESOLVED → REOPENED
Ever confirmed: true
Flags: needinfo?(jorgk)
Resolution: INVALID → ---
Summary: Failed to decode base64 string → Failed to decode base64 string (at jsmime.js:70) - Make jsmime more tolerant to bad data and/or improve error handling.
Whiteboard: [addon:Remove Duplicate Messages]
Status: REOPENED → NEW
The reporter send me an offending message. TB displays no recipient and the quick search by recipient fails with the mentioned error: Error: Failed to decode base64 string! jsmime.js:70:11

Now, the To: header of the message is:
To: =?UTF-8?B?Sm=2FDo28gU2lsdmE=?= <joao.m.santos.silva@gmail.com>
and that's clearly invalid as you can check here: http://www.jorgk.com/decode/.

So, as I said, we can make jsmime more tolerant.

In the meantime, I suggest you replace the To: headers in the mailbox with a text editor and you change your subscriptions details at the sender's website who obviously cannot deal with your name João M. S. Silva.
Don't worry, I stripped all the personal content ;-)
Thanks!

If you need help testing a fix I found the remaining offending e-mails and now have 45 of them in a separate folder.
Hi,

This bug is still here.
Clearly so, since there was no movement in this bug.
I also don't know what a solution would look like. The header is invalid, so what should TB display instead of displaying nothing?

Alfred, do you have an opinion here? Could you perhaps suggest a solution and a patch?
Flags: needinfo?(infofrommozilla)
Is there an easy fix so that the add-on does not crash (and provides duplicate results)?
Have you talked to the add-on author? He can wrap code that can fail into a try/catch block and deal with the failure.
(In reply to Jorg K (GMT+2) from comment #9)
> I also don't know what a solution would look like. The header is invalid, so

We should do something, because it breaks my Header Pane.

The error is caused by the '=' in the text.

According to the comment, decode_base64() should be a fail-safe function. But it stumbles over the '='.

> what should TB display instead of displaying nothing?

The Header is broken, so we should display it so.
It will be "Jm����M��ل".
Flags: needinfo?(infofrommozilla)
This removes any '=' from the mime word and corrects the padding.

The jsmime tests run without complaint.
Attachment #8883422 - Flags: review?(jorgk)
I have contacted the author of the add-on meanwhile.

If you'd like I can test the fix.
Comment on attachment 8883422 [details] [diff] [review]
make jsmime decode_base64() more fail-save (Ignore '=' in the code)

Review of attachment 8883422 [details] [diff] [review]:
-----------------------------------------------------------------

Can we add a simple example to mailnews/mime/jsmime/test/test_header.js in decodeRFC2047Words to simulate bad input and the respective bad output with replacement characters � (\ufffd).

::: mailnews/mime/jsmime/jsmime.js
@@ +66,5 @@
> +    if (more)
> +      buffer = sanitize.slice(-excess);
> +    else
> +      for (var i = 0; i < excess; i++)
> +        sanitize += '=';

Can you please explain this hunk a little an add a comment. I'm not a specialist on base64 but https://en.wikipedia.org/wiki/Base64#Output_padding suggests that there is padding with = or ==. If excess ==3 you'd add ===. Is this intended?
Just a thought: Perhaps it would be easier to remove = which are not at the end in one line:
let sanitize = buffer.replace(/[^A-Za-z0-9+\/=]/g,'');
sanitize = sanitize.replace(/=/([A-Za-z0-9+\/]/)/g, '$1');
Untested!!
Or rather:
sanitize = sanitize.replace(/[=]+/([A-Za-z0-9+\/]/)/g, '$1');
(In reply to Jorg K (GMT+2) from comment #15)

> ::: mailnews/mime/jsmime/jsmime.js
> @@ +66,5 @@
> > +    if (more)
> > +      buffer = sanitize.slice(-excess);
> > +    else
> > +      for (var i = 0; i < excess; i++)
> > +        sanitize += '=';

I only changed the structure to avoid a second "buffer = '';".
New is only the padding when 'excess' and '!more'

> Can you please explain this hunk a little an add a comment. I'm not a
> specialist on base64 but https://en.wikipedia.org/wiki/Base64#Output_padding
> suggests that there is padding with = or ==. If excess ==3 you'd add ===. Is
> this intended?

Somehow. The length of the Base64 string should always be mod(4)=0.
In a regular term, 3*'=' does not occur.

Base64('a')    => 'YQ=='
Base64('ab')   => 'YWI='
Base64('abc')  => 'YWJj'
Base64('abcd') => 'YWJjZA=='

If we want to work with irregular terms, three are also 3*'=' possible.

(In reply to Jorg K (GMT+2) from comment #16)
> Just a thought: Perhaps it would be easier to remove = which are not at the
> end in one line:

We would still have to correct the length. So no benefit. So that does not offer any advantage.
(In reply to Alfred Peters from comment #18)

> Base64('abcd') => 'YWJjZA=='
> 
> If we want to work with irregular terms, three are also 3*'=' possible.

Bad news: atob("YWJjZ"); throws an exception, no matter how many '=' I add.

Could we cut the last character when length is mod(4)=1?
Or add an 'A' (=0)?
(In reply to Alfred Peters from comment #19)

> Bad news: atob("YWJjZ"); throws an exception, no matter how many '=' I add.
> 
> Could we cut the last character when length is mod(4)=1?
> Or add an 'A' (=0)?

On a first thought, filling up with 0-bits would make sense. But a broken term is a broken term and we only would increase trash data.

So cutting of the last char would, in my opinion, make the most sense.
(In reply to Alfred Peters from comment #20)
> So cutting of the last char would, in my opinion, make the most sense.
Agreed. So the aim is to sanitise the base64 string so much that the atob() won't fail, right? We don't want a try/catch and simply return "bad input"?

OK, so let's do something that doesn't throw and returns something semi-useful, well, more useful than nothing since "Jm����M��ل" doesn't look very useful to me ;-)
(In reply to Alfred Peters from comment #20)
> So cutting of the last char would, in my opinion, make the most sense.

This is already be done in the original code:

| sanitize = sanitize.substring(0, sanitize.length - excess);

If !more this throws away all chars above mod(4)=0.

So:
| sanitize = sanitize.replace(/=+([A-Za-z0-9+\/])/g, '$1');

looks really good. :-)
(In reply to Jorg K (GMT+2) from comment #21)
> (In reply to Alfred Peters from comment #20)
> > So cutting of the last char would, in my opinion, make the most sense.
> Agreed. So the aim is to sanitise the base64 string so much that the atob()
> won't fail, right? We don't want a try/catch and simply return "bad input"?

That's a design decision. I do not insist on my solution.
What choice of words do you prefer? The text must be translatable, right?

> OK, so let's do something that doesn't throw and returns something
> semi-useful, well, more useful than nothing since "Jm����M��ل" doesn't look
> very useful to me ;-)

This has the advantage that at least the beginning is correct. OK, here it is only the 'J'
OK, how about a hybrid approach: We try to sanitise as much as we can and still add try/catch. Or is this unreasonable, that is, properly sanitised will always decode? Then let's just do that.

You're going to send a new patch, right? I'm a little confused by your comments:
> We would still have to correct the length. So no benefit. So that does not offer any advantage.
contradicting
> looks really good. :-) [referring to the wrong regexp which I corrected later].

So how about:
// Drop all non-base64 characters and also = which can be harmful in the middle.
let sanitize = buffer.replace(/[^A-Za-z0-9+\/]/g,'');
let excess = sanitize.length % 4;
if (excess != 0)
  if (more) {
    // Keep excess for next time.
    buffer = sanitize.slice(-excess);
    sanitize = sanitize.substring(0, sanitize.length - excess);
  } else {
    buffer = '';
    if (excess == 1) {
      // we can't pad with three =, so cut excess
      sanitize = sanitize.substring(0, sanitize.length - excess);
    } else if (excess == 2) {
      sanitize += "==";
    } else if (excess == 3) {
      sanitize += "=";
    }
  }

Actually, that's not right. Because you stripped "=", you create cases where you end up with 4*n-1 and when more is true, you'll return some excess where you shouldn't have. For AAA= you will even cause an endless loop, since you'll always return those three A's.

Am I right? Let's do it my way, patch coming ;-)
Attached patch 1301989-base64.patch (v1) (obsolete) — Splinter Review
How does that look?

I had to put the hunk in a different spot since my test case Y=WJjZA== didn't work. The code right after the hunk removed the last =, then I removed the = in the middle leaving YWJjZA=, so excess of 3 characters was removed and only YWJj was decoded.
Attachment #8884042 - Flags: review?(infofrommozilla)
Attachment #8883422 - Flags: review?(jorgk)
Attached patch 1301989-base64.patch (v1b) (obsolete) — Splinter Review
Removing trailing spaces.
Attachment #8884042 - Attachment is obsolete: true
Attachment #8884042 - Flags: review?(infofrommozilla)
Attachment #8884043 - Flags: review?(infofrommozilla)
Comment on attachment 8884043 [details] [diff] [review]
1301989-base64.patch (v1b)

Review of attachment 8884043 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mime/jsmime/jsmime.js
@@ +635,5 @@
>        if (/[^A-Za-z0-9+\/=]/.exec(text))
>          return false;
>  
> +      // Remove harmful `=' chars in the middle.
> +      text = text.replace(/[=]+([A-Za-z0-9+\/])/g, '$1');

This should be placed inside decode_base64(), because there are more indirect calls to it.
Search for 'ContentDecoders' and 'this._convertData'.

Why '[=]+' and not just '=+'?

::: mailnews/mime/jsmime/test/test_header.js
@@ +641,5 @@
>          "Byte missing \ufffd"], // Replacement character for invalid input.
> +
> +      // Test for bug 1301989: Tolerate = in the middle of base64.
> +      ["=?us-ascii?B?YWJjZA==?=X=?us-ascii?B?Y=WJjZA==?=X=?us-ascii?B?Y=WJjZ==A==?=",
> +        "abcdXabcdXabcd"],

I have just done some tests myself. Maybe you want to take over some:
| // Tolerate invalid Base64 encoding - Bug 1301989
| //["=?UTF-8?B?YWJjZA==?=", "abcd"], // correct
| ["=?UTF-8?B?YWJjZA=?=", "abc"], // Length is not a multipel of 4
| ["=?UTF-8?B?YWJjZA?=", "abc"],
| ["=?UTF-8?B?YWJjZ?=", "abc"],
| ["=?UTF-8?B?Y=WJ=jZA==?=", "abcd"], // Additional '=' within the word
| ["=?UTF-8?B?Sm=2FDo28=?=", "Jm\ufffd\u000e\ufffd\ufffd"], // Q encoded B string

The test run was successful.
Attachment #8884043 - Flags: review?(infofrommozilla) → review-
Attachment #8883422 - Attachment is obsolete: true
(In reply to Alfred Peters from comment #27)
> This should be placed inside decode_base64(), because there are more
> indirect calls to it.
Agreed. But then testcase Y=WJjZA== doesn't work, have you tried that? See comment #25.
Do we want to add this code twice?

> Why '[=]+' and not just '=+'?
OK.
(In reply to Jorg K (GMT+2) from comment #28)
> (In reply to Alfred Peters from comment #27)
> > This should be placed inside decode_base64(), because there are more
> > indirect calls to it.
> Agreed. But then testcase Y=WJjZA== doesn't work, have you tried that? See
> comment #25.

|  function decode_base64(buffer, more) {
|    // Drop all non-base64 characters
|    let sanitize = buffer.replace(/[^A-Za-z0-9+\/=]/g,'');
| +  // Drop all '=' that are not padding
| +  sanitize = sanitize.replace(/=+([A-Za-z0-9+\/])/g, '$1');

So it should work.
Hmm, you're not reading my comment #25. Let's read it together.

Start with Y=WJjZA==

Then we have this code:
      // Base64 strings must be a length of multiple 4, but it seems that some
      // mailers accidentally insert one too many `=' chars. Gracefully handle
      // this case; see bug 227290 for more information.
      if (text.length % 4 == 1 && text.charAt(text.length - 1) == '=')
        text = text.slice(0, -1);

So I get Y=WJjZA=

Now we remove the invalid = and get YWJjZA=

Now we remove ZA= and finally decode YWJj.

Oops, not what we wanted.

That may be justifiable behaviour, but as a test Y=WJjZA== will decode to abc and not abcd.
(In reply to Jorg K (GMT+2) from comment #30)
> Hmm, you're not reading my comment #25. Let's read it together.
> 
> Start with Y=WJjZA==
> 
> Then we have this code:
>       // Base64 strings must be a length of multiple 4, but it seems that some
>       // mailers accidentally insert one too many `=' chars. Gracefully handle
>       // this case; see bug 227290 for more information.
>       if (text.length % 4 == 1 && text.charAt(text.length - 1) == '=')
>         text = text.slice(0, -1);

> So I get Y=WJjZA=

Indeed. Isn't this code path covered by the tests?

My test:
| ["=?UTF-8?B?Y=WJ=jZA==?=", "abcd"], // Additional '=' within the word
was successful.

As I wrote in Comment 22, decode_base64() already cuts off chars above mod(4)=0.

So the above code is obsolete, right? Then we could remove it.
["=?UTF-8?B?Y=WJ=jZA==?=", "abcd"] is successful since you added twp "=" and hence the code isn't triggered. But I agree, that extra processing can just go since we cut anyway. I'll update my patch.
OK, this should do it then.
Attachment #8884043 - Attachment is obsolete: true
Attachment #8884095 - Flags: review?(infofrommozilla)
BTW, mailnews/mime/test/unit/test_bug493544.js still passes which has a test for bug 227290.
Comment on attachment 8884095 [details] [diff] [review]
1301989-base64.patch (v2).

Review of attachment 8884095 [details] [diff] [review]:
-----------------------------------------------------------------

Great
Attachment #8884095 - Flags: review?(infofrommozilla) → review+
https://hg.mozilla.org/comm-central/rev/23fafbba334ac5e12dd48bca5a8b10680265f6f1
Assignee: nobody → jorgk
Status: NEW → RESOLVED
Closed: 8 years ago7 years ago
Component: Search → MIME
Product: Thunderbird → MailNews Core
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 56.0
Version: 45 Branch → 45
Comment on attachment 8884095 [details] [diff] [review]
1301989-base64.patch (v2).

Let's get this uplifted.
Attachment #8884095 - Flags: approval-comm-esr52?
Attachment #8884095 - Flags: approval-comm-beta+
Attachment #8884095 - Flags: approval-comm-esr52? → approval-comm-esr52+
Thanks for the fix.
See Also: → 697927
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: