Closed Bug 1623896 Opened 5 months ago Closed 5 months ago

Embedded NULL bytes in text/* throws off parsing

Categories

(MailNews Core :: MIME, defect)

defect
Not set
critical

Tracking

(thunderbird_esr68 fixed, thunderbird68+ fixed, thunderbird75+ fixed)

RESOLVED FIXED
Thunderbird 76.0
Tracking Status
thunderbird_esr68 --- fixed
thunderbird68 + fixed
thunderbird75 + fixed

People

(Reporter: alex, Assigned: alex)

Details

Attachments

(2 files)

Attached file Example message

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0

Steps to reproduce:

This is the cause of bug 1623057

Sometimes a base64 encoded text/calendar message part may decode to NULL bytes. Example (part):

QkVHSU46VkNBTEVOREFSDQpNRVRIT0Q6UFVCTElTSA0KUFJPRElEOk1pY3Jvc29mdCBFeGNoYW5n
ZSBTZXJ2ZXIgMjAxMA0KVkVSU0lPTjoyLjANCkJFR0lOOlZUSU1FWk9ORQ0KVFpJRDpXLiBFdXJv
cGUgU3RhbmRhcmQgVGltZQAAAAAAAAAAAA0KQkVHSU46U1RBTkRBUkQNCkRUU1RBUlQ6MTYwMTAx

Observe the run of "AAA..." which are NULL bytes.

This leads to corrupted message parts causing e.g. calendar invitations to not show up. But I expect this to affect more attachment types as the issue is pretty generic

Actual results:

From debug tracing this is what happens:

  • MimePartBufferRead is called with MimeLeaf_parse_buffer as the read_fn
  • This calls MimeLeaf_parse_buffer due to in-memory decoding
  • This calls MimeDecoderWrite which forwards to mime_decode_base64_buffer (nothing happend to the data so far)
  • Now the base64 encoded data is decoded into the buffer and MimeInlineText_parse_decoded_buffer is called with the buffer (containing embedded NULLs) and the length (real length including the NULLs)
  • This is forwarded to mime_LineBuffer

Now comes the important part:

  • The buffer is searched for newlines, NULLs are skipped
  • Every line found is passed to convert_and_send_buffer with a pointer and length
  • This is passed to MimeInlineText_convert_and_parse_line but as it is already valid UTF-8 nothing is actually changed
  • Now GatherLine innsSimpleMimeConverterStub.cpp is called, still with the pointer and length
  • This does a ssobj->buffer->Append(line) which expects a NULL-terminated string

But because line contains embedded NULLs the line is truncated! This then leads to adding the next line to this line (due to the missing \n) causing garbage to come out of the parser.

I tried to pass the length to Append but then libical errors out, again due to a conversion to NULL terminated string.

What is also surprising: ical.js can deal with embedded zeros just fine. Example (save to index.htm and place next to ical.js):

<html>
<body>
<script src="ical.js"></script>
<script>
var src="QkVHSU46VkNBTEVOREFSDQpNRVRIT0Q6UFVCTElTSA0KUFJPRElEOk1pY3Jvc29mdCBFeGNoYW5n
ZSBTZXJ2ZXIgMjAxMA0KVkVSU0lPTjoyLjANCkJFR0lOOlZUSU1FWk9ORQ0KVFpJRDpXLiBFdXJv
cGUgU3RhbmRhcmQgVGltZQAAAAAAAAAAAA0KQkVHSU46U1RBTkRBUkQNCkRUU1RBUlQ6MTYwMTAx
MDFUMDIwMDAwDQpUWk9GRlNFVEZST006KzAyMDANClRaT0ZGU0VUVE86KzAxMDANClJSVUxFOkZS
RVE9WUVBUkxZO0lOVEVSVkFMPTE7QllEQVk9LTFTVTtCWU1PTlRIPTEwDQpFTkQ6U1RBTkRBUkQN
CkJFR0lOOkRBWUxJR0hUDQpEVFNUQVJUOjE2MDEwMTAxVDAxMDAwMA0KVFpPRkZTRVRGUk9NOisw
MTAwDQpUWk9GRlNFVFRPOiswMjAwDQpSUlVMRTpGUkVRPVlFQVJMWTtJTlRFUlZBTD0xO0JZREFZ
PS0xU1U7QllNT05USD0zDQpFTkQ6REFZTElHSFQNCkVORDpWVElNRVpPTkUNCkJFR0lOOlZFVkVO
VA0KT1JHQU5JWkVSO0NOPSJHcnVuZCwgQWxleGFuZGVyIjpNQUlMVE86YWxleGFuZGVyLmdydW5k
QHR1LWRyZXNkZW4uZGUNClNVTU1BUlk7TEFOR1VBR0U9ZGUtREU6Rk9PDQpEVFNUQVJUO1RaSUQ9
Vy4gRXVyb3BlIFN0YW5kYXJkIFRpbWUAAAAAAAAAAAA6MjAyMDAzMjJUMTkwMDAwDQpEVEVORDtU
WklEPVcuIEV1cm9wZSBTdGFuZGFyZCBUaW1lAAAAAAAAAAAAOjIwMjAwMzIyVDIwMDAwMA0KVUlE
OjQ1ZDQzOGQ4LTM5NTItNDFiYy1hYzAzLTEzNjVhNDljM2YxNw0KQ0xBU1M6UEVSU09OQUwNClBS
SU9SSVRZOjUNCkRUU1RBTVA6MjAyMDAzMTdUMTcxMTU1Wg0KVFJBTlNQOk9QQVFVRQ0KU1RBVFVT
OkNPTkZJUk1FRA0KU0VRVUVOQ0U6MA0KTE9DQVRJT047TEFOR1VBR0U9ZGUtREU6DQpYLU1JQ1JP
U09GVC1DRE8tQVBQVC1TRVFVRU5DRTowDQpYLU1JQ1JPU09GVC1DRE8tT1dORVJBUFBUSUQ6MjEx
ODM1NjA5Mw0KWC1NSUNST1NPRlQtQ0RPLUJVU1lTVEFUVVM6QlVTWQ0KWC1NSUNST1NPRlQtQ0RP
LUlOVEVOREVEU1RBVFVTOkJVU1kNClgtTUlDUk9TT0ZULUNETy1BTExEQVlFVkVOVDpGQUxTRQ0K
WC1NSUNST1NPRlQtQ0RPLUlNUE9SVEFOQ0U6MQ0KWC1NSUNST1NPRlQtQ0RPLUlOU1RUWVBFOjAN
ClgtTUlDUk9TT0ZULURJU0FMTE9XLUNPVU5URVI6RkFMU0UNCkJFR0lOOlZBTEFSTQ0KREVTQ1JJ
UFRJT046UkVNSU5ERVINClRSSUdHRVI7UkVMQVRFRD1TVEFSVDotUFQxME0NCkFDVElPTjpESVNQ
TEFZDQpFTkQ6VkFMQVJNDQpFTkQ6VkVWRU5UDQpFTkQ6VkNBTEVOREFSDQo="
var serialized = atob(src)
var cal = ICAL.parse(serialized)
var comp = new ICAL.Component(cal)
alert(comp.toString())
</script>
</body>
</html>

Expected results:

I think the best solution would be to not use NULL-terminated strings but always pass the length and make the parsers aware of that.

The second best solution would be to remove the embedded zeros in GatherLine (or earlier)

I created a patch which fixes this for me. Basically: Append full lines to buffer and strip all NULLs before passing it to APIs expecting NULL termination.

Severity: normal → critical
OS: Unspecified → All
Hardware: Unspecified → All
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attachment #9134678 - Flags: review?(benc)
Assignee: nobody → alex
Status: NEW → ASSIGNED
Comment on attachment 9134678 [details] [diff] [review]
Proposed patch (option 2)

The fix looks fine to me.

The actual calendar data looks like it's invalid, so the bigger question is: should really just be rejecting it outright?
And the answer of course is:
"No, don't be stupid. We have to cope with reality. If we stuck rigidly to the letter of every RFC out there we'd likely have to reject 90% of all data. In the real world lots of stuff is broken and we have to deal with it."

But I figured someone should at least ask the question :- )
Attachment #9134678 - Flags: review?(benc) → review+

And just for extra background, I decoded the offending entry, and replaced the '\0' bytes with '^', to get this:

BEGIN:VCALENDAR
METHOD:PUBLISH
PRODID:Microsoft Exchange Server 2010
VERSION:2.0
BEGIN:VTIMEZONE
TZID:W. Europe Standard Time^^^^^^^^^
BEGIN:STANDARD
DTSTART:16010101T020000
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
RRULE:FREQ=YEARLY;INTERVAL=1;BYDAY=-1SU;BYMONTH=10
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:16010101T010000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
RRULE:FREQ=YEARLY;INTERVAL=1;BYDAY=-1SU;BYMONTH=3
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
ORGANIZER;CN="Grund, Alexander":MAILTO:alexander.grund@tu-dresden.de
SUMMARY;LANGUAGE=de-DE:FOO
DTSTART;TZID=W. Europe Standard Time^^^^^^^^^:20200322T190000
DTEND;TZID=W. Europe Standard Time^^^^^^^^^:20200322T200000
UID:45d438d8-3952-41bc-ac03-1365a49c3f17
CLASS:PERSONAL
PRIORITY:5
DTSTAMP:20200317T171155Z
TRANSP:OPAQUE
STATUS:CONFIRMED
SEQUENCE:0
LOCATION;LANGUAGE=de-DE:
X-MICROSOFT-CDO-APPT-SEQUENCE:0
X-MICROSOFT-CDO-OWNERAPPTID:2118356093
X-MICROSOFT-CDO-BUSYSTATUS:BUSY
X-MICROSOFT-CDO-INTENDEDSTATUS:BUSY
X-MICROSOFT-CDO-ALLDAYEVENT:FALSE
X-MICROSOFT-CDO-IMPORTANCE:1
X-MICROSOFT-CDO-INSTTYPE:0
X-MICROSOFT-DISALLOW-COUNTER:FALSE
BEGIN:VALARM
DESCRIPTION:REMINDER
TRIGGER;RELATED=START:-PT10M
ACTION:DISPLAY
END:VALARM
END:VEVENT
END:VCALENDAR

It's always the TZID value that has the nulls. I assume that value is just human-readable identifier which is dropped in verbatim from a timezone database, and has no machine-readable meaning.
Note that the extra nulls in this example pad out the TZID value to a nice round 32 bytes :-) I'm imagining the software which generated this had a nice little char buf [32]-copying bug somewhere...

Note that the extra nulls in this example pad out the TZID value to a nice round 32 bytes :-) I'm imagining the software which generated this had a nice little char buf [32]-copying bug somewhere...

You are right. I initially reported this to TBSync at https://github.com/jobisoft/EAS-4-TbSync/issues/110 where there is already an older Bug report for pretty much the same issue: https://github.com/jobisoft/TbSync/issues/187 but as everything looks fine on their side I suspect a MS Exchange bug.

The protocol indeed uses 32 chars for that field: https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-asdtype/fe4760d2-3f8a-4453-908d-e256a6319956?redirectedfrom=MSDN although they use 64 bytes (wchar, 2 bytes/char)

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/25ec6ec11a93
Embedded NULL bytes in text/* throws off parsing. r=benc

Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 76.0
Attachment #9134678 - Flags: approval-comm-esr68?
Attachment #9134678 - Flags: approval-comm-beta?

Thanks! Would be great if this was backported to current/next releases especially 68.x as linux distros tend to stick there. The patch applies cleanly there.

Attachment #9134678 - Flags: approval-comm-beta? → approval-comm-beta+
Comment on attachment 9134678 [details] [diff] [review]
Proposed patch (option 2)

Approved for ESR
Attachment #9134678 - Flags: approval-comm-esr68? → approval-comm-esr68+
You need to log in before you can comment on or make changes to this bug.