When generating rfc2047 encoded headers, we're not calculating the size of the buffer we'll need correctly, which produces the predictable nasty issues.
Created attachment 247198 [details] [diff] [review] proposed fix This fixes the calculation to account for the multiple lines we'll generate. Given a utf8 decoded header, we'll generate something like this: Subject: Re: =?<charset>?Q?=910=865=910=862=A1=A4=910=8A2=810=868=810=8A5=810=890?= =?<charset>?Q?=61=E3=810=858=910=872=910=878=810=890=810=824=810=8B3=810=883?= =?<charset>?Q?=910=876=810=855=810=863=810=856=810=849=810=869=A1=C2=A1=C1?= =?<charset>?Q?=810=849=910=883=910=870=910=896=910=856=810=869=810=829=A1=E3?= =?<charset>?Q?=810=877=810=845=810=851=810=884=810=863=810=884=810=863=810=845?= =?<charset>?Q?=810=843?= So I basically count up the per-line overhead, and the max chars per line, and calculate how big our buffer has to be. I think this calculation is a lot more conservative than the old one, but I'll have to run with it a bit to make sure. The code downstream that crunches the passed in buffer also needs to be fixed.
Comment on attachment 247198 [details] [diff] [review] proposed fix 2 libmime bug patches in a week David, lucky you!
fixed on trunk and branch.
Comment on attachment 247198 [details] [diff] [review] proposed fix nominating for 18.104.22.168 - might be useful to get users who get a lot of rfc 2047 encoded headers (e.g., mozilla Japan) to try this out.
Masayuki-san and Kohei-san, I was wondering if you could test this fix on folders that contain a lot of rfc-2047 encoded headers. The particular test case I was fixing had to do with a message with a Re: subject which was rfc 2047 encoded - when parsing the headers for the db, we were crashing because the header was 6 lines long. I changed the logic for calculating the length, but I don't have a lot of test cases - I thought you might have more examples you could test this against, by deleting the .msf files for some folders and having them get regenerated. I've checked this fix into the trunk and 1.8.1 branch, so tomorrow's builds should have the change. We're thinking of taking this into 22.214.171.124, which comes out soon. thx for any help you can give! - David
Please file a bug to investigate/fix the downstream code you mention in comment 1
Bug 362542 filed to do follow-up investigation
CC'ing Georgi: he found the other recent MIME allocation error and this might give him more ideas of places to look -- happy hunting!
Comment on attachment 247198 [details] [diff] [review] proposed fix approved for 1.8.0 branch, a=dveditz for drivers
this code seems complex. PRInt32 perLineOverhead = strlen(charset) + 10; .... if (perLineOverhead > foldlen) ... return; ... PRInt32 charsPerLine = (foldlen - perLineOverhead) / 4;  RInt32 maxNumLines = (strlen(src) / charsPerLine) + 1; is |charsPerLine| guaranteed to be nonzero ( |foldlen == perLineOverhead|) causing division by 0 in  ?
I tried to make the code clearer by making the calculations explicit. Does that make it seem more complicated? Or is it the calculation itself that seems complicated? by default, foldlen 72, and the max charset string len is supposed to be 60, so in general, foldlen will be greater than the per-line overhead. But I don't know if there's any kind of enforcement of the max charset string (i.e., can a malicious message define an arbitrarily long charset for a header, and end up in this code?) I don't know enough about this code to know if that's possible.
David: Thank you for the CCing, and sorry for the delay. But I don't understand well. What should we test?
Masayuki-san, the main things to test are generation of rfc 2047 headers when the header itself is very long, e.g., a very long subject. It turns out we also run through this code when you read a message with a re: subject, and it's rfc 2047 encoded.