Calling SetForceCharacterSet() not using a Gecko-canonical name leads to crash (was: Thunderbird: Crash forwarding or sending a message, or editing draft ...)

RESOLVED FIXED in Firefox 51

Status

()

Core
Document Navigation
--
critical
RESOLVED FIXED
2 years ago
11 months ago

People

(Reporter: Paul Lemmon, Assigned: hsivonen)

Tracking

({crash, topcrash-thunderbird})

Trunk
mozilla51
x86
All
crash, topcrash-thunderbird
Points:
---

Firefox Tracking Flags

(thunderbird_esr38 wontfix, thunderbird_esr4551+ fixed, firefox51 fixed)

Details

(crash signature)

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

2 years ago
Created attachment 8644382 [details]
tbbug1.txt

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Firefox/38.0
Build ID: 20150525141253

Steps to reproduce:

Intermittent crashes and only with you tube or other player clips in them-It happens when I open the message to read-I am Not using NT but XP


Actual results:

When I go to open a draft file that I have saved but not sent yet with an enclosed you tube video clip in it, some days and sometimes it crashes, other days and times it won't crash. If it crashes it will do it to all my other video messages as well, but if it doesn't then it won't for all others as well. All other messages are fine that don't have video clips in them.


Expected results:

The message should have opened without crashing as it does sometimes.
(Reporter)

Updated

2 years ago
OS: Unspecified → Windows XP
Hardware: Unspecified → x86
(Reporter)

Updated

2 years ago
Keywords: crash

Comment 1

2 years ago
 nsHtml5StreamParser::WriteStreamBytes(unsigned char const*, unsigned int, unsigned int*)
bp-8d50bf0c-7ae3-4508-b69c-af0672150806
 0 	xul.dll	nsHtml5StreamParser::WriteStreamBytes(unsigned char const*, unsigned int, unsigned int*)	parser/html/nsHtml5StreamParser.cpp
1 	xul.dll	nsHtml5StreamParser::SetupDecodingAndWriteSniffingBufferAndCurrentSegment(unsigned char const*, unsigned int, unsigned int*)	parser/html/nsHtml5StreamParser.cpp
2 	xul.dll	nsHtml5StreamParser::SniffStreamBytes(unsigned char const*, unsigned int, unsigned int*)	parser/html/nsHtml5StreamParser.cpp
3 	xul.dll	nsHtml5StreamParser::DoDataAvailable(unsigned char const*, unsigned int)	parser/html/nsHtml5StreamParser.cpp
4 	xul.dll	nsStringInputStream::ReadSegments(nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*)	xpcom/io/nsStringStream.cpp
5 	xul.dll	nsHtml5StreamParser::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned __int64, unsigned int)	parser/html/nsHtml5StreamParser.cpp
Severity: normal → critical
Crash Signature: [@ nsHtml5StreamParser::WriteStreamBytes(unsigned char const*, unsigned int, unsigned int*)]
Component: Untriaged → Message Compose Window
Summary: bp-8d50bf0c-7ae3-4508-b69c-af0672150806 → Crash editting draft that contains video clip

Updated

2 years ago
Crash Signature: [@ nsHtml5StreamParser::WriteStreamBytes(unsigned char const*, unsigned int, unsigned int*)] → [@ nsHtml5StreamParser::WriteStreamBytes(unsigned char const*, unsigned int, unsigned int*)] [@ nsHtml5StreamParser::WriteStreamBytes]

Comment 2

2 years ago
This is a Thunderbird top crasher, so we should confirm.

The crash is pretty obvious, uninitialized mUnicodeConverter. In mailnews code we would routinely check such things, they tend to be less cautious in core mozilla code. We could add a null check to our branch and stop the crash if we wanted, or perhaps core would also accept such a check.

As to the underlying cause, without really understanding this code, the missing pointer is initialized in OnStartRequest. We've recently identified and fixed a missing OnStartRequest in compose that was related to the attachment hang, though that has not landed in esr45. Perhaps this crash will go away as well?
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 3

2 years ago
Created attachment 8767938 [details]
bug 1191841 crash rate.png

(In reply to Kent James (:rkent) from comment #2)
> This is a Thunderbird top crasher, so we should confirm.

more specifically, #10 crash for 45.1.1.

This would seem to be a regression, because there is a strong uptick in 45.0 betas. (TB crashes exist prior to 45.0, but at a low rate.) Unable to determine when the uptick started because there are no crashes reported at any noticable rate in trunk or aurora. 

I'd say Firefox got a fix, or backed out a fix, in the January time frame that Thunderbird did not, based this graph  https://crash-stats.mozilla.com/signature/?date=%3E2016-01-01&signature=nsHtml5StreamParser%3A%3AWriteStreamBytes#graphs   Crash rate drops after FF44.0b4, but I was unable to find a bug report or checkin that seems likely
https://hg.mozilla.org/mozilla-central/log/tip/parser/html/nsHtml5StreamParser.cpp
https://hg.mozilla.org/releases/mozilla-beta/log/tip/parser/html/nsHtml5StreamParser.cpp

For completeness, may be worth mentioning we picked up bug 653342 in TB45.

Comment 4

2 years ago
(In reply to Kent James (:rkent) from comment #2)
> This is a Thunderbird top crasher, so we should confirm.
> 
> The crash is pretty obvious, uninitialized mUnicodeConverter. In mailnews
> code we would routinely check such things, they tend to be less cautious in
> core mozilla code. We could add a null check to our branch and stop the
> crash if we wanted, or perhaps core would also accept such a check.
> 
> As to the underlying cause, without really understanding this code, the
> missing pointer is initialized in OnStartRequest. We've recently identified
> and fixed a missing OnStartRequest in compose that was related to the
> attachment hang, though that has not landed in esr45. Perhaps this crash
> will go away as well?

Hopefully someone can take the forensics of comment 3 further
status-thunderbird_esr38: --- → wontfix
status-thunderbird_esr45: --- → affected
tracking-thunderbird_esr45: --- → ?
Keywords: regression, regressionwindow-wanted, topcrash-thunderbird
OS: Windows XP → All
Summary: Crash editting draft that contains video clip → Crash forwarding or sending a message, or editting draft that contains video clip
Whiteboard: [regression:TB45?]

Comment 5

2 years ago
So this bug is caused by attaching and receiving (and trying to view a smallish you tube clip)? Maybe I can try to test it in the next few weeks although downloading a largish youtube video clip does not sound nice. I wonder if someone can point to a small youtube video that I can download.
(Reporter)

Comment 6

2 years ago
It appears to be happening if it goes over too many e-mails in the draft folder. If I move that you tube e-mail into a different folder that doesn't have as many e-mails in it, I can then open it up without it crashing.

Comment 7

2 years ago
I noticed that you can download a youtube video clip in

mp4
webm
gpp

or some such encoding format.
(For example, https://www.youtube.com/watch?v=vDiETeBC9W8 )

Given that I see sHtml5StreamParser in comment 1 in the backtrace of the involved DLLs,
I suspect the problem happens with webm data (?).

Can someone confirm if the problem happened with .webm data format video clips only
or it happened with any of the youtube clips?

This problem seems so tricky to re-produce if it can be reproduced in C-C TB at all. (May have been fixed already in C-C. Oh well.)

TIA

Updated

a year ago
Summary: Crash forwarding or sending a message, or editting draft that contains video clip → Crash forwarding or sending a message, or editting draft that contains video clip attachment
(In reply to Paul Lemmon from comment #6)
> It appears to be happening if it goes over too many e-mails in the draft
> folder. If I move that you tube e-mail into a different folder that doesn't
> have as many e-mails in it, I can then open it up without it crashing.

ring any bells?

see also comment 2
Flags: needinfo?(acelists)
Paul, does it also crash for you if you use a nightly build https://archive.mozilla.org/pub/thunderbird/nightly/latest-comm-central/thunderbird-50.0a1.en-US.win32.installer.exe ?    (backup your THunderbird profile before testing this)
Flags: needinfo?(plemmon)

Comment 10

a year ago
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #8)
> ring any bells?

Yes, we should disable videos also in releases :)

Anyway, how many emails is "many"? And how big are those videos? Are they playing while viewing the messages?
Flags: needinfo?(acelists)

Comment 11

a year ago
(In reply to Kent James (:rkent) from comment #2)
> We could add a null check to our branch and stop the
> crash if we wanted, or perhaps core would also accept such a check.
Well, why don't we speak to "core". We're on good terms with Henri and Masatoshi.

Gentlemen, apparently we see a crash here:
https://hg.mozilla.org/releases/mozilla-esr38/annotate/b32d199c886c/parser/html/nsHtml5StreamParser.cpp#l818

Would you accept a null-check there like you have here a few lines above?
https://hg.mozilla.org/releases/mozilla-esr38/annotate/b32d199c886c/parser/html/nsHtml5StreamParser.cpp#l794
Flags: needinfo?(hsivonen)
Flags: needinfo?(VYV03354)
(Assignee)

Comment 12

a year ago
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #1)
>  nsHtml5StreamParser::WriteStreamBytes(unsigned char const*, unsigned int,
> unsigned int*)
> bp-8d50bf0c-7ae3-4508-b69c-af0672150806
>  0 	xul.dll	nsHtml5StreamParser::WriteStreamBytes(unsigned char const*,
> unsigned int, unsigned int*)	parser/html/nsHtml5StreamParser.cpp
> 1 	xul.dll
> nsHtml5StreamParser::
> SetupDecodingAndWriteSniffingBufferAndCurrentSegment(unsigned char const*,
> unsigned int, unsigned int*)	parser/html/nsHtml5StreamParser.cpp

SetupDecodingAndWriteSniffingBufferAndCurrentSegment() does this:
  NS_ASSERTION(IsParserThread(), "Wrong thread!");
  nsresult rv = NS_OK;
  mUnicodeDecoder = EncodingUtils::DecoderForEncoding(mCharset);
  if (mSniffingBuffer) {
    uint32_t writeCount;
    rv = WriteStreamBytes(mSniffingBuffer.get(), mSniffingLength, &writeCount);
    NS_ENSURE_SUCCESS(rv, rv);
    mSniffingBuffer = nullptr;
  }
  mMetaScanner = nullptr;
  if (aFromSegment) {
    rv = WriteStreamBytes(aFromSegment, aCount, aWriteCount);
  }
  return rv;

As you can see, there's an assignment to mUnicodeDecoder that always happens before calling WriteStreamBytes().

How might DecoderForEncoding() return null?

Its documentation says:
  /**
   * Instantiates a decoder for an encoding. The input must be a
   * Gecko-canonical encoding name
   * @param aEncoding a Gecko-canonical encoding name
   * @return a decoder
   */

Let's look at the implementation:
  nsAutoCString contractId(NS_UNICODEDECODER_CONTRACTID_BASE);
  contractId.Append(aEncoding);

  nsCOMPtr<nsIUnicodeDecoder> decoder = do_CreateInstance(contractId.get());
  MOZ_ASSERT(decoder, "Tried to create decoder for unknown encoding.");
  return decoder.forget();

So if used contrary to the documentation, the method fatally asserts. But the assertion is left out in release builds, so the method returns null if used contrary to documentation. (Or, alternatively, if used before uconv has been initialized or after it has been shut down. Bug 1261841 will remove that caveat by deCOMtaminating encoding converters.)

Since this crash isn't seen in Firefox, chances are that mailnews manages to make mCharset contain a string that's not a Gecko-canonical encoding name.

It looks like the documentation of SetDocumentCharset fails to say that it these days takes a Gecko-canonical encoding name instead of a label. Sorry. The only caller in m-c passes a Gecko-canonical name.

Instead of adding a null-check in the parser, I suggest you audit the code path that ends up calling SetDocumentCharset() on the HTML parser to see where a label as opposed to a Gecko-canonical name is passed and fix that.

Using the C++ type system to prevent this problem is bug 919924.
Flags: needinfo?(hsivonen)

Updated

a year ago
Flags: needinfo?(VYV03354)

Comment 13

a year ago
Henri, thanks for the analysis and please excuse the layman questions that follow.

(In reply to Henri Sivonen (:hsivonen) from comment #12)
> Since this crash isn't seen in Firefox, chances are that mailnews manages to
> make mCharset contain a string that's not a Gecko-canonical encoding name.
> 
> It looks like the documentation of SetDocumentCharset fails to say that it
> these days takes a Gecko-canonical encoding name instead of a label. Sorry.
I'm a little confused by SetDocumentCharset(). Looking at DXR, I can see many:
https://dxr.mozilla.org/comm-central/search?q=%3A%3ASetDocumentCharset&redirect=false
Are we talking about nsHtml5Parser::SetDocumentCharset() or another one?

> The only caller in m-c passes a Gecko-canonical name.
Where is that only caller? Without knowing which SetDocumentCharset() we're talking about I can't locate it.
Next questions: What are Gecko-canonical names? Is there a list? Can I test that I am using a valid parameter. What's the difference between a name and a label? To me, it's "UTF-8", "ISO-8859-1", "ISO-2022-JP", "GBK", etc.).

> Instead of adding a null-check in the parser, I suggest you audit the code
> path that ends up calling SetDocumentCharset() on the HTML parser to see
> where a label as opposed to a Gecko-canonical name is passed and fix that.
Sure, once we establish which function to look for or breakpoint on, see question above.
(Without looking at it, I assume when rendering an e-mail, we pass on any charset information contained in the message. Does FF validate the charset information it finds on a webpage?)

Now one thing I don't understand is this: We have a module that crashes at some point if at some earlier point some bad data was passed in. Sure, we can track down where the bad data is passed in, but shouldn't a crash be avoided under all circumstances?
Flags: needinfo?(hsivonen)

Comment 14

a year ago
OK, I did some digging myself.
Starting at mUnicodeDecoder = EncodingUtils::DecoderForEncoding(mCharset); here
https://dxr.mozilla.org/mozilla-central/rev/97a52326b06a07930216ebefa5af333271578904/parser/html/nsHtml5StreamParser.cpp#301

we're talking about mCharset from here:
https://dxr.mozilla.org/mozilla-central/rev/97a52326b06a07930216ebefa5af333271578904/parser/html/nsHtml5StreamParser.h#428

and SetDocumentCharset() (inline!) from here:
https://dxr.mozilla.org/mozilla-central/rev/97a52326b06a07930216ebefa5af333271578904/parser/html/nsHtml5StreamParser.h#159
So nsHtml5StreamParser::SetDocumentCharset().

Breaking on this, I get this stack:
nsHtml5StreamParser::SetDocumentCharset(const nsACString_internal & aCharset, int aSource) Line 160	C++
nsHtml5Parser::SetDocumentCharset(const nsACString_internal & aCharset, int aCharsetSource) Line 105	C++
nsHTMLDocument::StartDocumentLoad(const char * aCommand, nsIChannel * aChannel, nsILoadGroup * aLoadGroup, nsISupports * aContainer, nsIStreamListener * * aDocListener, bool aReset, nsIContentSink * aSink) Line 765	C++
nsContentDLF::CreateDocument(const char * aCommand, nsIChannel * aChannel, nsILoadGroup * aLoadGroup, nsIDocShell * aContainer, const nsID & aDocumentCID, nsIStreamListener * * aDocListener, nsIContentViewer * * aContentViewer) Line 375	C++
nsContentDLF::CreateInstance(const char * aCommand, nsIChannel * aChannel, nsILoadGroup * aLoadGroup, const nsACString_internal & aContentType, nsIDocShell * aContainer, nsISupports * aExtraInfo, nsIStreamListener * * aDocListener, nsIContentViewer * * aDocViewer) Line 183	C++
mozilla::mailnews::MailNewsDLF::CreateInstance(const char * aCommand, nsIChannel * aChannel, nsILoadGroup * aLoadGroup, const nsACString_internal & aContentType, nsIDocShell * aContainer, nsISupports * aExtraInfo, nsIStreamListener * * aDocListener, nsIContentViewer * * aDocViewer) Line 69	C++
nsDocShell::NewContentViewerObj(const nsACString_internal & aContentType, nsIRequest * aRequest, nsILoadGroup * aLoadGroup, nsIStreamListener * * aContentHandler, nsIContentViewer * * aViewer) Line 9225	C++
nsDocShell::CreateContentViewer(const nsACString_internal & aContentType, nsIRequest * aRequest, nsIStreamListener * * aContentHandler) Line 9026	C++
nsDSURIContentListener::DoContent(const nsACString_internal & aContentType, bool aIsContentPreferred, nsIRequest * aRequest, nsIStreamListener * * aContentHandler, bool * aAbortProcess) Line 129	C++
nsDocumentOpenInfo::TryContentListener(nsIURIContentListener * aListener, nsIChannel * aChannel) Line 720	C++
nsDocumentOpenInfo::DispatchContent(nsIRequest * request, nsISupports * aCtxt) Line 398	C++
nsDocumentOpenInfo::OnStartRequest(nsIRequest * request, nsISupports * aCtxt) Line 259	C++
nsMsgProtocol::OnStartRequest(nsIRequest * request, nsISupports * ctxt) Line 314	C++
nsMailboxProtocol::OnStartRequest(nsIRequest * request, nsISupports * ctxt) Line 204	C++

So for all I can see, no charset is passed into M-C from anywhere in Mailnews, the SetDocumentCharset() is called from within the HTML5 parser. The only call in the stack comes from here:
MailNewsDLF::CreateInstance():
https://dxr.mozilla.org/comm-central/rev/d1960937860dbe784e21b43ce136253fc70e77e2/mailnews/base/src/MailNewsDLF.cpp#69

I noted that nsHtml5StreamParser::SetDocumentCharset() doesn't do any checking on the value passed it, it blindly assigns it to mCharset.

So what's next?
(Assignee)

Comment 15

a year ago
(In reply to Jorg K (GMT+2, PTO during summer) from comment #13)
> Henri, thanks for the analysis and please excuse the layman questions that
> follow.
> 
> (In reply to Henri Sivonen (:hsivonen) from comment #12)
> > Since this crash isn't seen in Firefox, chances are that mailnews manages to
> > make mCharset contain a string that's not a Gecko-canonical encoding name.
> > 
> > It looks like the documentation of SetDocumentCharset fails to say that it
> > these days takes a Gecko-canonical encoding name instead of a label. Sorry.
> I'm a little confused by SetDocumentCharset(). Looking at DXR, I can see
> many:
> https://dxr.mozilla.org/comm-central/
> search?q=%3A%3ASetDocumentCharset&redirect=false
> Are we talking about nsHtml5Parser::SetDocumentCharset() or another one?

We are talking about nsHtml5Parser::SetDocumentCharset() which may appears as nsIParser::SetDocumentCharset() to the caller.

> > The only caller in m-c passes a Gecko-canonical name.
> Where is that only caller?

nsHTMLDocument::StartDocumentLoad(). Well, at least in the form it runs in m-c. There is a mailnews-only bit there, which is likely the problem here.

> What are Gecko-canonical names?

The kind of name that you can append to NS_UNICODEDECODER_CONTRACTID_BASE so that the result is a valid contract id. For encodings that are in the Encoding Standard, this is the same as the concept _name_ in the Encoding Standard, except for GBK, whose Encoding Standard name is GBK but whose Gecko-canonical name is, for the time being, gbk (bug 1227006).

> Is there a list?

https://dxr.mozilla.org/mozilla-central/source/intl/uconv/nsUConvModule.cpp#125

Plus the ones that c-c adds.

> Can I test
> that I am using a valid parameter.

For now:
  nsAutoCString contractId(NS_UNICODEDECODER_CONTRACTID_BASE);
  contractId.Append(aEncoding);
  nsCOMPtr<nsIUnicodeDecoder> decoder = do_CreateInstance(contractId.get());

If decoder is not null, aEncoding was a Gecko-canonical name.

Bug 1261841 will change this.

> What's the difference between a name and
> a label? To me, it's "UTF-8", "ISO-8859-1", "ISO-2022-JP", "GBK", etc.).

Each encoding has exactly one name which is case-sensitive and doesn't allow spaces. A label is an ASCII-case-insensitive potentially whitespace-padded string used in a protocol to designate an encoding.

In a better-designed code base, the places where we use names to designate encodings would use either enum values or memory addresses to singleton objects. However, labels have to be strings, because they are text extracted from protocol text.

"UTF-8" is a Gecko-canonical (and Encoding Standard) name. "utf-8" or "  UTF-8  " are not.

"UTF-8", "utf-8", "UtF-8", "     UTF-8    " and "unicode-1-1-utf-8" are all labels for UTF-8.

The name of an encoding is also one of its labels, except in the case of the replacement encoding whose name is not one of its labels.

> (Without looking at it, I assume when rendering an e-mail, we pass on any
> charset information contained in the message. Does FF validate the charset
> information it finds on a webpage?)

Yes. The thing that's in MIME or HTTP protocol text is a label. To go from a label to a name in the email context, use nsICharsetConverterManager::GetCharsetAlias(). (In the Web context, the method to use is mozilla::dom::EncodingUtils::FindEncodingForLabel().)

(Bug 1261841 will make Gecko no longer accept email-only names, though, so you'll need to make sure UTF-7 is handled by other means than by passing the name to m-c code. I'm pretty sure it already is, though.)

> Sure, we
> can track down where the bad data is passed in, but shouldn't a crash be
> avoided under all circumstances?

No, because the API where the bad data comes in is an internal API (but c-c calls internal APIs), but we could make the crash a release assertion to highlight that bad data has been passed in.
Flags: needinfo?(hsivonen)

Comment 16

a year ago
Henri, thank you for your continued support and patient explanations.

If I understand it correctly, your argument is that C-C calls nsHtml5Parser::SetDocumentCharset() incorrectly.

Well, there is no such call:
https://dxr.mozilla.org/comm-central/search?q=SetDocumentCharset&redirect=false

C-C has a function nsMsgCompose::SetDocumentCharset() which gets called within C-C, and nsMessenger::SetDocumentCharset() which is called from JS via messenger.setDocumentCharset(...);

I don't see any calls to nsHtml5Parser::SetDocumentCharset().

Looking at the stack trace in comment #14 we see:
nsHtml5StreamParser::SetDocumentCharset() Line 160 C++
nsHtml5Parser::SetDocumentCharset() Line 105 C++
nsHTMLDocument::StartDocumentLoad() Line 765 C++
nsContentDLF::CreateDocument() Line 375 C++
nsContentDLF::CreateInstance() Line 183 C++
mozilla::mailnews::MailNewsDLF::CreateInstance() Line 69 C++  <=== C-C calls M-C

As per your comment #15, nsHTMLDocument::StartDocumentLoad() calls nsHtml5Parser::SetDocumentCharset().

You said in comment #12:
===
I suggest you audit the code path that ends up calling SetDocumentCharset() on the HTML parser to see where a label as opposed to a Gecko-canonical name is passed and fix that.
===
I did and I conclude that C-C cannot be the culprit here. What am I missing?

In comment #15 you pointed to the bits labelled "mailnews" in M-C:
In nsHTMLDocument::StartDocumentLoad():
https://dxr.mozilla.org/mozilla-central/rev/f97a056ae6235de7855fd8aaa04fb1c8d183bd06/dom/html/nsHTMLDocument.cpp#708

In nsHTMLDocument::TryUserForcedCharset():
https://dxr.mozilla.org/mozilla-central/rev/f97a056ae6235de7855fd8aaa04fb1c8d183bd06/dom/html/nsHTMLDocument.cpp#323

Do you know what they are about? Looking at blame:
https://hg.mozilla.org/mozilla-central/annotate/f97a056ae6235de7855fd8aaa04fb1c8d183bd06/dom/html/nsHTMLDocument.cpp#l706
doesn't give me any clues.

How do we proceed?
Flags: needinfo?(hsivonen)
(Assignee)

Comment 17

a year ago
(In reply to Jorg K (GMT+2, PTO during summer) from comment #16)
> https://dxr.mozilla.org/mozilla-central/rev/
> f97a056ae6235de7855fd8aaa04fb1c8d183bd06/dom/html/nsHTMLDocument.cpp#708

That one looks OK. c-c setters all use "UTF-8":
https://dxr.mozilla.org/comm-central/search?q=%2Bcallers%3A%22nsMessenger%3A%3ASetDisplayCharset%28const+nsACString_internal+%26%29%22

> In nsHTMLDocument::TryUserForcedCharset():
> https://dxr.mozilla.org/mozilla-central/rev/
> f97a056ae6235de7855fd8aaa04fb1c8d183bd06/dom/html/nsHTMLDocument.cpp#323

The only caller in c-c passes "UTF-8":
https://dxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgPrintEngine.cpp#518

> How do we proceed?

I'd double-check that the above callers are the only c-c setters of "hint" or "force" charset and not just quirks of DXR hiding stuff.

I don't have a good suggestion for the step after that at this time. I will need to think about it.
Flags: needinfo?(hsivonen)

Comment 18

a year ago
I'm not sure why you were looking at SetDisplayCharset() now. It calls SetHintCharacterSet().
Also not sure why you were looking at SetForceCharacterSet() (in nsMsgPrintEngine.cpp).

For the latter please note that is it called here when the compose window starts, as the bug summary says (forwarding or editing draft).

https://dxr.mozilla.org/comm-central/rev/bde9f7838d8c5003a27b961104c32b00c9109da5/editor/ui/composer/content/editingOverlay.js#193

https://dxr.mozilla.org/comm-central/rev/f97a056ae6235de7855fd8aaa04fb1c8d183bd06/mozilla/docshell/base/nsIContentViewer.idl#213

They are poorly documented, should they receive canonical names or labels?

Could that be the cause?
Flags: needinfo?(hsivonen)

Comment 19

a year ago
If I see this correctly, C-C passes in any old rubbish and that gets picked up here without any checking:
https://dxr.mozilla.org/mozilla-central/rev/f97a056ae6235de7855fd8aaa04fb1c8d183bd06/layout/base/nsDocumentViewer.cpp#3169
and then used:
https://dxr.mozilla.org/mozilla-central/rev/f97a056ae6235de7855fd8aaa04fb1c8d183bd06/dom/html/nsHTMLDocument.cpp#324

Is that it? SetForceCharacterSet() expects a "good" canonical name but gets anything and doesn't check it?

I suggest we beef up SetForceCharacterSet() to get the canonical name from what is passed in, and if that fails, return an error or throw or make it known otherwise that something is wrong.

Comment 20

a year ago
Hmm, this
https://dxr.mozilla.org/comm-central/rev/bde9f7838d8c5003a27b961104c32b00c9109da5/editor/ui/composer/content/editingOverlay.js#193
doesn't get called in TB (SM only), but this one does:

https://dxr.mozilla.org/comm-central/rev/bde9f7838d8c5003a27b961104c32b00c9109da5/mailnews/compose/src/nsMsgCompose.cpp#1618

I'm not sure under what condition this is run.

Comment 21

a year ago
OK, I found it.

I artificially pass in "WiNdOwS-1252" here:
https://dxr.mozilla.org/comm-central/rev/bde9f7838d8c5003a27b961104c32b00c9109da5/mailnews/compose/src/nsMsgCompose.cpp#1618

Result in debug mode:
MOZ_ASSERT(decoder, "Tried to create decoder for unknown encoding.");

In release mode it would have crashed later.

So do I fix this caller to turn a label into a canonic name like you indicated using
nsICharsetConverterManager::GetCharsetAlias()
or will you fix the callee?

Comment 22

a year ago
Created attachment 8783337 [details] [diff] [review]
Propose fix (v1).

I'm not sure you're a Mailnews reviewer, but I guess you know more about this stuff then most of the Mailnews peers ;-)
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
Flags: needinfo?(plemmon)
Flags: needinfo?(hsivonen)
Attachment #8783337 - Flags: review?(hsivonen)

Comment 23

a year ago
Created attachment 8783355 [details] [diff] [review]
Proposed fix (v1b).

Added comment and changed reviewer to Kent.
Attachment #8783337 - Attachment is obsolete: true
Attachment #8783337 - Flags: review?(hsivonen)
Attachment #8783355 - Flags: review?(rkent)
Attachment #8783355 - Flags: feedback?(hsivonen)

Comment 24

a year ago
Comment on attachment 8783355 [details] [diff] [review]
Proposed fix (v1b).

The m-c methods should really reject invalid charsets in the entrypoint functions (like SetForceCharacterSet). Or is it intentional that *Force* means let the caller set anything even if invalid with unexpected results?

Comment 25

a year ago
Well, looks like SetForceCharacterSet() only accepts Gecko-canonical names. This is undocumented and if you don't pass one of these, you get a crash later on. Not ideal IMHO, but we should really pass the right string, since for example "utf-8" or "Windows-1252" is simply not understood and won't work.

Comment 26

a year ago
Yes, we should of course fix our passed value.
But a proper interface should reject invalid values instead of crashing on them.

Comment 27

a year ago
BTW, no regression here. Maybe M-C behaviour changed at some point, but the C-C part was always wrong.
Keywords: regression, regressionwindow-wanted
We (m-c) can't prevent add-ons from passing an invalid value until we drop support for XUL add-ons.

We should either
1. add a parameter check to make sure non-canonical values are not sneaking in, or
2. add an assertion, make nsIContentViewer::forceCharset [noscript], and declare that this is an internal method and it is caller's responsibility to validate the parameter.

But this is out of scope of this bug.
(Assignee)

Comment 29

a year ago
Created attachment 8783581 [details] [diff] [review]
Validate the charset argument of scriptable methods of nsIContentViewer

(In reply to Jorg K (GMT+2, PTO during summer) from comment #20)
> Hmm, this
> https://dxr.mozilla.org/comm-central/rev/
> bde9f7838d8c5003a27b961104c32b00c9109da5/editor/ui/composer/content/
> editingOverlay.js#193
> doesn't get called in TB (SM only), but this one does:
> 
> https://dxr.mozilla.org/comm-central/rev/
> bde9f7838d8c5003a27b961104c32b00c9109da5/mailnews/compose/src/nsMsgCompose.
> cpp#1618
> 
> I'm not sure under what condition this is run.

My failure to locate these yesterday makes me distrustful of DXR search (or my ability to use DXR search). :-(

(In reply to Masatoshi Kimura [:emk] from comment #28)
> We (m-c) can't prevent add-ons from passing an invalid value until we drop
> support for XUL add-ons.

Good point. Indeed, since this stuff is scriptable and XPCOM add-ons aren't gone yet, we should validate the input in m-c.

The effect this has on c-c is:
 1) The methods can now return NS_ERROR_INVALID_ARG if the argument is none of a) the empty string, b) a label, or c) a Gecko-canonical name.
 2) Since the above is checked in a Web-centric way, UTF-7 is treated as NS_ERROR_INVALID_ARG.

(In reply to Jorg K (GMT+2, PTO during summer) from comment #21)
> So do I fix this caller to turn a label into a canonic name like you
> indicated using
> nsICharsetConverterManager::GetCharsetAlias()
> or will you fix the callee?

I'm fixing the callee. But note that now passing "UTF-7" no longer works. (Not sure if it really worked before.)
Attachment #8783581 - Flags: review?(VYV03354)
(Assignee)

Comment 30

a year ago
Comment on attachment 8783355 [details] [diff] [review]
Proposed fix (v1b).

I suggest working backwards to wherever msgCharSet is extracted from MIME and mapping whatever is extracted from MIME to a canonical name as early as possible.
Attachment #8783355 - Flags: feedback?(hsivonen) → feedback-

Comment 31

a year ago
(In reply to Henri Sivonen (:hsivonen) from comment #30)
> I suggest working backwards to wherever msgCharSet is extracted from MIME
> and mapping whatever is extracted from MIME to a canonical name as early as
> possible.
So that would be about here:
https://dxr.mozilla.org/comm-central/rev/02115be839cfcd0ba3b44f7fe480974496f0a86d/mailnews/compose/src/nsMsgCompose.cpp#1606
a few lines before.

There are a few more places that need inspection then:
https://dxr.mozilla.org/comm-central/search?q=m_compFields-%3EGetCharacterSet&redirect=false

Comment 32

a year ago
Comment on attachment 8783355 [details] [diff] [review]
Proposed fix (v1b).

I'll raise a new bug for the TB work. With a patch for M-C attached, I'll move this to Core::Layout.
Attachment #8783355 - Attachment is obsolete: true
Attachment #8783355 - Flags: review?(rkent)

Updated

a year ago
Component: Message Compose Window → Layout
Product: Thunderbird → Core
Version: 38 Branch → Trunk

Updated

a year ago
Assignee: jorgk → hsivonen

Updated

a year ago
Summary: Crash forwarding or sending a message, or editting draft that contains video clip attachment → Calling SetForceCharacterSet() not using a Geck-canonical name leads to crash (was: Thunderbird: Crash forwarding or sending a message, or editing draft ...)
Whiteboard: [regression:TB45?]

Comment 33

a year ago
Raised bug 1297118 for the TB work.
Attachment #8783581 - Flags: review?(VYV03354) → review+
(Assignee)

Updated

a year ago
Component: Layout → Document Navigation
Summary: Calling SetForceCharacterSet() not using a Geck-canonical name leads to crash (was: Thunderbird: Crash forwarding or sending a message, or editing draft ...) → Calling SetForceCharacterSet() not using a Gecko-canonical name leads to crash (was: Thunderbird: Crash forwarding or sending a message, or editing draft ...)
(Assignee)

Comment 34

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/e07157e29deedd1b2bda60c62f307503c558844f
Bug 1191841 - Perform label resolution in scriptable charset setters of nsIContentViewer. r=emk.

Comment 35

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/e07157e29dee
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
status-firefox51: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51

Updated

a year ago
Depends on: 1297761
Thunderbird 51 beta will be the first beta with this, so let's wait until TB 45.6.0 to consider uplifting this to our version branch.
(In reply to Kent James (:rkent) from comment #36)
> Thunderbird 51 beta will be the first beta with this, so let's wait until TB
> 45.6.0 to consider uplifting this to our version branch.

For the record, we didn't get TB 51 beta out yet so this didn't get into 45.6.0. But hopefully 45.7.0!

Comment 38

11 months ago
Pushed to THUNDERBIRD_45_VERBRANCH https://hg.mozilla.org/releases/mozilla-esr45/rev/83ee87b301bd
status-thunderbird_esr45: affected → fixed
tracking-thunderbird_esr45: ? → 51+
You need to log in before you can comment on or make changes to this bug.