579870 - crash [@ normalize_character ] looks like a startup crash

Reporter

Description

•

15 years ago

0 thunderbird-bin normalize_character mailnews/extensions/fts3/src/Normalize.c:1960 1 thunderbird-bin isDelim mailnews/extensions/fts3/src/fts3_porter.c:849 2 thunderbird-bin porterNext mailnews/extensions/fts3/src/fts3_porter.c:1021 3 libsqlite3.dylib fts3PendingTermsAdd db/sqlite3/src/sqlite3.c:104031 4 libsqlite3.dylib fts3UpdateMethod db/sqlite3/src/sqlite3.c:104109 5 libsqlite3.dylib sqlite3Step db/sqlite3/src/sqlite3.c:57829 6 libsqlite3.dylib sqlite3_step db/sqlite3/src/sqlite3.c:50662 7 thunderbird-bin mozilla::storage::AsyncExecuteStatements::executeStatement storage-backport/src/mozStorageAsyncStatementExecution.cpp:345 8 thunderbird-bin mozilla::storage::AsyncExecuteStatements::executeAndProcessStatement storage-backport/src/mozStorageAsyncStatementExecution.cpp:292 9 thunderbird-bin mozilla::storage::AsyncExecuteStatements::bindExecuteAndProcessStatement storage-backport/src/mozStorageAsyncStatementExecution.cpp:274 10 thunderbird-bin mozilla::storage::AsyncExecuteStatements::Run storage-backport/src/mozStorageAsyncStatementExecution.cpp:597 11 libxpcom_core.dylib nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:527 12 libxpcom_core.dylib NS_ProcessNextEvent_P nsThreadUtils.cpp:250 13 libxpcom_core.dylib nsThread::ThreadFunc xpcom/threads/nsThread.cpp:254 14 libnspr4.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:228 15 libSystem.B.dylib _pthread_start 16 libSystem.B.dylib thread_start

timeless

Comment 1

•

15 years ago

Signature normalize_character UUID 102f276b-6067-48a2-b4bb-017c62100712 Time 2010-07-12 06:08:43.481904 Uptime 18 Last Crash 4355 seconds (1.2 hours) before submission Install Age 25 seconds since version was first installed. Product Thunderbird Version 3.1.1 Build ID 20100701181850 Branch 1.9.2 OS Mac OS X OS Version 10.5.8 9L31a CPU x86 CPU Info GenuineIntel family 6 model 23 stepping 10 Crash Reason EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE Crash Address 0x1 User Comments Processor Notes EMCheckCompatibility False

Blocks: 525537

Ludovic Hirlimann [:Usul]

Reporter

Comment 2

•

15 years ago

Makoto San can you look into it ?

Makoto Kato [:m_kato]

Assignee

Updated

•

15 years ago

Assignee: nobody → m_kato

Makoto Kato [:m_kato]

Assignee

Comment 3

•

15 years ago

Attached patch fix — Details — Splinter Review

Makoto Kato [:m_kato]

Assignee

Updated

•

15 years ago

Attachment #458606 - Flags: review?(bugmail)

Ludovic Hirlimann [:Usul]

Reporter

Comment 4

•

15 years ago

Requesting this for 3.1.2 ...

status-thunderbird3.1: --- → ?

Wayne Mery (:wsmwk)

Comment 5

•

15 years ago

regression per crash-stats. #2 Mac crash for 3.1 http://bit.ly/b8uBKX ... although it doesn't appear to be severe in comparison to #1 crash. but does it hit some intl users hard? is patch needed before upgrading people from 3.0

Keywords: regression

Andrew Sutherland [:asuth] (he/him)

Comment 6

•

15 years ago

Comment on attachment 458606 [details] [diff] [review] fix Thanks for the quick investigation and fix!

Attachment #458606 - Flags: review?(bugmail) → review+

Andrew Sutherland [:asuth] (he/him)

Comment 7

•

15 years ago

From a prioritization perspective, if we consider malicious attacks, this is a very very bad bug. It is not a security problem, as such, because it is not exploitable. It is almost certain to result in a crash because we read off the end of an array and then dereference what we find there. The array we are reading off the end of is not dynamically allocated, and whatever is stored in the next word will always be stored there. (Pretty easy to figure that out but it's not important as it appears to just end up being 0x1.) The danger character is: 0x10000 LINEAR B SYLLABLE B008 A You can see it in here: http://www.unicode.org/charts/PDF/U10000.pdf (We obviously do not want to paste it into this bug because then it will kill your thunderbird.) I doubt the character comes up much on its own intentionally, but would not be surprised for mis-encoded messages to result in its appearance. I would probably vote for making sure we spin a release with this fix before upgrading people from 2.0, as they are even more likely to have old messages with serious encoding prolems.

Group: mozilla-confidential

Status: NEW → ASSIGNED

Andrew Sutherland [:asuth] (he/him)

Updated

•

15 years ago

Attachment #458606 - Flags: approval-thunderbird3.1.2?

David Ascher (:davida)

Updated

•

15 years ago

Group: mozilla-confidential → core-security

David Ascher (:davida)

Comment 8

•

15 years ago

Note: this is a DoS bug.

David :Bienvenu

Comment 9

•

15 years ago

If this truly is frequently a startup or near startup crash, then we should fix it before upgrading users to 3.1.x or the users who see this will get stranded... Is this something that could be avoided by marking a message as potentially bad before trying to index it, so that if we crash, next time we startup, we don't try to index it immediately?

Andrew Sutherland [:asuth] (he/him)

Comment 10

•

15 years ago

(In reply to comment #9) > Is this something that could be avoided by marking a message as potentially bad > before trying to index it, so that if we crash, next time we startup, we don't > try to index it immediately? I had ruled out this particular option as too expensive in the steady state with the assumption we would quickly fix crashers. Specifically, we would need to close out a SQLite transaction or commit the mork database before we index every single message. Although we would not technically need to wait for an fsync to happen, I think in both cases the data sources would effectively trigger fsyncs, slowing gloda down by at least an order of magnitude. Obviously, that assumption sort of glosses over the whole "gloda likes to try and index things it has not indexed at startup". It definitely is not too expensive to have gloda, during its initial indexing sweep, put a file on disk that says "yo yo yo, I'm doing the initial indexing sweep". If we start up and that file is still there, we could defer performing the indexing sweep (or perhaps any event driven indexing) until we complete a "is there a new version of thunderbird available?" check and verify that it has run to completion. Which is, if there is an update, we wait for it to be installed. If there is not an update, then maybe we say, "what the hay" and go index some more. Or maybe we go into a paranoid mode of indexing where we do pay the fsync piper. (Or we pay it on a per-folder basis initially and then next time we just skip that folder or something.) The reality of the situation is that this is probably the last/only crasher bug that the tokenizer will have until it grows some more features. Which means this would really be protection against libmime badness, which may not be a bad thing to protect against. We are limited in what we can do without adding more strings in terms of explaining to the user why gloda has decided to stop indexing certain things.

Jesse Ruderman

Updated

•

15 years ago

Whiteboard: [sg:low (persistent DoS)]

David :Bienvenu

Comment 11

•

15 years ago

Yes, I was mainly thinking about the various libmime crashes we've encountered over the years. I'm sure there are lots of relatively cheap ways to figure out that we didn't shutdown successfully. I really like the idea of gloda going into paranoid mode of indexing when that is detected, since that gives us some breathing room until we can get an update out, and helps user retention.

Andrew Sutherland [:asuth] (he/him)

Updated

•

15 years ago

status-thunderbird3.0: --- → unaffected

Andrew Sutherland [:asuth] (he/him)

Comment 12

•

15 years ago

Okay, so after some investigation of the crash reporting mechanism, it appears like we have two good ways to figure out when things have gone wrong in general: a) We see crash reports in "Crash Reports/{pending,submitted}" and their time-stamp is within some time window. (A few hours?) b) We create a file when we start a gloda indexing sweep and delete it when we complete the indexing sweep. (We may also want to cover event-driven indexing...) This should detect both infinite loops as well as varieties of crashes. We also have a middling-cost mechanism for figuring out what specifically betrayed us: c) Open a file using an unbuffered writer, write the folder and possibly message key of the messages we are indexing as we do that. This should not trigger an fsync but ensure that the OS has the bytes should Thunderbird die. If Thunderbird manages to crash the OS, everyone is out of luck. Since this will result in actual I/O traffic, we do not want to be logging a ton of data. This will result in accumulated disk usage, but we can keep track of that and just truncate the file periodically when the limit is exceeded. Our available mitigation strategies are: 1) Defer gloda indexing for some time duration... 1a) of all messages (sweep and event-driven). 1b) sweep-style; so new (event-driven) messages still get indexed. 1c) of messages outside of the inbox. 1d) of messages in folders we think have betrayed us. 2) Mark messages as deadly. This requires that we were writing the message key (with folder) as we indexed messages. My candidate strategy is: - Always create a file that indicates we are performing an indexing sweep when we are performing an indexing sweep. Write the folder url's unbuffered as we do this. (We could write the id's to be more concise, but the url's are more useful for debugging.) - When we initialize gloda, do the crash check and the indexing sweep check. If either indicates something suspicious happened, crank up our paranoia file output so that every time we index a message (sweep or event-driven) we write the message key (and the folder is in there, but not redundantly). Also, set a preference that tells us to be paranoid and for how long. (Specifically, two pereferences; one that says to be paranoid and one that gives a timestamp for when we should stop being paranoid.) We will also check this and possibly clear the paranoia status. - If our indexing log contains message information, mark the message that killed us as deadly. - If our indexing log contains only folder information, mark the folder that was being indexed at our death as suspicious with an in-memory (non-persisted) flag. - If we believe a recent crash/indexing failure occurred, then defer all gloda indexing for 15 minutes. Once the timer elapses, allow gloda to perform an indexing sweep, but staying out of any suspicious folders. After another 15 minutes, clear the suspicious bits and trigger another indexing sweep. The net results are: - You can always use Thunderbird for at least 15 minutes before it explodes. An active DoS via malicious e-mails would be 15 minutes. A single folder full of infinitely many deadly messages would be 30 minutes. Two folders full of infinitely many deadly messages would be 15 minutes because we will never accumulate enough knowledge to stay out of both of them. - If someone is killing Thunderbird because of gloda (for legit or confused reasons), this will cause gloda to get out of their way for a while. - After the first explosion, every new explosion at least lets us mark deadly messages as deadly. - If the mechanism goes haywire and always activates itself, the side-effect is a limited slowdown to indexing, some approximately fixed disk usage bloat, and delayed gloda responsiveness. The main downside is that my proposal is fairly complex. I think the various bits can be made sufficiently orthogonal that it won't be a nightmare, at least. Feedback appreciated.

Mark Banner (:standard8)

Comment 13

•

15 years ago

So if we do an out-of-sync respin to enable us to fully push out the major update to 3.1.1, are we taking just the crash fix or the gloda robustness fix as well? I would suggest taking just the crash fix, if so, please can we move the discussion on gloda robustness to another bug? - which we probably want to do anyway.

Ludovic Hirlimann [:Usul]

Reporter

Comment 14

•

15 years ago

(In reply to comment #13) > I would suggest taking just the crash fix, if so, please can we move the > discussion on gloda robustness to another bug? - which we probably want to do > anyway. And it would allow more input than from the people cced on this bug and people with bugzilla's security bugs. (In reply to comment #13) > So if we do an out-of-sync respin to enable us to fully push out the major > update to 3.1.1, are we taking just the crash fix or the gloda robustness fix > as well? we should just get that one.

David :Bienvenu

Comment 15

•

15 years ago

The candidate strategy looks reasonable to me. Implied, I assume, is that we ignore deadly messages going forward, at least until the app version changes (perhaps the deadly marking can include the current app version). Does gloda stream more than one message at a time? If so, that complicates the task of figuring out which message is bad, as does the fact that the storage statements execute on a different thread. Or do we make sure to only index one message at a time, soup to nuts, when in paranoid mode?

Andrew Sutherland [:asuth] (he/him)

Comment 16

•

15 years ago

(In reply to comment #15) > The candidate strategy looks reasonable to me. Implied, I assume, is that we > ignore deadly messages going forward, at least until the app version changes > (perhaps the deadly marking can include the current app version). Right. Since we have 31 usable sentinel gloda id values (the first valid one is 32 and 0 means "does not have a gloda id"), we would just change what stands for deadly in the next rev. > Does gloda stream more than one message at a time? If so, that complicates the > task of figuring out which message is bad, as does the fact that the storage > statements execute on a different thread. Or do we make sure to only index one > message at a time, soup to nuts, when in paranoid mode? Gloda does not stream more than one message. The message indexing process undergoes a de facto synchronization with the async database thread when it issues a SELECT to try and see if the message is already indexed (and conversation lookup) using message-id's. As long as we write to the descriptor after we get that back and before we stream the message, we should be fine.

Andrew Sutherland [:asuth] (he/him)

Comment 17

•

15 years ago

(In reply to comment #13) > So if we do an out-of-sync respin to enable us to fully push out the major > update to 3.1.1, are we taking just the crash fix or the gloda robustness fix > as well? Realistically, I think just the crash fix given the current timeline. The flipside is that it doesn't make much sense to pursue a gloda paranoia strategy if we're not going to ship it imminently. Specifically, implementing a pure JS mime parser that meets gloda's needs is in the same ballpark of effort but would eliminate crash potential entirely, provide performance gains (at least when JITing works if not always because of the XPConnect call reduction), and be useful to the raindrop team. (Use of the phrase "meets gloda's needs" is significant in reducing the level of effort. We just want the headers, structure, and body bits as blobs. We do not need to resolve references from HTML subparts to other subparts or otherwise look inside the content blobs. We already post-process the output of libmime to convert HTML to text, and that is the only time we look in the HTML at all.)

Mark Banner (:standard8)

Updated

•

15 years ago

blocking-thunderbird3.1: --- → .2+

status-thunderbird3.1: ? → ---

Mark Banner (:standard8)

Updated

•

15 years ago

Attachment #458606 - Flags: approval-thunderbird3.1.2? → approval-thunderbird3.1.2+

Mark Banner (:standard8)

Comment 18

•

15 years ago

Checked in on trunk and comm-1.9.2, and relbranch for 3.1.2: http://hg.mozilla.org/comm-central/rev/25c94f70ecc4 http://hg.mozilla.org/releases/comm-1.9.2/rev/32877904325c http://hg.mozilla.org/releases/comm-1.9.2/rev/822aba7e0789 (apologies for getting patch attribution wrong on the comm-1.9.2 repository).

Status: ASSIGNED → RESOLVED

Closed: 15 years ago

status-thunderbird3.1: --- → .2-fixed

Resolution: --- → FIXED

Target Milestone: --- → Thunderbird 3.2a1

Nobody; OK to take it and work on it

Updated

•

14 years ago

Crash Signature: [@ normalize_character ]

Daniel Veditz [:dveditz]

Updated

•

14 years ago

Group: core-security