Closed Bug 543737 Opened 14 years ago Closed 14 years ago

gloda needs to avoid indexing ridiculously large message bodies (especially from offline store corruption)

Tracking

(blocking-thunderbird3.1 rc1+, thunderbird3.1 rc1-fixed)

Status:

RESOLVED FIXED

Milestone:

Thunderbird 3.1rc1

Tracking Flags:

Tracking

Status

blocking-thunderbird3.1

---

rc1+

thunderbird3.1

---

rc1-fixed

People

(Reporter: gaston, Assigned: asuth)

Details

(Keywords: perf, Whiteboard: [gloda key])

Attachments

(2 files, 2 obsolete files)

a body has limits 14 years ago Andrew Sutherland [:asuth] (he/him) 20.83 KB, patch		Details \| Diff \| Splinter Review
v2 a body has limits 14 years ago Andrew Sutherland [:asuth] (he/him) 22.65 KB, patch		Details \| Diff \| Splinter Review
log of error 14 years ago David :Bienvenu 26.15 KB, text/plain		Details
v3 a body has limits 14 years ago Andrew Sutherland [:asuth] (he/him) 23.84 KB, patch	Bienvenu : review+	Details \| Diff \| Splinter Review

Landry Breuil (:gaston)

Reporter

Description

•

14 years ago

User-Agent:       Midori/0.2.2 (X11; OpenBSD amd64; U; fr-fr) WebKit/531.2+
Build Identifier: Mozilla/5.0 (X11; U; OpenBSD amd64; en-US; rv:1.9.1.7) Gecko/20100129 Lightning/1.0b1 Thunderbird/3.0.1

I'm working on the port of Thunderbird 3.0.1 for OpenBSD, we currently ship 2.0.0.23. The 3.0.1 build is with the embedded sqlite 3.6.20, i also tried with systemwide sqlite 3.6.16 & 3.6.22, both with SQLITE_ENABLE_FTS3 set.

I have 3 imap accounts configured in TB2, with around 5000 messages total. On the server, total Maildir size for all accounts is around 2Gb, and the local cache in ImapMail is around 500M (i didn't browse to each and every dir in each account).
After getting TB3 to build successfully, it started migrating my profile (ie syncing all mails from the server for offline use) and indexing all of them.

Rough values : it indexes approx 1 message every 10 secs, the machine is _totally_ unusable, i stopped the indexing after two hours and the sqlite index is now around 5.6Gb.

Slow indexing, i can understand that, but the index size is just crazy. vacuuming it didn't help btw.

528M    .thunderbird/ci580w50.default/ImapMail                                                                                             
5.6G    .thunderbird/ci580w50.default/global-messages-db.sqlite                                                                            
                                                                                                                                           
echo 'select count(*) from messages;' | sqlite3 .thunderbird/ci580w50.default/global-messages-db.sqlite                                    
2646

I created a new clean profile with the exact same accounts, and after a little while (very short compared to the TB2 migration), syncing & indexing was done okay (around 10 msg/sec indexed), and here are the values :

1019M   .thunderbird/7yh39hls.Test/ImapMail                                                                                                
71.8M   .thunderbird/7yh39hls.Test/global-messages-db.sqlite                                                                               
                                                                                                                                           
echo 'select count(*) from messages;' | sqlite3 .thunderbird/7yh39hls.Test/global-messages-db.sqlite                                       
5121     

I'm fine with telling the users that they have to create a new profile and don't try to migrate their existing profile (or otherwise disable GLODA), but the index size just puzzles me.

Reproducible: Always

Steps to Reproduce:
1. Install and configure thunderbird 2
2. Upgrade to thunderbird 3
3.
Actual Results:  
Index is huge compared to relative ImapMail size :
528M	.thunderbird/ci580w50.default/ImapMail/
5.6G	.thunderbird/ci580w50.default/global-messages-db.sqlite

Expected Results:  
Index should have a more reasonable size.

Ludovic Hirlimann [:Usul]

Updated

•

14 years ago

Keywords: perf

Whiteboard: [223 Migration]

Ludovic Hirlimann [:Usul]

Updated

•

14 years ago

Component: Search → Database

Product: Thunderbird → MailNews Core

QA Contact: search → database

Version: unspecified → 1.9.1 Branch

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 1

•

14 years ago

From offline conversation, it appears the problem is that the 2.0 profile had some serious offline store corruption.

Landry indicates having "~140 messages which have a body size of ~15Mo" where the body "contains a concatenation of lots of messages and attachements".

I think it's pretty clear that gloda should have some form of heuristic limiting the maximum body size it indexes.

bienvenu, for the 2-to-3 migration, though, should we be considering some form of sanity-checking of the offline store?  Gloda could likely participate in that if need be.

Status: UNCONFIRMED → NEW

Ever confirmed: true

OS: OpenBSD → All

Summary: Migrating a Thunderbird 2 profile creates a huge GLODA index → gloda needs to avoid indexing ridiculously large message bodies (especially from offline store corruption)

Whiteboard: [223 Migration] → [gloda key]

Version: 1.9.1 Branch → Trunk

David :Bienvenu

Comment 2

•

14 years ago

So, sanity checking of the offline store message size would be great, but we have to deal with Exchange's often wildly incorrect message sizes (i.e., the sanest sanity check is to compare the offline msg size with the online message size).

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 3

•

14 years ago

For offline store checking we could leverage the fact that offline messages should not overlap each other in the store.  Of course, this only works if the  store was never compacted.  Once the compaction happens we just have absurdly large messages... and I think we don't escape From in the offline store so we can't super easily detect from the contents what has gone wrong?

For overlap detection the naive implementation would be to sort the messages by offset and then make sure that 1) no one else has the same offset and 2) the size does not take us beyond the start of the next message.  In the worst case, that could get expensive, but that could be mitigated by a data-structure like we use for newsrc ranges or selection that fuses adjacent ranges together.

David :Bienvenu

Comment 4

•

14 years ago

(In reply to comment #3)
> For offline store checking we could leverage the fact that offline messages
> should not overlap each other in the store.  Of course, this only works if the 
> store was never compacted.  Once the compaction happens we just have absurdly
> large messages... and I think we don't escape From in the offline store so we
> can't super easily detect from the contents what has gone wrong?

I think that's right.
> 
> For overlap detection the naive implementation would be to sort the messages by
> offset and then make sure that 1) no one else has the same offset and 2) the
> size does not take us beyond the start of the next message.  In the worst case,
> that could get expensive, but that could be mitigated by a data-structure like
> we use for newsrc ranges or selection that fuses adjacent ranges together.

Once compaction happens, I think we're horked, like you say. We could have a heuristic like the message can't have an offline size 3x the message size. Even Exchange doesn't get it that wrong...

Wayne Mery (:wsmwk)

Updated

•

14 years ago

Severity: normal → major

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

14 years ago

Assignee: nobody → bugmail

Status: NEW → ASSIGNED

blocking-thunderbird3.1: --- → rc1+

Target Milestone: --- → Thunderbird 3.1rc1

Dan Mosedale (:dmosedale, :dmose)

Comment 5

•

14 years ago

I'm having a hard time convincing myself that this meets either of our current blocking criteria:

a) make the upgrade experience from TB2 very painful for a large number of users

or 

b) be a new, reproducible, severe quality issue (eg dataloss, frequent crashes)

Am I missing something?

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 6

•

14 years ago

I think a non-trivial number of TB2 upgraders are going to have corrupt messages that end up disturbingly large and that will have performance and result quality fall-out.  (Memory usage during indexing, SQLite performance during querying, result quality problems in terms of false positives, et.c)

But honestly I was abusing the blocking flag as an important to-do list since this is definitely on the easy end of the spectrum with serious potential to eliminate a whole set of performance problems.  We can downgraded to needed if you like (or something else that is full of things we are really going to try and implement).

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

14 years ago

Whiteboard: [gloda key] → [eta Saturday May 8][gloda key]

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

14 years ago

Whiteboard: [eta Saturday May 8][gloda key] → [eta Monday May 10][gloda key]

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

14 years ago

Whiteboard: [eta Monday May 10][gloda key] → [eta Wed May 12][gloda key]

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 7

•

14 years ago

I have a fix with tests but unfortunately the tests are revealing a subtle glitch in the imap online-to-offline case that previous tests did not catch.  Fix by tomorrow afternoon.

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

14 years ago

Whiteboard: [eta Wed May 12][gloda key] → [final patch eta Thu May 13][gloda key]

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 8

•

14 years ago

Attached patch a body has limits (obsolete) — Details — Splinter Review

Attachment #445258 - Flags: review?(bienvenu)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 9

•

14 years ago

I had to punt on the more thorough performance impact analysis of this, but I'll run with my cheap systemtap script to sanity check things.

Whiteboard: [final patch eta Thu May 13][gloda key] → [has patch needs review bienvenu][gloda key]

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 10

•

14 years ago

Attached patch v2 a body has limits (obsolete) — Details — Splinter Review

Whoops.  The bulk indexing I was doing for performance analysis revealed that the fix surfaced an underlying assumption violation going on...

We only wanted/thought we only got to stream the bodies of text/plain and text/html.  It turns out that all text/* is getting streamed.  In that case, there is no body attribute because we don't care about it, but "+=" when provided with a string on the right and null/undefined on the left just coerces null/undefined into a string.  This was observed on darcs patches which end up text/x-darcs-patch.

I just changed the logic to inside the JS mime emitter to guard against this.

Attachment #445258 - Attachment is obsolete: true

Attachment #445266 - Flags: review?(bienvenu)

Attachment #445258 - Flags: review?(bienvenu)

David :Bienvenu

Comment 11

•

14 years ago

Comment on attachment 445258 [details] [diff] [review]
a body has limits

punchline missing:

+   *  does; when indexing an already existing object, all mutations happen on
+   *  a clone of the existing object so that

xpcshell tests have been failing for me in gloda so it may be a bit tricky for me to test the tests.

20K is a pretty reasonable limit, IMO.

David :Bienvenu

Comment 12

•

14 years ago

Attached file log of error — Details

I tried just running test_mime_emitter.js and got a bunch of failures. This is on comm-central-1.9.2 and moz-central-1.9.2

David :Bienvenu

Comment 13

•

14 years ago

I get the same failures w/o your patch, but that doesn't help me run the tests :-(

David :Bienvenu

Comment 14

•

14 years ago

(In reply to comment #13)
> I get the same failures w/o your patch, but that doesn't help me run the tests
> :-(

er, scratch that - I don't see the failures w/o the patch.

David :Bienvenu

Comment 15

•

14 years ago

It looks like _writeBody isn't getting set correctly (or not read correctly). If I comment out that part of the conditional, test_mime_emitter.js succeeds.

  writeBody: function mime_emitter_writeBody(aBuf, aSize, aOutAmountWritten) {
    if (/* this._writeBody && */

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 16

•

14 years ago

I'll try a windows build...

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 17

•

14 years ago

It was a cross-platform control-flow bone-headed move on my part.  Sadly, with that fixed there's still an order of operations that is confusing me since most of my emitter memory is now in long term storage:

"Sad Llama In Your Car" is doing this:

*** CONTENT TYPE: multipart/related
***

*** CONTENT TYPE: text/html
***

*** CONTENT TYPE: image/png
***
&&& writeBody 0 bytes with write: false
&&& writeBody 6 bytes with write: false
&&& writeBody 0 bytes with write: false
&&& writeBody 6 bytes with write: false
&&& writeBody 0 bytes with write: false
&&& writeBody 7 bytes with write: false
&&& writeBody 0 bytes with write: false
&&& writeBody 6 bytes with write: false
&&& writeBody 16 bytes with write: false
&&& writeBody 7 bytes with write: false
&&& writeBody 0 bytes with write: false
&&& writeBody 7 bytes with write: false
&&& writeBody 1 bytes with write: false
&&& writeBody 1 bytes with write: false


I expect this is the normal multipart/related 'defer processing the body' logic, but I'm a bit confused as to how it ever worked if the image/png was clobbering...

I will research more.

Oh, and as you can tell from the debug, this suggests that the standard libmime function call costs are probably pretty high but the XPConnect crossing costs for HTML messages have to be off the chart.

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 18

•

14 years ago

ah, addHeaderField is fixing things up.  it needs to play a part in the writeBody game.

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 19

•

14 years ago

Attached patch v3 a body has limits — Details — Splinter Review

Sorry about the glitch on the previous patch; I must have done something dumb like run the tests without the patch applied.

My rationale for the _writeBody flag is that it's not really a surprise that a lot of calls get made to writeBody (although the 0-length calls in the excessively writey HTML logic are somewhat surprising) and I was trying to optimize that check.

Attachment #445266 - Attachment is obsolete: true

Attachment #445516 - Flags: review?(bienvenu)

Attachment #445266 - Flags: review?(bienvenu)

David :Bienvenu

Comment 20

•

14 years ago

Comment on attachment 445516 [details] [diff] [review]
v3 a body has limits

much better :-)

Attachment #445516 - Flags: review?(bienvenu) → review+

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 21

•

14 years ago

pushed to comm-1.9.2:
http://hg.mozilla.org/releases/comm-1.9.2/rev/0ab2eb35d388

pushed to comm-central despite the tree being seriously messed up:
http://hg.mozilla.org/comm-central/rev/991adaa32680

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

status-thunderbird3.1: --- → rc1-fixed

Resolution: --- → FIXED

Whiteboard: [has patch needs review bienvenu][gloda key] → [gloda key]

Comment hidden (Intermittent Failures Robot)

You need to log in before you can comment on or make changes to this bug.