Closed Bug 901514 Opened 11 years ago Closed 11 years ago

Permanent Orange: TEST-UNEXPECTED-FAIL | log4moz.js | gloda.datastore ERROR got error in _asyncTrackerListener.handleError(): 10: disk I/O error

Categories

(MailNews Core :: Database, defect)

defect
Not set
normal

Tracking

(thunderbird24 fixed, thunderbird25 fixed)

RESOLVED FIXED
Thunderbird 26.0
Tracking Status
thunderbird24 --- fixed
thunderbird25 --- fixed

People

(Reporter: standard8, Assigned: standard8)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

Since resolving some of a big bustage, which included the xpcshell reporting more errors than it had previously, we've been seeing these types of failures in some of the gloda tests:

https://tbpl.mozilla.org/php/getParsedLog.php?id=26160366&tree=Thunderbird-Aurora

TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/mailnews/db/gloda/test/unit/test_gloda_content_imap_offline.js | test failed (with xpcshell return code: 0), see following log:
...
TEST-INFO | (xpcshell/head.js) | test MAIN run_test pending (1)
2013-08-05 04:13:40	gloda.datastore	DEBUG	!! mapped 1 from mailbox://nobody@Local%20Folders/Trash
2013-08-05 04:13:40	gloda.datastore	DEBUG	!! mapped 2 from mailbox://nobody@Local%20Folders/Unsent%20Messages
2013-08-05 04:13:40	gloda.datastore	DEBUG	!! mapped 3 from imap://user@localhost/INBOX
2013-08-05 04:13:40	gloda.test	DEBUG	((( Index listener notified! aStatus = 0 waiting: false


TEST-INFO | (xpcshell/head.js) | test pending (2)

TEST-INFO | (xpcshell/head.js) | test _async_driver pending (3)

TEST-INFO | (xpcshell/head.js) | test MAIN run_test finished (3)

TEST-INFO | (xpcshell/head.js) | running event loop

TEST-UNEXPECTED-FAIL | resource:///modules/gloda/log4moz.js | 2013-08-05 04:13:40	gloda.datastore	ERROR	got error in _asyncTrackerListener.handleError(): 10: disk I/O error
 - See following stack:
JS frame :: resource:///modules/gloda/log4moz.js :: TApp_doAppend :: line 925
JS frame :: resource:///modules/gloda/log4moz.js :: App_append :: line 587
JS frame :: resource:///modules/gloda/log4moz.js :: Logger_log :: line 377
JS frame :: resource:///modules/gloda/log4moz.js :: Logger_error :: line 385
JS frame :: resource:///modules/gloda/datastore.js :: <TOP_LEVEL> :: line 1795
native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0
Asuth, any idea where we can look here? I'm a bit lost as to where these are coming from or why.
Flags: needinfo?(bugmail)
could the test machines/a single test machine have full disks or have two xpcshell tests trying to run at the same time in the same dir and fighting each other or something like that?

the extended error codes (which we won't get told) are:
#define SQLITE_IOERR_READ              (SQLITE_IOERR | (1<<8))
#define SQLITE_IOERR_SHORT_READ        (SQLITE_IOERR | (2<<8))
#define SQLITE_IOERR_WRITE             (SQLITE_IOERR | (3<<8))
#define SQLITE_IOERR_FSYNC             (SQLITE_IOERR | (4<<8))
#define SQLITE_IOERR_DIR_FSYNC         (SQLITE_IOERR | (5<<8))
#define SQLITE_IOERR_TRUNCATE          (SQLITE_IOERR | (6<<8))
#define SQLITE_IOERR_FSTAT             (SQLITE_IOERR | (7<<8))
#define SQLITE_IOERR_UNLOCK            (SQLITE_IOERR | (8<<8))
#define SQLITE_IOERR_RDLOCK            (SQLITE_IOERR | (9<<8))
#define SQLITE_IOERR_DELETE            (SQLITE_IOERR | (10<<8))
#define SQLITE_IOERR_BLOCKED           (SQLITE_IOERR | (11<<8))
#define SQLITE_IOERR_NOMEM             (SQLITE_IOERR | (12<<8))
#define SQLITE_IOERR_ACCESS            (SQLITE_IOERR | (13<<8))
#define SQLITE_IOERR_CHECKRESERVEDLOCK (SQLITE_IOERR | (14<<8))
#define SQLITE_IOERR_LOCK              (SQLITE_IOERR | (15<<8))
#define SQLITE_IOERR_CLOSE             (SQLITE_IOERR | (16<<8))
#define SQLITE_IOERR_DIR_CLOSE         (SQLITE_IOERR | (17<<8))
#define SQLITE_IOERR_SHMOPEN           (SQLITE_IOERR | (18<<8))
#define SQLITE_IOERR_SHMSIZE           (SQLITE_IOERR | (19<<8))
#define SQLITE_IOERR_SHMLOCK           (SQLITE_IOERR | (20<<8))
#define SQLITE_IOERR_SHMMAP            (SQLITE_IOERR | (21<<8))
#define SQLITE_IOERR_SEEK              (SQLITE_IOERR | (22<<8))
#define SQLITE_IOERR_DELETE_NOENT      (SQLITE_IOERR | (23<<8))
#define SQLITE_IOERR_MMAP              (SQLITE_IOERR | (24<<8))

All of these things seem like bad things that shouldn't happen.
Flags: needinfo?(bugmail)
Parallel running of tests hasn't been enabled. The disk size shouldn't be an issue as we are running the 2GB and 4GB folder size tests on these same machines.
Blocks: 896738
Ok, so I've been doing some debugging. It looks like this error is being generated with these three INSERT statements:

TEST-INFO | (xpcshell/head.js) | test MAIN run_test pending (1)
2028134784[108119070]: Inited async statement 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (?1, ?2, ?3, ?4, ?5)' (0x108116158)
196612096[10811bff0]: Initialized statement 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (?1, ?2, ?3, ?4, ?5)' (0x1081341a0)
196612096[10811bff0]: sqlite3_trace on 10b730b70 for 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (1, 'mailbox://nobody@Local%20Folders/Trash', 0, 'Trash', -1)'
2013-08-06 14:26:48	gloda.datastore	DEBUG	!! mapped 1 from mailbox://nobody@Local%20Folders/Trash
196612096[10811bff0]: sqlite3_trace on 10b730b70 for 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (2, 'mailbox://nobody@Local%20Folders/Unsent%20Messages', 0, 'Unsent Messages', -1)'
2013-08-06 14:26:48	gloda.datastore	DEBUG	!! mapped 2 from mailbox://nobody@Local%20Folders/Unsent%20Messages
196612096[10811bff0]: sqlite3_trace on 10b730b70 for 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (3, 'imap://user@localhost/INBOX', 0, 'INBOX', 50)'
2013-08-06 14:26:48	gloda.datastore	DEBUG	!! mapped 3 from imap://user@localhost/INBOX
2013-08-06 14:26:48	gloda.test	DEBUG	((( Index listener notified! aStatus = 0 waiting: false

Not quite sure why yet.
Attached file Run with debugging
This is debugging with NSPR_LOG_MODULES=mozStorage:5 and an additional line:

    dbConnection.executeSimpleSQL("PRAGMA vdbe_debug = 1");

added in _createDB in datastore.js

The interesting part is the three rc=1802 lines after each of the sqlite3_trace INSERT statements I referenced in comment 4 above.

Looking at the definitions, I think that equates to SQLITE_IOERR_FSTAT.

Checking the profile directory whilst the test is running, indeed shows that the file is no longer present - something has removed it.
Huh.  So, there's no good reason for the file to go away.  Gloda logging says things like "migrate" and "purged" when it decides to nuke the database.

I thought xpcshell created a physically new directory for each fake-run-dir it uses?  So it shouldn't be any type of lingering thing, right?  And the toolkit logic that helps provide automated vacuuming should still be opt-in exclusively, so this is a big stumpy to me.

Is it possible to run dtrace scripts?  The failure you reported was on OS X, and it can easily tell us who is being a jerk.  You can probably even add a ustack() command:

see:
http://superuser.com/questions/164749/how-does-one-find-who-or-what-service-deleted-a-particular-file-or-folder-in-s
http://chihungchan.blogspot.com/2009/05/which-process-deleted-my-file.html
Attached patch The fixSplinter Review
The problem here is that the initialize for mailDirService.js is being called twice - I think I centralized it in the head_gloda.js file, but didn't realise that gloda was also pulling in head_server.js from the imap tests.

This fixes a few of the failures for me.
Assignee: nobody → mbanner
Status: NEW → ASSIGNED
Attachment #786310 - Flags: review?(Pidgeot18)
Blocks: 901959
Attachment #786310 - Flags: review?(Pidgeot18) → review+
https://hg.mozilla.org/comm-central/rev/90e049357560
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 26.0
Comment on attachment 786310 [details] [diff] [review]
The fix

[Triage Comment]
Looking good on trunk (Mac debug failures down to 7!). Getting this onto branches.
Attachment #786310 - Flags: approval-comm-beta+
Attachment #786310 - Flags: approval-comm-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: