Last Comment Bug 901514 - Permanent Orange: TEST-UNEXPECTED-FAIL | log4moz.js | gloda.datastore ERROR got error in _asyncTrackerListener.handleError(): 10: disk I/O error
: Permanent Orange: TEST-UNEXPECTED-FAIL | log4moz.js | gloda.datastore ERROR g...
Status: RESOLVED FIXED
: intermittent-failure
Product: MailNews Core
Classification: Components
Component: Database (show other bugs)
: Trunk
: All All
: -- normal (vote)
: Thunderbird 26.0
Assigned To: Mark Banner (:standard8, afk until Dec)
:
:
Mentors:
Depends on:
Blocks: 896738 901959
  Show dependency treegraph
 
Reported: 2013-08-05 06:54 PDT by Mark Banner (:standard8, afk until Dec)
Modified: 2013-08-06 10:53 PDT (History)
2 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
fixed
fixed


Attachments
Run with debugging (539.29 KB, text/plain)
2013-08-06 07:49 PDT, Mark Banner (:standard8, afk until Dec)
no flags Details
The fix (1003 bytes, patch)
2013-08-06 08:25 PDT, Mark Banner (:standard8, afk until Dec)
Pidgeot18: review+
standard8: approval‑comm‑aurora+
standard8: approval‑comm‑beta+
Details | Diff | Splinter Review

Description Mark Banner (:standard8, afk until Dec) 2013-08-05 06:54:55 PDT
Since resolving some of a big bustage, which included the xpcshell reporting more errors than it had previously, we've been seeing these types of failures in some of the gloda tests:

https://tbpl.mozilla.org/php/getParsedLog.php?id=26160366&tree=Thunderbird-Aurora

TEST-UNEXPECTED-FAIL | /Users/cltbld/talos-slave/test/build/xpcshell/tests/mailnews/db/gloda/test/unit/test_gloda_content_imap_offline.js | test failed (with xpcshell return code: 0), see following log:
...
TEST-INFO | (xpcshell/head.js) | test MAIN run_test pending (1)
2013-08-05 04:13:40	gloda.datastore	DEBUG	!! mapped 1 from mailbox://nobody@Local%20Folders/Trash
2013-08-05 04:13:40	gloda.datastore	DEBUG	!! mapped 2 from mailbox://nobody@Local%20Folders/Unsent%20Messages
2013-08-05 04:13:40	gloda.datastore	DEBUG	!! mapped 3 from imap://user@localhost/INBOX
2013-08-05 04:13:40	gloda.test	DEBUG	((( Index listener notified! aStatus = 0 waiting: false


TEST-INFO | (xpcshell/head.js) | test pending (2)

TEST-INFO | (xpcshell/head.js) | test _async_driver pending (3)

TEST-INFO | (xpcshell/head.js) | test MAIN run_test finished (3)

TEST-INFO | (xpcshell/head.js) | running event loop

TEST-UNEXPECTED-FAIL | resource:///modules/gloda/log4moz.js | 2013-08-05 04:13:40	gloda.datastore	ERROR	got error in _asyncTrackerListener.handleError(): 10: disk I/O error
 - See following stack:
JS frame :: resource:///modules/gloda/log4moz.js :: TApp_doAppend :: line 925
JS frame :: resource:///modules/gloda/log4moz.js :: App_append :: line 587
JS frame :: resource:///modules/gloda/log4moz.js :: Logger_log :: line 377
JS frame :: resource:///modules/gloda/log4moz.js :: Logger_error :: line 385
JS frame :: resource:///modules/gloda/datastore.js :: <TOP_LEVEL> :: line 1795
native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0
Comment 1 Mark Banner (:standard8, afk until Dec) 2013-08-05 06:55:48 PDT
Asuth, any idea where we can look here? I'm a bit lost as to where these are coming from or why.
Comment 2 Andrew Sutherland [:asuth] 2013-08-05 07:58:54 PDT
could the test machines/a single test machine have full disks or have two xpcshell tests trying to run at the same time in the same dir and fighting each other or something like that?

the extended error codes (which we won't get told) are:
#define SQLITE_IOERR_READ              (SQLITE_IOERR | (1<<8))
#define SQLITE_IOERR_SHORT_READ        (SQLITE_IOERR | (2<<8))
#define SQLITE_IOERR_WRITE             (SQLITE_IOERR | (3<<8))
#define SQLITE_IOERR_FSYNC             (SQLITE_IOERR | (4<<8))
#define SQLITE_IOERR_DIR_FSYNC         (SQLITE_IOERR | (5<<8))
#define SQLITE_IOERR_TRUNCATE          (SQLITE_IOERR | (6<<8))
#define SQLITE_IOERR_FSTAT             (SQLITE_IOERR | (7<<8))
#define SQLITE_IOERR_UNLOCK            (SQLITE_IOERR | (8<<8))
#define SQLITE_IOERR_RDLOCK            (SQLITE_IOERR | (9<<8))
#define SQLITE_IOERR_DELETE            (SQLITE_IOERR | (10<<8))
#define SQLITE_IOERR_BLOCKED           (SQLITE_IOERR | (11<<8))
#define SQLITE_IOERR_NOMEM             (SQLITE_IOERR | (12<<8))
#define SQLITE_IOERR_ACCESS            (SQLITE_IOERR | (13<<8))
#define SQLITE_IOERR_CHECKRESERVEDLOCK (SQLITE_IOERR | (14<<8))
#define SQLITE_IOERR_LOCK              (SQLITE_IOERR | (15<<8))
#define SQLITE_IOERR_CLOSE             (SQLITE_IOERR | (16<<8))
#define SQLITE_IOERR_DIR_CLOSE         (SQLITE_IOERR | (17<<8))
#define SQLITE_IOERR_SHMOPEN           (SQLITE_IOERR | (18<<8))
#define SQLITE_IOERR_SHMSIZE           (SQLITE_IOERR | (19<<8))
#define SQLITE_IOERR_SHMLOCK           (SQLITE_IOERR | (20<<8))
#define SQLITE_IOERR_SHMMAP            (SQLITE_IOERR | (21<<8))
#define SQLITE_IOERR_SEEK              (SQLITE_IOERR | (22<<8))
#define SQLITE_IOERR_DELETE_NOENT      (SQLITE_IOERR | (23<<8))
#define SQLITE_IOERR_MMAP              (SQLITE_IOERR | (24<<8))

All of these things seem like bad things that shouldn't happen.
Comment 3 Mark Banner (:standard8, afk until Dec) 2013-08-05 08:03:01 PDT
Parallel running of tests hasn't been enabled. The disk size shouldn't be an issue as we are running the 2GB and 4GB folder size tests on these same machines.
Comment 4 Mark Banner (:standard8, afk until Dec) 2013-08-06 06:32:31 PDT
Ok, so I've been doing some debugging. It looks like this error is being generated with these three INSERT statements:

TEST-INFO | (xpcshell/head.js) | test MAIN run_test pending (1)
2028134784[108119070]: Inited async statement 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (?1, ?2, ?3, ?4, ?5)' (0x108116158)
196612096[10811bff0]: Initialized statement 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (?1, ?2, ?3, ?4, ?5)' (0x1081341a0)
196612096[10811bff0]: sqlite3_trace on 10b730b70 for 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (1, 'mailbox://nobody@Local%20Folders/Trash', 0, 'Trash', -1)'
2013-08-06 14:26:48	gloda.datastore	DEBUG	!! mapped 1 from mailbox://nobody@Local%20Folders/Trash
196612096[10811bff0]: sqlite3_trace on 10b730b70 for 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (2, 'mailbox://nobody@Local%20Folders/Unsent%20Messages', 0, 'Unsent Messages', -1)'
2013-08-06 14:26:48	gloda.datastore	DEBUG	!! mapped 2 from mailbox://nobody@Local%20Folders/Unsent%20Messages
196612096[10811bff0]: sqlite3_trace on 10b730b70 for 'INSERT INTO folderLocations (id, folderURI, dirtyStatus, name,                                     indexingPriority) VALUES         (3, 'imap://user@localhost/INBOX', 0, 'INBOX', 50)'
2013-08-06 14:26:48	gloda.datastore	DEBUG	!! mapped 3 from imap://user@localhost/INBOX
2013-08-06 14:26:48	gloda.test	DEBUG	((( Index listener notified! aStatus = 0 waiting: false

Not quite sure why yet.
Comment 5 Mark Banner (:standard8, afk until Dec) 2013-08-06 07:49:07 PDT
Created attachment 786294 [details]
Run with debugging

This is debugging with NSPR_LOG_MODULES=mozStorage:5 and an additional line:

    dbConnection.executeSimpleSQL("PRAGMA vdbe_debug = 1");

added in _createDB in datastore.js

The interesting part is the three rc=1802 lines after each of the sqlite3_trace INSERT statements I referenced in comment 4 above.

Looking at the definitions, I think that equates to SQLITE_IOERR_FSTAT.

Checking the profile directory whilst the test is running, indeed shows that the file is no longer present - something has removed it.
Comment 6 Andrew Sutherland [:asuth] 2013-08-06 08:15:07 PDT
Huh.  So, there's no good reason for the file to go away.  Gloda logging says things like "migrate" and "purged" when it decides to nuke the database.

I thought xpcshell created a physically new directory for each fake-run-dir it uses?  So it shouldn't be any type of lingering thing, right?  And the toolkit logic that helps provide automated vacuuming should still be opt-in exclusively, so this is a big stumpy to me.

Is it possible to run dtrace scripts?  The failure you reported was on OS X, and it can easily tell us who is being a jerk.  You can probably even add a ustack() command:

see:
http://superuser.com/questions/164749/how-does-one-find-who-or-what-service-deleted-a-particular-file-or-folder-in-s
http://chihungchan.blogspot.com/2009/05/which-process-deleted-my-file.html
Comment 7 Mark Banner (:standard8, afk until Dec) 2013-08-06 08:25:49 PDT
Created attachment 786310 [details] [diff] [review]
The fix

The problem here is that the initialize for mailDirService.js is being called twice - I think I centralized it in the head_gloda.js file, but didn't realise that gloda was also pulling in head_server.js from the imap tests.

This fixes a few of the failures for me.
Comment 8 Mark Banner (:standard8, afk until Dec) 2013-08-06 08:47:09 PDT
https://hg.mozilla.org/comm-central/rev/90e049357560
Comment 9 Mark Banner (:standard8, afk until Dec) 2013-08-06 10:48:48 PDT
Comment on attachment 786310 [details] [diff] [review]
The fix

[Triage Comment]
Looking good on trunk (Mac debug failures down to 7!). Getting this onto branches.

Note You need to log in before you can comment on or make changes to this bug.