Thunderbird crash in walIndexTryHdr due to failures at the operating system level to read the page in
Categories
(Toolkit :: Storage, defect, P5)
Tracking
()
Tracking | Status | |
---|---|---|
firefox47 | --- | wontfix |
firefox48 | --- | wontfix |
firefox49 | --- | wontfix |
firefox-esr45 | --- | wontfix |
firefox50 | --- | wontfix |
firefox51 | --- | wontfix |
firefox52 | --- | wontfix |
firefox-esr52 | --- | wontfix |
firefox-esr60 | - | wontfix |
firefox53 | --- | wontfix |
firefox-esr115 | --- | affected |
firefox54 | --- | wontfix |
firefox55 | --- | wontfix |
firefox56 | --- | wontfix |
firefox57 | --- | wontfix |
firefox61 | --- | wontfix |
firefox62 | --- | wontfix |
firefox63 | --- | wontfix |
firefox64 | --- | wontfix |
firefox65 | --- | wontfix |
firefox66 | --- | verified |
firefox67 | --- | wontfix |
firefox119 | --- | wontfix |
firefox120 | --- | wontfix |
firefox121 | --- | wontfix |
People
(Reporter: wsmwk, Unassigned)
Details
(Keywords: crash, Whiteboard: [tbird crash][TB12 regression][wontfix?])
Crash Data
This bug was filed from the Socorro interface and is report bp-ca4b63fa-0a3b-4954-987b-3b4272120705 . ============================================================= Frame Module Signature Source 0 mozsqlite3.dll walIndexTryHdr db/sqlite3/src/sqlite3.c:47107 1 mozsqlite3.dll walIndexReadHdr db/sqlite3/src/sqlite3.c:47165 2 mozsqlite3.dll walTryBeginRead db/sqlite3/src/sqlite3.c:47299 3 mozsqlite3.dll pagerBeginReadTransaction db/sqlite3/src/sqlite3.c:41335 4 mozsqlite3.dll sqlite3PagerSharedLock db/sqlite3/src/sqlite3.c:43209 5 mozsqlite3.dll lockBtree db/sqlite3/src/sqlite3.c:51524 6 mozsqlite3.dll sqlite3BtreeBeginTrans db/sqlite3/src/sqlite3.c:51824 7 mozsqlite3.dll sqlite3VdbeExec db/sqlite3/src/sqlite3.c:67638 8 xul.dll mozilla::PerformanceCounter xpcom/ds/TimeStamp_windows.cpp:427 9 mozsqlite3.dll sqlite3Step db/sqlite3/src/sqlite3.c:63043 10 mozsqlite3.dll sqlite3_step db/sqlite3/src/sqlite3.c:63118 11 xul.dll mozilla::storage::Connection::stepStatement storage/src/mozStorageConnection.cpp:893 12 xul.dll mozilla::storage::AsyncExecuteStatements::executeStatement storage/src/mozStorageAsyncStatementExecution.cpp:400 13 xul.dll mozilla::storage::AsyncExecuteStatements::executeAndProcessStatement storage/src/mozStorageAsyncStatementExecution.cpp:325 14 xul.dll mozilla::storage::AsyncExecuteStatements::bindExecuteAndProcessStatement storage/src/mozStorageAsyncStatementExecution.cpp:307 15 xul.dll mozilla::storage::AsyncExecuteStatements::Run storage/src/mozStorageAsyncStatementExecution.cpp:647 crash appears to have started in version 12. ranking #22 for TB13.0.1 almost no firefox crashes stack variation: bp-119f3dcc-873e-44ce-a67e-986762120705
Reporter | ||
Comment 1•12 years ago
|
||
firefox example bp-11008807-7480-4be6-8d92-2e17d2120614
Updated•12 years ago
|
Comment 2•12 years ago
|
||
It's #36 top crasher in TB 17.0.
Comment 3•8 years ago
|
||
Crash volume for signature 'walIndexTryHdr': - nightly(version 50):0 crashes from 2016-06-06. - aurora (version 49):4 crashes from 2016-06-07. - beta (version 48):51 crashes from 2016-06-06. - release(version 47):315 crashes from 2016-05-31. - esr (version 45):1095 crashes from 2016-04-07. Crash volume on the last weeks: W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7 - nightly 0 0 0 0 0 0 0 - aurora 0 1 0 2 0 1 0 - beta 8 4 12 9 5 11 2 - release 56 51 47 36 38 48 30 - esr 102 148 123 114 91 112 107 Affected platforms: Windows, Mac OS X, Linux
Comment 4•8 years ago
|
||
walIndexTryHdr is trying to read a memory-mapped page from the Write-Ahead-Log. All of these crashes are due to failures at the operating system level to read the page in. This is one of the downsides of memory-mapped I/O; I/O errors become fatal if we don't explicitly handle the page faults and transform them into something non-fatal. Having said that, although these don't need to be fatal, it's quite likely that if we're encountering them, then most profile I/O going forward is going to be broken too, so trading the crash for everything breaking isn't likely a major improvement. As such I don't think there's much to do about the bug. == Details: If we aggregate on "Reason" for >2% we get: 1 EXCEPTION_IN_PAGE_ERROR_READ / STATUS_IN_PAGE_ERROR 358 52.19 % 2 EXCEPTION_IN_PAGE_ERROR_READ / STATUS_CONNECTION_DISCONNECTED 101 14.72 % 3 EXCEPTION_IN_PAGE_ERROR_READ / STATUS_OBJECT_NAME_NOT_FOUND 60 8.75 % 4 EXCEPTION_IN_PAGE_ERROR_READ / STATUS_INVALID_PARAMETER 50 7.29 % 5 EXCEPTION_IN_PAGE_ERROR_READ / STATUS_VOLUME_DISMOUNTED 22 3.21 % 6 EXCEPTION_IN_PAGE_ERROR_READ / STATUS_FILE_INVALID 15 2.19 % The first part, "EXCEPTION_IN_PAGE_ERROR_READ" specifically means there was an I/O error paging things in. The latter code is extracted from the exception record if available. * STATUS_IN_PAGE_ERROR: This is the same actual code as EXCEPTION_IN_PAGE_ERROR. I'm not sure if this is an inability to be more specific or some layering scenario like if loopback devices were involved. * STATUS_CONNECTION_DISCONNECTED: Presumably the file was network mounted and we lost the mount. * STATUS_OBJECT_NAME_NOT_FOUND: Seems similar? Machine/drive no longer around to service the UNC path or whatever? * STATUS_INVALID_PARAMETER: This is a very generic error like it sounds; likely a cascading error from some other I/O error, possibly involving a disconnect? * STATUS_VOLUME_DISMOUNTED: Explicitly that the volume was dismounted * STATUS_FILE_INVALID: This generic-seeming error is actually really specific that someone externally messed with the opened file and it's no longer valid. The others that didn't make the cut seem similarly of the form "the file system has betrayed us".
Updated•8 years ago
|
Comment 5•7 years ago
|
||
Crash volume for signature 'walIndexTryHdr': - nightly (version 54): 3 crashes from 2017-01-23. - aurora (version 53): 0 crashes from 2017-01-23. - beta (version 52): 15 crashes from 2017-01-23. - release (version 51): 87 crashes from 2017-01-16. - esr (version 45): 4528 crashes from 2016-08-10. Crash volume on the last weeks (Week N is from 02-06 to 02-12): W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7 - nightly 2 1 - aurora 0 0 - beta 11 1 - release 57 14 0 - esr 216 262 311 213 153 54 141 Affected platforms: Windows, Mac OS X, Linux Crash rank on the last 7 days: Browser Content Plugin - nightly #415 - aurora - beta #1254 - release #698 - esr #78
Updated•7 years ago
|
Comment 6•7 years ago
|
||
A P5 critical bug seems like a contradiction in terms.
Updated•7 years ago
|
Comment 7•7 years ago
|
||
Gonna remove regression keyword since we've been shipping this for so long.
Reporter | ||
Comment 8•7 years ago
|
||
(In reply to Mike Taylor [:miketaylr] from comment #7) > Gonna remove regression keyword since we've been shipping this for so long. I don't understand how time impacts whether this is a regression or not.
Comment 9•7 years ago
|
||
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #8) > (In reply to Mike Taylor [:miketaylr] from comment #7) > > Gonna remove regression keyword since we've been shipping this for so long. > > I don't understand how time impacts whether this is a regression or not. Fair point. We use that keyword to help in regression triage, where we look for recent regressions (ideally to prevent shipping them to release). It can still be considered a regression without this keyword.
Comment 10•6 years ago
|
||
this is a fairly frequent crash on esr buils. modules from sophos security software are commonly showing up in the app_init_dll field of crash reports and correlations.
Reporter | ||
Updated•6 years ago
|
Comment 11•6 years ago
|
||
This is higher volume on ESR than on release. Adam, can you try contacting Sophos?
Comment 12•5 years ago
|
||
Opened a support case with them. https://secure2.sophos.com/en-us/support/open-a-support-case.aspx
Comment 13•5 years ago
|
||
To help out anyone looking from Sophos, here's some links to relevant info in crash-stats: Click through on the crash signatures from this bug, and then add an extra column into the resulting report. So, the first crash sig takes you to: https://crash-stats.mozilla.com/signature/?signature=walIndexTryHdr There is a little tab that says "summary" there and you want to look at the "reports" tab instead. I added the column for "app init dlls" and then clicked on the column heading to sort by that content. That shows me a bunch of reports with sophos dlls. This link will take you to the summary page - it adds the column for dlls but you will still have to click through on the Reports tab and then sort on the column header. https://crash-stats.mozilla.com/signature/?signature=walIndexTryHdr&date=%3E%3D2018-12-05T16%3A04%3A25.000Z&date=%3C2018-12-12T16%3A04%3A25.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=app_init_dlls&_sort=-app_init_dlls&_sort=-date&page=1 Hope that helps.
Updated•5 years ago
|
Updated•5 years ago
|
Reporter | ||
Comment 14•5 years ago
|
||
(In reply to [:philipp] from comment #10)
this is a fairly frequent crash on esr buils. modules from sophos security
software are commonly showing up in the app_init_dll field of crash reports
and correlations.
Do you find that sophos correlates in many Thunderbird crashes?
**In a spot check of 10 Thunderbird crashes I found only two potentially AV related (though I could be wrong). If Thunderbird crashes don't correlate well to AV then we should move this back to the other component.
bp-783e9d0b-c52b-43cb-9fca-2c9ef0181212 TrendMicro
bp-b090cfb1-e316-4e40-bcf0-c96990181212 Sophois
Still a THunderbird topcrash
Comment 15•5 years ago
•
|
||
yes, currently only 10% of those crashes on thunderbird show involvement of sophos, so i'm following your suggestion and moving the bug to its original component.
Reporter | ||
Comment 16•4 years ago
|
||
Crash rate of Thunderbird is up 20-30% compared to spring. Not possible to say yet whether that correlates to version 78 uptake.
Reporter | ||
Comment 17•3 years ago
|
||
Still topcrash - Ranks ~25 for version Thunderbird 78.12.0, combined count of sqlite3WalFindFrame and walIndexTryHdr
Comment 18•2 years ago
|
||
This happens after resume from suspend: https://crash-stats.mozilla.org/report/index/745c0252-578c-43d3-8e71-030b90220513
Reporter | ||
Comment 19•2 years ago
|
||
In Thunderbird 102.0.2, the rank drops to #52
Updated•2 years ago
|
Updated•6 months ago
|
Updated•5 months ago
|
Description
•