Closed
Bug 1204381
Opened 9 years ago
Closed 8 years ago
frequent hangs on Mac - looping reading cache file _CACHE_
Categories
(Thunderbird :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: calum.mackay, Unassigned)
References
Details
(Keywords: hang, regression, Whiteboard: [regression:TB42])
I've been having some "hang" issues with Earlybird on MacOS (10.9.5), and have just reproduced it with Daily 20150911. Note this is not a hibernation/sleep issue :) I tried Daily 20150911, with a fresh profile - it hangs about 50% of the time on startup. e.g when loading the "account sign up" box. Using dtruss I can see it seems to be mostly sitting in a loop of read() syscalls, about 2,000 per second. e.g: read(0x29, "\0", 0x2B0) = 0 0 read(0x29, "\0", 0x2B0) = 0 0 read(0x29, "\0", 0x2B0) = 0 0 The file descriptor in question is always one of the _CACHE_ files: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME thunderbi 1341 calum 41u REG 1,4 24912 23533586 /Users/calum/Library/Caches/Thunderbird/Profiles/ht2u32xx.cdm.daily/Cache/_CACHE_003_ It's not always the same file, e.g. it could be 001. This happens with both Earlybird and Daily recent builds. I've been seeing hangs on Earlybird for a few weeks, but only yesterday got around to tracing it, so can't say for sure it's the same issue.
Reporter | ||
Comment 1•9 years ago
|
||
Here's an example from today, Earlybird 20150913 EB fails to display a message when clicked on - blank body. cpu goes to 100%. run dtruss; ctrl-c it after 10 seconds: $ sudo dtruss -p $(pgrep thunderbird) -c read(0x25, "\0", 0x2D) = 0 0 read(0x25, "\0", 0x2D) = 0 0 read(0x25, "\0", 0x2D) = 0 0 read(0x25, "\0", 0x2D) = 0 0 read(0x25, "\0", 0x2D) = 0 0 read(0x0, "\0", 0x0) = 0 0 ^C dtrace: 154128 dynamic variable drops with non-empty dirty list CALL COUNT gettimeofday 6 psynch_cvbroad 15 psynch_cvwait 30 read 17416 Find out what fd 0x25 is; (0x25 == 37 decimal) $ lsof -p $(pgrep thunderbird) | grep 37u thunderbi 1606 calum 37u REG 1,4 25555 23587136 /Users/calum/Library/Caches/Thunderbird/Profiles/ag10ufyl.default/Cache/_CACHE_002_ This is happening a few times an hour at the moment.
Comment 2•9 years ago
|
||
A regression range would help. If you are not asserting it doesn't happen with daily 20150910, can you find the most recent daily that does not fail?
Severity: major → critical
Flags: needinfo?(calum.mackay)
Keywords: hang
Summary: frequent hangs - looping reading cache file _CACHE_ → frequent startup hangs - looping reading cache file _CACHE_
Reporter | ||
Comment 3•9 years ago
|
||
(In reply to Wayne Mery (:wsmwk, use Needinfo for questions) from comment #2) > A regression range would help. If you are not asserting it doesn't happen > with daily 20150910, can you find the most recent daily that does not fail? I can't find an archive of Mac Daily builds; would you point me at it please? I can only find the Mac Daily builds for "latest". The archives seems to be only Linux & win32/64. Failing that, should I use Earlybird builds...? They do seem to be available...
Reporter | ||
Comment 4•9 years ago
|
||
removed "startup" from Summary: this happens all the time, not just at startup.
Flags: needinfo?(calum.mackay)
Summary: frequent startup hangs - looping reading cache file _CACHE_ → frequent hangs - looping reading cache file _CACHE_
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(calum.mackay)
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(calum.mackay) → needinfo?(vseerror)
Reporter | ||
Comment 5•9 years ago
|
||
I've been able to narrow it down, using Earlybird Aurora builds. For each build, I created a fresh profile, and started EB, and selected "Set as default" on the default client box. On repeated subsequent startups, for this build & profile: - 0811 - BAD http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2015/08/2015-08-11-00-40-04-comm-aurora/ hangs about 50% of the time, whilst displaying the "Would you like a new email address" box, without ever getting to listing any provider. Using dtruss/lsof, the result is always looping on read() of a _CACHE_ file (see earlier comments). - 0810 - GOOD http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2015/08/2015-08-10-00-40-05-comm-aurora/ I'm unable to get this build to hang, no matter how many times I start it.
Flags: needinfo?(vseerror)
Reporter | ||
Comment 6•9 years ago
|
||
Interestingly, the first BAD build, 0811, was the switch-over (for EB Aurora) from 41.0a2 (0810) to 42.0a2 (0811).
Reporter | ||
Comment 7•9 years ago
|
||
Also, confirmed BAD is: Daily 0801 http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2015/08/2015-08-01-03-02-12-comm-central/ which is also 42. This is the oldest Mac Daily build I could find.
Reporter | ||
Comment 8•9 years ago
|
||
The July Daily builds are a bit sketchy, but I was able to confirm: - 0722 - BAD http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2015/07/2015-07-22-03-02-08-comm-central/ - 0713 - GOOD http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2015/07/2015-07-13-03-02-07-comm-central/ I could not find any Daily builds in between 0713 & 0722.
Reporter | ||
Comment 9•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=comm-aurora&revision=102abb2cdd3c
Comment 10•9 years ago
|
||
Excellent work - *daily* builds is exactly what we need tested, not earlybird. It hurts that we are missing nightly builds, because we are now stuck with a wide regression range [1]. Fortunately Firefox is using cachev2, so in recent months there is almost no code change to old cache code. I've skimmed SOME of changelogs of http://hg.mozilla.org/mozilla-central/file/fba4b0cd3823/netwerk/cache (mostly the .cpp) and some of the change are: * bug 1170646 (ishikawa) * bug 354493 * bug 1048271 * bug 1182996 [1] http://hg.mozilla.org/comm-central/pushloghtml?startdate=2015-07-13+03%3A00&enddate=2015-07-22+05%3A00 http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2015-07-13+03%3A00&enddate=2015-07-22+05%3A00
Flags: needinfo?(ishikawa)
Keywords: regression
Comment 11•9 years ago
|
||
And I'm not finding anything any other worthwhile bug reports filed since July 13 that involve likely cache http://mzl.la/1OaREQM http://mzl.la/1OaS9dr
Reporter | ||
Comment 12•9 years ago
|
||
great, thanks much indeed, Wayne.
Reporter | ||
Comment 13•9 years ago
|
||
So, in https://hg.mozilla.org/mozilla-central/rev/f600f0cd7bb3 if the read() syscall returns 0, I presume that will be passed up to busy_beaver_PR_Read(), and that new function looks like it will loop on n==0? And the dtruss shows that read() is returning 0.
Comment 14•9 years ago
|
||
Mac users have also reported hanging in addons manager.
Blocks: 1170646
Flags: needinfo?(ishikawa)
Summary: frequent hangs - looping reading cache file _CACHE_ → frequent hangs on Mac - looping reading cache file _CACHE_
Updated•9 years ago
|
See Also: → mail-cache2
Reporter | ||
Comment 15•9 years ago
|
||
thanks very much indeed, Wayne, that looks promising. I'll test that backout as soon as I find a Daily build for macos. I can't find any Daily builds more recent than August 12th, at least at: http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/2015/ am I looking in the wrong place, please?
Flags: needinfo?(vseerror)
Comment 16•9 years ago
|
||
Mac nightlies are indeed broken. Hopefully will be better when you get the automated bugmail that bug 1195442 is fixed.
Depends on: 1195442
Flags: needinfo?(vseerror)
Reporter | ||
Comment 17•9 years ago
|
||
thanks again Wayne, much appreciated.
Comment 18•8 years ago
|
||
(In reply to Calum Mackay from comment #15) > thanks very much indeed, Wayne, that looks promising. > > I'll test that backout as soon as I find a Daily build for macos. Calum, is it still an issue?
Flags: needinfo?(calum.mackay)
Reporter | ||
Comment 19•8 years ago
|
||
Apols, I should have updated this sooner. No, the issue went away as expected, when I next got a build. thanks again.
Flags: needinfo?(calum.mackay)
Comment 20•8 years ago
|
||
Unfortunate that we won't know what fixed or caused it, but not your fault by any means, and it's one less thing to worry about. Thanks for the help and the update!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Whiteboard: [regression:TB42]
You need to log in
before you can comment on or make changes to this bug.
Description
•