Open Bug 1381485 Opened 7 years ago Updated 9 months ago

Hangs sending mail while copying message to Sent folder on Mac-only while displaying the progress bar. Deadlock in graphics on CGLClearDrawable. Workaround comment 349

Categories

(Thunderbird :: Message Compose Window, defect)

54 Branch
x86_64
macOS
defect
Not set
critical

Tracking

(thunderbird_esr60 wontfix, thunderbird_esr68? affected, thunderbird53 unaffected, thunderbird54 wontfix, thunderbird56 wontfix, thunderbird57 wontfix, thunderbird58 wontfix, thunderbird59 wontfix, thunderbird60 wontfix, thunderbird67 wontfix, thunderbird71 wontfix, thunderbird72 wontfix, thunderbird73 wontfix, thunderbird74 wontfix, thunderbird75 wontfix, thunderbird76 wontfix, thunderbird77 wontfix, thunderbird78 affected, thunderbird79 affected)

Tracking Status
thunderbird_esr60 --- wontfix
thunderbird_esr68 ? affected
thunderbird53 --- unaffected
thunderbird54 --- wontfix
thunderbird56 --- wontfix
thunderbird57 --- wontfix
thunderbird58 --- wontfix
thunderbird59 --- wontfix
thunderbird60 --- wontfix
thunderbird67 --- wontfix
thunderbird71 --- wontfix
thunderbird72 --- wontfix
thunderbird73 --- wontfix
thunderbird74 --- wontfix
thunderbird75 --- wontfix
thunderbird76 --- wontfix
thunderbird77 --- wontfix
thunderbird78 --- affected
thunderbird79 --- affected

People

(Reporter: jamesrome, Unassigned)

References

()

Details

(Keywords: hang, regression, regressionwindow-wanted, Whiteboard: [regression:TB54?][duptome][workaound: comment 349])

User Story

Workaround: Comment 349

History:

* 2017-02-22 TB45 cayenne INCOMPLETE Bug 1341784 - hangs on sending mail (on one machine only, but has come and gone on others) (Unclear if this is the same issue, so unknown if this is the first report of this bug)

>> 2016-06 core graphics Bug 1207332 - skia content on OS X - landed.  Should be in TB48 (beta), and subsequently in TB52 ESR


>> 2017-02-14 core graphics Bug 1325227 - Use read locks instead of synchronous transactions for ContentClientRemoteBuffer - landed. Should be in TB54 (beta), and subsequently in TB60.


* 2017-03 TB45 simone WFM Bug 1343480 - Rare hang sending email with MacOSx with spotlight search enabled  (so perhaps unrelated)

>> 2017-05 FF?? OPEN Bug 1369207 - Firefox hang on CGLClearDrawable after quickly closing window after FEATURE_FAILURE_OPENGL_CREATE_CONTEXT


* 2017-07 TB54(beta) Rome OPEN Bug 1381485 - Hangs sending mail while copying message to Sent folder on Mac-only while displaying the progress bar.  Deadlock in graphics on CGLClearDrawable.
** 2018-03 m_kato is first developer to comment, and then we link to bug 1369207 reads "On macOS installations where opening a window logs: |[GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT|, quickly closing the window can deadlock in CGLClearDrawable."
** 2018-10 Gene developer comments, and Glenn comments about mutex

* 2017-09 TB52(ESR) Reid DUPE Bug 1400568 - Thunderbird Mac 52.3.0 frequently hangs after Send
* 2017-12 TB52 cinymini DUPE Bug 1422251 - TB52 Freeze after mail has been sent, with zero cpu (sierra).  two imap accounts
* 2018-02 TB52 or earlier Heikki INCOMPLETE (but more general networking issue caused by bug  Bug 1440716 - Sits forever in "Connecting ". Hanging connections (imap and smtp) and hang on mac OS X
* more reports follow for Thunderbird 60

----------------------------------------------------------------------------------------------------------------------------------------------------

** We need a one-day regression range using daily builds.  Please pick a build date that might fail and test it. ***

* initially thought to be: http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/  to http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/
* We have no idea on what date the flaw states, we only know some that work and some that don't.

Working backwards:
* fail: 2017-06-07 comment 193, coment 216  Scott, Christopher
* fail: 2017-06-06   Richard
* fail: 2017-06-05   Richard
* ?? : 2017-06-04   Chris
* fail: 2017-06-03  Scott
* fail: 2017-06-02  Scott
* fail: 2017-06-01  Scott
* fail: 2017-05-31  Scott
* fail: 2017-05-24  Christopher  
* ?? : 2017-05-17   ?
* ?? : 2017-05-14   ?
* ?? : 2017-05-01  Christopher
* fail: 2017-04-01  Scott
* fail: 2017-03-22  Scott
* ??: 2017-03-15  Scott
* ??: 2017-03-08  ??
* works for two weeks: 2017-03-01  Scott

User configs (name, computer, OS version, graphics, monitor(s)) :
* robert.p              MacBookPro14,2       10.13.6 0x5927  Built-In Retina LCD 
* joduinn              MacBookPro15-2018  10.14.6   Radeon Pro 560X 4 GB Intel UHD Graphics 630 
* Bob Shimizu                                         10.14.16  <unknown graphics>  Apple Thunderbolt Display(3) 
* Aaron                Mac Mini (Late 2014)  10.14.6   Intel Iris 1536 MB   27" Thunderbolt display
* Robert Shimizu             Mac Pro          10.14.6   Uknown graphics  Thunderbolt 27" (3)
* Scott James              iMac Mid 2017      10.13.6   Radeon Pro 575   Integrated 5120 x 2880 Apple Display
* kimlove  iMac Retina 5K 27-inch 2017  10.13.6  Radeon Pro 580  imac desktop
* Marc De Graef     Mac Pro (Late 2013)  10.13.6  3.5 GHz 6-core Intel Xeon E5  LG Ultrawide Display
* Marc De Graef   Macbook Pro (Retina, 15" Mid 2015)  10.13.6  2.8GHz Intel Core i7  Built-in Display 
* Ludovic Rousseau ?  "it takes a week"
* Christopher Schultz  ?
* Richard Leger ?

- yahoo and gmail imap
- spotlight doesn't matter
- bumping file handles doesn't help
> 54.0B3 works - no crashing.
> 55.0b2 dies


**beta feedback**
- James: still fails with 67 beta  (this bug)
- Scott: unknown  (this bug)
- degraef: works with 66? beta  (bug 1525001)
- cinymini: unknown  (bug 1422251) 

**Reports:**
https://support.mozilla.org/en-US/questions/1273927
https://discourse.mozilla.org/t/thunderbird-freezes-and-i-have-to-force-quit/46256
https://support.mozilla.org/en-US/questions/1265983
https://support.mozilla.org/en-US/questions/1256159 (claims safe mode helped)
https://support.mozilla.org/en-US/questions/1254261
https://support.mozilla.org/en-US/questions/1247137 (reverted to TB52 - good tester)
https://support.mozilla.org/en-US/questions/1246950
https://support.mozilla.org/en-US/questions/1246643  1/13/2019
https://support.mozilla.org/en-US/questions/1242885 **disabling send progress helps**
https://support.mozilla.org/en-US/questions/1241167 (when adding attachment)
https://support.mozilla.org/en-US/questions/1237929 (likely unrelated because this is on win10 and caused by signature file)
https://support.mozilla.org/en-US/questions/1234828 9/21/2018
https://support.mozilla.org/en-US/questions/1241380
https://support.mozilla.org/en-US/questions/1234041 - UCD reverted to TB52.0
https://support.mozilla.org/en-US/questions/1172364 - three users left Thunderbird
https://support.mozilla.org/en-US/questions/1170401 - 52.2.1 Arthur frequent hangs  8/7/2017 

Similar or Mac hang issue: 
- https://support.mozilla.org/en-US/questions/1246950
- https://support.mozilla.org/en-US/questions/1246494

Attachments

(24 files)

1.70 MB, text/plain
Details
136.43 KB, application/zip
Details
61.32 KB, text/plain
Details
5.74 MB, text/plain
Details
126.58 KB, application/zip
Details
100.80 KB, text/plain
Details
1.35 MB, text/plain
Details
107.08 KB, text/plain
Details
115.67 KB, application/zip
Details
117.55 KB, application/zip
Details
120.33 KB, application/zip
Details
146.26 KB, application/zip
Details
4.55 KB, image/png
Details
6.28 KB, image/png
Details
1.10 MB, text/plain
Details
31.13 KB, image/png
Details
18.76 KB, image/png
Details
51.05 KB, text/plain
Details
1.74 MB, text/plain
Details
1.72 MB, text/plain
Details
95.15 KB, image/jpeg
Details
1.66 MB, text/plain
Details
154.83 KB, application/x-gzip
Details
90.08 KB, text/plain
Details
Attached file TBjang.txt
see attached
Does "hangs frequently" mean it hangs and you must kill the process? OR does it mean it hangs for x minutes and rten Please try 55
Flags: needinfo?(jamesrome)
I have to kill the process.
Flags: needinfo?(jamesrome)
> Please try 55 ... started in safe mode
Severity: major → critical
Keywords: hang
It upgraded to 55. I'll see if it hangs still
It still hangs frequently in normal mode. And I have the latest MacOS 10.12.6.
New apple report attached.
Attached file TBHang2.txt.zip
Attached file TBfiles.txt
It hung again in safe mode. One issue may be the number of open files (see attached). TB seems to load every one of my fonts, and I have a huge number since I do desktop publishing. There is no reason for this, and certainly zonks the system, which (I think) has some limit on the number of open files. TB 55.0b2
Attached file TBSpindump.txt
It hung right away in normal mod. I attach a spindump
Attached file TBHang3.txt.zip
It hung again just after sending mail.
It seems to be happening when I send google mail. It did it again.
It is now hanging every time I send mail. It's a lot worse since the upgrade to macOS 10.12.6
(In reply to James Rome from comment #11)
> It is now hanging every time I send mail. It's a lot worse since the upgrade
> to macOS 10.12.6

does that mean non-google mail?
with or without addons?

(In reply to James Rome from comment #7)
> Created attachment 8889048 [details]
> TBfiles.txt
> 
> It hung again in safe mode. One issue may be the number of open files (see
> attached). TB seems to load every one of my fonts, and I have a huge number
> since I do desktop publishing. There is no reason for this, and certainly
> zonks the system, 

Your fonts situation is a good piece of info.  That's the way gecko works (it's not Thunderbird code) and is unavoidable, so the same thing will happen in Firefox. It does mean many fd will be open, and perhaps cause high memory usage.

> which (I think) has some limit on the number of open files. TB 55.0b2

You can make a modest increase to ulimit See bug 800279 comment 1.

But it is unclear whether this is related to your hanging situation.
Flags: needinfo?(jamesrome)
Summary: hangs frequently → Thunderbird hangs frequently
I reverted to the release channel after the last kerfuffle with Lightning, and this problem no longer occurs.
Flags: needinfo?(jamesrome)
Does this Mac have 13 imap accounts, or is that the Windows 10 system?
Did the hang occur only when sending?
Flags: needinfo?(jamesrome)
Summary: Thunderbird hangs frequently → Thunderbird beta hangs frequently while sending mail
My Mac has 11 IMAP accounts. I usually do not activate them all on Windows.
Flags: needinfo?(jamesrome)
James,
Please try increased ulimit See bug 800279 comment 1.
Is it more prone to happen with gmail, or happens only with gmail?
Did it also happen with 53 beta? Or, had you not used beta prior to 54?

(marking regression, because it does not happen for user with release build)
Flags: needinfo?(jamesrome)
Sorry, I reverted to the release build because I could not do anything...
Flags: needinfo?(jamesrome)
Had you been using TB53 beta prior to comment 0 and not had problems?
Flags: needinfo?(jamesrome)
Whiteboard: [regression:TB54?]
Don't remember. I gave it up when Lightning died.
Flags: needinfo?(jamesrome)
Similar report in bug 1343480 but it is version 45. And bug 1400568 version 52. But no one has ponied up with a regression range.

If you could retest - it would be useful to know whether it is more prone to happen with gmail, or happens only with gmail?

(bp-a454d2eb-107b-45cf-88de-d00c50170518 indicates you were at one time using 53 beta)
Well, after struggling to find provider for Google calendar (try Googling for the beta builds), I have 56.0 b4 running, and so far I can send mail. Why is Provider for Google Calendar called gdata provider? How is one supposed to find it? Change one name or the other.
56.0b4 just hung again sending gmail.
The latest hang was on 57.90b1. TB opens all of my hundreds of font files. Why???
57.0b1
(In reply to James Rome from comment #26)
> TB opens all of my hundreds of font files.
> Why???

Nothing to do with Thunderbird. That's the way mozilla Gecko handles Mac
That's bad. It might be running out of file handles.
Attached file TBHang.txt
TB57.0b2 hung again after sending mail via gmail.
The sample from today
Can you remove any font files from your system?
Do you have virtual folders that iterate over many other folders?
Flags: needinfo?(jamesrome)
No virtual folders. If I remove fonts, I can't get them back easily. It hung today without sending an e-mail.
But surely opening all the font files is a bug. It must slow things down and use more system resources.
Flags: needinfo?(jamesrome)
(In reply to James Rome from comment #33)
> No virtual folders. If I remove fonts, I can't get them back easily.

I suggesting only removing fonts that you do not use.

> But surely opening all the font files is a bug. 

> It must slow things down 

Actually, no.

> and use more system resources.

no, not memory, and not cpu, afaik. only open file handles.


You should do comment 18
> 
> You should do comment 18

... and does disabling spotlight help?
Flags: needinfo?(jamesrome)
How do I disable spotlight?
Flags: needinfo?(jamesrome)
So far, adding file handles fixes things. There is not need to open all my font files, and IMHO that is a bug.
(In reply to James Rome from comment #36)
> How do I disable spotlight?

bug 1343480 comment 1
Sorry, it hung again after sending a Gmail, and with Spotlight disabled, and with the file numbers boosted.
Attached file TBHang.txt.zip
This time it hung after sending Yahoo IMAP mail
Attached file TBhang12-16.txt.zip
The Apple report
This still happens daily on 58.0b3
Attached file tbhang.txt.zip
Every day it hangs many times. This is getting annoying.
Probably the best chance of finding the cause is for you to find the regression range using nightly builds.
For example starting with version 54 dmg nightly found at http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-01-03-02-08-comm-central/
54.0B3 seems to work without crashing.
55.0b2 dies
> 54.0B3 seems to work without crashing.
> 55.0b2 dies

That's a great start. So we need your help to determine the one day range in the 55.0a1 series.
http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/
http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/
See Also: → 1422251, 1400568
I need a non-daily build. When install lightning and I reboots, it updates to 5.8.
The whole way that TB and Lightning are stored needs fixing. The correct Lightning builds and calendar provider should be ibn the same directory as TB.  It is really hard to find what does with what.
There is no 55.0b1 to test that is not a nightly that self-updates. So the problem happened from 54 to 55
I believe that is when I reported it. But looking at the above, it seems that 54 also crashed.
I have sent in many hang reports. Don't they point to the issue?
> I have sent in many hang reports. Don't they point to the issue?

If we had someone to read it. Right now we don't. 

But you have started to narrow a regression range, which could ultimately be more useful.  Unfortunately one cannot get a reegression range without testing daily. 

> The whole way that TB and Lightning are stored needs fixing. The correct Lightning builds and calendar provider should be ibn the same directory as TB.  It is really hard to find what does with what.

Have you used dailies on a regular basis?  I never have a problem with lightning in daily.  You just install the daily and if lightning doesn't behave install the correct nightly.  All thunderbird daily installs WITHIN THE SAME VERSION from then on should just work.
(I failed to finish the thought)

You just install the Thunderbird daily and if lightning doesn't behave install the correct version lightning.  (Or remove the currently installed lightning and install thunderbird daily a second time)
I have been using the candidate builds because the dailies self-update instantly. The candidate builds always say Lightning is incompatible, and I must find the correct one. The lightning-TB version page does not list gdata provider\, so I must figure that out too. Yes, they are in the lightning directory, but I have downloaded every version, and there is no way to tell from the files which are correct. Also gdata provider throws all sorts of errors if it is the wrong version, but this is not detected in the install process. Unlike Lightning, gdata provider is not disabled.
It may be time for me to switch to outlook
> I have been using the candidate builds because the dailies self-update instantly

This is hardly a show stopper. One simply disables updates


(In reply to James Rome from comment #45)
> 54.0B3 seems to work without crashing.

Just backtracking a bit, you reported this issue on July 17.  It seems to me by then you would have run 54.0b1 and b2 without problems and first saw the problem using 54.0b3.  Is that correct?
User Story: (updated)
Summary: Thunderbird beta hangs frequently while sending mail → Thunderbird beta hangs frequently while sending imap mail
I would assume so, but it was a while ago.
(In reply to James Rome from comment #54)
> I would assume so [that 54.0b3 is the first that failed], but it was a while ago.

But that [if you were running 54.0b3 in comment 0] does not square with comment 45 where "54.0B3 seems to work without crashing."
Hopefully you are using https://releases.mozilla.org/pub/thunderbird/releases/

In any event, suggest your next step be setting the yahoo and gmail account to save sent messages to a local folder, so that we can determine whether this situation involves the imap sent folder.
Good suggestion. I changed the sent folder to local now on 58.b3. It always manages to send the mail, so you mught be correct in your hunch.
Also remember that there has been a long-standing bug about copying IMAP mail to sent folder. Maybe when that was fixed, it caused this issue.
I do believe you have pinned down the problem. Not had a hang since I made the sent folders local.I am running 58.0b3
According to what I have read, gmail automatically saves outgoing mail to gmail Sent folder. So there should be no need for Thunderbird to be set to save to Sent (even though it is default).  I don't know about yahoo.
But go back to that bug about copying sent mail to imap folder. It always sends the mail successfully, bug hangs after or during the next step. Still have not had a hang using local folders.
Sure, there is still a bug.  Which it why it is so important to us for you to determine a regression range.
Component: General → Networking: IMAP
Product: Thunderbird → MailNews Core
Summary: Thunderbird beta hangs frequently while sending imap mail → Hangs frequently while sending imap mail while copying message to Sent folder
Version: 54 Branch → 54
Blocks: 1402841
It has started to hang again on 58.0b3. Twice today after sending Google mail
Can you try a nightly build from http://archive.mozilla.org/pub/thunderbird/nightly/latest-comm-central/thunderbird-60.0a1.en-US.mac.dmg

If so, what are the results?
Flags: needinfo?(jamesrome)
It just hung again with 60 nightly. I did move the sent messages back to gmail from local.
Flags: needinfo?(jamesrome)
Something else is happening with TB 60 also. When it hangs, and I Force quit it, the app in my /Applications/Thunderbird Daily.app gets trashed. I cannot reopen it until I replace it with the version I downloaded. The same thing happens when the daily tries to update itself. TB never restarts, and I must update it manually.
And moving sent mail to local did not help the hangs.
(In reply to James Rome from comment #65)
>The same thing
> happens when the daily tries to update itself. TB never restarts, and I must
> update it manually.
I confirm this issue. Since a week it happens every time I want to update Daily via About Daily > Check for updates.
Perhaps the two of you can conspire to find the regression range.


(In reply to James Rome from comment #64)
> It just hung again with 60 nightly. I did move the sent messages back to
> gmail from local.

(previously sent in PM) to get the gdata and lightning addons to match the nightly being tested
https://ftp.mozilla.org/pub/calendar/lightning/nightly/latest-comm-central/
m_kato, can you make anything of the stacks here, or in bug 1422251 and bug 1400568 (which are both version 52)


I don't know that it is related, but for completeness, ref Bug 1170646 - A few M-C fixes to handle short read in Cache code ( from [META] Failure to deal with short read

There is also a newly reported Bug 1440716 - Hanging (imap and smtp) connections and hang on mac OS X
Flags: needinfo?(m_kato)
See Also: → 1444739
(In reply to Wayne Mery (:wsmwk) from comment #68)
> m_kato, can you make anything of the stacks here, or in bug 1422251 and bug
> 1400568 (which are both version 52)

Maybe, these are same deadlock by CGLClearDrawable.  I don't know why this occur.
Flags: needinfo?(m_kato)
FWIW bug 1369207 reads "On macOS installations where opening a window logs: |[GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT|, quickly closing the window can deadlock in CGLClearDrawable."
Summary: Hangs frequently while sending imap mail while copying message to Sent folder → Hangs frequently while sending imap mail while copying message to Sent folder on Mac. deadlock on CGLClearDrawable?
(In reply to Wayne Mery (:wsmwk) from comment #47)
> > 54.0B3 seems to work without crashing.
> > 55.0b2 dies
> 
> That's a great start. So we need your help to determine the one day range in
> the 55.0a1 series.
> http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-
> 29-comm-central/
> http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-
> 06-comm-central/

James,

We still need  better regression range than 
http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/
http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/

It seems I forgot to mention the regression tool https://mozilla.github.io/mozregression/quickstart.html
Flags: needinfo?(jamesrome)
Alas, the tool is for windows only. Because it is so difficult to know and get the correct version of lightning and google provider, I have given up on nightlies. They should be packaged together in the same download directory IMHO>
Flags: needinfo?(jamesrome)
I've been seeing this occasionally, in Daily, for quite some time, including today's build.

I can't easily give a regression window: sometimes it happens a few times a day, sometimes once a week.

abridged hang txt:

OS Version:      Mac OS X 10.13.6 (Build 17G65)
Architecture:    x86_64h

Path:            /Applications/Thunderbird Daily.app/Contents/MacOS/thunderbird
Identifier:      org.mozilla.thunderbird daily
Version:         63.0a1 (63.0a1)

Duration:        4.30s (process was unresponsive for 10 seconds before sampling)

Hardware model:  MacBookPro14,1
Active cpus:     4

Heaviest stack for the main thread of the target process:
  43  start + 1 (libdyld.dylib + 4117) [0x7fff791f6015]
  43  main + 890 (thunderbird + 4474) [0x10681617a]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 61943681) [0x10a9c8f81]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 61941996) [0x10a9c88ec]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 61939604) [0x10a9c7f94]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 61180073) [0x10a90e8a9]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 41233963) [0x109608e2b]
  43  -[NSApplication run] + 764 (AppKit + 223365) [0x7fff4e949885]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 41229596) [0x109607d1c]
  43  -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044 (AppKit + 8224308) [0x7fff4f0eae34]
  43  _DPSNextEvent + 2085 (AppKit + 268915) [0x7fff4e954a73]
  43  _BlockUntilNextEventMatchingListInModeWithFilter + 64 (HIToolbox + 194692) [0x7fff506a3884]
  43  ReceiveNextEventCommon + 613 (HIToolbox + 195334) [0x7fff506a3b06]
  43  RunCurrentEventLoopInMode + 286 (HIToolbox + 195990) [0x7fff506a3d96]
  43  CFRunLoopRunSpecific + 483 (CoreFoundation + 545107) [0x7fff513b9153]
  43  __CFRunLoopRun + 1293 (CoreFoundation + 547053) [0x7fff513b98ed]
  43  __CFRunLoopDoSources0 + 300 (CoreFoundation + 550092) [0x7fff513ba4cc]
  43  __CFRunLoopDoSource0 + 108 (CoreFoundation + 1430572) [0x7fff5149142c]
  43  __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 (CoreFoundation + 670225) [0x7fff513d7a11]
  43  __NSThreadPerformPerform + 334 (Foundation + 426677) [0x7fff534fd2b5]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 41001366) [0x1095d0196]
  43  -[NSView removeFromSuperview] + 270 (AppKit + 157615) [0x7fff4e9397af]
  43  -[NSView _setWindow:] + 2356 (AppKit + 147783) [0x7fff4e937147]
  43  -[NSSurface setWindow:] + 53 (AppKit + 2268392) [0x7fff4eb3cce8]
  43  -[NSSurface _disposeSurface] + 152 (AppKit + 2269311) [0x7fff4eb3d07f]
  43  -[NSNotificationCenter postNotificationName:object:userInfo:] + 66 (Foundation + 26823) [0x7fff5349b8c7]
  43  _CFXNotificationPost + 599 (CoreFoundation + 358839) [0x7fff5138b9b7]
  43  -[_CFXNotificationRegistrar find:object:observer:enumerator:] + 1664 (CoreFoundation + 362624) [0x7fff5138c880]
  43  ___CFXNotificationPost_block_invoke + 225 (CoreFoundation + 633569) [0x7fff513ceae1]
  43  _CFXRegistrationPost + 458 (CoreFoundation + 634282) [0x7fff513cedaa]
  43  __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 12 (CoreFoundation + 634588) [0x7fff513ceedc]
  43  CGLClearDrawable + 41 (OpenGL + 28651) [0x7fff5b8c1feb]
  43  _pthread_mutex_lock_slow + 253 (libsystem_pthread.dylib + 5320) [0x7fff7950c4c8]
  43  __psynch_mutexwait + 10 (libsystem_kernel.dylib + 117318) [0x7fff79346a46]
 *43  psynch_mtxcontinue + 0 (pthread + 31325) [0xffffff7f81967a5d] (blocked by pthread mutex owned by thunderbird (Thunderbird Daily) [3467] thread 0x20330)


Process:         thunderbird (Thunderbird Daily) [3467]
Path:            /Applications/Thunderbird Daily.app/Contents/MacOS/thunderbird
Architecture:    x86_64
Parent:          launchd [1]
UID:             501
Task size:       1273.60 MB
CPU Time:        0.055s (124.5M cycles, 25.3M instructions, 4.92c/i)
Note:            Unresponsive for 10 seconds before sampling
Note:            1 idle work queue thread omitted
> They should be packaged together in the same download directory IMHO

Should not be difficult - it should only change every 10 weeks when version numbers change. And if you have them bookmark, then you are good for when the version does change ...

Everyone in the other bugs seems to be gone for the summer :(  so I don't think we are going to make progress for some months.
Glenn from bug 1400568 writes "It’s a lot less frequent than it was, although I’m not sure if I’ve just learned to work around it. It does still happen once a week or so, when it used to happen three or four times a day."

So for most of you it has stopped, or it is better in version 60?
Summary: Hangs frequently while sending imap mail while copying message to Sent folder on Mac. deadlock on CGLClearDrawable? → Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. No problem if Sent is set to local folder. deadlock on CGLClearDrawable?
It is less frequent, but it still does it.
Not directly linked to this bug but may be linked to it as the closest I found that was updated recently... 

Today on Win 10 Pro, I got a crash report bp-8017d419-a7d9-47ba-ba11-f9a3a0181002 Thunderbird 60.0b11 Crash Report [@ shutdownhang | ntdll.dll@0x6a28c ] following closure of Thunderbird because it hangs on closure following the fact that it got stuck in trying to save copy of sent message to Sent folder on server via IMAP. The progress bar reached 100% before it start to hang in never ending the processing...

Before closing TB, I tried moving in different folders (which in some case help sorted the issue by re-connecting/re-authenticating to the server as indicated in status bar), flush dns, put in offline mode and back in online mode (does that re-initialise socket/connection to server?), etc... but nothing worked... TB was stuck in processing save copy of sent message to Sent folder. I could still send another message but it same issue raise with the second message... Msg were sent and received by recipients but no copy could be kept in Sent nor in a local folder (would be nice to have such option if copy to Sent folder on server incomplete somehow to avoid loosing msg).

Only thing that could be done is to close Thunderbird that then crashed upon closing as it hanged on closing.

As indicated before somewhere in some posts, as I am VPN user that may cause IP of mail server to change, I am wondering if that may cause TB to not handle such situation somehow by not updating its dns/socket/cached server connection information and cause it to hang in some sort of loop, where the only way to come out of it is to close the application and re-open it but in the current case situation that means loosing data (copy of msg in Sent folder). Stopping or restarting VPN had no effect though.

I would expect that using the disconnect/reconnect button in Thunderbird shall suffice to pop out of such processing/loop/hang situation but it does not... unfortunately for the user... and data is lost :-)

Hope this information may help sort the issue in the future or narrow it down... so it can be fixed once and for all... Saving a copy of sent message in Sent folder is a basic feature that shall not fails or if it does that Thunderbird shall cleary indicate why it failed to do so by a clear error msg, and allow to retry or regain access to the message so it can be saved later somehow... while it has already been sent...
Another possibility I would thought about is that computer goes to sleep and when it wakes up something in Thunderbird is not waking up or is no longer valid or accurate causing issue to the save copy to sent folder when sending a message... but the issue not being systematic and happening randomly is hard to track down...
Similar bug have been reported:
Bug 413240 "Save to Sent was successful, but "Copying to sent folder" doesn't finish, or zombie "Copy complete""
My computer never goes to sleep, so it is not that.
I haven't looked at all the attachments to this bug but most seem to be apple logs. Maybe an tb imap log would also be helpful since the problem seems to be saving to the imap Sent mailbox.

If an error is reported while saving to Sent, the patch here should allow the user to choose to save to Local Folders:
Bug 1366591. This also applies to saving saving Drafts and Templates

But not sure this fix is in the versions being tested by the reporter(s) of this bug.
As the original author of this bug report, I wanted to chime in with a bit of information in response to recent comments.  First, I'm glad it's getting some continued attention.  It's the kind of bug where you lose work -- it's effectively a crash bug, since you have to force quit.

1. It has repeated consistently with POP as well as IMAP.
2. Possibly related to Sent folder, but upon restart, the copy is always safely in the Sent folder (seems more like it deadlocks on cleanup, closing the progress window, rather than the task itself)
3. Earlier in this bug are extensive stack traces that point to the exact pthread mutexes which deadlock -- that should help to know where to look.
4. It seems like a race condition around mutexes -- always hard to find/fix.
https://support.mozilla.org/en-US/questions/1237929 states the cause was a signature file.  

I wonder if that is the case with some other reports
User Story: (updated)
Depends on: 1398807

It is still doing this on 65.0b4

This bug report is of course reported long before TB60, but bug 1525001 comment 4 states using beta resolves the issue he reported against TB60.

Can you try the newer beta 66 from https://www.thunderbird.net/en-US/channel/ and report your results?
(fair warning, most addons won't work - calendar will)

(In reply to Wayne Mery (:wsmwk) from comment #92)

Eckard, feel free to try as well.

Note: You'll need to get it from https://www.thunderbird.net/en-US/channel/ AND it might start a new profile

Your link leads to the actual 67.0b1 beta-version.
Should I try the 66.0b3 beta-version from http://ftp.mozilla.org/pub/thunderbird/releases/66.0b3/mac/ or 67.0b1 in a new profile?

Eckard, I suspect both 66 and 67 should be tested. Ultimately, for whomever can reproduce the the failure, we need to know the range in which it was fixed (if it was) so we can know what needs to be uplifted to esr.

There is a link for the Google Calendar on the Release Notes pages:
https://www.thunderbird.net/en-US/thunderbird/67.0beta/releasenotes/
https://www.thunderbird.net/en-US/thunderbird/66.0beta/releasenotes/
Second last item. Right-click to save the XPI file locally, then install add-on manually from file.

67b1 is working with gdata provider. But the crashes have grown more infrequent, so it will take time to know if the problem is fixed.

It is not fixed. 67b1 just hung after I sent gmail.

(In reply to Wayne Mery (:wsmwk) from comment #94)

I cannot reproduce the issue in new profiles neither with TB 66.0b3 nor with TB 67.0b1 (two different IMAP accounts with the French ISP Free, no GMail nor Yahoo accounts).

Are any more crash reports needed? I have several from my machine (with truncated traces), and one non-truncated from a coworkers machine. Both are modern iMacs - one is almost brand new, with an old thunder bird profile, while the other is an older machine with a a very new thunderbird profile.

Both report the same crash on CGLClearDrawable + 44. Most of the users in my office (all mac based) have been suffering from this bug for the last 2-4+ months and it seems to going up in frequency. Were eager to see a fix.

I can offer up some of my time for testing purposes too.

Thanks,

(In reply to James Rome from comment #98)

It is not fixed. 67b1 just hung after I sent gmail.

What happens if you point Sent folder to a local folder in account settings?

User Story: (updated)
Flags: needinfo?(jamesrome)
See Also: → 1551317

(In reply to Scott from comment #100)

...
Both report the same crash on CGLClearDrawable + 44. Most of the users in my office (all mac based) have been suffering ...
I can offer up some of my time for testing purposes too.

Great! Let's start at the same point as others:

Flags: needinfo?(sjames)

I pointed to local now. We shall see.

Flags: needinfo?(jamesrome)

https://support.mozilla.org/en-US/questions/1242885#answer-1179483 suggests disabling send progress helps, which might implicate graphics

User Story: (updated)
Flags: needinfo?(mozilla)

(In reply to Wayne from comment #102)

Hi Wayne,

#1 Most of my users do not save Sent mail at all, so I think this can be ruled out. We use O365, which rolled out automatic filing of sent emails to "Sent Items" folder. We had to disable Thunderbird saving an extra copy - to avoid duplicates - in Fall 2018 (around Sept 17th is when it started)

#2 I installed the beta version yesterday, I haven't had any hangs yet, but I haven't had to send many emails. In wait and see mode currently.

#3 I will look into this setting and report back.

Flags: needinfo?(sjames)

(In reply to Wayne from comment #102)

Further to #3:

My own preferences are set somewhat correctly... with sendInBackground set to False, and show_send_progress set to True, but offline.send.unsent_messages is set to 0 not 1 (as recommended in the other bug thread. However the reporter in that thread indicated he used the opposite settings and made his highly reproducible crash go away.

I have checked my own machine (both original v60 and beta) and both have matching settings. I also checked with one of my colleagues who is one of the only people in my office who does not experience the problem and he has the same settings as myself.

I will try toggling the settings once I experience another crash.

Thanks!

(In reply to Wayne from comment #102)

Beta Build 67.0b3 (64 Bit) just crashed for me.

It was running a fresh profile, imap with no local folders, it was saving to a imap sent folder (as I hadn't thought to disable that, it is now disabled).

I have an un-truncated crash log for it too, but it reports the same CGLClearDrawable error (+41 instead of +44).

Excerpt:

55 CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER + 12 (CoreFoundation + 634300) [0x7fff3f561dbc]
55 CGLClearDrawable + 41 (OpenGL + 28651) [0x7fff49a56feb]
55 _pthread_mutex_lock_slow + 253 (libsystem_pthread.dylib + 5320) [0x7fff677924c8]
55 __psynch_mutexwait + 10 (libsystem_kernel.dylib + 117318) [0x7fff675cca46]
*55 psynch_mtxcontinue + 0 (pthread + 31325) [0xffffff7f82a10a5d] (blocked by pthread mutex owned by thunderbird (Thunderbird) [416] thread 0x189b)

It hung again sending IMAP mail, and Sent mail is a local folder.

It has hung twice today. But I have put sent mail back to Google.

Just happened to me in TB 67.0b2 (64-bit) while sending message, the save to Sent folder on IMAP server while online ok, progressed up to 89% and then got stuck somehow... see attached... I left it as is but after few hours no progress, TB still in processing status :-)

Those are errors appearing in console fyi... if that can be of any help...

NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIAbDirectoryQuery.doQuery]
nsAbLDAPAutoCompleteSearch.js:261
TypeError: complistItem.occurrence is undefined
agenda-listbox.js:1041:13
TypeError: this.mItemInfoCache[aNewItem.id] is undefined
calDavCalendar.js:758:28
NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: [JavaScript Error: "this.mItemInfoCache[aNewItem.id] is undefined" {file: "jar:file:///C:/Users/richard/AppData/Roaming/Thunderbird/Profiles/cnant748.default/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D.xpi!/components/calDavCalendar.js" line: 758}]'[JavaScript Error: "this.mItemInfoCache[aNewItem.id] is undefined" {file: "jar:file:///C:/Users/richard/AppData/Roaming/Thunderbird/Profiles/cnant748.default/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D.xpi!/components/calDavCalendar.js" line: 758}]' when calling method: [calICalendar::modifyItem] calAlarmService.js:174
Assert failed: aElement
calUtils.jsm:147
ASSERT resource://calendar/modules/calUtils.jsm:147
setElementValue chrome://calendar/content/calendar-ui-utils.js:32
setBooleanAttribute chrome://calendar/content/calendar-ui-utils.js:91
enableTimeIndicator chrome://calendar/content/calendar-multiday-view.xml:2573
view_XBL_Constructor chrome://calendar/content/calendar-multiday-view.xml:2534
Could not start Browser Toolbox, you need to enable it. ToolboxProcess.jsm:77:13
init resource://devtools/client/framework/ToolboxProcess.jsm:77
oncommand chrome://messenger/content/messenger.xul:1
TypeError: this.gViewSourceUtils is undefined
webconsole.js:168:5

Clicking on red cross icon to close the previous prompt, raise a new prompt Save message > Your message was sent but a copy...etc... as per attached...

(In reply to Richard in comment #110)

That console report references an error occurring with your calendar extension (caldavCalendar.js). Does the crash log also report a "CGLClearDrawable" and/or Mutex error, if not - this may be distinct.

Wayne perhaps can provide more guidance.

TB do not crash in my case, just hand in processing mode... (as per processing icon appearing on the main tab blue circle)... also worth mentioning I am using TB on Windows... sorry only just recall this bug was referring to macos...

(In reply to Richard in commender #113)

The bug is affecting both, but seems more prevalent in MacOS. Hang is perhaps the correct term, the app gets stuck, and doesn't continue during (?) or after the moment the sent email progress bar closes. In my experience Thunderbird stays open and does not close, but has hung/is no longer responding to user input. You have to do the equivalent of a force quit (or control alt delete) and manually force it to close, and then generate a bug report.

I switched back to use the release version this morning (as it has my local email folders and message filters). I will keep an eye on it, but I feel like its crashing more frequently than the beta does. The beta doesn't remove the problem (as myself and others have pointed out though).

It could also just be the transient nature of the bug, some days it happens more than others.

Anyone tried 68 beta?

Blocks: 1561990

In Reply to Wayne in comment #116

Upgraded today and got a crash a little while ago. No log was generated, but I would assume it is the same crash. Will confirm as soon as I see a log.

Are we discussing more than one issue here? When I see the problem I have, which happens about once per day (for months, and still, using current Daily), on MacOS, I get the spinning beachball, and I need to Force Quit. There’s no possibility to get a log here, is there?

Are people talking about getting logs, and seeing CGLClearDrawable, seeing a different issue than me? I am using IMAP Sent folder, and I only see it immediately after a successful mail send. But I send many emails per day, yet it only hangs once or twice per day.

(In reply to Calum Mackay from comment #119)
It could be, probably is?

I am getting logs intermittently now.

The crash always occurs during the sending process. Usually - if not always, when I have moved on to do something else whilst the email completes. Thunderbird will hang (with the beach ball) and needs to be force quit. Every log I have examined (this includes multiple workstations in our office) includes the CGLClearDrawable handle.

In my experience, your usage of a sent folder doesn't affect the crash as Thunderbird in our environment (O365) does not even save its own sent messages and still crashes. Crashes seem less frequent in the Beta releases, but still happen.

Some of my users have started (temporarily) using Apple Mail again to avoid incessant crashes.

(In reply to Scott from comment #120)

thanks Scott; how are you getting logs following the force quit?

(In reply to Calum Mackay from comment #121)

(In reply to Scott from comment #120)

thanks Scott; how are you getting logs following the force quit?

Good question. That explains why it doesn't give me a log every time, but it does intermittently. I assumed it was normal behavior.

(In reply to Scott from comment #122)

thanks Scott; how are you getting logs following the force quit?

Good question. That explains why it doesn't give me a log every time, but it does intermittently. I assumed it was normal behavior.

thanks. I never get logs when mine does this. Or perhaps I'm just not leaving it long enough before doing the Force Quit.

(In reply to Calum Mackay from comment #123)

thanks. I never get logs when mine does this. Or perhaps I'm just not leaving it long enough before doing the Force Quit.

I have gotten them in both situations. If its been too long since the crash the log is truncated and not very useful. If its more recent it will have more information on the thread that crashed.

I am regularly experiencing the same issue: Thunderbird hangs after sending an email with the spinning beach ball mouse cursor. I'm on MacOS 10.13.6. Sometimes, but not always, I receive a Apple bug report dialog along with a crash log after Force-Quitting. I'll attach my crash log to the ticket.

(In reply to Edmond from comment #125)

I am regularly experiencing the same issue: Thunderbird hangs after sending an email with the spinning beach ball mouse cursor. I'm on MacOS 10.13.6. Sometimes, but not always, I receive a Apple bug report dialog along with a crash log after Force-Quitting. I'll attach my crash log to the ticket.

Looks like the same bug to me - based on your crash report. Welcome :)

We may need a Mac-expert for this bug, and bug 1398807, and bug 1422251.

Also, anyone with a 4K or 5K monitor?

Flags: needinfo?(acdp)

(In reply to Calum Mackay from comment #119)

Are we discussing more than one issue here?

Yes, very possible (even probable) people are seeing more than one issue - even more so because the problem comments span two years, and multiple versions.

(In reply to James Rome from comment #74)

Alas, the [mozregression] tool is for windows only.

No, because there is a command line tool for Mac. So if this is a regression in the version 54 time frame and someone can reliably reproduce this, then the regression range should be easy to get.

User Story: (updated)

See also Bug 1334549... which looks like a similar issue... if that can help...

And also FYI Bug 1257235...

(In reply to Wayne Mery (:wsmwk) from comment #128)

We may need a Mac-expert for this bug, and bug 1398807, and bug 1422251.

Also, anyone with a 4K or 5K monitor?

What constitutes a Mac-expert?

I am running a Mac, w/ a built-in 5K display, and experiencing the bug.

Someone who can debug it and write a patch to fix it, I suppose.

Would recording Performance via DevTools (Ctrl+i on Windows to open it) help to identify the issue?

Any end-user encountering the issue on Mac may be able to do it:

  • prepare msg ready to send
  • open DevTools (you can keep it open in parallell of TB)
  • go to Performance tab (if missing make sure Petformance option ticked in DevTools settings)
  • start recording Performance
  • send msg
  • if msg send ok, stop recording Performance, and delete profile created. Repeat process above till a msg fails to send.
    OR
  • if msg not sending properly, wait a bit, then stop recording Performance. Then save petformance profile recorded and post here.

Maybe that can help identify what TB is doing when sending but not completing process somehow...

Would that help?

(In reply to Scott from comment #132)

(In reply to Wayne Mery (:wsmwk) from comment #128)

We may need a Mac-expert for this bug, and bug 1398807, and bug 1422251.

Also, anyone with a 4K or 5K monitor?

What constitutes a Mac-expert?

The starting point would be someone who can specifically identify the steps to reproduce, or even better get the regression range.
https://mozilla.github.io/mozregression/quickstart.html and for Mac use command line.

I am running a Mac, w/ a built-in 5K display, and experiencing the bug.
If it were an external display I would ask whether changing the display to 4k or non-4k eliminates the problem. Which brings us back to regression range.

(In reply to Wayne Mery (:wsmwk) from comment #135)

The starting point would be someone who can specifically identify the steps to reproduce, or even better get the regression range.
https://mozilla.github.io/mozregression/quickstart.html and for Mac use command line.

What range do you want to be looked at?

Note that I originally filed bug 1400568 two years ago (https://bugzilla.mozilla.org/show_bug.cgi?id=1400568) which, in the followup comments, showed a mutex lock in CGLClearDrawable. It was not on a 5k monitor or even a 4k monitor. Recent comments show that it's still exactly the same bug.

What is needed is not so much a Mac expert, but a thread/mutex expert. These are hard bugs to find/fix, but not impossible. Usually it involves reading the code carefully around the mutexes, looking for race conditions and false assumptions.

I suspect that there is an assumption lurking in the mutex code, some intermediate operation (displaying the progress bar, most likely) that takes a variable amount of time, and sometimes the race condition causes the mutex never to unlock. I would try commenting out the progress window completely (who needs 'em anyway?).

After two years of flailing around in Bugzilla, I have lost interest in this bug, unfortunately, but I hope somebody fixes it.

(In reply to Richard Leger from comment #136)

...
What range do you want to be looked at?

Try July 1 2017 to August 10 2017. Hard to be more exacting - and depends on whether this is reproducicble in nightlies. If doesn't fail in July 1, try a month or two earlier.

User Story: (updated)
Flags: needinfo?(richard.leger)
Summary: Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. No problem if Sent is set to local folder. deadlock on CGLClearDrawable? → Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. - displaying the progress bar. No problem if Sent is set to local folder. Deadlock on CGLClearDrawable.

(In reply to Glenn Reid from comment #137)

originally filed bug 1400568 two years ago [against 52.3.0 2017-09-16] which ... showed a mutex lock in CGLClearDrawable. It was not on a 5k monitor or even a 4k monitor.

Good point. It may not be a trigger, or requirement, for most.

There is an interesting development in bug 1422251 comment 18 - the reporter stopped having trouble after moving to beta 67. If that holds for others then version 68 should work better.

(In reply to Wayne Mery (:wsmwk) from comment #138)

(In reply to Richard Leger from comment #136)

...
What range do you want to be looked at?

Try July 1 2017 to August 10 2017. Hard to be more exacting - and depends on whether this is reproducicble in nightlies. If doesn't fail in July 1, try a month or two earlier.

Actually, I forgot that we had earlier determined a probable range of
http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/
http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/
base, iirc, on James' report of seeing this in beta 54

I don't see any Thunderbird debug symbols in the various apple log dumps. Seeing TB functions show up in those logs might give some more clues? I'm not sure which builds (if any) have debug symbols in them - nightly, maybe?
I've no idea if the apple log will show up TB symbols even if they're there, but it should do.

Other random thoughts:

A GL mutex lock kind of implies that GUI code is being called from multiple threads, but I was under the impression that all the GUI code in TB was driven via main-thread-only javascript. And that has warning lights all over it which go off if it's called from the wrong thread.

But yeah, ultimately this kind of bug really needs someone running a debug build on a mac, catching it a debugger, and poking about to see what's locked up.

(In reply to Wayne Mery (:wsmwk) from comment #139)

(In reply to Glenn Reid from comment #137)

originally filed bug 1400568 two years ago [against 52.3.0 2017-09-16] which ... showed a mutex lock in CGLClearDrawable. It was not on a 5k monitor or even a 4k monitor.

Good point. It may not be a trigger, or requirement, for most.

The display resolution was speculation from me, trying to explain why it was affecting more macs than windows machines. They commonly have higher resolution screens and less powerful graphics cards leading to - presumably lower refresh rates - which is what I was trying to get at with the suggestion :)

Someone who better understands the mutex function may be able to simply rules this out though!

(In reply to Wayne Mery (:wsmwk) from comment #141)

There is an interesting development in bug 1422251 comment 18 - the reporter stopped having trouble after moving to beta 67. If that holds for others then version 68 should work better.

Still occurs in 69.0b3.

As a test I manually sent about 55 email at random times over a period of several hours and haven't seen a problem. This is with macbook air and tb version 60.8.0. I was sending from a non-gmail account to a gmail account. Save to Sent on non-gmail worked fine with no lock-ups. The messages were not huge and mostly just old emails archived from a mailing list. Display on mbAir (running mavericks 10.9.5) not very high res, 1280x800.
Does anyone cc'd on this bug see the problem on mbAir?

Another clue. I just sent an e-mail, and it hung while downloading a message--presumably the copy of the message I sent. See attached image.

(In reply to James Rome from comment #148)

Another clue. I just sent an e-mail, and it hung while downloading a message--presumably the copy of the message I sent. See attached image.

Any evidence this is the same crash (crash log showing same types of mutex locks/clear drawable issues)... I have personally experienced the crash probably close to 50 times now and not seen an issue downloading messages.

See Also: 1422251

fyi, in bug#1547339 before it was closed-as-DUP, I reported hitting this problem when running Thunderbird 60.8.0 on Mac OSX 10.14.6 (and list of other previous versions). If it helps, I had attached crash-dumps in bug#1547339 over the last few months.

(aside: Thanks to :wsmwk for connecting these two tickets)

Flags: needinfo?(richard.leger)
Flags: needinfo?(acdp)
Whiteboard: [regression:TB54?] → [regression:TB54?][duptome]

Thanks to everyone who contributed their system configs, noted in user story

User Story: (updated)

For what it's worth, the problem I was seeing seems recently (within the last month or so) to have stopped.

For a long time (a year or more), a few times a week, TB would stop responding, immediately after sending a message (with IMAP Sent folder). I had to Force Quit, and never once got a stack trace or report. It belatedly occurs to me that I've not seen this happen for several weeks now. I'd been away for a couple of weeks holiday; having now been back a few weeks, I don't think it's happened since before my holiday, so that's at least a month.

I'm running Daily builds, updated daily (or when they change), on MacOS 10.14.6 (currently).

I appreciate this doesn't help much; just as a data point.

(In reply to Wayne Mery (:wsmwk) from comment #142)

(In reply to Wayne Mery (:wsmwk) from comment #138)

(In reply to Richard Leger from comment #136)

...
What range do you want to be looked at?

Try July 1 2017 to August 10 2017. Hard to be more exacting - and depends on whether this is reproducicble in nightlies. If doesn't fail in July 1, try a month or two earlier.

Actually, I forgot that we had earlier determined a probable range of
http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/
http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/
base, iirc, on James' report of seeing this in beta 54

As suggested, I run bisection 2017-03-08 till 2017-06-13 with MozRegression on Windows 10 pro x64 with Thunderbird 32-bits...

Bisection setup

  • New profile (re-used)
  • IMAP/SMTP mailbox setup - keep message - sync most recent 30 days
  • Use each TB version for a full day before moving on to the next...

Results (see also attached):

Start 55.0a1 (2017-04-26) - good
55.0a1 (2017-05-20) - good
55.0a1 (2017-06-01) - good
55.0a1 (2017-06-07) - good
55.0a1 (2017-06-10) - bad (one time) - RB could not save a copy of sent email to Sent folder on the server, stuck in processing, disconnecting network on computer and reconnecting cause a TB prompt to appear asking to Retry, after which it worked.
55.0a1 (2017-06-09) - good for sending items - but bad (one time) for "Copying message to Draft folder..." When trying to access mail folders while that happen the popup error message "Could not connect to mail server xxx. Connection was refused."... could have been issue at server side for this one (temporarily disconnected due to kernel update)

At the end there were a message saying it did not have enough info to establish a bisection of code or something like that... not sure what it means...

Bisection Information for the bad one:

app_name: thunderbird
build_date: 2017-06-10
build_file: C:\Users\richard.mozilla\mozregression\persist\2017-06-10--comm-central--thunderbird-55.0a1.en-US.win32.zip
build_type: nightly
build_url: https://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-10-03-02-05-comm-central/thunderbird-55.0a1.en-US.win32.zip
changeset: 3e3745b52dc53eb74efd73d3107a81e2e13f94be
pushlog_url: https://hg.mozilla.org/comm-central/pushloghtml?fromchange=e915a8b2f1f505d7c4fa1820f35ce99d1164b293&tochange=3e3745b52dc53eb74efd73d3107a81e2e13f94be
repo_name: comm-central
repo_url: https://hg.mozilla.org/comm-central

Could it be that Thunderbird has an issue to connect to the IMAP server to file copy of message and it does not retries properly or identify the issue to prompt user to retry?

We use a dedicated CA cert for SSL validation which is added to Thunderbird Certificates storage prior using the account...

I have just had another hang. The email has been copied to the (IMAP) Sent folder, so the hang must have appeared afterwards.

I’m on the beta update channel and running 70.0b1.

All:

I've been following this thread with interest.  TB is one of the main tools I use in my business, and these frequent hangs get in the way.  So to all of you: THANK YOU for thinking about this.

A few minutes ago, TB hung again.  After issuing a Force Quit, I captured the data that normally goes to Apple.  It may be huge, but it may also shed some light.  Here we go with an edit/paste.

Sincerely,
Bob

I hate to throw a wrench into this effort, but I moved sent mail to a local folder, and TB still hangs. Maybe not so often though...

User Story: (updated)

This specific signature seems to be missing in the report so far.

This is Thunderbird 60.9.0 (64-bit) on macOS High Sierra (10.13.6). I'm using IMAP4 to gmail, and a gmail SMTP server. This also applies to some recent Thunderbird versions.

After sending an email, Thunderbird sometimes stalls, requiring ForceQuit and restart. In all cases, the message gets sent, and gets copied into my "Sent Mail" folder. After the restart all is well (until the next time).

This is 100% reproducible: The stall happens ONLY if I click on the main window before the message window disappears, or within a few seconds. If I remember to wait at least 5 seconds after the send window disappears, it does not stall and all is well.

This has been happening for several months, over several Thunderbird updates. IIRC it was worse on earlier versions, but I have modified my behavior to minimize it.

I believe that this also applies to an NNTP send to giganews and IMAP4 save to gmail.
I also believe that it is OK to click on any other application's window without stalling Thunderbird.
I don't have good statistics on these, however.

Richard, thank you for your scientific study. And signficant that it correlates to Jame's earlier findings. If accurate, others should find that https://ftp.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-10-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg shows the problem.

Can we determine the cause from https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10 (blissfully small list) ?
Bug 1364977 ?

Flags: needinfo?(jorgk)

Hmm, we've got two slightly different ranges here:
pushlog_url: https://hg.mozilla.org/comm-central/pushloghtml?fromchange=e915a8b2f1f505d7c4fa1820f35ce99d1164b293&tochange=3e3745b52dc53eb74efd73d3107a81e2e13f94be
from comment #160 and
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10
from comment #166.

If I read the bug summary, it talks about an IMAP issue when copying the sent message to the Sent folder. We also read:
No problem if Sent is set to local folder. Deadlock on CGLClearDrawable.

So it's possible that the regression range is correct, but if really CGLClearDrawable is a problem, then we shouldn't look for the issue in C-C. The equivalent date range based on dates is:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10

Looking at bug 1364977 we see that for TB it added the removal of a command observer for housekeeping purposes. We note that this observer is also used in editor.js
https://searchfox.org/comm-central/search?q=obs_documentCreated&case=false&regexp=false&path=editor.js
where it's not removed.

I doubt that this change caused any problems, but I'm happy to provide a try build with that change removed. I guess you want a build for Mac, so would you like this to be based on TB 68 ESR, TB 70 beta or trunk? Note that the code added in bug 1364977 runs when the compose window closes, so it's not entirely impossible that that has some bad effect onto other things happening at the same time.

Flags: needinfo?(jorgk)

(In reply to Jorg K (GMT+2) from comment #167)

Hmm, we've got two slightly different ranges here:
pushlog_url: https://hg.mozilla.org/comm-central/pushloghtml?fromchange=e915a8b2f1f505d7c4fa1820f35ce99d1164b293&tochange=3e3745b52dc53eb74efd73d3107a81e2e13f94be
from comment #160 and
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10
from comment #166.

The range in comment #166 is approximated from comment 160 so it could be applied to mozilla-central.

If I read the bug summary, it talks about an IMAP issue when copying the sent message to the Sent folder. We also read:
No problem if Sent is set to local folder. Deadlock on CGLClearDrawable.

So it's possible that the regression range is correct, but if really CGLClearDrawable is a problem, then we shouldn't look for the issue in C-C. The equivalent date range based on dates is:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10

I too suspect the issue will be in M-C. But a regression range of three days may be challenging to get a hit. Even a one day range is challenging.

Richard (or anyone else??), can you run regression tests to see if you can narrow 2017-06-07 thru 2017-06-10 to a one day range?

... I'm happy to provide a try build with that change removed.

Let's see first if Richard has positive results

Flags: needinfo?(richard.leger)

Attached is crash while sending email today using TB 60.9.0 on OSX 10.14.6.

I just noticed that I am on TB 60.9.0 (64-bit). While the about panel claims I am on "the release update channel", I see on https://www.thunderbird.net/en-US/thunderbird/releases/ that there are newer 68.1.x releases available for manual download, even though my TB installation does not see them. Will investigate here, but flagging in case that helps with narrowing the debugging range.

(In reply to John O'Duinn [:joduinn] (please use "needinfo?" flag) from comment #169)

...
I just noticed that I am on TB 60.9.0 (64-bit). While the about panel claims I am on "the release update channel", I see on https://www.thunderbird.net/en-US/thunderbird/releases/ that there are newer 68.1.x releases available for manual download, even though my TB installation does not see them.

That is because update to version 68 from version 60 are not currently enabled. But that wouldn't help you with this issue anyway.

Still no reply re. comment #167: Do you want a try build? For Mac? Based on which version of TB?

I'm using 68.1.2 and mine crashes multiple times daily, thread dump includes the CGLClearDrawable call (and a few calls beneath that).

My messages ALWAYS send successfully, and the "Sent" message copy is also saved to the server correctly. So no data is lost for me. Seems to be related to mouse activity connected with the main window while sending a message -- not just clicking. I do a lot of scrolling with the touchpad, and I often don't want for a message window to close before moving-on with my life, so I'm surely generating mouse-events on the main window during and after the message window closes.

I have only Lightning and Enigmail extensions installed. No funny business with custom x509 certificates.

I'm happy to try previous versions to see what happens. I'm also happy to try a custom build to test out a fix. I'll even run it in a debugger if that will help in any way. I'm desperate to get this resolved, but I'm not sure how best to help.

(In reply to Jorg K (GMT+2) from comment #167)

....
I doubt that this change caused any problems, but I'm happy to provide a try build with that change removed. I guess you want a build for Mac, so would you like this to be based on TB 68 ESR, TB 70 beta or trunk? Note that the code added in bug 1364977 runs when the compose window closes, so it's not entirely impossible that that has some bad effect onto other things happening at the same time.

Most reporters are using esr, so 68 ESR please. And if that doesn't work out then Beta 70.

Flags: needinfo?(jamesrome)

(previously wrong person)

Flags: needinfo?(jamesrome) → needinfo?(jorgk)

Mac try build started:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=189dafbb360eab163171568640014c3829d1f4dd
Since I've been doing merges and uplifts yesterday, this will be a TB 68.2.0 ESR pre-release. I'll paste the path to the binary here later.

What is the google provider that goes with this, and where do we get it?

The one from ATN for TB 68, it's just a regular add-on.

Sorry, this test build hung already.

(In reply to James Rome from comment #180)

Sorry, this test build hung already.

I had one too. Looked eerily similar but no crash report unfortunately.

(In reply to Wayne Mery (:wsmwk) from comment #168)

(In reply to Jorg K (GMT+2) from comment #167)

So it's possible that the regression range is correct, but if really CGLClearDrawable is a problem, then we shouldn't look for the issue in C-C. The equivalent date range based on dates is:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10

I too suspect the issue will be in M-C. But a regression range of three days may be challenging to get a hit. Even a one day range is challenging.

Richard (or anyone else??), can you [re]run regression tests to see if you can narrow 2017-06-07 thru 2017-06-10 to a one day range?

The builds that need to be tested by those who have been able to reliably reproduce are:

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg
  2. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-08-03-02-07-comm-central/thunderbird-55.0a1.en-US.mac.dmg
  3. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-09-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg
  4. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-10-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

#1 presumably works and #4 is presumably fails. So #2 and #3 should be tested first.
Backup your profile before testing.

Flags: needinfo?(sjames)
Flags: needinfo?(jamesrome)
Flags: needinfo?(chris)

Sorry, this test build hung already.

Sorry, I suggested that this wouldn't help, see comment #167. The issue that only Mac users are experiencing is clearly in some low-level Mac graphics library, see "Deadlock on CGLClearDrawable" in the summary of this bug.

So, Wayne and Jorg, what's the best way to make progress, here? Try the builds referenced by Wayne in comment #182, or try Jorg's build referenced in comment #177? I can probably try them all. While I can't reliably make it crash, it happens so often that if I went a whole day without crashing, I'd consider the bug "not present in that build".

Will running any of these builds damage my tb user profile? I'm currently running 68.1.2. I use IMAP for everything so I'm not worried about the messages themselves; mostly just the setting and all that. I can re-build if necessary but would prefer to avoid it if possible.

As we heard (and expected), the build from comment #177 isn't any good.

Apparently the problem started to occur between the 2017-06-07 and the 2017-06-10. So in comment #182 Wayne suggests to try the builds of 7th, 8th, 9th and 10th of June 2017.

Running "old" builds, in this case TB 55 on a fresh profile can lead to malfunctions if the profile has already been upgraded to a newer version. I don't think it will cause damage or changes to the profile, but Wayne suggested to do a backup just in case.

Personally I think it's important to also try the 7th and 10th to double/triple check that the issue really stated there.

(In reply to Jorg K (GMT+2) from comment #167)

If I read the bug summary, it talks about an IMAP issue when copying the sent message to the Sent folder. We also read:
No problem if Sent is set to local folder. Deadlock on CGLClearDrawable.

To Clarify - the bug definitely occurs regardless of sent folder location. Local folder, No sent folder at all, or online folder. I do not even have TB configured to save sent messages - O365 creates duplicates on the back end if I do; and experience the crash daily.

Flags: needinfo?(sjames)

Thanks for the addition info.

I look forward to everyone's test results of comment 182 in the next few days so we can finally nail this bugger.

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I backed-up my profile and launched this version with no other modifications to my profile. I'm having trouble writing my first email... I'm getting the color-wheel about once a second for two seconds. Mouse clicks are ignored. Keyboard is ignored.

(In reply to Christopher Schultz from comment #188)

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I backed-up my profile and launched this version with no other modifications to my profile. I'm having trouble writing my first email... I'm getting the color-wheel about once a second for two seconds. Mouse clicks are ignored. Keyboard is ignored.

This finally cleared-up. I've had other bouts of the color-wheel appearing, but they all eventually do clear-up and I'm able to continue my work. Still testing...

FYI, running bisection 2017-06-06 to 2017-06-11 on Windows 10 with TB 32 bits keep you posted... I'll remove the need info flag when I'll publish the result... as it is reduced period and number of version I plan to use each version for a few days in a row... to maximise chances to get the issue... as said in my previous test result, issue was linked to a temporary lost of connection to the IMAP server during the sending (server was rebooting)... with TB unable to resume task while server was back and running... I don't know if that help... as info...

(Moving back to Thunderbird where it's more likely for users to find it.)

Component: Networking: IMAP → Message Compose Window
Product: MailNews Core → Thunderbird
Summary: Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. - displaying the progress bar. No problem if Sent is set to local folder. Deadlock on CGLClearDrawable. → Hangs sending imap mail while copying message to imap Sent folder on Mac while displaying the progress bar. Deadlock in graphics on CGLClearDrawable.
Whiteboard: [regression:TB54?][duptome] → [regression:TB54?][duptome][workaound: comment 104]
Version: 54 → 54 Branch

(In reply to Wayne Mery (:wsmwk) from comment #182)

The builds that need to be tested by those who have been able to reliably reproduce are:

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I believe I have gotten this one to lock-up. ActivityMonitor.app says "Not Responding" and there's no CPU usage. I'm going to wait a good long time before killing it and hopefully I'll be able to get a thread dump.

But this was the "suspected good build," so maybe we have to cast a wider net.

Flags: needinfo?(chris)

(In reply to Christopher Schultz from comment #193)

(In reply to Wayne Mery (:wsmwk) from comment #182)

The builds that need to be tested by those who have been able to reliably reproduce are:

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I believe I have gotten this one to lock-up. ActivityMonitor.app says "Not Responding" and there's no CPU usage. I'm going to wait a good long time before killing it and hopefully I'll be able to get a thread dump.

But this was the "suspected good build," so maybe we have to cast a wider net.

Yep, deadlocked at the same place:

[...]
11 CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER + 12 (CoreFoundation + 646038) [0x7fff5133eb96] 1-11
11 CGLClearDrawable + 44 (OpenGL + 28513) [0x7fff5af37f61] 1-11
11 _pthread_mutex_firstfit_lock_slow + 222 (libsystem_pthread.dylib + 5325) [0x7fff7d4234cd] 1-11
11 __psynch_mutexwait + 10 (libsystem_kernel.dylib + 16134) [0x7fff7d368f06] 1-11
*11 psynch_mtxcontinue + 0 (pthread + 10172) [0xffffff7f827fa7bc] (blocked by pthread mutex owned by thunderbird [42207] thread 0x86053b) 1-11

(In reply to Richard Leger from comment #190)

FYI, running bisection 2017-06-06 to 2017-06-11 on Windows 10 with TB 32 bits keep you posted... I'll remove the need info flag when I'll publish the result... as it is reduced period and number of version I plan to use each version for a few days in a row... to maximise chances to get the issue.

Suggest we use Richard's idea of running multiple days with one build, and do it wit multiple people with each person taking one or more builds in a coordinated manner:
Richard http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-06-03-02-05-comm-central/ ?
test#2 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-05-03-02-06-comm-central/
test#3 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-04-03-02-08-comm-central/
test#4 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-03-03-02-05-comm-central/
test#5 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-02-03-02-06-comm-central/
test#6 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-01-03-02-08-comm-central/

Although we do not know whether the regression is in fact in ths range. What do you think?

Is it possible to create a debug version of TB that would print out the necessary information to diagnose this in real time? Logs do not work when it hangs.

Flags: needinfo?(jamesrome)

(In reply to James Rome from comment #196)

Is it possible to create a debug version of TB that would print out the necessary information to diagnose this in real time? Logs do not work when it hangs.

Perhaps. But we would need a developer familiar with this area of coe to define the process, or enen say whether such a thing is possible. Right now there isn't such a person.

What we CAN do now - without any special tools or knowledge - is FIND the regression range, which is the path suggested two years ago. This can ONLY be done by those of you who can reproduce this - the rest of us are powerless to help except encourage you on the path.

To elaborate on "we do not know whether the regression is in fact in this range" of June, it was originally thought maybe this began in nightly 54, so the range possibly includes many months before June.

(In reply to Christopher Schultz from comment #194)

(In reply to Christopher Schultz from comment #193)

(In reply to Wayne Mery (:wsmwk) from comment #182)

The builds that need to be tested by those who have been able to reliably reproduce are:

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I believe I have gotten this one to lock-up. ActivityMonitor.app says "Not Responding" and there's no CPU usage. I'm going to wait a good long time before killing it and hopefully I'll be able to get a thread dump.

But this was the "suspected good build," so maybe we have to cast a wider net.

Yep, deadlocked at the same place:

[...]
11 CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER + 12 (CoreFoundation + 646038) [0x7fff5133eb96] 1-11
11 CGLClearDrawable + 44 (OpenGL + 28513) [0x7fff5af37f61] 1-11
11 _pthread_mutex_firstfit_lock_slow + 222 (libsystem_pthread.dylib + 5325) [0x7fff7d4234cd] 1-11
11 __psynch_mutexwait + 10 (libsystem_kernel.dylib + 16134) [0x7fff7d368f06] 1-11
*11 psynch_mtxcontinue + 0 (pthread + 10172) [0xffffff7f827fa7bc] (blocked by pthread mutex owned by thunderbird [42207] thread 0x86053b) 1-11

Locked-up again. I also confirmed that (a) I had re-installed the Daily.app before launch and (b) disabled auto-update so it wouldn't keep upgrading itself.

For anyone who can't get a thread dump, try this: while the color-wheel is spinning and before you Force-Quit, run the "Activity Monitor" application, choose "Thunderbird" (or "Daily") in the list of processes, click the gear-icon in the upper-left-hand corner of the window and choose "Sample Process". After a few seconds, it will give you a thread dump as text with a bunch of header information. Every time this has happened to me, the offending thread was the first one listed. It shows every call starting with the thread-start at the top and the current work being done at the bottom, on the most-indented line. A few lines from the bottom, you'll see what I have posted above, including the call to CGLClearDreawable.

So even the assumed-good build is locking-up for me.

(In reply to Richard Leger from comment #190)

FYI, running bisection 2017-06-06 to 2017-06-11 on Windows 10 with TB 32 bits keep you posted... I'll remove the need info flag when I'll publish the result... as it is reduced period and number of version I plan to use each version for a few days in a row... to maximise chances to get the issue... as said in my previous test result, issue was linked to a temporary lost of connection to the IMAP server during the sending (server was rebooting)... with TB unable to resume task while server was back and running... I don't know if that help... as info...

It seems today at 9:36am, I had one issue with saving copy of message to Sent with TB 2017-06-09 while sending a simple text message...
pushlog_url https://hg.mozilla.org/comm-central/pushloghtml?fromchange=cc0700686608ad42e5847abcfc10f1c25b644352&tochange=b8876205fa8dbf22f34ffffadce627327ad51f24

In Activity Manager it shows one entry "connection refused at 9:23am" but server was available at all time as I checked and especially at 9:36am... indeed the message was sent... as I checked by other means. I could also access any messages in Inbox from any dates...

But I also noticed when the issue raise, TB keep trying to save a copy without process to complete, that it was also trying to bring the Sent folder uptodate and download some message in it... but that process appeared "Paused"... then I decided to browse between multiple folder to check if issue with the server but could still have access to messages... then browsing back in the Sent folder... seems to trigger it to be updated again... and the bring folder uptodate activity log resume its process... till completed.

Then the message was copied to Sent folder fine by itself (previously stall process completed successfully at that point), all I had to do was to browse back and form from/to Sent folder in Main TB UI so it may have triggered/retriggered connection to the server and update of the folder again...

I have slightly changed my IMAP settings to sync only 1 days worth of emails and not download email larger that 50k. So you know!

I was also connected via VPN to the office at the time but that should not make any different to TB...

2017-06-10 is impossible to use for more than few hours because UI keep slowing down over time to the point it is unbearable/unusable...

Hope this info can help...

Flags: needinfo?(richard.leger)

(In reply to Richard Leger from comment #199)

(In reply to Richard Leger from comment #190)

FYI, running bisection 2017-06-06 to 2017-06-11 on Windows 10 with TB 32 bits keep you posted... I'll remove the need info flag when I'll publish the result... as it is reduced period and number of version I plan to use each version for a few days in a row... to maximise chances to get the issue... as said in my previous test result, issue was linked to a temporary lost of connection to the IMAP server during the sending (server was rebooting)... with TB unable to resume task while server was back and running... I don't know if that help... as info...

It seems today at 9:36am, I had one issue with saving copy of message to Sent with TB 2017-06-09 while sending a simple text message...
pushlog_url https://hg.mozilla.org/comm-central/pushloghtml?fromchange=cc0700686608ad42e5847abcfc10f1c25b644352&tochange=b8876205fa8dbf22f34ffffadce627327ad51f24

Forgot to mentioned that I was running a bisection to find related fixes and not regressions (by default)... to get the link above...

(In reply to Richard Leger from comment #199)

with TB 2017-06-09

FYI, about the TB version I referred to...

app_name: thunderbird
build_date: 2017-06-09
build_file: C:\Users\richard.mozilla\mozregression\persist\2017-06-09--comm-central--thunderbird-55.0a1.en-US.win32.zip
build_type: nightly
build_url: https://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-09-03-02-05-comm-central/thunderbird-55.0a1.en-US.win32.zip
changeset: 998749e6ed4e8c8a70b406fa421cf64e98f0977a
pushlog_url: https://hg.mozilla.org/comm-central/pushloghtml?fromchange=998749e6ed4e8c8a70b406fa421cf64e98f0977a&tochange=b8876205fa8dbf22f34ffffadce627327ad51f24
repo_name: comm-central
repo_url: https://hg.mozilla.org/comm-central

(In reply to Wayne Mery (:wsmwk) from comment #195)

Richard http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-06-03-02-05-comm-central/ ?

I'll try that next and will attempt to get a thread dump as per Comment 198 advise see if that can help somehow... if issue occurs...

(In reply to Christopher Schultz from comment #198)

For anyone who can't get a thread dump, try this: while the color-wheel is spinning and before you Force-Quit, run the "Activity Monitor" application, choose "Thunderbird" (or "Daily") in the list of processes, click the gear-icon in the upper-left-hand corner of the window and choose "Sample Process".

Worth mentioning this advise is only applicable on Mac OS X system and not Windows ;-)

I first thought wrongly you were referring to TB Activity Manager ;-)

For Windows, the closer I could find is to open Task Manager, select Thunderbird process, right click, create dump file (.DMP)... would that be of any use?

(In reply to Wayne Mery (:wsmwk) from comment #195)

(In reply to Richard Leger from comment #190)
Suggest we use Richard's idea of running multiple days with one build, and do it wit multiple people with each person taking one or more builds in a coordinated manner:
Richard http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-06-03-02-05-comm-central/ ?
test#2 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-05-03-02-06-comm-central/
test#3 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-04-03-02-08-comm-central/
test#4 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-03-03-02-05-comm-central/
test#4 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-02-03-02-06-comm-central/
test#5 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-01-03-02-08-comm-central/

When testing as per above suggestion, one version at a time for three days, would it be worth then to activate IMAP logging at startup as per
https://wiki.mozilla.org/MailNews:Logging to maximum level of verbosity?

Would that be any useful to dev team to identify the issue?

The only imperative is to find the one day regression range where MACOS build N works and build N+1 fails. Nothing else matters.

Thanks for helping. I do hope the others jump in or this could take another two years :)

(In reply to Richard Leger from comment #203)

(In reply to Christopher Schultz from comment #198)

For anyone who can't get a thread dump, try this: while the color-wheel is spinning and before you Force-Quit, run the "Activity Monitor" application, choose "Thunderbird" (or "Daily") in the list of processes, click the gear-icon in the upper-left-hand corner of the window and choose "Sample Process".

Worth mentioning this advise is only applicable on Mac OS X system and not Windows ;-)

I was under the impression that this whole issue was 100% MacOS. I don't think the Windows build of tb is using OpenGL, etc. Are Windows folks having deadlocks in OpenGL like the bug-title suggests? Or are Windows folks having otherwise unexplained lock-ups and just guessing that it's the same issue. GUI deadlocks across OSs almost never have the same root cause because the OSs are usually so different.

This is 100% Mac only

(In reply to Wayne Mery (:wsmwk) from comment #205)

The only imperative is to find the one day regression range where MACOS build N works and build N+1 fails. Nothing else matters.

Thanks for helping. I do hope the others jump in or this could take another two years :)

People may want to use the binary search.

Step 0:
Set Start Date 0. (version from this date is known to work.)
Set End Date (version from this date is known to be broken).

Step 1:
Duration = EndDate - SartDate (in days).

If Duration is 1, we are done (!). The software got broken by a patch set on the start date.

Choose a test date based on Start Date + Duration / 2: We need to take care of the odd number, but
choose either below or above.

Step 2: Check the version on test date.

If the version is OK, then set Start Date to this test date.
If the version is NOT OK, then set End Date to this test date.

Go to Step 1:

This will take O(logN) as opposed to O(N) days as Wayne pointed out.

I think this is the strategy the mozilla utility for bisection uses.
(Yes "bi-" section.)

Also, I am sorry that I don't own or use a Mac and so can't offer any insight on this bug.

I do not get the issue so often. Maybe 1 time per week or less.
With Thunderbird 68.2.0 I just got the problem and used Activity Monitor as documented in https://bugzilla.mozilla.org/show_bug.cgi?id=1381485#c198

Sampling process 424 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling thunderbird (pid 424) every 1 millisecond
Process:         thunderbird [424]
Path:            /Applications/Thunderbird.app/Contents/MacOS/thunderbird
Load Address:    0x10df81000
Identifier:      org.mozilla.thunderbird
Version:         68.2.0 (68.2.0)
Code Type:       X86-64
Parent Process:  ??? [1]

Date/Time:       2019-10-26 17:32:40.722 +0200
Launch Time:     2019-10-26 16:11:02.254 +0200
OS Version:      Mac OS X 10.14.6 (18G103)
Report Version:  7
Analysis Tool:   /usr/bin/sample

Physical footprint:         340.2M
Physical footprint (peak):  351.6M
----

Call graph:
    2267 Thread_3211   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2267 ???  (in XUL)  load address 0x10e504000 + 0x2a1d5d0  [0x110f215d0]
    +   2267 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]  (in AppKit) + 1361  [0x7fff4304b46b]
    +     2267 _DPSNextEvent  (in AppKit) + 1135  [0x7fff4304c77d]
    +       2267 _BlockUntilNextEventMatchingListInModeWithFilter  (in HIToolbox) + 64  [0x7fff44cb3c76]
    +         2267 ReceiveNextEventCommon  (in HIToolbox) + 603  [0x7fff44cb3ee5]
    +           2267 RunCurrentEventLoopInMode  (in HIToolbox) + 292  [0x7fff44cb41ab]
    +             2267 CFRunLoopRunSpecific  (in CoreFoundation) + 455  [0x7fff45a5561e]
    +               2267 __CFRunLoopRun  (in CoreFoundation) + 1189  [0x7fff45a55d15]
    +                 2267 __CFRunLoopDoSources0  (in CoreFoundation) + 283  [0x7fff45a567a3]
    +                   2267 __CFRunLoopDoSource0  (in CoreFoundation) + 108  [0x7fff45a72d89]
    +                     2267 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__  (in CoreFoundation) + 17  [0x7fff45a72de3]
    +                       2267 ???  (in XUL)  load address 0x10e504000 + 0x29e0a01  [0x110ee4a01]
    +                         2267 -[NSView removeFromSuperview]  (in AppKit) + 164  [0x7fff430c5ee5]
    +                           2267 -[NSView _setWindow:]  (in AppKit) + 2621  [0x7fff430c533a]
    +                             2267 __21-[NSView _setWindow:]_block_invoke_2  (in AppKit) + 136  [0x7fff430dbd69]
    +                               2267 -[__NSArrayM enumerateObjectsWithOptions:usingBlock:]  (in CoreFoundation) + 219  [0x7fff45aa476b]
    +                                 2267 -[NSView _setWindow:]  (in AppKit) + 2309  [0x7fff430c5202]
    +                                   2267 -[NSSurface setWindow:]  (in AppKit) + 50  [0x7fff4337eb78]
    +                                     2267 -[NSSurface _disposeSurface]  (in AppKit) + 132  [0x7fff4337eefb]
    +                                       2267 -[NSNotificationCenter postNotificationName:object:userInfo:]  (in Foundation) + 66  [0x7fff47cafaab]
    +                                         2267 _CFXNotificationPost  (in CoreFoundation) + 732  [0x7fff45a293c7]
    +                                           2267 -[_CFXNotificationRegistrar find:object:observer:enumerator:]  (in CoreFoundation) + 1642  [0x7fff45a2a014]
    +                                             2267 ___CFXNotificationPost_block_invoke  (in CoreFoundation) + 87  [0x7fff45ac1688]
    +                                               2267 _CFXRegistrationPost  (in CoreFoundation) + 404  [0x7fff45ab91da]
    +                                                 2267 ___CFXRegistrationPost_block_invoke  (in CoreFoundation) + 63  [0x7fff45ab9270]
    +                                                   2267 __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__  (in CoreFoundation) + 12  [0x7fff45ab92f6]
    +                                                     2267 CGLClearDrawable  (in OpenGL) + 44  [0x7fff4f6b2f61]
    +                                                       2267 _pthread_mutex_firstfit_lock_slow  (in libsystem_pthread.dylib) + 222  [0x7fff71bc34cd]
    +                                                         2267 _pthread_mutex_firstfit_lock_wait  (in libsystem_pthread.dylib) + 96  [0x7fff71bc5d52]
    +                                                           2267 __psynch_mutexwait  (in libsystem_kernel.dylib) + 10  [0x7fff71b08f06]

I don't know what ??? (in XUL) load address 0x10e504000 + 0x29e0a01 [0x110ee4a01] is doing. It looks like to be the latest code from Thunderbird that is involved in the deadlock.

Please see my comment #194. I believe I got the "known good" build to fail. Do I misunderstand something? Perhaps we need to rewind a bit for our "known good build"?

(In reply to Christopher Schultz from comment #210)

Please see my comment #194. I believe I got the "known good" build 2017-06-07 to fail.

Thanks for reemphasizing this

User Story: (updated)
Summary: Hangs sending imap mail while copying message to imap Sent folder on Mac while displaying the progress bar. Deadlock in graphics on CGLClearDrawable. → Hangs sending mail while copying message to Sent folder on Mac-only while displaying the progress bar. Deadlock in graphics on CGLClearDrawable.

People may want to use the binary search.

That would be great. Unfortunately I don't think anyone reports this behavior being extremely deterministic. Which means false positives happen unless a build is used for several days, and one individual doing a binary search might take a several weeks to complete a binary search.

Does anyone reliably reproduce this within a few hours? If not, then we need these reporters to divide and conquer, working back now from 2017-06-06.

We don't need traces. We don't need logs. We only need to know the dates of the daily builds that fail, and there is a list in comment 195 (which I have just corrected).

Richard took test#1 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-06-03-02-05-comm-central/
We need others to pick test#2 - test#6.
Or maybe take an older one - http://archive.mozilla.org/pub/thunderbird/nightly/2017/05/2017-05-25-03-02-23-comm-central/

(In reply to Christopher Schultz from comment #206)

(In reply to Richard Leger from comment #203)

(In reply to Christopher Schultz from comment #198)

For anyone who can't get a thread dump, try this: while the color-wheel is spinning and before you Force-Quit, run the "Activity Monitor" application, choose "Thunderbird" (or "Daily") in the list of processes, click the gear-icon in the upper-left-hand corner of the window and choose "Sample Process".

Worth mentioning this advise is only applicable on Mac OS X system and not Windows ;-)

I was under the impression that this whole issue was 100% MacOS. I don't think the Windows build of tb is using OpenGL, etc. Are Windows folks having deadlocks in OpenGL like the bug-title suggests? Or are Windows folks having otherwise unexplained lock-ups and just guessing that it's the same issue. GUI deadlocks across OSs almost never have the same root cause because the OSs are usually so different.

FYI, the deadlock in graphics on CGLClearDrawable may be 100% MacOS only (I am not in position to tell really), but the Hangs sending mail while copying message to Sent folder while displaying the progress bar is not a MacOS only issue... it has been experienced in various TB versions on Windows... that is why I help testing on Windows... but if you think that is not necessary, let me know and I'll stop testing and reporting...

(In reply to Wayne Mery (:wsmwk) from comment #211)

(In reply to Christopher Schultz from comment #210)

Please see my comment #194. I believe I got the "known good" build 2017-06-07 to fail.

Thanks for reemphasizing this

Wayne. It's worth noting that I've been running that build for several days, now, and I was only able to get it to fail a single time. It's been running without quitting the whole time and I've been using my email as usual. With the up-to-date builds, I was getting hangs maybe 10 times per day. So perhaps this is a race-condition or something like that where the old builds were susceptible, but the later builds are just MORE susceptible due to some combination of factors. Not really helpful, I know. :(

When mine hangs with the (likely) MacOS-specific CGLClearDrawable call in the thread dump, the message is 100% sent and the sent-message is 100% saved to my IMAP server, and both windows (the composition window and the "sending message" window) both close. It appears to only be a lock-up of the main window after all the other windows have closed.

(In reply to Christopher Schultz from comment #210)

Please see my comment #194. I believe I got the "known good" build to fail. Do I misunderstand something? Perhaps we need to rewind a bit for our "known good build"?

I have been able to get tb 55.0a1 daily (date: 2017-06-07) to lock-up a second time after sending a message. It took days to do it, but it finally happened. It is indeed the old familiar deadlock involving a call through CGLClearDrawable.

The two threads in deadlock are "Compositor" (running MessageLoop::Run()) and the unnamed thread making the call to CGLClearDrawable, which looks like the main event-dispatch thread for the application.

I guess I have to back-up to a previous build to try again. Shall I just grab the previous day? Or was this search based upon some educated guesses as to where the flaw may have been introduced?

Chris, thanks for volunteering to test another build.

Coordination is clearly difficult, so I've put more details and suggested date assignments in the user story - which everyone should consider to be the "diary" for this bug.

User Story: (updated)
Flags: needinfo?(sjames)
User Story: (updated)

(Commenting on User Story)

User configs (name, computer, OS version, graphics, monitor(s)) :

  • Christopher Schultz ?

MacBook Pro (15-inch, 2018), 10.14.6 Mojave, Intel UHD Graphics 630 w/1536MiB+Radeon Pro 560X w/4GiB; built-in display (15.4-inch 2880x1800)

(In reply to Wayne Mery (:wsmwk) from comment #217)

I thought I commented that the beta build crashed for me? I have found what I think is the 2017-06-03 you wanted me to test, will try it out.

Thanks.

Flags: needinfo?(sjames)

(In reply to Scott from comment #219)
CGLClearDrawable Mutex crash on
"thunderbird-55.0a1.en-US.mac.dmg 54M 03-Jun-2017 11:08"
after about 2 hours.

Ill try the next one.

(In reply to Wayne Mery (:wsmwk) from comment #217)

I got crashes in June 3rd and June 2nd within an hour. I jumped to May 13st and its so far lasted longer than the others. I will give it some more time then try June 1st.

Using a blank profile w/ no sent folder saving.

(In reply to Scott from comment #221)

(In reply to Wayne Mery (:wsmwk) from comment #217)

I got crashes in June 3rd and June 2nd within an hour. I jumped to May 13st and its so far lasted longer than the others. I will give it some more time then try June 1st.

Using a blank profile w/ no sent folder saving.

OK, I switched back to June 1st this morning and got it to crash. So to recap:

June 3rd, 2nd and 1'st daily builds all crash in an hour or less (lets say less than half a dozen emails sent in each). I have been running May 31st for about a day and half without crashes so I will go back and continue to test that.

Anyone else want to try and confirm these finding by also testing the May 31st and June 1st daily builds?

Thanks for testing. Yes, we need to get the regression range down to one day. So confirming "31st May good, 1st June bad" would be very helpful.

(In reply to Jorg K (GMT+2) from comment #223)

Thanks for testing. Yes, we need to get the regression range down to one day. So confirming "31st May good, 1st June bad" would be very helpful.

Got a crash in the May 31st Build, I'm going to jump back a full week.

(In reply to Scott from comment #219)

(In reply to Wayne Mery (:wsmwk) from comment #217)

I thought I commented that the beta build crashed for me? I have found what I think is the 2017-06-03 you wanted me to test, will try it out.

To reclarify for others, we're at the stage where only testing of NIGHTLY builds is helpful. Thanks for actively working on this

User Story: (updated)

I've gotten 2017-06-07 "daily" to crash a bunch of times, now. I know Scott has been getting this to happen more quickly than I have -- just reiterating that June 7th is definitely bad.

I'd like to try May 30th, but this directory only appears to contain log files and no actual builds:
http://archive.mozilla.org/pub/thunderbird/nightly/2017/05/2017-05-30-03-02-06-comm-central/
So I've backed-up to this build: http://archive.mozilla.org/pub/thunderbird/nightly/2017/05/2017-05-29-03-02-06-comm-central/thunderbird-55.0a1.en-US.mac.dmg

(In reply to Christopher Schultz from comment #226)

You can skip it. May 24th Crashes too.

Ill go back another week.

Blocks: 1599553

How are results from version older than May 17 or 24?

No longer blocks: 1599553
User Story: (updated)
Flags: needinfo?(sjames)
Flags: needinfo?(chris)
Blocks: 1599553

(In reply to Wayne Mery (:wsmwk) from comment #230)

How are results from version older than May 17 or 24?

I was away on vacation for 10 days. I get failures all the way back to May 10th. I am currently testing the AM build (there was an AM and PM) of May 3rd.

Flags: needinfo?(sjames)

(In reply to Scott from comment #227)

How are results from version older than May 17 or 24?

Just another confirmation that 2017-05-17 is locking up. It took several days (weeks?) to start, but today it's been locking up a lot.

I'll go back to 2017-05-01.

Flags: needinfo?(chris)

(In reply to Christopher Schultz from comment #232)

I'll go back to 2017-05-01.

I get crashes in 2017-04-01, I am currently testing March 1st.

User Story: (updated)

(In reply to Christopher Schultz from comment #232)

I'll go back to 2017-05-01.

2017-05-01 is unusable for me: every time I launch it, it re-downloads all email from all folders for all time from Gmail. I think I need to replace the CPU can in my computer, now.

(In reply to Richard Leger from comment #213)

FYI, the deadlock in graphics on CGLClearDrawable may be 100% MacOS only (I am not in position to tell really), but the Hangs sending mail while copying message to Sent folder while displaying the progress bar is not a MacOS only issue... it has been experienced in various TB versions on Windows... that is why I help testing on Windows... but if you think that is not necessary, let me know and I'll stop testing and reporting...

If it's easily reproduced, we still need to find the regression range, so please do keep on testing

(In reply to Scott from comment #233)

(In reply to Christopher Schultz from comment #232)

I'll go back to 2017-05-01.

I get crashes in 2017-04-01, I am currently testing March 1st.

Have you confirmed it is the same crash? How goes it with March 1?

(In reply to Christopher Schultz from comment #234)

(In reply to Christopher Schultz from comment #232)

I'll go back to 2017-05-01.

2017-05-01 is unusable for me: every time I launch it, it re-downloads all email from all folders for all time from Gmail. I think I need to replace the CPU can in my computer, now.

Depending on Scott's results about March 1, can you coordinate the next calendar dates to test

Flags: needinfo?(sjames)
Flags: needinfo?(chris)

MIGHT be on to something with March 1st. Haven't had a crash yet and I have been running it for close to 2 weeks. I think Ill go to weekly builds between April 1st and March 1st and see what I can find this week. I will be away the following 2 weeks.

All of my previous crashes have been CGLClearDrawable ones.

Flags: needinfo?(sjames)

Scott, which date are you going with first, so Christopher can pick a different date?

User Story: (updated)
Flags: needinfo?(sjames)

(In reply to Wayne Mery (:wsmwk) from comment #238)

Scott, which date are you going with first, so Christopher can pick a different date?

I'm starting with the 22nd... give me a day, I usually get crashes within an hour or two. So I should be able to narrow it down to a week of builds fairly quickly then we can break them up.

Flags: needinfo?(sjames)
See Also: → 413240

(Scott is making great progress. Hopefully we have a 1-2 day range by Wednesday or Thurdsay.)

Good results Scott?

Flags: needinfo?(sjames)

Back at it hopefully today. I seem to have lost track of whether I was testing the 15th or March 8th build last. Hopefully Ill have it to a week soon.

Flags: needinfo?(sjames)

I got 2017-05-01 to lock-up, finally.

It's always fun re-downloading your whole email history from Gmail. I will eventually get banned. :(

I'm going to re-try with http://archive.mozilla.org/pub/thunderbird/nightly/2017/04/2017-04-08-00-40-03-comm-aurora/thunderbird-54.0a2.en-US.mac.dmg

2017-04-08 locked-up this morning in (well, beneath) CGLClearDrawable.

I'm backing up to http://archive.mozilla.org/pub/thunderbird/nightly/2017/04/2017-04-04-00-40-03-comm-aurora/thunderbird-54.0a2.en-US.mac.dmg

Hmm. I've looked back at the comments and I've apparently switched from "comm-central" to "comm-aurora". What is the difference, and should I be consistent?

Yes you need to be using nightly consistently. Aurora is what used to be alpha, and that has different code.

But "nightly" has a bunch of options for each day. Which of the e.g. 2017-04-04-* should I be using?

http://archive.mozilla.org/pub/thunderbird/nightly/2017/04/

There are lots of choices:

Dir 2017-04-04-00-40-03-comm-aurora-l10n/
Dir 2017-04-04-00-40-03-comm-aurora/
Dir 2017-04-04-03-02-02-comm-central-l10n/
Dir 2017-04-04-03-02-02-comm-central/
Dir 2017-04-04-03-02-03-comm-esr45/
Dir 2017-04-04-03-02-03-comm-esr52/
Dir 2017-04-04-03-03-28-comm-central/
Dir 2017-04-04-03-03-28-comm-esr45/
Dir 2017-04-04-03-03-28-comm-esr52/

comm-central is where the development happens, so "2017-04-04-03-02-02-comm-central"

(In reply to Wayne Mery (:wsmwk) from comment #241)

Good results Scott?

Ok - I get March 8th (comm-central/ - for clarity) to crash. And March 1st to maybe not crash (I usually get crashes in a couple of hours and I ran it cleanly for almost 2 weeks.)

I'm currently downloading the daily builds for the 2nd, 3rd, 4th, 5th, 6th and 7th.

Interestingly the night of the 7th is when it rolls over from v54 to v55 - this might prove to be significant.

I will start my testing on the 7th and work backwards as its easier/quicker for me to eliminate candidates that prove they work successfully. If anyone else wants to double/triple check the March 1st build and work forwards that would be great.

Unsurprisingly, I got 2017-04-08 to lock-up in a similar way. I'll go back to 2017-03-02 to help bracket Wayne's researches.

Happens twice a day or more including today.. Running 68.4.1 (64-Bit). Really frustrating.

I got 2017-03-02 to lock up this evening, beneath CGLClearDrawable Maybe I should try Communicator? ;)

(In reply to Christopher Schultz from comment #251)

I got 2017-03-02 to lock up this evening, beneath CGLClearDrawable Maybe I should try Communicator? ;)

Interesting. Try March 1st. My 7th is still running after several days. We might need to go back further...!

I'm starting to think that this was introduced by a change in Macos and not a change in Thunderbird.

(In reply to Christopher Schultz from comment #253)

I'm starting to think that this was introduced by a change in Macos and not a change in Thunderbird.

I have a similar sentiment. I recall first encountering crashes in the fall... of I think 2018. But this could also be explained by not being up to date on Thunderbird releases. My office also has Mac's with various different OS - as I am not on site anymore, I cant do a detailed analysis of whom is getting crashes with what version, but I can also see this impacting peoples update schedule.

I'm starting to think that this was introduced by a change in Macos and not a change in Thunderbird.

An interesting idea. It could even be hardware related. But if it is our code, that would be consistent with this being first reported with 54 beta and not seeing majority of reports until version 60 when newer version 5<something> code hit the larger user population.

To exhaust the code regression idea we'd need to test version 53 and 54.
http://archive.mozilla.org/pub/thunderbird/nightly/2017/01/2017-01-24-03-02-12-comm-central/ is roughly the earliest 54 nightly.
http://archive.mozilla.org/pub/thunderbird/nightly/2016/11/2016-11-15-03-02-11-comm-central/ is roughly earliest 53 nightly.

User Story: (updated)
Flags: needinfo?(sjames)
Flags: needinfo?(chris)

(In reply to Wayne Mery (:wsmwk) from comment #255)

I'm starting to think that this was introduced by a change in Macos and not a change in Thunderbird.

An interesting idea. It could even be hardware related. But if it is our code, that would be consistent with this being first reported with 54 beta and not seeing majority of reports until version 60 when newer version 5<something> code hit the larger user population.

To exhaust the code regression idea we'd need to test version 53 and 54.
http://archive.mozilla.org/pub/thunderbird/nightly/2017/01/2017-01-24-03-02-12-comm-central/ is roughly the earliest 54 nightly.
http://archive.mozilla.org/pub/thunderbird/nightly/2016/11/2016-11-15-03-02-11-comm-central/ is roughly earliest 53 nightly.

No big updates from me, I am still running March 7 stable. But Christopher has reported a crash on March 2nd.

Flags: needinfo?(sjames)
Attached file tb_v68_4_1_crash.txt

Another hang-while-sending. Eventually manually force-quit. On restart, confirmed that the email had been sent correctly (as usual). This has happened several times over this last week now, fyi, each with same pattern as before. Write email and click send without any problems. The "Sending" Dialog box finishes sending the email and disappears. Then Thunderbird immediately locks, with beach-ball, until I eventually give up waiting and force-quit.

TB: 68.4.1
MacOSX: 10.14.6

:wsmwk,

Hey there. I note that in all of my hangs, the "sending email" dialog box with progress bar successfully completes sending, and successfully disappears from screen. The hang for me happens immediately after that dialog box clears. However, I just noticed that the summary for this ticket describes having the dialog box still displayed when hanging!?!

Are these the same issue or two different issues?

Flags: needinfo?(vseerror)

Are these the same issue or two different issues?

Interesting observation. I have no technical expertise here. But I think your comment does further suggest this is a graphics issue and not a thunderbird issue.

Flags: needinfo?(vseerror)

This bug is 3 years old. I filed essentially the same bug, but I don't see it in my Dashboard so it may have been closed as a duplicate or something. At the time I provided stack dumps that matched these.

In my experience (which is considerable -- I led the engineering teams for iMovie and iPhoto at Apple), this is a "race condition" bug, not a graphics bug or an OS bug or whatever else has been proposed. It is deadlocking in a mutex, probably around the progress bar, not the mail delivery, which always succeeds.

I don't think you're going to find/fix this bug by regression analysis, as though the bug were somehow introduced at some point. I think it has been there for a long time, and is a design bug. Race conditions are like that.

The only way to fix it, in my opinion, is to look carefully at the code, specifically where the mutexes are established and released. It takes some hard thinking and careful looking, but mutexes are inherently hard to debug.

Suggestions:

  1. Remove the mutexes altogether. Are they really necessary? I've seen very few UI/graphics/progress bar interactions that require mutexes. Presumably they are used because of multiple threads, but if (as should be) only one thread -- the main thread, typically -- is handling UI updates, then a mutex shouldn't be necessary.
  2. Deliberately introduce a delay into one or the other of the threads that is locking the mutex, to see if it can be consistently reproduced.
  3. Add trace/log statements around the lock/unlock of the mutexes so you can see any close timing / race conditions (important to flush stdout as blocked threads that write to log files don't always show up in the log files due to output buffering).

In your bug 1400568 you wrote "This is recent bug, as of 52.3.0, never happened before". So in that bug (which is duped to this one), and here, we have proceeded on the assumption it's a regression - regardless of whether it's a race condition or not.

Looking back, there is also bug 1422251 and bug 1440716. So you are not the only person to have reported this issue against version 52. Still, it was only 3-4 people to report the issue for all or most of version 52. So change(s) in newer versions make the situation worse. But certainly it's possible part of the underlying issue predates any of these bug reports, and the regression hunt is a waste.

It still may have been introduced in 52, but if that is the theory, then there's little point in testing 53, 54, etc.

Not sure what source code control system you guys use, but GitHub has pretty nice "diff" tools and inspecting code changes in/around the area in question in the timeframe of 52.3.0 might shed some clues.

(In reply to John O'Duinn [:joduinn] (please use "needinfo?" flag) from comment #258)

:wsmwk,

Hey there. I note that in all of my hangs, the "sending email" dialog box with progress bar successfully completes sending, and successfully disappears from screen. The hang for me happens immediately after that dialog box clears. However, I just noticed that the summary for this ticket describes having the dialog box still displayed when hanging!?!

I have never had a stray window hanging around when the lock-up occurs. The mail-send operation appears to have 100% completed. The lock-up occurs when trying to work with the main window after the completion of the send/copy-to-sent/etc. operation and all temporary windows have closed (for me).

Hello all (and John):

I'll add my two cents worth here. I also do not see any window hanging around after I send a message. It clears and then if I'm foolish enough to try ANYTHING else in TB before a period of time (which I've not been able to determine), operation continues normally. If, on the other hand, I attempt to go right back into TB and start another message, I'm entertained by the beach-ball until I get bored and perform a force quit.

Sincerely,
Bob

(In reply to Glenn Reid from comment #262)

It still may have been introduced in 52, but if that is the theory, then there's little point in testing 53, 54, etc.

We've been working back in time, not forward in time.

Not sure what source code control system you guys use

hg

but GitHub has pretty nice "diff" tools and inspecting code changes in/around the area in question in the timeframe of 52.3.0 might shed some clues.

Good idea. Which two versions should we run a diff against?

(Hint: that's what we are trying to determine, so we can actually look at some targeted code changes instead of just "hey, what changed during the decade around release 52?")

(In reply to Christopher Schultz from comment #265)

(In reply to Glenn Reid from comment #262)

It still may have been introduced in 52, but if that is the theory, then there's little point in testing 53, 54, etc.

We've been working back in time, not forward in time.

Looking back through this bug's history, I see regression testing comments for 53, 54, and 55.

but GitHub has pretty nice "diff" tools and inspecting code changes in/around the area in question in the timeframe of 52.3.0 might shed some clues.

Good idea. Which two versions should we run a diff against?

(Hint: that's what we are trying to determine, so we can actually look at some targeted code changes instead of just "hey, what changed during the decade around release 52?")

Based on the stack trace(s) the section of code should be narrow. Can't be that many changes to that code. Pick a version before it (52.2) and after it (52.3).

I know you feel like I am intruding and telling you how to find the bug. I am also noting that in three years you have not found the bug, so I'm just trying to offer suggestions. I have found bugs like this in my career. It is not easy. My previous suggestions are probably more useful than diff'ing, based on your comments.

(In reply to Glenn Reid from comment #266)

I know you feel like I am intruding and telling you how to find the bug. I am also noting that in three years you have not found the bug, so I'm just trying to offer suggestions. I have found bugs like this in my career. It is not easy. My previous suggestions are probably more useful than diff'ing, based on your comments.

For what it's worth, I'm not a tb developer, just a user. So I've been responding to requests from the tb developers to get them more information; specifically, trying to narrow-down a before/after time where the bug can and cannot be reproduced.

My (wild) assertion about an underlying change in Macos "causing" this issue wasn't suggesting that Macos actually has the bug. It's much more likely that some change in Macos has simply made this existing bug in tb more obvious and to occur more frequently.

I agree with you that it's very likely to be an improper lock-management situation within tb.

Blocks: 1608733

March 2nd and 1st both crashed for me this morning within minutes. I will jump back to the first version of 53 and move back or forward from there.

(In reply to Wayne Mery (:wsmwk) from comment #261)

(In reply to Scott from comment #269)

Well that was quick... v53 and v52 from November 14-15th 2016 both give EXC_BAD_ACCESS crashes when opening any email. I would guess OS incompatibility.

Where should we go from here?

(In reply to Scott from comment #270)

(In reply to Wayne Mery (:wsmwk) from comment #261)

(In reply to Scott from comment #269)

Well that was quick... v53 and v52 from November 14-15th 2016 both give EXC_BAD_ACCESS crashes when opening any email. I would guess OS incompatibility.

Where should we go from here?

I think I have reached a dead end.

I get Jan 25th 2017 daily to crash on mutex.

Every build I have tried between December 2nd 2016 and Jan 21st 2017 either crashes when I open the app or crashes when I select any mail message to be read.

I have also tried builds from Nov 14th and 15th 2016 - the switch from v52 to v53... and both crash when opening an email.

Summary: Every build that I can get to run on my iMac crashes on Mutex.

Greetings;

I have been having this same bug for a couple of years, now. I would very much like it fixed.

I know, you're doing your best. Keep up the great work!

I'd like to suggest that maybe there is a race condition with the internal thunderbird search indexing service. Careful, this might be a red herring.

I tend to think the suggestion that something changed in OSX is a good thought.

Anyway, just wanted to voice my concern that, as a Mac user, this bug is very annoying.

I will try to poke around and provide additional testing data now that I know there is a fresh thread/bug devoted to this pesky bugger.

TB: 68.4.2 (64-bit)
Mac: 10.11.6 (15G22010), ATI Radeon HD 2600 Pro 256 MB

Thanks!

John

Potentially relevant:
https://forum.juce.com/t/opengl-deadlocks-mac/8933/3

This forum thread suggests that a window component may be causing trouble if its enclosing window is destroyed without first removing that component (and, maybe, disposing of it). I have never had tb lock-up on me just navigating around: getting it to lock-up always requires me to have just recently sent an email message. All the composition/sending/etc. windows have all closed and the main window is focused usually for a short time (.5 - 5 sec) before the color-wheel appears. So perhaps this is the source of the deadlock (?). It's clear that something is causing it to occur more frequently than in the past, but it doesn't appear that tb code is directly responsible for the increased frequency.

The stuck thread has this in its backtrace:

  • [ChildView delayedTearDown] [...]

Perhaps tearing-down a child view (window?) is tripping-over some resource that hasn't yet been cleaned-up. The "Compositor" thread clearly holds this lock and isn't giving it up.

Flags: needinfo?(chris)

This has been going on for years for me as well and seems to have gotten worse recently with latest Thunderbird 68.5.0 and MacOS 10.15.3 (19D76).
It seems to happen only after interacting with the main mail panel right after pressing send on a message.
I have tons of MacOS dumps I can share if it helps.
I love Thunderbird for his cross-platform but this really impairs my work on my laptop and would hate to have to move to another mail client.
Can this be assigned to an Engineer to investigate?

Not assigned a severity, yet? This is a critical issue. Can we please have someone look at this? :D

I've been hoping that every new version of Thunderbird might include a fix for this issue. I'm currently using 68.5.0 on a system running macOS 10.13.6. My first few sessions worked fine, but now I'm in a cycle where every single message send beachballs Thunderbird and requires a Force Quit. Very frustrating. I know former Thunderbird users who are no more because of this issue and on days like today I can fully understand why they threw in the towel.

Hello All: I've been watching this bug report passively since I first signed on to Bugzilla in September of 2019. I am not a hobby user of TB, and I consider it my primary means of communication in my support role for a small mainframe software company.

Outlook is, in my opinion, the most often hacked platform, so I've been able to avoid using it by running Eudora until I went completely Mac, and installed Thunderbird. I evaluated both TB and Outlook, and came down on the side of Thunderbird. Great product. I even contributed a time or two, which I hope helped.

All was well with this setup until in February of 2019, when I upgraded macOS to 10.4.3. TB began to crash so I upgraded it to 45.8.0. I still experienced problems so later in that month I upgraded to TB 60.5.0. At the time I found the huge leap in versions remarkable, but there it is.

My OS is now at 10.14.6, and TB has received no further upgrades. Frankly, I'm afraid to upgrade either the OS or TB now. I experience this problem many times per day. I'm as annoyed as anyone else, but changing to Outlook would be a huge, and one-way migration.

I'm not an engineer, so all I can do is be a good reporter and let everyone know what I see. The smartest comment I've seen on this that the bug may be due to a "race condition" (which I infer to mean something that manifests under system load). That person also suggested that it was a hidden bug that was revealed by the upgrade to macOS. That fits with my experience.

I agree with John Dale above. This sort of bug would drive away most people. I work in a software house myself, and I have a great deal of understanding about what's involved here. That TB is free-ware and that it is community supported also a factor, and I am grateful to those smarter individuals who make it "go". That said, I really wish that this were fixed, and in the near term. Thanks for your time spent reading this.

Sincerely,
Bob

This week I was doing some video editing. When my editing software was rendering and converting for output (CPU was pegged), the bug happened three times in a row.

Okay, so no progress on this bug in 3 years. Time to do something different, rather than hoping for different results.

This is not a MacOS bug. MacOS is not in control of any of the threads in the app, and they are simply deadlocking on sempahores. It's a classic race condition bug, as I've stated many times, and which my own bug report (closed as a duplicate of this) detailed.

People have been "looking for" this bug, apparently, but not, apparently, making code changes. You don't fix race conditions by looking for reproducible scenarios: by definition, race conditions can't be reliably reproduced. You fix bugs like this by changing the source code and trying to eliminate the race condition.

My recommendation would be to REMOVE the semaphore locks completely. Are they really needed? There are few situations which genuinely require locks -- even shared memory overwrites are often benign, whereas a deadlock is effectively a crash bug. So would you intentionally put code into an app that is proven to CRASH/DEADLOCK it on a regular basis, to try to prevent a theoretical situation for which you think perhaps a semaphore is the solution? Of course not.

I know for a fact that no human being is smart enough to truly imagine all the conditions that arise between parallel threads, correctly anticipating all the situations, and apply locks perfectly where needed. I've seen a lot of code that had no semaphores that probably should have, that never had an actual bugs in them. And I've seen code that has locks to prevent hypothetical scenarios that lock up at random times due to race conditions.

So some developer at Mozilla, please, remove the offending semaphors. Comment them out. Release the app. And we will all watch to see if it's better or worse. I predict that the problem will simply go away and will not be replaced by whatever scenario the locks were originally imagined to prevent.

If I'm wrong, and something bad happens, well, then go fix whatever problems are observed.

I totally disagree to simply remove resource locks.

I am the poster of bug 1608733, which was marked as a duplicate of this one. I advocated leaving 1608733 active, but was told that the bug is easy to reproduce already without additional information.

If this bug is so easy to reproduce, I wonder why nobody has figured out the root cause yet. I was a senior software engineer for over 20 years. I solved some of the most intractable bugs in data communications system software – I know how tough bugs can be. I also have over 15 years of experience in web development.

I've been using TB for around 15 years. The only time I've ever seen TB hang is as described in 1608733, which is during sending an email, and then attempting to close the message I was replying to. I use a satellite internet connection, so sending takes some time.

Sending a message and closing another message are completely separate actions, so there may be some resource that the closing of the message is trying to grab, but can't because that resource is locked up – that's one explanation. Another explanation is that there is erroneous coding – i.e. someone wrote some code that simply doesn't do what it was intended to do, and hence the closing of the message causes the software to crash (yet the culprit code may have nothing to do with closing the message). Something of this nature can manifest as a sort of cascade of things going wrong, resulting in the SWOD. Sometimes code may just wind up in a endless loop, not related at all to a locked up resource.

If I was working on this bug, I would be hammering on reproducing it, and then strategically inserting debug output to check for the values of variables, to see if they make sense. This then ultimately can lead to discovery. This is the preferred method in a lot of cases, because running under a debugger can effect the nature of the bug too much, and possibly could prevent its appearance in the testing. If during the debugging a variable is found to have a crazy value, then the question becomes, what code put that value in the variable?

If indeed this is simply a resource lock issue, then the point in the code can be found where a thread gets stuck waiting for the resource. Then the question becomes, why is that resource locked up? Then you go looking for all the other places in the code that grab that resource, and then you can find the culprit that never releases the resource. Etc. Another cause for a thread getting stuck could be erroneous coding of the lock mechanism itself – the code that implements the mutex for grabbing and releasing the lock.

I have found a total of 5 bugs in TB version 68, and have been using the same OS X for years, 10.11.6. I have only reported 4 other bugs since 2012, and they were all minor.

That's my 2 cents for today, and I appreciate all who are trying to solve this one. I love TB ... it has been very solid. I abandoned Mac Mail early on because it was unstable.

I also have a lot of experience ... 30+ years of building and shipping major software, including iMovie and iPhoto at Apple, and was Director of Applications Software at Apple a few years back.

It has been clearly established by countless stack traces that this bug is a deadlock on semaphores. It's even in the title of the bug. It is a race condition on the locks. Removing them will at least allow the race to continue. Keeping them will accomplish what, exactly? No one knows, because know one knows why they were put there in the first place.

You're right that it's a design/coding error. But it doesn't crash, it locks on the semaphores. One thread against another. I very much doubt that the semaphore is necessary, which is why I advocate removing it. It is well-established to be causing this hang. It is not well-established that it has any other value.

Your paragraph that starts with "If indeed this is simply a resource lock issue..." is right. But after three years, it seems that no one has actually gone looking for this bug. Someone should.

Instead of bragging about your experience, you should just download the source, build Thunderbird and fix the bug. See:
https://developer.thunderbird.net/the-basics/building-thunderbird

Good point. Or maybe one of the people who work at Mozilla should at least try to fix the bug, first. As far as I can see from following this bug for three years, no one has tried to fix it. Should it be me? Or maybe you?

(In reply to webmaster2 from comment #280)

If this bug is so easy to reproduce, I wonder why nobody has figured out the root cause yet.

I've been using TB for around 15 years. The only time I've ever seen TB hang is as described in 1608733, which is during sending an email, and then attempting to close the message I was replying to. I use a satellite internet connection, so sending takes some time.

I can reproduce this - typically, daily - but the only action it requires from me is to not "wait" in Thunderbird. Sometimes it will happen every email message, sometimes it wont. On the very rare occasion I've had it run for several days, even upwards of a week before getting a crash.

The typical procedure is that I click "send" on an email, and I switch to a different window before the sending progress bar completes - usually another app, firefox, production software, sometimes even the main thunderbird 'inbox' window. Switching to a different window usually results in the progress bar being covered up behind whatever I am looking at. I am not manually attempting to close the message that is in the process of being sent, I am just attempting to get on with doing other things. Other users have reported this chain of events as well.

Has your experience been different?

(In reply to Glenn Reid from comment #283)

Good point. Or maybe one of the people who work at Mozilla should at least try to fix the bug, first. As far as I can see from following this bug for three years, no one has tried to fix it. Should it be me? Or maybe you?

Looks like you don't understand the governance structure. What you call "Mozilla" are in fact three legal entities: Mozilla Foundation (MoFo), Mozilla Corporation (MoCo) and MZLA, the administrative home of Thunderbird. Looks these up on Wikipedia. Now, people at MoFo don't cut code, and the ones at MoCo produce Firefox and the so-called Mozilla platform which is the basis for both Firefox and Thunderbird.

MoCo have around 1000 employees and MZLA have about 12. Of those twelve, most do not develop on Mac, and the few who do, can't reproduce the bug. So you answer your question: Should it be me? Or maybe you? Certainly not me, since I'm a Windows person. If you can reproduce the bug and you were "Director of Applications Software at Apple a few years back", you are most certainly the most qualified person for this job. Open source means: If it doesn't work for you or you don't like, you can actually fix it yourself ;-)

Of course I don't understand the governance structure. Nor do I care. It's an app, and it has a bug. And you want me to fix it. Listen to yourself.

I don't know exactly why you're being combative here, nor your relationship to Thunderbird, but I was trying to be helpful, to offer some strategies to fix the bug. You're just being a troll.

On behalf of those of us who are only able to support Netscape's descendants through our advocacy and use at this time, we sure would like someone to take charge of this one. If it's true that the squeeky wheel gets the grease .. well .. squeek squeek. We need some help from someone at the project capable of assigning a resource, or a Maverick who is sick of seeing this thread grow and wants to save the day. Perhaps the reward for solving this could be rewarded with a cash prize, or hired-in to do more work with the Mac team on Thunderbird. I have an opening in late April. If nothing has been discovered by then, I will dig-in. That said, I'm a Java dev with the luxury of robust and solid concurrent data structures .. I do top-to-bottom design/code/test/admin HTML5/J2E/RDB, so my learning curve will likely be extensive on a desktop app like TB. I don't have plans to switch to another client since I'm loyal to a fault, but I'm hopeful that this bug gets resolved and I can rely on a more bulletproof implementation for BSD .. er .. Mac. Sincerely, John

(In reply to Jorg K (GMT+1) from comment #282)
"Instead of bragging about your experience, you should just download the source, build Thunderbird and fix the bug."

Help me out here... who is responsible for the TB code base? How many people right now are in on debugging at this sort of level? How many people actually are authorized to modify the code base? Of these people, how many are actually getting paid by a Mozilla entity?

Look, you can't just go around removing semaphores and hoping for the best. That's a very bad idea. What you need to do is understand the code and understand why the original authors included those semaphores. If that's too much of a tall order, then someone who is very familiar with debugging this code base needs to get in there, reproduce the problem over-and-over, get some debug output going, and very meticulously zero in on what is happening. I have a feeling that the reason this bug is still around is that the effort required to debug it is stopping people.

Reward of a cash prize might be a good idea, if possible.

(In reply to Glenn Reid from comment #286)

Of course I don't understand the governance structure. Nor do I care. It's an app, and it has a bug. And you want me to fix it. Listen to yourself.

Well, I think it's valuable to understand the bigger picture. I'm sure there are difficult bugs in Firefox that 1000 MoCo employees haven't tackled in decades, so why would 12 Thunderbird staff be able to fix this in a timely fashion? Apart from 12 staff, Thunderbird has about 120 people doing all sorts of jobs on a voluntary basis. So you could be the 121st. Please remember that not all Mac users see this bug, so a prerequisite to fixing it, is to actually be able to reproduce it.

I don't know exactly why you're being combative here, nor your relationship to Thunderbird, but I was trying to be helpful, to offer some strategies to fix the bug. You're just being a troll.

I'm a Thunderbird and MailNews peer, I've been a volunteer since 2010, I started since there was something that didn't work for me, so I needed to fix it. I was also the overall maintainer and release manager from 2017-2019, now back to volunteer status.

This bug is dragging on, we're almost at comment 300 with no fix in sight. Neither your nor my comments help in any way since there are only two ways to fixing the bug:

  • Find the regression. Some people have tried, but somehow they all stopped. Once you pinpoint when the code change occurred that caused the problem, you may have a chance to spot the code that changed. Or even then you won't be able to tell.
  • Debug it directly by adding, maybe, print statements, running with a self compiled debug version for a while, etc. I did that once to find and fix a bug that occurred every couple of weeks. It wasn't fun.

I don't quite understand why giving you that information is being "combative" or a "troll", but if you read the etiquette, https://bugzilla.mozilla.org/page.cgi?id=etiquette.html, point 2, "No obligation", you'll see that there is no obligation to fix this bug, so if you want it fixed and have the right skills, it's a really good idea to fix it yourself, like I did years back.

I, for one, while indeed frustrated about this bug still being present and hitting me literally several times per day, am willing to be patient and try to help as best I can. I appreciate the efforts of Mozilla (by whatever name, covering all appropriate parties) to produce Thunderbird. I do not intend to switch to another email client. I will continue to suffer with this bug, and I will continue to try to help figure out what's going on.

The problem is that we have two separate groups with (apparently) zero overlap:

  1. People who can reproduce this bug (reliably!)
  2. People who are capable of debugging and, ultimately, fixing it

So how can we get some overlap, here? I can think of at least two possibilities:

  1. Get together over screen-sharing, reproduce, re-build, reproduce, etc. just like we were physically together

  2. Instrument the code (perhaps under a "find-that-damned-OSX-bug" flag) so the reproducers can produce some useful debugging information that the debuggers/fixers can actually use to locate what's going on

I will run daily/nightly builds. I will run in debug mode. I will upload huge trace logs. I will do it repeatedly, because that's literally all I can do to help. (A seasoned Java programmer is not useful when debugging native/cocoa/XUL/whatever.) But I will not harangue the only people who are able to effect any change, here. It's not helpful in the slightest.

So please let's all just take a breath and see how we can actually work together to solve this problem.

"I will run daily/nightly builds."

I'm down. Just let me know where that executable is .. I would prefer that there be a separate log file created so I don't have to fish through syslog.

Is it difficult getting the build up-and-running on Mac? Is it in git? I would be willing to pull hourly changes, build, and report back.

I will do my best to get this building in my environment .. who knows, maybe I'll be the dude who is able to fix it. :)

Sincerely,

John

(In reply to Christopher Schultz from comment #290)
(In reply to John Dale from comment #291)

Are either of you running an OSX Verison before High Sierra?

I have tested (I think) every version that I can get to run on my machine (Mid 2017 - running High Sierra), and can reproduce the crash. Versions before - roughly - December 2016 to January 2017 - wont run on my setup so I can't test anything older than that.

I run Daily builds on OSX, updated (mostly) daily.

I used to see this problem (or something like it; spinning beachball after sending email) a few times a month. Then some time back (6 months or more), I belatedly realised I'd not seen it for a few months, and I've never seen it since. I think I recorded this in one of the many bugs that was closed as a duplicate.

Have those who can reproduce it tried the Daily builds?

https://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-central/

https://ftp.mozilla.org/pub/thunderbird/nightly/latest-comm-central-l10n/

If it still happens there, are they running the latest OSX? I'm on 10.14.16 and, with Daily builds, haven't seen the problem in many months.

NOTE: if you try a more recent build, then go back to an earlier build, you will have to start it manually from the cmdline/xterm, just once, as:

/Applications/Thunderbird\ Daily.app/Contents/MacOS/thunderbird --allow-downgrade

(or similar path depending on version you're trying to go back to)

(In reply to Calum Mackay from comment #293)

Have those who can reproduce it tried the Daily builds?

Short answer yes, if you read back even 10~ comments you will see discussion about them.

I have personally tested around 30-40+ daily builds, and all of the ones that are compatible with my system crash (these go back to December 2016).

Yup, sorry Scott, I think I did know that; I've been reading this bug for a long time, just forgot and failed to check, sorry.

My other concern was that there was more than one bug here; there seemed a spate of "closing as duplicate", despite there being little evidence to connect some of the bugs.

e.g. in my case, I never saw a stack trace; I always had to force quit, with no debugging info shown. It's hard to see how anyone can be sure that bugs are the same cause, in that situation. So perhaps the issues we saw are different, and mine is fixed, either within or without Thunderbird.

forgot to add: I had the impression that all the reproduction attempts detailed above were directed at finding a window where the problem first appeared, testing builds from a while back.

but of course that would have started with recent builds, which is what I forgot :)

(In reply to Calum Mackay from comment #296)

forgot to add: I had the impression that all the reproduction attempts detailed above were directed at finding a window where the problem first appeared, testing builds from a while back.

but of course that would have started with recent builds, which is what I forgot :)

I did try beta builds first, but it has been a while. I have been pretty aggressively testing old daily builds since November. And have experienced the bug on my system since before 2018. I have about 15 users in my office, a mixture of which experience the issue. It has proven difficult to confirm who is and who isn't at any given time as I no longer work on site.

As of a few minutes ago I have moved back to the regular release channel, since none of the test builds really offered any improvements, and I don't have physical access to a machine that can run builds from December 2016 or earlier.

I'm running 10.11.6 (15G22010)

I just thought of something .. I'm not a fan of xcode .. I'm running an older imac 24" - can I get things working without xcode with relative ease?

(In reply to John Dale from comment #298)

I'm running 10.11.6 (15G22010)

Could you try this build? (it is the last nightly build of v52 as far as I can find)

http://archive.mozilla.org/pub/thunderbird/nightly/2016/11/2016-11-14-03-02-09-comm-central/

I took my existing thunderbird folder and renamed it "Thunderbird Backup" ... then opening the daily build create a new folder and new profile. Set up my account, made sure to disable automatic updates and didnt both downloading all my old email.

If it stops working and doesn't produce a crash dump - before force quitting, open activity monitor, select thunderbird (daily) and sample the process to check for the mutex lock.

(In reply to Calum Mackay from comment #295)
"My other concern was that there was more than one bug here; there seemed a spate of "closing as duplicate", despite there being little evidence to connect some of the bugs."

I motion to reopen bug 1608733. It may not be the same exact bug as 1381485, and it provides instructions for reproducing. Anyone following that line of attack can then make their contributions on that thread. The more approaches and the more information, the better.

(In reply to webmaster2 from comment #301)

(In reply to Calum Mackay from comment #295)
"My other concern was that there was more than one bug here; there seemed a spate of "closing as duplicate", despite there being little evidence to connect some of the bugs."

I motion to reopen bug 1608733. It may not be the same exact bug as 1381485, and it provides instructions for reproducing. Anyone following that line of attack can then make their contributions on that thread. The more approaches and the more information, the better.

I just tried this half a dozen times with no luck. Open email in new window. Reply to email. Close window of original message after clicking send on the reply. Am I missing something?

Seems to happen most frequently with me when my system is under load .. streaming video in one window, maybe doing a backup at the same time.

(In reply to Scott from comment #302)
"I just tried this half a dozen times with no luck. Open email in new window. Reply to email. Close window of original message after clicking send on the reply. Am I missing something?"

I think it hangs for me because I'm on a satellite internet connection, and the send operation takes a long time. In other words, when I close the window of the original message, the send operation has not completed yet. Another way you could possibly precipitate the failure is to send a very large attachment. Fyi, it doesn't always hang for me. In fact, lately I haven't noticed this failure during normal use.

Comment 260 is certainly worth pursuing. But if the root cause is semaphores/mutex then the challenge here isn't just a matter of reproducing (which isn't difficult for some people), nor just getting a developer. It's getting the right developer, one that can reproduce the issue, and is willing to dive into graphics code. Plus we'll ultimately must have a graphics developer because that's where the core Firefox code is failing and it is they that will need approve any patch that gets applied to the code base. A further significant factor is we have yet to find a corresponding Firefox issue, which would then get the interest of Mozilla Firefox developers who have no vested interest in pursing a Thunderbird-only issue. We don't pay them, we don't manage them, and they have other priorities.

We have only a couple Thunderbird c++ developers - one of them offered an opinion in comment 143. If they can identify some code suggestions and then we may have a shot at enlisting the core developers. We will check further to see whether any are able to address this.

THESE factors are why we have pursued the lower skill approach of looking for a regression range. That approach is often successful for us, and that it hasn't worked out so far in this case is unfortunate.

As for webmaster2's bug 1608733, there is now a possibility that it is not related to this bug.

I did considerable research this morning into the early history which I am putting in the story section, which includes some major graphics code landings - perhaps someone can make something of it. It's worth noting that we've had far more more reports of this in Thunderbird 60 than version 52.

User Story: (updated)

Comes to mind: people seeing this bug - does setting layers.acceleration.disabled to false make the bug go away? (== Check the "Use hardware acceleration when available"

IIRC the default on Mac is enabled for Use hardware acceleration when available, no?

(In reply to Wayne Mery (:wsmwk) from comment #309)

IIRC the default on Mac is enabled for Use hardware acceleration when available, no?

On Mac it's disabled by default!

(In reply to Scott from comment #292)

Are either of you running an OSX Verison before High Sierra?

No. I primarily run on Mojave, but I have both Mojave and Catalina environments available for testing.

My Mojave environment is single-screen, and I can reproduce this with also every email I send if I'm not careful. It seems that triggering mouse events over the main window as the composition window is closing triggers the issue. If I press CMD-ENTER to send, then touch nothing as the window closes, all is well. When moving the mouse pointer (which is common, since I'm usually "doing email" and not just sending a single message), I get the color-wheel.

My Catalina environment is dual-screen, and I keep my email main window on one display while the composition window is on another display. I can't remember when the last time TB crashed in that environment. I can try to get it to fail, there.

This issue has been hitting me for years, and is becoming more frequent as time goes on. Perhaps something changed in Mojave which makes it either easier (or even just possible) to trigger while older versions are less prone (or impossible).

(In reply to Wayne Mery (:wsmwk) from comment #309)

IIRC the default on Mac is enabled for Use hardware acceleration when available, no?

I said as much in an email I just sent you - when you look at the bug reports data you shared - 7 of the first 10 most reported GPu's are all intel integrated graphics - and reasonably dated. This means the crashes are most frequently happening in lower end macbook/imacs or thunderbird is not using the discrete gpu.

(In (In reply to Magnus Melin [:mkmelin] from comment #308)

Comes to mind: people seeing this bug - does setting layers.acceleration.disabled to false make the bug go away? (== Check the "Use hardware acceleration when available"

I am a frequent reproducer - and mine is currently set to "true" I will change it to false and test. (I crashed on the last email I sent).

(In reply to Magnus Melin [:mkmelin] from comment #308)

Does setting layers.acceleration.disabled to false make the bug go away? (== Check the "Use hardware acceleration when available"

Since changing this value from "true" (which is the default: DO NOT use acceleration) to "false", I have had no lock-ups, running 68.5.0 which used to lock-up with nearly every message I sent. So I'm guessing that either a different code-path is being taken, or the window of opportunity to be hit by this bug is very very small when hardware acceleration is enabled (or, rather, NOT DISABLED).

John, what's your hw acceleration set to?

(In reply to Christopher Schultz from comment #314)

(In reply to Magnus Melin [:mkmelin] from comment #308)

Does setting layers.acceleration.disabled to false make the bug go away? (== Check the "Use hardware acceleration when available"

Since changing this value from "true" (which is the default: DO NOT use acceleration) to "false", I have had no lock-ups, running 68.5.0 which used to lock-up with nearly every message I sent. So I'm guessing that either a different code-path is being taken, or the window of opportunity to be hit by this bug is very very small when hardware acceleration is enabled (or, rather, NOT DISABLED).

John, what's your hw acceleration set to?

I am seeing the same behavior so far (no crashes yet!). But its too early to say for certain. Will keep testing.

(In reply to Scott from comment #315)
(In reply to Christopher Schultz from comment #314)

I switched a dozen of my users over to the new setting over lunch break (2 hours ago).

Already one user has reported a crash on sending email. This person is on a Mid 2015 Macbook Pro with a Radeon R9 M370X GPU. Typically runs with an external benQ 24" LCD as well. But has reported that they get crashes when both at home (not conencted to the external display) and regardless of which display thunderbird is in, of if the inbox and messages being sent are on the same display.

I will wait for more reports. Its possible it will help some users and not others. But as this persons machine did have a discrete GPU I would not have expected them to still be in the problem camp - if indeed this change is helpful.

As many users reported, this bug seems to be correlated to the load on the host system, so I wouldn't expect to be able to reliably test any workaround positive with a single user in a small time window...

RE: Wayne Mery, "As for webmaster2's bug 1608733, there is now a possibility that it is not related to this bug."

I don't think bug 1608733 is related to system load. It seems to be related to the timing of the closing of the original message in relation to a message send operation taking a long time, for example because of network delay.

You may be right webmaster2, but the point was that it's not a simple test to verify a fix. There's some concurrency scenario that is not exactly clear how to reproduce consistently.

layers.acceleration.disabled;true

Flags: needinfo?(jcd)

I made no changes (this was my default setting).

(In reply to Scott from comment #316)

(In reply to Scott from comment #315)
(In reply to Christopher Schultz from comment #314)

So far so good. All 4 of my main sufferers have not reported any further crashes since the preference change.

Thunderbird 68.5.0 crash report on macOs 10.15.3.

Crash happened again, and yesterday.

I'm a qualified developer and tester. Please let me know if there is any testing I can do to help.

This is not be something we could ship today in the product, but as a test affected users could force enable webrender https://wiki.mozilla.org/Platform/GFX/Quantum_Render#Build_instructions (also read the notes there about help > Troubleshooting to ensure it is enabled). I've been running it a few days in Thunderbird with no ill effects.

See Also: → 1623265

I have time for a quick note between projects ..

I was good for three days, then restarted Thunderbird after it took a gig of memory. It froze on me this morning and I did not have that flag set for hardware acceleration.

Still no luck! Please help!

(In reply to John Dale from comment #331)

I was good for three days, then restarted Thunderbird after it took a gig of memory. It froze on me this morning and I did not have that flag set for hardware acceleration.

You change the setting to False, and are still crashing with Clear Drawable errors?

What kind of mac are you using? Many models do not even have discrete graphics cards, so I can imagine changing the setting for those will not help.

iMac (24-inch, Mid 2007)
2.8 GHz Intel Core 2 Duo
4 GB 667 MHz DDR2 SDRAM
ATI Radeon HD 2600 Pro 256 MB

(In reply to John Dale from comment #333)

iMac (24-inch, Mid 2007)
2.8 GHz Intel Core 2 Duo
4 GB 667 MHz DDR2 SDRAM
ATI Radeon HD 2600 Pro 256 MB

Oooofff. That is certainly an aging computer, it hails from the era before on board graphics. I can imagine how the preference change may not being helping and the machine will simply lack the power necessary.

Lack the power necessary to send and receive emails and paint in boxes?

;)

John

Hello all:

I have lurked on this issue because I'm neither a developer or a tester. I'm just a loyal user out in the field who is STILL experiencing multiple crashes per day. I use TB as my main application, and it's become crash-ware. Thankfully, no emails are lost during the crash or I'd be "gone" as a user by now.

Some have characterized this three-year-and-counting bug as a "race condition". Others have suggested this bug was revealed by a macOS upgrade. These are indeed useful suggestions, but I wonder when Mozilla will deploy an engineer or engineers to FIX this thing.

Perhaps someone can produce a diagnostic script that traces the action during a crash. If anyone wants me to install such a thing I'd do it. As it is, lots of information is generated during these crashes and are passed to Apple - who presumably has no interest in it.

I work in software myself, and I can't imagine a critical error that went on this long without having a priority set. Perhaps we have arrived at the lethal mutation of open-source software. You can only beg for assistance and depend upon the kindness of this community - which is certainly present, but at this time - apparently - ineffective.

This is the first time I've felt at a disadvantage for migrating from Wintel to macOS. Mozilla, would you please allocate some resources to this?

Sincerely (and with a plaintive little whine),
Bob

I now have layers.acceleration.disabled set to false i.e. "Use hardware acceleration when available" is selected.
I do not have the problem any more.
So at least for me, this solution is working fine.

There looks to be a very strong correlation between TB locking up and the screen resolution or number of screens.
TB has been working great for the past few days while I've been working just from my MBP laptop without the additional display.
TB has just locked up on about the 6th email I sent since I've attached the 43" Phillips HD monitor (specs mentioned in previous reports on this ticket).

(In reply to Mike from comment #328)

Created attachment 9132131 [details]
Thunderbird 68.5.0 crash report on macOs 10.15.3

Thunderbird 68.5.0 crash report on macOs 10.15.3.

Crash happened again, and yesterday.

I'm a qualified developer and tester. Please let me know if there is any testing I can do to help.

(In reply to Mike from comment #341)

There looks to be a very strong correlation between TB locking up and the screen resolution or number of screens.

Agreed. That has been my theory from nearly the beginning. Or more specifically screen resolution to GPU power.

To Bob, Mike and Matt, please try the possible "fix" a number of us have been testing for a couple of weeks now and are having success with. It is in comment #308.

I think the problem here is simply this.

Among the active contributors to TB source code, nobody seems to be using Mac as the PC for daily mail exchange. PERIOD.

I began contributing to bug fixes because TB literally ate my e-mail messages close to 12 years ago.
(Sorry, I am using x86-based linux PC.)

Were I using Mac and experiencing the kind of crashes or failures mentioned in this today, I would have gone nuts and

  • either found the real cause of the bug and send a fix, OR
  • tried to run TB inside a linux running in a virtual PC to see if that would help. OR
  • ditched TB or Mac whichever is easier.

Given that I own an e-mail archive for more than dozen years in TB mail folders, I am afraid I would go with TB and ditch Mac.

I am curious if there is anyone inside Mozilla who uses a Mac and that person can spend a week or two to look into this
issue. But the problem is that person is probably an FF developer and clueless regarding a bug in TB.
(Maybe he/she is not and can point out the correct race/resource locking issue, etc.)
Last time I remember the note PC of choice for developers at Mozilla was a x86-PC of Dell or some such brand.

Just my two cents worth.

(In reply to ISHIKAWA, Chiaki from comment #344)

I think the problem here is simply this.

Among the active contributors to TB source code, nobody seems to be using Mac as the PC for daily mail exchange. PERIOD.

That is my understanding, as has been mentioned by others in this bug thread. There are not many Thunderbird software engineers, even fewer use macs, and none of the ones that do can reproduce the bug.

There are also not a massive number of bug reports about it (don't quote me on that, Wayne probably has a better idea how the hundreds for this compare to other bugs they experience).

All:

Thanks for the suggestion that I "set layers.acceleration.disabled to false". I see a cryptic reference to "(== Check the 'Use hardware acceleration when available'.

Now, what the HECK does that mean? That's nothing that I can find in TB Preferences, nor is a choice under "Settings" in my Mac Pro.

How very opaque. See my postscript for an example of other jargon.

I am willing to do anything I can to help this thread along, but blithely mentioning resources that are meaningless to the uninformed (me) isn't going to get the job done.

With respect,
Bob

PS: A half-diminished chord can always be substituted with a dominant 9th chord one third down. Try it. It's a great substitution. -B

This is what I did:
Go to SETTING > ADVANCED. On the last one "Edit Configuration" you will see a list of commands (or whatever they are). Look for the one called: layers.acceleration.disabled and double click on it until it reads FALSE on the last column.
Hopes that helps!

(In reply to Bob Shimizu from comment #347)

All:

Thanks for the suggestion that I "set layers.acceleration.disabled to false". I see a cryptic reference to "(== Check the 'Use hardware acceleration when available'.

Now, what the HECK does that mean? That's nothing that I can find in TB Preferences, nor is a choice under "Settings" in my Mac Pro.

How very opaque. See my postscript for an example of other jargon.

I am willing to do anything I can to help this thread along, but blithely mentioning resources that are meaningless to the uninformed (me) isn't going to get the job done.

With respect,
Bob

PS: A half-diminished chord can always be substituted with a dominant 9th chord one third down. Try it. It's a great substitution. -B

WORKAROUND CHOICES pick one (summarized for clarity):

  • use Thunderbird version 78 where hardware acceleration is enabled by default (or beta 76 or newer https://thunderbird.net/#channel)
  • enable "Use hardware acceleration" (HWA)
    ** version 68 - Thunderbird > Preferences > Advanced > General > mark the checkbox for "Use hardware acceleration"
    ** version 78 (or beta version 76 or newer): Thunderbird > Preferences > General > (go to the bottom) > mark the checkbox for "Use hardware acceleration"
  • set layers.acceleration.disabled to false - Thunderbird > Preferences > Advanced > General > Config editor > paste '"layers.acceleration.disabled" > double click to toggle to false
    ** beta versions: Thunderbird > Preferences > General > (go to the bottom) > Config editor > paste "layers.acceleration.disabled" > double click to toggle to false
  • force enable webrender EXPERIMENTAL USE AT YOUR OWN RISK perhaps best only for users of Thunderbird beta builds - Webrender replaces the Gecko compositor (which is part of this problem). Read the Notes about double checking in help > Troubleshooting, to it verify is enabled. ref. comment 329 open bug reports
  • after clicking the "Write" button on Thunderbird's main window to compose a new message, minimize the main window. Once done composing, clicking "Send" works and does not freeze Thunderbird. (from comment 375)

We need your feedback only in cases where one of the above does NOT help you, and include details about your hardware. In other words if the workaround helps please refrain from commenting so we can focus on delivering a solution. Also, see comment 350 below.

Any errors or additions to the above instructions, email me direct and I'll edit this comment.

User Story: (updated)
Flags: needinfo?(cinymini)
Whiteboard: [regression:TB54?][duptome][workaound: comment 104] → [regression:TB54?][duptome][workaound: comment 349]

**75.0b3 has a workaround (mentioned above in comment 349) by enabling HWA [1] via bug 1623265. The patch will soon appear in 68.7.0, 68.7.1, or 68.8.0 - if you don't see it mentioned in the release notes then it hasn't been done in that release. **

NOTE, even with a manual or default setting of enabled, after startup you may find HWA is NOT running due to startup checks which ensure your hardware works with HWA. [2] In other words, you enabling HWA is not a 100% guarantee it will function on your PC.

Also, this is a workaround, not a code fix. I don't foresee further investigation to develop a code fix unless the workaround proves ineffective. Fingers crossed.

[1] HWA - HardWare Acceleration for graphics
[2] Mac acceleration requirements:

  • For WebGL, we require Mac OS version 10.6 or newer. See bug 636611
  • For layers acceleration (HWA), we require Mac OS version 10.6.3 or newer. See bug 629016. One exception is <video> acceleration, which is enabled on all Mac OS versions.
  • For layers acceleration (HWA), we also block all old graphics adapters that do not fully support OpenGL 2.1 in hardware (use slow software fallbacks), or that can't render to non-power-of-two texture-backed framebuffers. That includes the following generations of GPUs: ATI Radeon X1000 and older, NVIDIA Geforce FX and older, and Intel GMA 950 and older.
Blocks: 1398807
No longer depends on: 1398807

Wayne and Matts: Thanks for showing me WHERE to affect TB's settings. I was able to enable "Use hardware acceleration when available". I then navigated to Config Editor only to find that layers.acceleration.disabled was already set to a false state.

I will see how TB performs.

Thanks for clarifying the settings for me.

Bob

I seem to have the same problem and have to force close thunderbird after about every other email sent. I have this problem already for months but it's getting worse and unbearable.

All:

I am pleased to report that after following Wayne Mery's advice in comment #349, all seems to be well.

Here's a snippet of Wayne's advice:

enable "Use hardware acceleration" - Thunderbird > Preferences > Advanced > General > mark the checkbox for "Use hardware acceleration"
** beta versions: Thunderbird > Preferences > General > (go to the bottom) > mark the checkbox for "Use hardware acceleration"

set layers.acceleration.disabled to false - Thunderbird > Preferences > Advanced > General > Config editor > paste '"layers.acceleration.disabled" > double click to toggle to false

** beta versions: Thunderbird > Preferences > General > (go to the bottom) > Config editor > paste "layers.acceleration.disabled" > double click to toggle to false

In my case all I had to do was mark the checkbox for "Use hardware acceleration". The other choices had already been set in my otherwise un-altered copy of TB.

It's been over a week now and I've experienced no crashes. I'm not any part of an engineer, but "Use hardware acceleration" might indicate a timing problem. In the mainframe world, we might issue a POST to indicate the conclusion of some sub-task. I don't know the analog in the case of world-zilla, but perhaps a POST is or isn't being done.

In any case, thanks Wayne!

Bob

No luck .. tried variations of the suggested settings. Still crapping-out.

(In reply to webmaster2 from comment #355)

I am the poster of bug 1608733, which was marked as a duplicate of this one. I am still seeing the bug; however, have not had time to try any of the workarounds (or the one workaround). TB 68.7.0 on Mac.

The workaround on 68 is trivial. If it doesn't help, please post an update here.

Tried workaround #2 above (enable "Use hardware acceleration"), and this seems to have fixed the problem for me. TYVM.

System description: MacBook Pro (16-inch, 2019), Catalina: 10.15.3 (19D76)

UPDATE: I am using workaround #2 and it appears to have fixed the problem.

TB 68.9.0. OS X 10.11.6.

I haven't tried the "workarounds".

I always (and only) experience this crash if I click away from a "sending" window back onto the main window while the outbound send is still in progress. If I wait until message delivery is completed I get no hangs.

p.s. TB 68.9.0, macOS 10.15.3. I've been experiencing this bug for years.

(In reply to Ray Bellis from comment #360)

I haven't tried the "workarounds".

I always (and only) experience this crash if I click away from a "sending" window back onto the main window while the outbound send is still in progress. If I wait until message delivery is completed I get no hangs.

That is expected behavior of this bug.

The workarounds work well and are easy to implement - go for it!

I have an end-user that runs TB 68.11 (64-bits) on Mac OS X 10.15.6 (new computer) which mentioned his Thunderbird freeze/hang 3-4 times per day randomly when sending message with or without attachment (IMAP/SMTP setup)... the compose message window remain opened, and it seems his progress bar is missing possibly and when that happens he is unable to regain access to TB without killing it. After restart of TB, message appears correctly in his IMAP mailbox Sent folder... so message was effectively sent...

Do you think that could be related to this bug?

Would the best option be to wait for him to upgrade automatically to 78.2 when soon out, to fix his issue?

(In reply to Richard Leger from comment #364)

I have an end-user that runs TB 68.11 (64-bits) on Mac OS X 10.15.6 (new computer) which mentioned his Thunderbird freeze/hang 3-4 times per day randomly when sending message with or without attachment (IMAP/SMTP setup)... the compose message window remain opened, and it seems his progress bar is missing possibly and when that happens he is unable to regain access to TB without killing it. After restart of TB, message appears correctly in his IMAP mailbox Sent folder... so message was effectively sent...

Do you think that could be related to this bug?

Yes, the symptoms match this bug - the message is sent but thunderbird is hung.

Would the best option be to wait for him to upgrade automatically to 78.2 when soon out, to fix his issue?

Yes, It is a good choice because one of the workarounds, hardware acceleration, is enabled by default in version 78.

(In reply to Scott from comment #345)

(In reply to ISHIKAWA, Chiaki from comment #344)

I think the problem here is simply this.

Among the active contributors to TB source code, nobody seems to be using Mac as the PC for daily mail exchange. PERIOD.

That is my understanding, as has been mentioned by others in this bug thread. There are not many Thunderbird software engineers, even fewer use macs, and none of the ones that do can reproduce the bug.

There are also not a massive number of bug reports about it (don't quote me on that, Wayne probably has a better idea how the hundreds for this compare to other bugs they experience).

I can contribute one every time it crashes if you need.
Thunderbird is my daily driver, on a Mac, OS 10.14.6.

Keep working the issue and if you can figure it out, bully for us! :D

In the meantime, I tread lightly and slowly when I'm sending and it seems to help.

I'm sorry I couldn't help more .. my Internetio broadcast is blowing-up and consuming me.

All of my correspondence with guests is done using Thunderbird. That will continue for as long as possible.

Thanks to everyone in the community for putting it forward.

Sincerely,

John Dale

DB2DOM.COM
PLAINSTRIBUNE.COM

For me, hardware acceleration fixed it.

OSX 10.15.6, daily driver.

TB is much more stable than the Mail.app that I finally gave up.

Thank you for the offer of crash reports. However, such actions will be a waste of time because ...

there is zero chance of this being fixed due to a) not affecting firefox b) problem is expected to go away in version 78 where HWA is enabled by default (automatic updates coming soon, and c) the rendering engine moving to webrender (in 2021?) which should resolve the issue for anyone not helped by version 78 ... and thus no effort will be invested in fixing the underlying cause, which thankfully we won't need.

Is there a way to turn on HWA in 68.12? On a Mac Mini that only has Intel HD Graphics 4000 (integrated)

Alternatively, when is version 78 expected to hit?

Because not expecting my mail app to randomly crash would be a nice thing.

(In reply to Anthony from comment #371)

Is there a way to turn on HWA in 68.12? On a Mac Mini that only has Intel HD Graphics 4000 (integrated)

comment 349 above

Alternatively, when is version 78 expected to hit?

you can get it today at https://www.thunderbird.net/

Summary: Hangs sending mail while copying message to Sent folder on Mac-only while displaying the progress bar. Deadlock in graphics on CGLClearDrawable. → Hangs sending mail while copying message to Sent folder on Mac-only while displaying the progress bar. Deadlock in graphics on CGLClearDrawable. Workaround comment 349
Attached file Sampling Process 440
Sorry, didn't get the e-mail notification of your response, 
Thunderbird had crashed again. 

Here's a sample:

I enabled HWA, and it seems persistent with app restart. Fingers crossed.
Thank you all.

I just discovered a simple workaround to this bug (these bugs?) that has worked for me nearly every time I've sent a message.

The workaround is, after I click the "Write" button on Thunderbird's main window to compose a new message, I minimize the main window. Once done composing, clicking "Send" works and does not freeze Thunderbird. Some more details...

I currently have Thunderbird 78 and first experienced these issues in version 68. Before that I had version 45.8.0 which did not have these issues.

In my case, because I've chosen not to keep this account's password permanently saved and don't want to use a Master Password, once I click the Send button, the main window automatically maximizes which now has a dialog overlayed asking for my email account's SMTP password. Normally this is when sending would freeze Thunderbird, but now I can enter my password and send without a problem. So it seems to me that this bug is triggered by a combination of which windows are open/visible, and when one, or perhaps multiple, popups get displayed.

This freezing bug only happens when interacting with my one normal password-authenticating account, which is my default account. I have two Oauth2 accounts which have sent with no problems so far, but I rarely use these so I can't be certain they wouldn't also have this problem.

I tried the workarounds suggested earlier in this thread, but I still had this problem whether setting layers.acceleration.disabled to true or false. Unfortunately I have an older Mac with an NVIDIA GeForce 9400M which doesn't appear to be supported by webrender. I did try force enabling webrender on both Thunderbird and Firefox and saw many graphical issues/artifacts. (Yes, I will upgrade my computer soon, likely not buying another Mac, but I feel it's still worth the underlying bug getting fixed in case it otherwise resurfaces despite the HWA workarounds!)

Though my workaround helps nearly every time, there are some instances after clicking Send, and despite entering my password correctly, that the compose window focuses again with yet another dialog box asking for my SMTP password. Sometimes Thunderbird freezes when this happens. When it doesn't freeze, I must minimize the main window again THEN type and submit my password on the compose window, otherwise I risk Thunderbird freezing at this point. There apparently is separate problem (bug 1661337 ?) with Thunderbird not remembering a password throughout the session, which increases the likelyhood of this Send bug resulting in a freeze. This also affects the automatic saving of drafts to the mail server, often with a message saying "Your draft message was not copied to your drafts folder (Drafts) due to network or file access errors. You can retry or save the draft locally to Local Folders/Drafts." To work around this I have either canceled saving drafts, or have sometimes been successful by maximizing the main window, clicking on the email account's "Drafts" folder which then prompts me for my account's password again, and then hopefully the draft gets saved successfully and I can minimize the main window again.

When I change my about:config setting for mailnews.show_send_progress to false, this bug doesn't happen as often, which reinforces the idea that this bug has to do with how Thunderbird manages its various windows and dialog popups. Just a big guess, but perhaps Thunderbird is creating multiple popups upon the click of "Send" which are competing for focus at the exact same time, triggering the crash?

Lastly, in case it helps to pinpoint the problem, my Mac's Console app shows many of these messages back-to-back while running Thunderbird:

thunderbird: NextSurface returning false because of invalid mSize (0, 0).

(In reply to kurtbarb from comment #375)

I just discovered a simple workaround to this bug (these bugs?) that has worked for me nearly every time I've sent a message.

Thanks for all the info.

I tried the workarounds suggested earlier in this thread, but I still had this problem whether setting layers.acceleration.disabled to true or false. Unfortunately I have an older Mac with an NVIDIA GeForce 9400M which doesn't appear to be supported by webrender. I did try force enabling webrender on both Thunderbird and Firefox and saw many graphical issues/artifacts. (Yes, I will upgrade my computer soon, likely not buying another Mac, but I feel it's still worth the underlying bug getting fixed in case it otherwise resurfaces despite the HWA workarounds!)

Can anyone on version 91 can reproduce the crash? If yes, please post your crash ID(s) - it will be helpful.

(In reply to Wayne Mery (:wsmwk) from comment #376)

Can anyone on version 91 can reproduce the crash? If yes, please post your crash ID(s) - it will be helpful.

I have set the value of layers.acceleration.disable to TRUE (which is no longer the default in 91 and likely earlier, so it's set to FALSE aka DISABLE ACCELERATION) and restarted Thunderbird. I'll see if it locks-up in the near future.

Still reproduces?

Flags: needinfo?(chris)

(In reply to Wayne Mery (:wsmwk) from comment #378)

Still reproduces?

I just checked, and my current (102.5.0, 64-bix macos 11.7/Big Sur) setup has:

layers.acceleration.disabled=true (non-default value)
layers.acceleration.force-enabled=false (default value)

I haven't experienced a crash in a good long time, so I suspect that disabling acceleration does in fact fix this.

But I haven't re-disabled disable-acceleration (to, in effect, ENABLE acceleration :) and re-tested it. If that would be helpful, I can do so and see if it starts locking-up again. I seem to recall it would happen multiple times daily so it shouldn't take long to reproduce. :)

Flags: needinfo?(chris)

With version 102 and webrender graphics, we should be seeing a lot less of this.

Can anyone still reproduce?

Flags: needinfo?(web)
Flags: needinfo?(kurtbarb)
Flags: needinfo?(chris)

Wayne, I'm happy to re-configure and re-test. Can you tell me what settings you want for those two config values (and any other relevant ones)? I'll put my tb into that configuration and just run for a while to see if it still happens to me.

Flags: needinfo?(chris) → needinfo?(vseerror)

It has been a few years now since I have experienced this bug. I don't believe it is still affecting any of my offices machines.

See Also: → 1810216

Either enabling HWA in settings or layers.acceleration.force-enabled=true (default value). Although I'm not sure layers.acceleration.force-enabled is still hooked up.

Flags: needinfo?(web)
Flags: needinfo?(vseerror)
Flags: needinfo?(kurtbarb)
Flags: needinfo?(chris)

I've been running with layers.acceleration.disabled=true (not the default) and layers.acceleration.force-enabled=false (the current default) for months, now, and I see no crashes or hangs. Are you asking me to restore the default layers.acceleration.disabled=false and run for a while?

Flags: needinfo?(chris) → needinfo?(vseerror)

(In reply to Christopher Schultz from comment #384)

Are you asking me to restore the default layers.acceleration.disabled=false and run for a while?Y

Yes

Flags: needinfo?(vseerror) → needinfo?(chris)

Okay. Here are my settings, now:

version: 115.6.0 (64-bit)
OS: MacOS Ventura 13.6.1 x86-64
layers.acceleration.disabled false
layers.acceleration.force-enabled false

We'll see how things go.

Flags: needinfo?(chris)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: