Open Bug 1381485 Opened 3 years ago Updated 3 days ago

Hangs sending mail while copying message to Sent folder on Mac-only while displaying the progress bar. Deadlock in graphics on CGLClearDrawable.

Categories

(Thunderbird :: Message Compose Window, defect, critical)

54 Branch
x86_64
macOS
defect
Not set
critical

Tracking

(thunderbird_esr60 wontfix, thunderbird_esr68? affected, thunderbird53 unaffected, thunderbird54 wontfix, thunderbird56 wontfix, thunderbird57 wontfix, thunderbird58 wontfix, thunderbird59 wontfix, thunderbird60 wontfix, thunderbird67 wontfix, thunderbird71 wontfix, thunderbird72 affected)

Tracking Status
thunderbird_esr60 --- wontfix
thunderbird_esr68 ? affected
thunderbird53 --- unaffected
thunderbird54 --- wontfix
thunderbird56 --- wontfix
thunderbird57 --- wontfix
thunderbird58 --- wontfix
thunderbird59 --- wontfix
thunderbird60 --- wontfix
thunderbird67 --- wontfix
thunderbird71 --- wontfix
thunderbird72 --- affected

People

(Reporter: jamesrome, Unassigned, NeedInfo)

References

(Depends on 1 open bug, )

Details

(Keywords: hang, regression, regressionwindow-wanted, Whiteboard: [regression:TB54?][duptome][workaound: comment 104])

User Story

** We need a one-day regression range using daily builds.  Please pick a build date that might fail and test it. ***

* initially thought to be: http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/  to http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/
* We have no idea on what date the flaw states, we only know some that work and some that don't.

Working backwards:
* fail: 2017-06-07 comment 193, coment 216  Scott, Christopher
* fail: 2017-06-06   Richard
* fail: 2017-06-05   Richard
* ?? : 2017-06-04   Chris
* fail: 2017-06-03  Scott
* fail: 2017-06-02  Scott
* fail: 2017-06-01  Scott
* fail: 2017-05-31  Scott
* fail: 2017-05-24  Christopher  
* ?? : 2017-05-17   ?
* ?? : 2017-05-14   ?
* ?? : 2017-05-01  Christopher
* fail: 2017-04-01  Scott
* fail: 2017-03-22  Scott
* ??: 2017-03-15  Scott
* ??: 2017-03-08  ??
* works for two weeks: 2017-03-01  Scott

User configs (name, computer, OS version, graphics, monitor(s)) :
* robert.p              MacBookPro14,2       10.13.6 0x5927  Built-In Retina LCD 
* joduinn              MacBookPro15-2018  10.14.6   Radeon Pro 560X 4 GB Intel UHD Graphics 630 
* Bob Shimizu                                         10.14.16  <unknown graphics>  Apple Thunderbolt Display(3) 
* Aaron                Mac Mini (Late 2014)  10.14.6   Intel Iris 1536 MB   27" Thunderbolt display
* Robert Shimizu             Mac Pro          10.14.6   Uknown graphics  Thunderbolt 27" (3)
* Scott James              iMac Mid 2017      10.13.6   Radeon Pro 575   Integrated 5120 x 2880 Apple Display
* kimlove  iMac Retina 5K 27-inch 2017  10.13.6  Radeon Pro 580  imac desktop
* Marc De Graef     Mac Pro (Late 2013)  10.13.6  3.5 GHz 6-core Intel Xeon E5  LG Ultrawide Display
* Marc De Graef   Macbook Pro (Retina, 15" Mid 2015)  10.13.6  2.8GHz Intel Core i7  Built-in Display 
* Ludovic Rousseau ?  "it takes a week"
* Christopher Schultz  ?
* Richard Leger ?

- yahoo and gmail imap
- spotlight doesn't matter
- bumping file handles doesn't help
> 54.0B3 works - no crashing.
> 55.0b2 dies


**beta feedback**
- James: still fails with 67 beta  (this bug)
- Scott: unknown  (this bug)
- degraef: works with 66? beta  (bug 1525001)
- cinymini: unknown  (bug 1422251) 

**Reports:**
https://support.mozilla.org/en-US/questions/1273927
https://discourse.mozilla.org/t/thunderbird-freezes-and-i-have-to-force-quit/46256
https://support.mozilla.org/en-US/questions/1265983
https://support.mozilla.org/en-US/questions/1256159 (claims safe mode helped)
https://support.mozilla.org/en-US/questions/1254261
https://support.mozilla.org/en-US/questions/1247137 (reverted to TB52 - good tester)
https://support.mozilla.org/en-US/questions/1246950
https://support.mozilla.org/en-US/questions/1246643  1/13/2019
https://support.mozilla.org/en-US/questions/1242885 **disabling send progress helps**
https://support.mozilla.org/en-US/questions/1241167 (when adding attachment)
https://support.mozilla.org/en-US/questions/1237929 (likely unrelated because this is on win10 and caused by signature file)
https://support.mozilla.org/en-US/questions/1234828 9/21/2018
https://support.mozilla.org/en-US/questions/1241380
https://support.mozilla.org/en-US/questions/1234041 - UCD reverted to TB52.0
https://support.mozilla.org/en-US/questions/1172364 - three users left Thunderbird
https://support.mozilla.org/en-US/questions/1170401 - 52.2.1 Arthur frequent hangs  8/7/2017 

Similar or Mac hang issue: 
- https://support.mozilla.org/en-US/questions/1246950
- https://support.mozilla.org/en-US/questions/1246494

Attachments

(22 files)

1.70 MB, text/plain
Details
136.43 KB, application/zip
Details
61.32 KB, text/plain
Details
5.74 MB, text/plain
Details
126.58 KB, application/zip
Details
100.80 KB, text/plain
Details
1.35 MB, text/plain
Details
107.08 KB, text/plain
Details
115.67 KB, application/zip
Details
117.55 KB, application/zip
Details
120.33 KB, application/zip
Details
146.26 KB, application/zip
Details
4.55 KB, image/png
Details
6.28 KB, image/png
Details
1.10 MB, text/plain
Details
31.13 KB, image/png
Details
18.76 KB, image/png
Details
51.05 KB, text/plain
Details
1.74 MB, text/plain
Details
1.72 MB, text/plain
Details
95.15 KB, image/jpeg
Details
1.66 MB, text/plain
Details
Attached file TBjang.txt
see attached
Does "hangs frequently" mean it hangs and you must kill the process? OR does it mean it hangs for x minutes and rten Please try 55
Flags: needinfo?(jamesrome)
I have to kill the process.
Flags: needinfo?(jamesrome)
> Please try 55 ... started in safe mode
Severity: major → critical
Keywords: hang
It upgraded to 55. I'll see if it hangs still
It still hangs frequently in normal mode. And I have the latest MacOS 10.12.6.
New apple report attached.
Attached file TBHang2.txt.zip
Attached file TBfiles.txt
It hung again in safe mode. One issue may be the number of open files (see attached). TB seems to load every one of my fonts, and I have a huge number since I do desktop publishing. There is no reason for this, and certainly zonks the system, which (I think) has some limit on the number of open files. TB 55.0b2
Attached file TBSpindump.txt
It hung right away in normal mod. I attach a spindump
Attached file TBHang3.txt.zip
It hung again just after sending mail.
It seems to be happening when I send google mail. It did it again.
It is now hanging every time I send mail. It's a lot worse since the upgrade to macOS 10.12.6
(In reply to James Rome from comment #11)
> It is now hanging every time I send mail. It's a lot worse since the upgrade
> to macOS 10.12.6

does that mean non-google mail?
with or without addons?

(In reply to James Rome from comment #7)
> Created attachment 8889048 [details]
> TBfiles.txt
> 
> It hung again in safe mode. One issue may be the number of open files (see
> attached). TB seems to load every one of my fonts, and I have a huge number
> since I do desktop publishing. There is no reason for this, and certainly
> zonks the system, 

Your fonts situation is a good piece of info.  That's the way gecko works (it's not Thunderbird code) and is unavoidable, so the same thing will happen in Firefox. It does mean many fd will be open, and perhaps cause high memory usage.

> which (I think) has some limit on the number of open files. TB 55.0b2

You can make a modest increase to ulimit See bug 800279 comment 1.

But it is unclear whether this is related to your hanging situation.
Flags: needinfo?(jamesrome)
Summary: hangs frequently → Thunderbird hangs frequently
I reverted to the release channel after the last kerfuffle with Lightning, and this problem no longer occurs.
Flags: needinfo?(jamesrome)
Does this Mac have 13 imap accounts, or is that the Windows 10 system?
Did the hang occur only when sending?
Flags: needinfo?(jamesrome)
Summary: Thunderbird hangs frequently → Thunderbird beta hangs frequently while sending mail
My Mac has 11 IMAP accounts. I usually do not activate them all on Windows.
Flags: needinfo?(jamesrome)
James,
Please try increased ulimit See bug 800279 comment 1.
Is it more prone to happen with gmail, or happens only with gmail?
Did it also happen with 53 beta? Or, had you not used beta prior to 54?

(marking regression, because it does not happen for user with release build)
Flags: needinfo?(jamesrome)
Sorry, I reverted to the release build because I could not do anything...
Flags: needinfo?(jamesrome)
Had you been using TB53 beta prior to comment 0 and not had problems?
Flags: needinfo?(jamesrome)
Whiteboard: [regression:TB54?]
Don't remember. I gave it up when Lightning died.
Flags: needinfo?(jamesrome)
Similar report in bug 1343480 but it is version 45. And bug 1400568 version 52. But no one has ponied up with a regression range.

If you could retest - it would be useful to know whether it is more prone to happen with gmail, or happens only with gmail?

(bp-a454d2eb-107b-45cf-88de-d00c50170518 indicates you were at one time using 53 beta)
Well, after struggling to find provider for Google calendar (try Googling for the beta builds), I have 56.0 b4 running, and so far I can send mail. Why is Provider for Google Calendar called gdata provider? How is one supposed to find it? Change one name or the other.
56.0b4 just hung again sending gmail.
The latest hang was on 57.90b1. TB opens all of my hundreds of font files. Why???
57.0b1
(In reply to James Rome from comment #26)
> TB opens all of my hundreds of font files.
> Why???

Nothing to do with Thunderbird. That's the way mozilla Gecko handles Mac
That's bad. It might be running out of file handles.
Attached file TBHang.txt
TB57.0b2 hung again after sending mail via gmail.
The sample from today
Can you remove any font files from your system?
Do you have virtual folders that iterate over many other folders?
Flags: needinfo?(jamesrome)
No virtual folders. If I remove fonts, I can't get them back easily. It hung today without sending an e-mail.
But surely opening all the font files is a bug. It must slow things down and use more system resources.
Flags: needinfo?(jamesrome)
(In reply to James Rome from comment #33)
> No virtual folders. If I remove fonts, I can't get them back easily.

I suggesting only removing fonts that you do not use.

> But surely opening all the font files is a bug. 

> It must slow things down 

Actually, no.

> and use more system resources.

no, not memory, and not cpu, afaik. only open file handles.


You should do comment 18
> 
> You should do comment 18

... and does disabling spotlight help?
Flags: needinfo?(jamesrome)
How do I disable spotlight?
Flags: needinfo?(jamesrome)
So far, adding file handles fixes things. There is not need to open all my font files, and IMHO that is a bug.
(In reply to James Rome from comment #36)
> How do I disable spotlight?

bug 1343480 comment 1
Sorry, it hung again after sending a Gmail, and with Spotlight disabled, and with the file numbers boosted.
Attached file TBHang.txt.zip
This time it hung after sending Yahoo IMAP mail
Attached file TBhang12-16.txt.zip
The Apple report
This still happens daily on 58.0b3
Attached file tbhang.txt.zip
Every day it hangs many times. This is getting annoying.
Probably the best chance of finding the cause is for you to find the regression range using nightly builds.
For example starting with version 54 dmg nightly found at http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-01-03-02-08-comm-central/
54.0B3 seems to work without crashing.
55.0b2 dies
> 54.0B3 seems to work without crashing.
> 55.0b2 dies

That's a great start. So we need your help to determine the one day range in the 55.0a1 series.
http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/
http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/
See Also: → 1422251, 1400568
I need a non-daily build. When install lightning and I reboots, it updates to 5.8.
The whole way that TB and Lightning are stored needs fixing. The correct Lightning builds and calendar provider should be ibn the same directory as TB.  It is really hard to find what does with what.
There is no 55.0b1 to test that is not a nightly that self-updates. So the problem happened from 54 to 55
I believe that is when I reported it. But looking at the above, it seems that 54 also crashed.
I have sent in many hang reports. Don't they point to the issue?
> I have sent in many hang reports. Don't they point to the issue?

If we had someone to read it. Right now we don't. 

But you have started to narrow a regression range, which could ultimately be more useful.  Unfortunately one cannot get a reegression range without testing daily. 

> The whole way that TB and Lightning are stored needs fixing. The correct Lightning builds and calendar provider should be ibn the same directory as TB.  It is really hard to find what does with what.

Have you used dailies on a regular basis?  I never have a problem with lightning in daily.  You just install the daily and if lightning doesn't behave install the correct nightly.  All thunderbird daily installs WITHIN THE SAME VERSION from then on should just work.
(I failed to finish the thought)

You just install the Thunderbird daily and if lightning doesn't behave install the correct version lightning.  (Or remove the currently installed lightning and install thunderbird daily a second time)
I have been using the candidate builds because the dailies self-update instantly. The candidate builds always say Lightning is incompatible, and I must find the correct one. The lightning-TB version page does not list gdata provider\, so I must figure that out too. Yes, they are in the lightning directory, but I have downloaded every version, and there is no way to tell from the files which are correct. Also gdata provider throws all sorts of errors if it is the wrong version, but this is not detected in the install process. Unlike Lightning, gdata provider is not disabled.
It may be time for me to switch to outlook
> I have been using the candidate builds because the dailies self-update instantly

This is hardly a show stopper. One simply disables updates


(In reply to James Rome from comment #45)
> 54.0B3 seems to work without crashing.

Just backtracking a bit, you reported this issue on July 17.  It seems to me by then you would have run 54.0b1 and b2 without problems and first saw the problem using 54.0b3.  Is that correct?
User Story: (updated)
Summary: Thunderbird beta hangs frequently while sending mail → Thunderbird beta hangs frequently while sending imap mail
I would assume so, but it was a while ago.
(In reply to James Rome from comment #54)
> I would assume so [that 54.0b3 is the first that failed], but it was a while ago.

But that [if you were running 54.0b3 in comment 0] does not square with comment 45 where "54.0B3 seems to work without crashing."
Hopefully you are using https://releases.mozilla.org/pub/thunderbird/releases/

In any event, suggest your next step be setting the yahoo and gmail account to save sent messages to a local folder, so that we can determine whether this situation involves the imap sent folder.
Good suggestion. I changed the sent folder to local now on 58.b3. It always manages to send the mail, so you mught be correct in your hunch.
Also remember that there has been a long-standing bug about copying IMAP mail to sent folder. Maybe when that was fixed, it caused this issue.
I do believe you have pinned down the problem. Not had a hang since I made the sent folders local.I am running 58.0b3
According to what I have read, gmail automatically saves outgoing mail to gmail Sent folder. So there should be no need for Thunderbird to be set to save to Sent (even though it is default).  I don't know about yahoo.
But go back to that bug about copying sent mail to imap folder. It always sends the mail successfully, bug hangs after or during the next step. Still have not had a hang using local folders.
Sure, there is still a bug.  Which it why it is so important to us for you to determine a regression range.
Component: General → Networking: IMAP
Product: Thunderbird → MailNews Core
Summary: Thunderbird beta hangs frequently while sending imap mail → Hangs frequently while sending imap mail while copying message to Sent folder
Version: 54 Branch → 54
Blocks: 1402841
It has started to hang again on 58.0b3. Twice today after sending Google mail
Can you try a nightly build from http://archive.mozilla.org/pub/thunderbird/nightly/latest-comm-central/thunderbird-60.0a1.en-US.mac.dmg

If so, what are the results?
Flags: needinfo?(jamesrome)
It just hung again with 60 nightly. I did move the sent messages back to gmail from local.
Flags: needinfo?(jamesrome)
Something else is happening with TB 60 also. When it hangs, and I Force quit it, the app in my /Applications/Thunderbird Daily.app gets trashed. I cannot reopen it until I replace it with the version I downloaded. The same thing happens when the daily tries to update itself. TB never restarts, and I must update it manually.
And moving sent mail to local did not help the hangs.
(In reply to James Rome from comment #65)
>The same thing
> happens when the daily tries to update itself. TB never restarts, and I must
> update it manually.
I confirm this issue. Since a week it happens every time I want to update Daily via About Daily > Check for updates.
Perhaps the two of you can conspire to find the regression range.


(In reply to James Rome from comment #64)
> It just hung again with 60 nightly. I did move the sent messages back to
> gmail from local.

(previously sent in PM) to get the gdata and lightning addons to match the nightly being tested
https://ftp.mozilla.org/pub/calendar/lightning/nightly/latest-comm-central/
m_kato, can you make anything of the stacks here, or in bug 1422251 and bug 1400568 (which are both version 52)


I don't know that it is related, but for completeness, ref Bug 1170646 - A few M-C fixes to handle short read in Cache code ( from [META] Failure to deal with short read

There is also a newly reported Bug 1440716 - Hanging (imap and smtp) connections and hang on mac OS X
Flags: needinfo?(m_kato)
See Also: → 1444739
(In reply to Wayne Mery (:wsmwk) from comment #68)
> m_kato, can you make anything of the stacks here, or in bug 1422251 and bug
> 1400568 (which are both version 52)

Maybe, these are same deadlock by CGLClearDrawable.  I don't know why this occur.
Flags: needinfo?(m_kato)
FWIW bug 1369207 reads "On macOS installations where opening a window logs: |[GFX1-]: [OPENGL] Failed to init compositor with reason: FEATURE_FAILURE_OPENGL_CREATE_CONTEXT|, quickly closing the window can deadlock in CGLClearDrawable."
Summary: Hangs frequently while sending imap mail while copying message to Sent folder → Hangs frequently while sending imap mail while copying message to Sent folder on Mac. deadlock on CGLClearDrawable?
(In reply to Wayne Mery (:wsmwk) from comment #47)
> > 54.0B3 seems to work without crashing.
> > 55.0b2 dies
> 
> That's a great start. So we need your help to determine the one day range in
> the 55.0a1 series.
> http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-
> 29-comm-central/
> http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-
> 06-comm-central/

James,

We still need  better regression range than 
http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/
http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/

It seems I forgot to mention the regression tool https://mozilla.github.io/mozregression/quickstart.html
Flags: needinfo?(jamesrome)
Alas, the tool is for windows only. Because it is so difficult to know and get the correct version of lightning and google provider, I have given up on nightlies. They should be packaged together in the same download directory IMHO>
Flags: needinfo?(jamesrome)
I've been seeing this occasionally, in Daily, for quite some time, including today's build.

I can't easily give a regression window: sometimes it happens a few times a day, sometimes once a week.

abridged hang txt:

OS Version:      Mac OS X 10.13.6 (Build 17G65)
Architecture:    x86_64h

Path:            /Applications/Thunderbird Daily.app/Contents/MacOS/thunderbird
Identifier:      org.mozilla.thunderbird daily
Version:         63.0a1 (63.0a1)

Duration:        4.30s (process was unresponsive for 10 seconds before sampling)

Hardware model:  MacBookPro14,1
Active cpus:     4

Heaviest stack for the main thread of the target process:
  43  start + 1 (libdyld.dylib + 4117) [0x7fff791f6015]
  43  main + 890 (thunderbird + 4474) [0x10681617a]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 61943681) [0x10a9c8f81]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 61941996) [0x10a9c88ec]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 61939604) [0x10a9c7f94]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 61180073) [0x10a90e8a9]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 41233963) [0x109608e2b]
  43  -[NSApplication run] + 764 (AppKit + 223365) [0x7fff4e949885]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 41229596) [0x109607d1c]
  43  -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 3044 (AppKit + 8224308) [0x7fff4f0eae34]
  43  _DPSNextEvent + 2085 (AppKit + 268915) [0x7fff4e954a73]
  43  _BlockUntilNextEventMatchingListInModeWithFilter + 64 (HIToolbox + 194692) [0x7fff506a3884]
  43  ReceiveNextEventCommon + 613 (HIToolbox + 195334) [0x7fff506a3b06]
  43  RunCurrentEventLoopInMode + 286 (HIToolbox + 195990) [0x7fff506a3d96]
  43  CFRunLoopRunSpecific + 483 (CoreFoundation + 545107) [0x7fff513b9153]
  43  __CFRunLoopRun + 1293 (CoreFoundation + 547053) [0x7fff513b98ed]
  43  __CFRunLoopDoSources0 + 300 (CoreFoundation + 550092) [0x7fff513ba4cc]
  43  __CFRunLoopDoSource0 + 108 (CoreFoundation + 1430572) [0x7fff5149142c]
  43  __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 (CoreFoundation + 670225) [0x7fff513d7a11]
  43  __NSThreadPerformPerform + 334 (Foundation + 426677) [0x7fff534fd2b5]
  43  ??? (<A5CA6BA1-E112-3B19-B3AB-EB1683128CAE> + 41001366) [0x1095d0196]
  43  -[NSView removeFromSuperview] + 270 (AppKit + 157615) [0x7fff4e9397af]
  43  -[NSView _setWindow:] + 2356 (AppKit + 147783) [0x7fff4e937147]
  43  -[NSSurface setWindow:] + 53 (AppKit + 2268392) [0x7fff4eb3cce8]
  43  -[NSSurface _disposeSurface] + 152 (AppKit + 2269311) [0x7fff4eb3d07f]
  43  -[NSNotificationCenter postNotificationName:object:userInfo:] + 66 (Foundation + 26823) [0x7fff5349b8c7]
  43  _CFXNotificationPost + 599 (CoreFoundation + 358839) [0x7fff5138b9b7]
  43  -[_CFXNotificationRegistrar find:object:observer:enumerator:] + 1664 (CoreFoundation + 362624) [0x7fff5138c880]
  43  ___CFXNotificationPost_block_invoke + 225 (CoreFoundation + 633569) [0x7fff513ceae1]
  43  _CFXRegistrationPost + 458 (CoreFoundation + 634282) [0x7fff513cedaa]
  43  __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__ + 12 (CoreFoundation + 634588) [0x7fff513ceedc]
  43  CGLClearDrawable + 41 (OpenGL + 28651) [0x7fff5b8c1feb]
  43  _pthread_mutex_lock_slow + 253 (libsystem_pthread.dylib + 5320) [0x7fff7950c4c8]
  43  __psynch_mutexwait + 10 (libsystem_kernel.dylib + 117318) [0x7fff79346a46]
 *43  psynch_mtxcontinue + 0 (pthread + 31325) [0xffffff7f81967a5d] (blocked by pthread mutex owned by thunderbird (Thunderbird Daily) [3467] thread 0x20330)


Process:         thunderbird (Thunderbird Daily) [3467]
Path:            /Applications/Thunderbird Daily.app/Contents/MacOS/thunderbird
Architecture:    x86_64
Parent:          launchd [1]
UID:             501
Task size:       1273.60 MB
CPU Time:        0.055s (124.5M cycles, 25.3M instructions, 4.92c/i)
Note:            Unresponsive for 10 seconds before sampling
Note:            1 idle work queue thread omitted
> They should be packaged together in the same download directory IMHO

Should not be difficult - it should only change every 10 weeks when version numbers change. And if you have them bookmark, then you are good for when the version does change ...

Everyone in the other bugs seems to be gone for the summer :(  so I don't think we are going to make progress for some months.
Duplicate of this bug: 1400568
Glenn from bug 1400568 writes "It’s a lot less frequent than it was, although I’m not sure if I’ve just learned to work around it. It does still happen once a week or so, when it used to happen three or four times a day."

So for most of you it has stopped, or it is better in version 60?
Summary: Hangs frequently while sending imap mail while copying message to Sent folder on Mac. deadlock on CGLClearDrawable? → Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. No problem if Sent is set to local folder. deadlock on CGLClearDrawable?
It is less frequent, but it still does it.
Not directly linked to this bug but may be linked to it as the closest I found that was updated recently... 

Today on Win 10 Pro, I got a crash report bp-8017d419-a7d9-47ba-ba11-f9a3a0181002 Thunderbird 60.0b11 Crash Report [@ shutdownhang | ntdll.dll@0x6a28c ] following closure of Thunderbird because it hangs on closure following the fact that it got stuck in trying to save copy of sent message to Sent folder on server via IMAP. The progress bar reached 100% before it start to hang in never ending the processing...

Before closing TB, I tried moving in different folders (which in some case help sorted the issue by re-connecting/re-authenticating to the server as indicated in status bar), flush dns, put in offline mode and back in online mode (does that re-initialise socket/connection to server?), etc... but nothing worked... TB was stuck in processing save copy of sent message to Sent folder. I could still send another message but it same issue raise with the second message... Msg were sent and received by recipients but no copy could be kept in Sent nor in a local folder (would be nice to have such option if copy to Sent folder on server incomplete somehow to avoid loosing msg).

Only thing that could be done is to close Thunderbird that then crashed upon closing as it hanged on closing.

As indicated before somewhere in some posts, as I am VPN user that may cause IP of mail server to change, I am wondering if that may cause TB to not handle such situation somehow by not updating its dns/socket/cached server connection information and cause it to hang in some sort of loop, where the only way to come out of it is to close the application and re-open it but in the current case situation that means loosing data (copy of msg in Sent folder). Stopping or restarting VPN had no effect though.

I would expect that using the disconnect/reconnect button in Thunderbird shall suffice to pop out of such processing/loop/hang situation but it does not... unfortunately for the user... and data is lost :-)

Hope this information may help sort the issue in the future or narrow it down... so it can be fixed once and for all... Saving a copy of sent message in Sent folder is a basic feature that shall not fails or if it does that Thunderbird shall cleary indicate why it failed to do so by a clear error msg, and allow to retry or regain access to the message so it can be saved later somehow... while it has already been sent...
Another possibility I would thought about is that computer goes to sleep and when it wakes up something in Thunderbird is not waking up or is no longer valid or accurate causing issue to the save copy to sent folder when sending a message... but the issue not being systematic and happening randomly is hard to track down...
Similar bug have been reported:
Bug 413240 "Save to Sent was successful, but "Copying to sent folder" doesn't finish, or zombie "Copy complete""
My computer never goes to sleep, so it is not that.
I haven't looked at all the attachments to this bug but most seem to be apple logs. Maybe an tb imap log would also be helpful since the problem seems to be saving to the imap Sent mailbox.

If an error is reported while saving to Sent, the patch here should allow the user to choose to save to Local Folders:
Bug 1366591. This also applies to saving saving Drafts and Templates

But not sure this fix is in the versions being tested by the reporter(s) of this bug.
As the original author of this bug report, I wanted to chime in with a bit of information in response to recent comments.  First, I'm glad it's getting some continued attention.  It's the kind of bug where you lose work -- it's effectively a crash bug, since you have to force quit.

1. It has repeated consistently with POP as well as IMAP.
2. Possibly related to Sent folder, but upon restart, the copy is always safely in the Sent folder (seems more like it deadlocks on cleanup, closing the progress window, rather than the task itself)
3. Earlier in this bug are extensive stack traces that point to the exact pthread mutexes which deadlock -- that should help to know where to look.
4. It seems like a race condition around mutexes -- always hard to find/fix.
https://support.mozilla.org/en-US/questions/1237929 states the cause was a signature file.  

I wonder if that is the case with some other reports
User Story: (updated)
Depends on: 1398807

It is still doing this on 65.0b4

Duplicate of this bug: 1531771

This bug report is of course reported long before TB60, but bug 1525001 comment 4 states using beta resolves the issue he reported against TB60.

Can you try the newer beta 66 from https://www.thunderbird.net/en-US/channel/ and report your results?
(fair warning, most addons won't work - calendar will)

(In reply to Wayne Mery (:wsmwk) from comment #92)

Eckard, feel free to try as well.

Note: You'll need to get it from https://www.thunderbird.net/en-US/channel/ AND it might start a new profile

Your link leads to the actual 67.0b1 beta-version.
Should I try the 66.0b3 beta-version from http://ftp.mozilla.org/pub/thunderbird/releases/66.0b3/mac/ or 67.0b1 in a new profile?

Eckard, I suspect both 66 and 67 should be tested. Ultimately, for whomever can reproduce the the failure, we need to know the range in which it was fixed (if it was) so we can know what needs to be uplifted to esr.

There is a link for the Google Calendar on the Release Notes pages:
https://www.thunderbird.net/en-US/thunderbird/67.0beta/releasenotes/
https://www.thunderbird.net/en-US/thunderbird/66.0beta/releasenotes/
Second last item. Right-click to save the XPI file locally, then install add-on manually from file.

67b1 is working with gdata provider. But the crashes have grown more infrequent, so it will take time to know if the problem is fixed.

It is not fixed. 67b1 just hung after I sent gmail.

(In reply to Wayne Mery (:wsmwk) from comment #94)

I cannot reproduce the issue in new profiles neither with TB 66.0b3 nor with TB 67.0b1 (two different IMAP accounts with the French ISP Free, no GMail nor Yahoo accounts).

Are any more crash reports needed? I have several from my machine (with truncated traces), and one non-truncated from a coworkers machine. Both are modern iMacs - one is almost brand new, with an old thunder bird profile, while the other is an older machine with a a very new thunderbird profile.

Both report the same crash on CGLClearDrawable + 44. Most of the users in my office (all mac based) have been suffering from this bug for the last 2-4+ months and it seems to going up in frequency. Were eager to see a fix.

I can offer up some of my time for testing purposes too.

Thanks,

(In reply to James Rome from comment #98)

It is not fixed. 67b1 just hung after I sent gmail.

What happens if you point Sent folder to a local folder in account settings?

User Story: (updated)
Flags: needinfo?(jamesrome)
See Also: → 1551317

(In reply to Scott from comment #100)

...
Both report the same crash on CGLClearDrawable + 44. Most of the users in my office (all mac based) have been suffering ...
I can offer up some of my time for testing purposes too.

Great! Let's start at the same point as others:

Flags: needinfo?(sjames)

I pointed to local now. We shall see.

Flags: needinfo?(jamesrome)

https://support.mozilla.org/en-US/questions/1242885#answer-1179483 suggests disabling send progress helps, which might implicate graphics

User Story: (updated)
Flags: needinfo?(mozilla)

(In reply to Wayne from comment #102)

Hi Wayne,

#1 Most of my users do not save Sent mail at all, so I think this can be ruled out. We use O365, which rolled out automatic filing of sent emails to "Sent Items" folder. We had to disable Thunderbird saving an extra copy - to avoid duplicates - in Fall 2018 (around Sept 17th is when it started)

#2 I installed the beta version yesterday, I haven't had any hangs yet, but I haven't had to send many emails. In wait and see mode currently.

#3 I will look into this setting and report back.

Flags: needinfo?(sjames)

(In reply to Wayne from comment #102)

Further to #3:

My own preferences are set somewhat correctly... with sendInBackground set to False, and show_send_progress set to True, but offline.send.unsent_messages is set to 0 not 1 (as recommended in the other bug thread. However the reporter in that thread indicated he used the opposite settings and made his highly reproducible crash go away.

I have checked my own machine (both original v60 and beta) and both have matching settings. I also checked with one of my colleagues who is one of the only people in my office who does not experience the problem and he has the same settings as myself.

I will try toggling the settings once I experience another crash.

Thanks!

(In reply to Wayne from comment #102)

Beta Build 67.0b3 (64 Bit) just crashed for me.

It was running a fresh profile, imap with no local folders, it was saving to a imap sent folder (as I hadn't thought to disable that, it is now disabled).

I have an un-truncated crash log for it too, but it reports the same CGLClearDrawable error (+41 instead of +44).

Excerpt:

55 CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER + 12 (CoreFoundation + 634300) [0x7fff3f561dbc]
55 CGLClearDrawable + 41 (OpenGL + 28651) [0x7fff49a56feb]
55 _pthread_mutex_lock_slow + 253 (libsystem_pthread.dylib + 5320) [0x7fff677924c8]
55 __psynch_mutexwait + 10 (libsystem_kernel.dylib + 117318) [0x7fff675cca46]
*55 psynch_mtxcontinue + 0 (pthread + 31325) [0xffffff7f82a10a5d] (blocked by pthread mutex owned by thunderbird (Thunderbird) [416] thread 0x189b)

It hung again sending IMAP mail, and Sent mail is a local folder.

It has hung twice today. But I have put sent mail back to Google.

Just happened to me in TB 67.0b2 (64-bit) while sending message, the save to Sent folder on IMAP server while online ok, progressed up to 89% and then got stuck somehow... see attached... I left it as is but after few hours no progress, TB still in processing status :-)

Those are errors appearing in console fyi... if that can be of any help...

NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIAbDirectoryQuery.doQuery]
nsAbLDAPAutoCompleteSearch.js:261
TypeError: complistItem.occurrence is undefined
agenda-listbox.js:1041:13
TypeError: this.mItemInfoCache[aNewItem.id] is undefined
calDavCalendar.js:758:28
NS_ERROR_XPC_JAVASCRIPT_ERROR_WITH_DETAILS: [JavaScript Error: "this.mItemInfoCache[aNewItem.id] is undefined" {file: "jar:file:///C:/Users/richard/AppData/Roaming/Thunderbird/Profiles/cnant748.default/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D.xpi!/components/calDavCalendar.js" line: 758}]'[JavaScript Error: "this.mItemInfoCache[aNewItem.id] is undefined" {file: "jar:file:///C:/Users/richard/AppData/Roaming/Thunderbird/Profiles/cnant748.default/extensions/%7Be2fda1a4-762b-4020-b5ad-a41df1933103%7D.xpi!/components/calDavCalendar.js" line: 758}]' when calling method: [calICalendar::modifyItem] calAlarmService.js:174
Assert failed: aElement
calUtils.jsm:147
ASSERT resource://calendar/modules/calUtils.jsm:147
setElementValue chrome://calendar/content/calendar-ui-utils.js:32
setBooleanAttribute chrome://calendar/content/calendar-ui-utils.js:91
enableTimeIndicator chrome://calendar/content/calendar-multiday-view.xml:2573
view_XBL_Constructor chrome://calendar/content/calendar-multiday-view.xml:2534
Could not start Browser Toolbox, you need to enable it. ToolboxProcess.jsm:77:13
init resource://devtools/client/framework/ToolboxProcess.jsm:77
oncommand chrome://messenger/content/messenger.xul:1
TypeError: this.gViewSourceUtils is undefined
webconsole.js:168:5

Clicking on red cross icon to close the previous prompt, raise a new prompt Save message > Your message was sent but a copy...etc... as per attached...

(In reply to Richard in comment #110)

That console report references an error occurring with your calendar extension (caldavCalendar.js). Does the crash log also report a "CGLClearDrawable" and/or Mutex error, if not - this may be distinct.

Wayne perhaps can provide more guidance.

TB do not crash in my case, just hand in processing mode... (as per processing icon appearing on the main tab blue circle)... also worth mentioning I am using TB on Windows... sorry only just recall this bug was referring to macos...

(In reply to Richard in commender #113)

The bug is affecting both, but seems more prevalent in MacOS. Hang is perhaps the correct term, the app gets stuck, and doesn't continue during (?) or after the moment the sent email progress bar closes. In my experience Thunderbird stays open and does not close, but has hung/is no longer responding to user input. You have to do the equivalent of a force quit (or control alt delete) and manually force it to close, and then generate a bug report.

I switched back to use the release version this morning (as it has my local email folders and message filters). I will keep an eye on it, but I feel like its crashing more frequently than the beta does. The beta doesn't remove the problem (as myself and others have pointed out though).

It could also just be the transient nature of the bug, some days it happens more than others.

Anyone tried 68 beta?

Blocks: 1561990
Duplicate of this bug: 1561990

In Reply to Wayne in comment #116

Upgraded today and got a crash a little while ago. No log was generated, but I would assume it is the same crash. Will confirm as soon as I see a log.

Are we discussing more than one issue here? When I see the problem I have, which happens about once per day (for months, and still, using current Daily), on MacOS, I get the spinning beachball, and I need to Force Quit. There’s no possibility to get a log here, is there?

Are people talking about getting logs, and seeing CGLClearDrawable, seeing a different issue than me? I am using IMAP Sent folder, and I only see it immediately after a successful mail send. But I send many emails per day, yet it only hangs once or twice per day.

(In reply to Calum Mackay from comment #119)
It could be, probably is?

I am getting logs intermittently now.

The crash always occurs during the sending process. Usually - if not always, when I have moved on to do something else whilst the email completes. Thunderbird will hang (with the beach ball) and needs to be force quit. Every log I have examined (this includes multiple workstations in our office) includes the CGLClearDrawable handle.

In my experience, your usage of a sent folder doesn't affect the crash as Thunderbird in our environment (O365) does not even save its own sent messages and still crashes. Crashes seem less frequent in the Beta releases, but still happen.

Some of my users have started (temporarily) using Apple Mail again to avoid incessant crashes.

(In reply to Scott from comment #120)

thanks Scott; how are you getting logs following the force quit?

(In reply to Calum Mackay from comment #121)

(In reply to Scott from comment #120)

thanks Scott; how are you getting logs following the force quit?

Good question. That explains why it doesn't give me a log every time, but it does intermittently. I assumed it was normal behavior.

(In reply to Scott from comment #122)

thanks Scott; how are you getting logs following the force quit?

Good question. That explains why it doesn't give me a log every time, but it does intermittently. I assumed it was normal behavior.

thanks. I never get logs when mine does this. Or perhaps I'm just not leaving it long enough before doing the Force Quit.

(In reply to Calum Mackay from comment #123)

thanks. I never get logs when mine does this. Or perhaps I'm just not leaving it long enough before doing the Force Quit.

I have gotten them in both situations. If its been too long since the crash the log is truncated and not very useful. If its more recent it will have more information on the thread that crashed.

I am regularly experiencing the same issue: Thunderbird hangs after sending an email with the spinning beach ball mouse cursor. I'm on MacOS 10.13.6. Sometimes, but not always, I receive a Apple bug report dialog along with a crash log after Force-Quitting. I'll attach my crash log to the ticket.

(In reply to Edmond from comment #125)

I am regularly experiencing the same issue: Thunderbird hangs after sending an email with the spinning beach ball mouse cursor. I'm on MacOS 10.13.6. Sometimes, but not always, I receive a Apple bug report dialog along with a crash log after Force-Quitting. I'll attach my crash log to the ticket.

Looks like the same bug to me - based on your crash report. Welcome :)

We may need a Mac-expert for this bug, and bug 1398807, and bug 1422251.

Also, anyone with a 4K or 5K monitor?

Flags: needinfo?(acdp)

(In reply to Calum Mackay from comment #119)

Are we discussing more than one issue here?

Yes, very possible (even probable) people are seeing more than one issue - even more so because the problem comments span two years, and multiple versions.

(In reply to James Rome from comment #74)

Alas, the [mozregression] tool is for windows only.

No, because there is a command line tool for Mac. So if this is a regression in the version 54 time frame and someone can reliably reproduce this, then the regression range should be easy to get.

User Story: (updated)

See also Bug 1334549... which looks like a similar issue... if that can help...

And also FYI Bug 1257235...

(In reply to Wayne Mery (:wsmwk) from comment #128)

We may need a Mac-expert for this bug, and bug 1398807, and bug 1422251.

Also, anyone with a 4K or 5K monitor?

What constitutes a Mac-expert?

I am running a Mac, w/ a built-in 5K display, and experiencing the bug.

Someone who can debug it and write a patch to fix it, I suppose.

Would recording Performance via DevTools (Ctrl+i on Windows to open it) help to identify the issue?

Any end-user encountering the issue on Mac may be able to do it:

  • prepare msg ready to send
  • open DevTools (you can keep it open in parallell of TB)
  • go to Performance tab (if missing make sure Petformance option ticked in DevTools settings)
  • start recording Performance
  • send msg
  • if msg send ok, stop recording Performance, and delete profile created. Repeat process above till a msg fails to send.
    OR
  • if msg not sending properly, wait a bit, then stop recording Performance. Then save petformance profile recorded and post here.

Maybe that can help identify what TB is doing when sending but not completing process somehow...

Would that help?

(In reply to Scott from comment #132)

(In reply to Wayne Mery (:wsmwk) from comment #128)

We may need a Mac-expert for this bug, and bug 1398807, and bug 1422251.

Also, anyone with a 4K or 5K monitor?

What constitutes a Mac-expert?

The starting point would be someone who can specifically identify the steps to reproduce, or even better get the regression range.
https://mozilla.github.io/mozregression/quickstart.html and for Mac use command line.

I am running a Mac, w/ a built-in 5K display, and experiencing the bug.
If it were an external display I would ask whether changing the display to 4k or non-4k eliminates the problem. Which brings us back to regression range.

(In reply to Wayne Mery (:wsmwk) from comment #135)

The starting point would be someone who can specifically identify the steps to reproduce, or even better get the regression range.
https://mozilla.github.io/mozregression/quickstart.html and for Mac use command line.

What range do you want to be looked at?

Note that I originally filed bug 1400568 two years ago (https://bugzilla.mozilla.org/show_bug.cgi?id=1400568) which, in the followup comments, showed a mutex lock in CGLClearDrawable. It was not on a 5k monitor or even a 4k monitor. Recent comments show that it's still exactly the same bug.

What is needed is not so much a Mac expert, but a thread/mutex expert. These are hard bugs to find/fix, but not impossible. Usually it involves reading the code carefully around the mutexes, looking for race conditions and false assumptions.

I suspect that there is an assumption lurking in the mutex code, some intermediate operation (displaying the progress bar, most likely) that takes a variable amount of time, and sometimes the race condition causes the mutex never to unlock. I would try commenting out the progress window completely (who needs 'em anyway?).

After two years of flailing around in Bugzilla, I have lost interest in this bug, unfortunately, but I hope somebody fixes it.

(In reply to Richard Leger from comment #136)

...
What range do you want to be looked at?

Try July 1 2017 to August 10 2017. Hard to be more exacting - and depends on whether this is reproducicble in nightlies. If doesn't fail in July 1, try a month or two earlier.

User Story: (updated)
Flags: needinfo?(richard.leger)
Summary: Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. No problem if Sent is set to local folder. deadlock on CGLClearDrawable? → Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. - displaying the progress bar. No problem if Sent is set to local folder. Deadlock on CGLClearDrawable.

(In reply to Glenn Reid from comment #137)

originally filed bug 1400568 two years ago [against 52.3.0 2017-09-16] which ... showed a mutex lock in CGLClearDrawable. It was not on a 5k monitor or even a 4k monitor.

Good point. It may not be a trigger, or requirement, for most.

Duplicate of this bug: 1551317

There is an interesting development in bug 1422251 comment 18 - the reporter stopped having trouble after moving to beta 67. If that holds for others then version 68 should work better.

(In reply to Wayne Mery (:wsmwk) from comment #138)

(In reply to Richard Leger from comment #136)

...
What range do you want to be looked at?

Try July 1 2017 to August 10 2017. Hard to be more exacting - and depends on whether this is reproducicble in nightlies. If doesn't fail in July 1, try a month or two earlier.

Actually, I forgot that we had earlier determined a probable range of
http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/
http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/
base, iirc, on James' report of seeing this in beta 54

I don't see any Thunderbird debug symbols in the various apple log dumps. Seeing TB functions show up in those logs might give some more clues? I'm not sure which builds (if any) have debug symbols in them - nightly, maybe?
I've no idea if the apple log will show up TB symbols even if they're there, but it should do.

Other random thoughts:

A GL mutex lock kind of implies that GUI code is being called from multiple threads, but I was under the impression that all the GUI code in TB was driven via main-thread-only javascript. And that has warning lights all over it which go off if it's called from the wrong thread.

But yeah, ultimately this kind of bug really needs someone running a debug build on a mac, catching it a debugger, and poking about to see what's locked up.

(In reply to Wayne Mery (:wsmwk) from comment #139)

(In reply to Glenn Reid from comment #137)

originally filed bug 1400568 two years ago [against 52.3.0 2017-09-16] which ... showed a mutex lock in CGLClearDrawable. It was not on a 5k monitor or even a 4k monitor.

Good point. It may not be a trigger, or requirement, for most.

The display resolution was speculation from me, trying to explain why it was affecting more macs than windows machines. They commonly have higher resolution screens and less powerful graphics cards leading to - presumably lower refresh rates - which is what I was trying to get at with the suggestion :)

Someone who better understands the mutex function may be able to simply rules this out though!

(In reply to Wayne Mery (:wsmwk) from comment #141)

There is an interesting development in bug 1422251 comment 18 - the reporter stopped having trouble after moving to beta 67. If that holds for others then version 68 should work better.

Still occurs in 69.0b3.

As a test I manually sent about 55 email at random times over a period of several hours and haven't seen a problem. This is with macbook air and tb version 60.8.0. I was sending from a non-gmail account to a gmail account. Save to Sent on non-gmail worked fine with no lock-ups. The messages were not huge and mostly just old emails archived from a mailing list. Display on mbAir (running mavericks 10.9.5) not very high res, 1280x800.
Does anyone cc'd on this bug see the problem on mbAir?

Duplicate of this bug: 1575568

Another clue. I just sent an e-mail, and it hung while downloading a message--presumably the copy of the message I sent. See attached image.

(In reply to James Rome from comment #148)

Another clue. I just sent an e-mail, and it hung while downloading a message--presumably the copy of the message I sent. See attached image.

Any evidence this is the same crash (crash log showing same types of mutex locks/clear drawable issues)... I have personally experienced the crash probably close to 50 times now and not seen an issue downloading messages.

Duplicate of this bug: 1573497
Duplicate of this bug: 1527965
Duplicate of this bug: 1525001
See Also: 1422251
Duplicate of this bug: 1422251
Duplicate of this bug: 1547339
Duplicate of this bug: 1578784

fyi, in bug#1547339 before it was closed-as-DUP, I reported hitting this problem when running Thunderbird 60.8.0 on Mac OSX 10.14.6 (and list of other previous versions). If it helps, I had attached crash-dumps in bug#1547339 over the last few months.

(aside: Thanks to :wsmwk for connecting these two tickets)

Flags: needinfo?(richard.leger)
Flags: needinfo?(acdp)
Whiteboard: [regression:TB54?] → [regression:TB54?][duptome]

Thanks to everyone who contributed their system configs, noted in user story

User Story: (updated)

For what it's worth, the problem I was seeing seems recently (within the last month or so) to have stopped.

For a long time (a year or more), a few times a week, TB would stop responding, immediately after sending a message (with IMAP Sent folder). I had to Force Quit, and never once got a stack trace or report. It belatedly occurs to me that I've not seen this happen for several weeks now. I'd been away for a couple of weeks holiday; having now been back a few weeks, I don't think it's happened since before my holiday, so that's at least a month.

I'm running Daily builds, updated daily (or when they change), on MacOS 10.14.6 (currently).

I appreciate this doesn't help much; just as a data point.

(In reply to Wayne Mery (:wsmwk) from comment #142)

(In reply to Wayne Mery (:wsmwk) from comment #138)

(In reply to Richard Leger from comment #136)

...
What range do you want to be looked at?

Try July 1 2017 to August 10 2017. Hard to be more exacting - and depends on whether this is reproducicble in nightlies. If doesn't fail in July 1, try a month or two earlier.

Actually, I forgot that we had earlier determined a probable range of
http://archive.mozilla.org/pub/thunderbird/nightly/2017/03/2017-03-08-03-02-29-comm-central/
http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-12-03-02-06-comm-central/
base, iirc, on James' report of seeing this in beta 54

As suggested, I run bisection 2017-03-08 till 2017-06-13 with MozRegression on Windows 10 pro x64 with Thunderbird 32-bits...

Bisection setup

  • New profile (re-used)
  • IMAP/SMTP mailbox setup - keep message - sync most recent 30 days
  • Use each TB version for a full day before moving on to the next...

Results (see also attached):

Start 55.0a1 (2017-04-26) - good
55.0a1 (2017-05-20) - good
55.0a1 (2017-06-01) - good
55.0a1 (2017-06-07) - good
55.0a1 (2017-06-10) - bad (one time) - RB could not save a copy of sent email to Sent folder on the server, stuck in processing, disconnecting network on computer and reconnecting cause a TB prompt to appear asking to Retry, after which it worked.
55.0a1 (2017-06-09) - good for sending items - but bad (one time) for "Copying message to Draft folder..." When trying to access mail folders while that happen the popup error message "Could not connect to mail server xxx. Connection was refused."... could have been issue at server side for this one (temporarily disconnected due to kernel update)

At the end there were a message saying it did not have enough info to establish a bisection of code or something like that... not sure what it means...

Bisection Information for the bad one:

app_name: thunderbird
build_date: 2017-06-10
build_file: C:\Users\richard.mozilla\mozregression\persist\2017-06-10--comm-central--thunderbird-55.0a1.en-US.win32.zip
build_type: nightly
build_url: https://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-10-03-02-05-comm-central/thunderbird-55.0a1.en-US.win32.zip
changeset: 3e3745b52dc53eb74efd73d3107a81e2e13f94be
pushlog_url: https://hg.mozilla.org/comm-central/pushloghtml?fromchange=e915a8b2f1f505d7c4fa1820f35ce99d1164b293&tochange=3e3745b52dc53eb74efd73d3107a81e2e13f94be
repo_name: comm-central
repo_url: https://hg.mozilla.org/comm-central

Could it be that Thunderbird has an issue to connect to the IMAP server to file copy of message and it does not retries properly or identify the issue to prompt user to retry?

We use a dedicated CA cert for SSL validation which is added to Thunderbird Certificates storage prior using the account...

I have just had another hang. The email has been copied to the (IMAP) Sent folder, so the hang must have appeared afterwards.

I’m on the beta update channel and running 70.0b1.

All:

I've been following this thread with interest.  TB is one of the main tools I use in my business, and these frequent hangs get in the way.  So to all of you: THANK YOU for thinking about this.

A few minutes ago, TB hung again.  After issuing a Force Quit, I captured the data that normally goes to Apple.  It may be huge, but it may also shed some light.  Here we go with an edit/paste.

Sincerely,
Bob

I hate to throw a wrench into this effort, but I moved sent mail to a local folder, and TB still hangs. Maybe not so often though...

User Story: (updated)

This specific signature seems to be missing in the report so far.

This is Thunderbird 60.9.0 (64-bit) on macOS High Sierra (10.13.6). I'm using IMAP4 to gmail, and a gmail SMTP server. This also applies to some recent Thunderbird versions.

After sending an email, Thunderbird sometimes stalls, requiring ForceQuit and restart. In all cases, the message gets sent, and gets copied into my "Sent Mail" folder. After the restart all is well (until the next time).

This is 100% reproducible: The stall happens ONLY if I click on the main window before the message window disappears, or within a few seconds. If I remember to wait at least 5 seconds after the send window disappears, it does not stall and all is well.

This has been happening for several months, over several Thunderbird updates. IIRC it was worse on earlier versions, but I have modified my behavior to minimize it.

I believe that this also applies to an NNTP send to giganews and IMAP4 save to gmail.
I also believe that it is OK to click on any other application's window without stalling Thunderbird.
I don't have good statistics on these, however.

Richard, thank you for your scientific study. And signficant that it correlates to Jame's earlier findings. If accurate, others should find that https://ftp.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-10-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg shows the problem.

Can we determine the cause from https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10 (blissfully small list) ?
Bug 1364977 ?

Flags: needinfo?(jorgk)

Hmm, we've got two slightly different ranges here:
pushlog_url: https://hg.mozilla.org/comm-central/pushloghtml?fromchange=e915a8b2f1f505d7c4fa1820f35ce99d1164b293&tochange=3e3745b52dc53eb74efd73d3107a81e2e13f94be
from comment #160 and
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10
from comment #166.

If I read the bug summary, it talks about an IMAP issue when copying the sent message to the Sent folder. We also read:
No problem if Sent is set to local folder. Deadlock on CGLClearDrawable.

So it's possible that the regression range is correct, but if really CGLClearDrawable is a problem, then we shouldn't look for the issue in C-C. The equivalent date range based on dates is:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10

Looking at bug 1364977 we see that for TB it added the removal of a command observer for housekeeping purposes. We note that this observer is also used in editor.js
https://searchfox.org/comm-central/search?q=obs_documentCreated&case=false&regexp=false&path=editor.js
where it's not removed.

I doubt that this change caused any problems, but I'm happy to provide a try build with that change removed. I guess you want a build for Mac, so would you like this to be based on TB 68 ESR, TB 70 beta or trunk? Note that the code added in bug 1364977 runs when the compose window closes, so it's not entirely impossible that that has some bad effect onto other things happening at the same time.

Flags: needinfo?(jorgk)

(In reply to Jorg K (GMT+2) from comment #167)

Hmm, we've got two slightly different ranges here:
pushlog_url: https://hg.mozilla.org/comm-central/pushloghtml?fromchange=e915a8b2f1f505d7c4fa1820f35ce99d1164b293&tochange=3e3745b52dc53eb74efd73d3107a81e2e13f94be
from comment #160 and
https://hg.mozilla.org/comm-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10
from comment #166.

The range in comment #166 is approximated from comment 160 so it could be applied to mozilla-central.

If I read the bug summary, it talks about an IMAP issue when copying the sent message to the Sent folder. We also read:
No problem if Sent is set to local folder. Deadlock on CGLClearDrawable.

So it's possible that the regression range is correct, but if really CGLClearDrawable is a problem, then we shouldn't look for the issue in C-C. The equivalent date range based on dates is:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10

I too suspect the issue will be in M-C. But a regression range of three days may be challenging to get a hit. Even a one day range is challenging.

Richard (or anyone else??), can you run regression tests to see if you can narrow 2017-06-07 thru 2017-06-10 to a one day range?

... I'm happy to provide a try build with that change removed.

Let's see first if Richard has positive results

Flags: needinfo?(richard.leger)

Attached is crash while sending email today using TB 60.9.0 on OSX 10.14.6.

I just noticed that I am on TB 60.9.0 (64-bit). While the about panel claims I am on "the release update channel", I see on https://www.thunderbird.net/en-US/thunderbird/releases/ that there are newer 68.1.x releases available for manual download, even though my TB installation does not see them. Will investigate here, but flagging in case that helps with narrowing the debugging range.

Duplicate of this bug: 1588772

(In reply to John O'Duinn [:joduinn] (please use "needinfo?" flag) from comment #169)

...
I just noticed that I am on TB 60.9.0 (64-bit). While the about panel claims I am on "the release update channel", I see on https://www.thunderbird.net/en-US/thunderbird/releases/ that there are newer 68.1.x releases available for manual download, even though my TB installation does not see them.

That is because update to version 68 from version 60 are not currently enabled. But that wouldn't help you with this issue anyway.

Still no reply re. comment #167: Do you want a try build? For Mac? Based on which version of TB?

I'm using 68.1.2 and mine crashes multiple times daily, thread dump includes the CGLClearDrawable call (and a few calls beneath that).

My messages ALWAYS send successfully, and the "Sent" message copy is also saved to the server correctly. So no data is lost for me. Seems to be related to mouse activity connected with the main window while sending a message -- not just clicking. I do a lot of scrolling with the touchpad, and I often don't want for a message window to close before moving-on with my life, so I'm surely generating mouse-events on the main window during and after the message window closes.

I have only Lightning and Enigmail extensions installed. No funny business with custom x509 certificates.

I'm happy to try previous versions to see what happens. I'm also happy to try a custom build to test out a fix. I'll even run it in a debugger if that will help in any way. I'm desperate to get this resolved, but I'm not sure how best to help.

(In reply to Jorg K (GMT+2) from comment #167)

....
I doubt that this change caused any problems, but I'm happy to provide a try build with that change removed. I guess you want a build for Mac, so would you like this to be based on TB 68 ESR, TB 70 beta or trunk? Note that the code added in bug 1364977 runs when the compose window closes, so it's not entirely impossible that that has some bad effect onto other things happening at the same time.

Most reporters are using esr, so 68 ESR please. And if that doesn't work out then Beta 70.

Flags: needinfo?(jamesrome)

(previously wrong person)

Flags: needinfo?(jamesrome) → needinfo?(jorgk)

Mac try build started:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=189dafbb360eab163171568640014c3829d1f4dd
Since I've been doing merges and uplifts yesterday, this will be a TB 68.2.0 ESR pre-release. I'll paste the path to the binary here later.

What is the google provider that goes with this, and where do we get it?

The one from ATN for TB 68, it's just a regular add-on.

Sorry, this test build hung already.

(In reply to James Rome from comment #180)

Sorry, this test build hung already.

I had one too. Looked eerily similar but no crash report unfortunately.

(In reply to Wayne Mery (:wsmwk) from comment #168)

(In reply to Jorg K (GMT+2) from comment #167)

So it's possible that the regression range is correct, but if really CGLClearDrawable is a problem, then we shouldn't look for the issue in C-C. The equivalent date range based on dates is:
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2017-06-07&enddate=2017-06-10

I too suspect the issue will be in M-C. But a regression range of three days may be challenging to get a hit. Even a one day range is challenging.

Richard (or anyone else??), can you [re]run regression tests to see if you can narrow 2017-06-07 thru 2017-06-10 to a one day range?

The builds that need to be tested by those who have been able to reliably reproduce are:

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg
  2. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-08-03-02-07-comm-central/thunderbird-55.0a1.en-US.mac.dmg
  3. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-09-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg
  4. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-10-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

#1 presumably works and #4 is presumably fails. So #2 and #3 should be tested first.
Backup your profile before testing.

Flags: needinfo?(sjames)
Flags: needinfo?(jamesrome)
Flags: needinfo?(chris)

Sorry, this test build hung already.

Sorry, I suggested that this wouldn't help, see comment #167. The issue that only Mac users are experiencing is clearly in some low-level Mac graphics library, see "Deadlock on CGLClearDrawable" in the summary of this bug.

So, Wayne and Jorg, what's the best way to make progress, here? Try the builds referenced by Wayne in comment #182, or try Jorg's build referenced in comment #177? I can probably try them all. While I can't reliably make it crash, it happens so often that if I went a whole day without crashing, I'd consider the bug "not present in that build".

Will running any of these builds damage my tb user profile? I'm currently running 68.1.2. I use IMAP for everything so I'm not worried about the messages themselves; mostly just the setting and all that. I can re-build if necessary but would prefer to avoid it if possible.

As we heard (and expected), the build from comment #177 isn't any good.

Apparently the problem started to occur between the 2017-06-07 and the 2017-06-10. So in comment #182 Wayne suggests to try the builds of 7th, 8th, 9th and 10th of June 2017.

Running "old" builds, in this case TB 55 on a fresh profile can lead to malfunctions if the profile has already been upgraded to a newer version. I don't think it will cause damage or changes to the profile, but Wayne suggested to do a backup just in case.

Personally I think it's important to also try the 7th and 10th to double/triple check that the issue really stated there.

(In reply to Jorg K (GMT+2) from comment #167)

If I read the bug summary, it talks about an IMAP issue when copying the sent message to the Sent folder. We also read:
No problem if Sent is set to local folder. Deadlock on CGLClearDrawable.

To Clarify - the bug definitely occurs regardless of sent folder location. Local folder, No sent folder at all, or online folder. I do not even have TB configured to save sent messages - O365 creates duplicates on the back end if I do; and experience the crash daily.

Flags: needinfo?(sjames)

Thanks for the addition info.

I look forward to everyone's test results of comment 182 in the next few days so we can finally nail this bugger.

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I backed-up my profile and launched this version with no other modifications to my profile. I'm having trouble writing my first email... I'm getting the color-wheel about once a second for two seconds. Mouse clicks are ignored. Keyboard is ignored.

(In reply to Christopher Schultz from comment #188)

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I backed-up my profile and launched this version with no other modifications to my profile. I'm having trouble writing my first email... I'm getting the color-wheel about once a second for two seconds. Mouse clicks are ignored. Keyboard is ignored.

This finally cleared-up. I've had other bouts of the color-wheel appearing, but they all eventually do clear-up and I'm able to continue my work. Still testing...

FYI, running bisection 2017-06-06 to 2017-06-11 on Windows 10 with TB 32 bits keep you posted... I'll remove the need info flag when I'll publish the result... as it is reduced period and number of version I plan to use each version for a few days in a row... to maximise chances to get the issue... as said in my previous test result, issue was linked to a temporary lost of connection to the IMAP server during the sending (server was rebooting)... with TB unable to resume task while server was back and running... I don't know if that help... as info...

Duplicate of this bug: 1590511

(Moving back to Thunderbird where it's more likely for users to find it.)

Component: Networking: IMAP → Message Compose Window
Product: MailNews Core → Thunderbird
Summary: Hangs frequently while sending imap mail while copying message to imap Sent folder on Mac. - displaying the progress bar. No problem if Sent is set to local folder. Deadlock on CGLClearDrawable. → Hangs sending imap mail while copying message to imap Sent folder on Mac while displaying the progress bar. Deadlock in graphics on CGLClearDrawable.
Whiteboard: [regression:TB54?][duptome] → [regression:TB54?][duptome][workaound: comment 104]
Version: 54 → 54 Branch

(In reply to Wayne Mery (:wsmwk) from comment #182)

The builds that need to be tested by those who have been able to reliably reproduce are:

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I believe I have gotten this one to lock-up. ActivityMonitor.app says "Not Responding" and there's no CPU usage. I'm going to wait a good long time before killing it and hopefully I'll be able to get a thread dump.

But this was the "suspected good build," so maybe we have to cast a wider net.

Flags: needinfo?(chris)

(In reply to Christopher Schultz from comment #193)

(In reply to Wayne Mery (:wsmwk) from comment #182)

The builds that need to be tested by those who have been able to reliably reproduce are:

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I believe I have gotten this one to lock-up. ActivityMonitor.app says "Not Responding" and there's no CPU usage. I'm going to wait a good long time before killing it and hopefully I'll be able to get a thread dump.

But this was the "suspected good build," so maybe we have to cast a wider net.

Yep, deadlocked at the same place:

[...]
11 CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER + 12 (CoreFoundation + 646038) [0x7fff5133eb96] 1-11
11 CGLClearDrawable + 44 (OpenGL + 28513) [0x7fff5af37f61] 1-11
11 _pthread_mutex_firstfit_lock_slow + 222 (libsystem_pthread.dylib + 5325) [0x7fff7d4234cd] 1-11
11 __psynch_mutexwait + 10 (libsystem_kernel.dylib + 16134) [0x7fff7d368f06] 1-11
*11 psynch_mtxcontinue + 0 (pthread + 10172) [0xffffff7f827fa7bc] (blocked by pthread mutex owned by thunderbird [42207] thread 0x86053b) 1-11

(In reply to Richard Leger from comment #190)

FYI, running bisection 2017-06-06 to 2017-06-11 on Windows 10 with TB 32 bits keep you posted... I'll remove the need info flag when I'll publish the result... as it is reduced period and number of version I plan to use each version for a few days in a row... to maximise chances to get the issue.

Suggest we use Richard's idea of running multiple days with one build, and do it wit multiple people with each person taking one or more builds in a coordinated manner:
Richard http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-06-03-02-05-comm-central/ ?
test#2 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-05-03-02-06-comm-central/
test#3 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-04-03-02-08-comm-central/
test#4 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-03-03-02-05-comm-central/
test#5 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-02-03-02-06-comm-central/
test#6 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-01-03-02-08-comm-central/

Although we do not know whether the regression is in fact in ths range. What do you think?

Is it possible to create a debug version of TB that would print out the necessary information to diagnose this in real time? Logs do not work when it hangs.

Flags: needinfo?(jamesrome)

(In reply to James Rome from comment #196)

Is it possible to create a debug version of TB that would print out the necessary information to diagnose this in real time? Logs do not work when it hangs.

Perhaps. But we would need a developer familiar with this area of coe to define the process, or enen say whether such a thing is possible. Right now there isn't such a person.

What we CAN do now - without any special tools or knowledge - is FIND the regression range, which is the path suggested two years ago. This can ONLY be done by those of you who can reproduce this - the rest of us are powerless to help except encourage you on the path.

To elaborate on "we do not know whether the regression is in fact in this range" of June, it was originally thought maybe this began in nightly 54, so the range possibly includes many months before June.

(In reply to Christopher Schultz from comment #194)

(In reply to Christopher Schultz from comment #193)

(In reply to Wayne Mery (:wsmwk) from comment #182)

The builds that need to be tested by those who have been able to reliably reproduce are:

  1. http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-07-03-02-05-comm-central/thunderbird-55.0a1.en-US.mac.dmg

I believe I have gotten this one to lock-up. ActivityMonitor.app says "Not Responding" and there's no CPU usage. I'm going to wait a good long time before killing it and hopefully I'll be able to get a thread dump.

But this was the "suspected good build," so maybe we have to cast a wider net.

Yep, deadlocked at the same place:

[...]
11 CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER + 12 (CoreFoundation + 646038) [0x7fff5133eb96] 1-11
11 CGLClearDrawable + 44 (OpenGL + 28513) [0x7fff5af37f61] 1-11
11 _pthread_mutex_firstfit_lock_slow + 222 (libsystem_pthread.dylib + 5325) [0x7fff7d4234cd] 1-11
11 __psynch_mutexwait + 10 (libsystem_kernel.dylib + 16134) [0x7fff7d368f06] 1-11
*11 psynch_mtxcontinue + 0 (pthread + 10172) [0xffffff7f827fa7bc] (blocked by pthread mutex owned by thunderbird [42207] thread 0x86053b) 1-11

Locked-up again. I also confirmed that (a) I had re-installed the Daily.app before launch and (b) disabled auto-update so it wouldn't keep upgrading itself.

For anyone who can't get a thread dump, try this: while the color-wheel is spinning and before you Force-Quit, run the "Activity Monitor" application, choose "Thunderbird" (or "Daily") in the list of processes, click the gear-icon in the upper-left-hand corner of the window and choose "Sample Process". After a few seconds, it will give you a thread dump as text with a bunch of header information. Every time this has happened to me, the offending thread was the first one listed. It shows every call starting with the thread-start at the top and the current work being done at the bottom, on the most-indented line. A few lines from the bottom, you'll see what I have posted above, including the call to CGLClearDreawable.

So even the assumed-good build is locking-up for me.

(In reply to Richard Leger from comment #190)

FYI, running bisection 2017-06-06 to 2017-06-11 on Windows 10 with TB 32 bits keep you posted... I'll remove the need info flag when I'll publish the result... as it is reduced period and number of version I plan to use each version for a few days in a row... to maximise chances to get the issue... as said in my previous test result, issue was linked to a temporary lost of connection to the IMAP server during the sending (server was rebooting)... with TB unable to resume task while server was back and running... I don't know if that help... as info...

It seems today at 9:36am, I had one issue with saving copy of message to Sent with TB 2017-06-09 while sending a simple text message...
pushlog_url https://hg.mozilla.org/comm-central/pushloghtml?fromchange=cc0700686608ad42e5847abcfc10f1c25b644352&tochange=b8876205fa8dbf22f34ffffadce627327ad51f24

In Activity Manager it shows one entry "connection refused at 9:23am" but server was available at all time as I checked and especially at 9:36am... indeed the message was sent... as I checked by other means. I could also access any messages in Inbox from any dates...

But I also noticed when the issue raise, TB keep trying to save a copy without process to complete, that it was also trying to bring the Sent folder uptodate and download some message in it... but that process appeared "Paused"... then I decided to browse between multiple folder to check if issue with the server but could still have access to messages... then browsing back in the Sent folder... seems to trigger it to be updated again... and the bring folder uptodate activity log resume its process... till completed.

Then the message was copied to Sent folder fine by itself (previously stall process completed successfully at that point), all I had to do was to browse back and form from/to Sent folder in Main TB UI so it may have triggered/retriggered connection to the server and update of the folder again...

I have slightly changed my IMAP settings to sync only 1 days worth of emails and not download email larger that 50k. So you know!

I was also connected via VPN to the office at the time but that should not make any different to TB...

2017-06-10 is impossible to use for more than few hours because UI keep slowing down over time to the point it is unbearable/unusable...

Hope this info can help...

Flags: needinfo?(richard.leger)

(In reply to Richard Leger from comment #199)

(In reply to Richard Leger from comment #190)

FYI, running bisection 2017-06-06 to 2017-06-11 on Windows 10 with TB 32 bits keep you posted... I'll remove the need info flag when I'll publish the result... as it is reduced period and number of version I plan to use each version for a few days in a row... to maximise chances to get the issue... as said in my previous test result, issue was linked to a temporary lost of connection to the IMAP server during the sending (server was rebooting)... with TB unable to resume task while server was back and running... I don't know if that help... as info...

It seems today at 9:36am, I had one issue with saving copy of message to Sent with TB 2017-06-09 while sending a simple text message...
pushlog_url https://hg.mozilla.org/comm-central/pushloghtml?fromchange=cc0700686608ad42e5847abcfc10f1c25b644352&tochange=b8876205fa8dbf22f34ffffadce627327ad51f24

Forgot to mentioned that I was running a bisection to find related fixes and not regressions (by default)... to get the link above...

(In reply to Richard Leger from comment #199)

with TB 2017-06-09

FYI, about the TB version I referred to...

app_name: thunderbird
build_date: 2017-06-09
build_file: C:\Users\richard.mozilla\mozregression\persist\2017-06-09--comm-central--thunderbird-55.0a1.en-US.win32.zip
build_type: nightly
build_url: https://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-09-03-02-05-comm-central/thunderbird-55.0a1.en-US.win32.zip
changeset: 998749e6ed4e8c8a70b406fa421cf64e98f0977a
pushlog_url: https://hg.mozilla.org/comm-central/pushloghtml?fromchange=998749e6ed4e8c8a70b406fa421cf64e98f0977a&tochange=b8876205fa8dbf22f34ffffadce627327ad51f24
repo_name: comm-central
repo_url: https://hg.mozilla.org/comm-central

(In reply to Wayne Mery (:wsmwk) from comment #195)

Richard http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-06-03-02-05-comm-central/ ?

I'll try that next and will attempt to get a thread dump as per Comment 198 advise see if that can help somehow... if issue occurs...

(In reply to Christopher Schultz from comment #198)

For anyone who can't get a thread dump, try this: while the color-wheel is spinning and before you Force-Quit, run the "Activity Monitor" application, choose "Thunderbird" (or "Daily") in the list of processes, click the gear-icon in the upper-left-hand corner of the window and choose "Sample Process".

Worth mentioning this advise is only applicable on Mac OS X system and not Windows ;-)

I first thought wrongly you were referring to TB Activity Manager ;-)

For Windows, the closer I could find is to open Task Manager, select Thunderbird process, right click, create dump file (.DMP)... would that be of any use?

(In reply to Wayne Mery (:wsmwk) from comment #195)

(In reply to Richard Leger from comment #190)
Suggest we use Richard's idea of running multiple days with one build, and do it wit multiple people with each person taking one or more builds in a coordinated manner:
Richard http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-06-03-02-05-comm-central/ ?
test#2 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-05-03-02-06-comm-central/
test#3 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-04-03-02-08-comm-central/
test#4 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-03-03-02-05-comm-central/
test#4 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-02-03-02-06-comm-central/
test#5 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-01-03-02-08-comm-central/

When testing as per above suggestion, one version at a time for three days, would it be worth then to activate IMAP logging at startup as per
https://wiki.mozilla.org/MailNews:Logging to maximum level of verbosity?

Would that be any useful to dev team to identify the issue?

The only imperative is to find the one day regression range where MACOS build N works and build N+1 fails. Nothing else matters.

Thanks for helping. I do hope the others jump in or this could take another two years :)

(In reply to Richard Leger from comment #203)

(In reply to Christopher Schultz from comment #198)

For anyone who can't get a thread dump, try this: while the color-wheel is spinning and before you Force-Quit, run the "Activity Monitor" application, choose "Thunderbird" (or "Daily") in the list of processes, click the gear-icon in the upper-left-hand corner of the window and choose "Sample Process".

Worth mentioning this advise is only applicable on Mac OS X system and not Windows ;-)

I was under the impression that this whole issue was 100% MacOS. I don't think the Windows build of tb is using OpenGL, etc. Are Windows folks having deadlocks in OpenGL like the bug-title suggests? Or are Windows folks having otherwise unexplained lock-ups and just guessing that it's the same issue. GUI deadlocks across OSs almost never have the same root cause because the OSs are usually so different.

This is 100% Mac only

(In reply to Wayne Mery (:wsmwk) from comment #205)

The only imperative is to find the one day regression range where MACOS build N works and build N+1 fails. Nothing else matters.

Thanks for helping. I do hope the others jump in or this could take another two years :)

People may want to use the binary search.

Step 0:
Set Start Date 0. (version from this date is known to work.)
Set End Date (version from this date is known to be broken).

Step 1:
Duration = EndDate - SartDate (in days).

If Duration is 1, we are done (!). The software got broken by a patch set on the start date.

Choose a test date based on Start Date + Duration / 2: We need to take care of the odd number, but
choose either below or above.

Step 2: Check the version on test date.

If the version is OK, then set Start Date to this test date.
If the version is NOT OK, then set End Date to this test date.

Go to Step 1:

This will take O(logN) as opposed to O(N) days as Wayne pointed out.

I think this is the strategy the mozilla utility for bisection uses.
(Yes "bi-" section.)

Also, I am sorry that I don't own or use a Mac and so can't offer any insight on this bug.

I do not get the issue so often. Maybe 1 time per week or less.
With Thunderbird 68.2.0 I just got the problem and used Activity Monitor as documented in https://bugzilla.mozilla.org/show_bug.cgi?id=1381485#c198

Sampling process 424 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling thunderbird (pid 424) every 1 millisecond
Process:         thunderbird [424]
Path:            /Applications/Thunderbird.app/Contents/MacOS/thunderbird
Load Address:    0x10df81000
Identifier:      org.mozilla.thunderbird
Version:         68.2.0 (68.2.0)
Code Type:       X86-64
Parent Process:  ??? [1]

Date/Time:       2019-10-26 17:32:40.722 +0200
Launch Time:     2019-10-26 16:11:02.254 +0200
OS Version:      Mac OS X 10.14.6 (18G103)
Report Version:  7
Analysis Tool:   /usr/bin/sample

Physical footprint:         340.2M
Physical footprint (peak):  351.6M
----

Call graph:
    2267 Thread_3211   DispatchQueue_1: com.apple.main-thread  (serial)
    + 2267 ???  (in XUL)  load address 0x10e504000 + 0x2a1d5d0  [0x110f215d0]
    +   2267 -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]  (in AppKit) + 1361  [0x7fff4304b46b]
    +     2267 _DPSNextEvent  (in AppKit) + 1135  [0x7fff4304c77d]
    +       2267 _BlockUntilNextEventMatchingListInModeWithFilter  (in HIToolbox) + 64  [0x7fff44cb3c76]
    +         2267 ReceiveNextEventCommon  (in HIToolbox) + 603  [0x7fff44cb3ee5]
    +           2267 RunCurrentEventLoopInMode  (in HIToolbox) + 292  [0x7fff44cb41ab]
    +             2267 CFRunLoopRunSpecific  (in CoreFoundation) + 455  [0x7fff45a5561e]
    +               2267 __CFRunLoopRun  (in CoreFoundation) + 1189  [0x7fff45a55d15]
    +                 2267 __CFRunLoopDoSources0  (in CoreFoundation) + 283  [0x7fff45a567a3]
    +                   2267 __CFRunLoopDoSource0  (in CoreFoundation) + 108  [0x7fff45a72d89]
    +                     2267 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__  (in CoreFoundation) + 17  [0x7fff45a72de3]
    +                       2267 ???  (in XUL)  load address 0x10e504000 + 0x29e0a01  [0x110ee4a01]
    +                         2267 -[NSView removeFromSuperview]  (in AppKit) + 164  [0x7fff430c5ee5]
    +                           2267 -[NSView _setWindow:]  (in AppKit) + 2621  [0x7fff430c533a]
    +                             2267 __21-[NSView _setWindow:]_block_invoke_2  (in AppKit) + 136  [0x7fff430dbd69]
    +                               2267 -[__NSArrayM enumerateObjectsWithOptions:usingBlock:]  (in CoreFoundation) + 219  [0x7fff45aa476b]
    +                                 2267 -[NSView _setWindow:]  (in AppKit) + 2309  [0x7fff430c5202]
    +                                   2267 -[NSSurface setWindow:]  (in AppKit) + 50  [0x7fff4337eb78]
    +                                     2267 -[NSSurface _disposeSurface]  (in AppKit) + 132  [0x7fff4337eefb]
    +                                       2267 -[NSNotificationCenter postNotificationName:object:userInfo:]  (in Foundation) + 66  [0x7fff47cafaab]
    +                                         2267 _CFXNotificationPost  (in CoreFoundation) + 732  [0x7fff45a293c7]
    +                                           2267 -[_CFXNotificationRegistrar find:object:observer:enumerator:]  (in CoreFoundation) + 1642  [0x7fff45a2a014]
    +                                             2267 ___CFXNotificationPost_block_invoke  (in CoreFoundation) + 87  [0x7fff45ac1688]
    +                                               2267 _CFXRegistrationPost  (in CoreFoundation) + 404  [0x7fff45ab91da]
    +                                                 2267 ___CFXRegistrationPost_block_invoke  (in CoreFoundation) + 63  [0x7fff45ab9270]
    +                                                   2267 __CFNOTIFICATIONCENTER_IS_CALLING_OUT_TO_AN_OBSERVER__  (in CoreFoundation) + 12  [0x7fff45ab92f6]
    +                                                     2267 CGLClearDrawable  (in OpenGL) + 44  [0x7fff4f6b2f61]
    +                                                       2267 _pthread_mutex_firstfit_lock_slow  (in libsystem_pthread.dylib) + 222  [0x7fff71bc34cd]
    +                                                         2267 _pthread_mutex_firstfit_lock_wait  (in libsystem_pthread.dylib) + 96  [0x7fff71bc5d52]
    +                                                           2267 __psynch_mutexwait  (in libsystem_kernel.dylib) + 10  [0x7fff71b08f06]

I don't know what ??? (in XUL) load address 0x10e504000 + 0x29e0a01 [0x110ee4a01] is doing. It looks like to be the latest code from Thunderbird that is involved in the deadlock.

Please see my comment #194. I believe I got the "known good" build to fail. Do I misunderstand something? Perhaps we need to rewind a bit for our "known good build"?

(In reply to Christopher Schultz from comment #210)

Please see my comment #194. I believe I got the "known good" build 2017-06-07 to fail.

Thanks for reemphasizing this

User Story: (updated)
Summary: Hangs sending imap mail while copying message to imap Sent folder on Mac while displaying the progress bar. Deadlock in graphics on CGLClearDrawable. → Hangs sending mail while copying message to Sent folder on Mac-only while displaying the progress bar. Deadlock in graphics on CGLClearDrawable.

People may want to use the binary search.

That would be great. Unfortunately I don't think anyone reports this behavior being extremely deterministic. Which means false positives happen unless a build is used for several days, and one individual doing a binary search might take a several weeks to complete a binary search.

Does anyone reliably reproduce this within a few hours? If not, then we need these reporters to divide and conquer, working back now from 2017-06-06.

We don't need traces. We don't need logs. We only need to know the dates of the daily builds that fail, and there is a list in comment 195 (which I have just corrected).

Richard took test#1 http://archive.mozilla.org/pub/thunderbird/nightly/2017/06/2017-06-06-03-02-05-comm-central/
We need others to pick test#2 - test#6.
Or maybe take an older one - http://archive.mozilla.org/pub/thunderbird/nightly/2017/05/2017-05-25-03-02-23-comm-central/

(In reply to Christopher Schultz from comment #206)

(In reply to Richard Leger from comment #203)

(In reply to Christopher Schultz from comment #198)

For anyone who can't get a thread dump, try this: while the color-wheel is spinning and before you Force-Quit, run the "Activity Monitor" application, choose "Thunderbird" (or "Daily") in the list of processes, click the gear-icon in the upper-left-hand corner of the window and choose "Sample Process".

Worth mentioning this advise is only applicable on Mac OS X system and not Windows ;-)

I was under the impression that this whole issue was 100% MacOS. I don't think the Windows build of tb is using OpenGL, etc. Are Windows folks having deadlocks in OpenGL like the bug-title suggests? Or are Windows folks having otherwise unexplained lock-ups and just guessing that it's the same issue. GUI deadlocks across OSs almost never have the same root cause because the OSs are usually so different.

FYI, the deadlock in graphics on CGLClearDrawable may be 100% MacOS only (I am not in position to tell really), but the Hangs sending mail while copying message to Sent folder while displaying the progress bar is not a MacOS only issue... it has been experienced in various TB versions on Windows... that is why I help testing on Windows... but if you think that is not necessary, let me know and I'll stop testing and reporting...

(In reply to Wayne Mery (:wsmwk) from comment #211)

(In reply to Christopher Schultz from comment #210)

Please see my comment #194. I believe I got the "known good" build 2017-06-07 to fail.

Thanks for reemphasizing this

Wayne. It's worth noting that I've been running that build for several days, now, and I was only able to get it to fail a single time. It's been running without quitting the whole time and I've been using my email as usual. With the up-to-date builds, I was getting hangs maybe 10 times per day. So perhaps this is a race-condition or something like that where the old builds were susceptible, but the later builds are just MORE susceptible due to some combination of factors. Not really helpful, I know. :(

When mine hangs with the (likely) MacOS-specific CGLClearDrawable call in the thread dump, the message is 100% sent and the sent-message is 100% saved to my IMAP server, and both windows (the composition window and the "sending message" window) both close. It appears to only be a lock-up of the main window after all the other windows have closed.

Duplicate of this bug: 1592480

(In reply to Christopher Schultz from comment #210)

Please see my comment #194. I believe I got the "known good" build to fail. Do I misunderstand something? Perhaps we need to rewind a bit for our "known good build"?

I have been able to get tb 55.0a1 daily (date: 2017-06-07) to lock-up a second time after sending a message. It took days to do it, but it finally happened. It is indeed the old familiar deadlock involving a call through CGLClearDrawable.

The two threads in deadlock are "Compositor" (running MessageLoop::Run()) and the unnamed thread making the call to CGLClearDrawable, which looks like the main event-dispatch thread for the application.

I guess I have to back-up to a previous build to try again. Shall I just grab the previous day? Or was this search based upon some educated guesses as to where the flaw may have been introduced?

Chris, thanks for volunteering to test another build.

Coordination is clearly difficult, so I've put more details and suggested date assignments in the user story - which everyone should consider to be the "diary" for this bug.

User Story: (updated)
Flags: needinfo?(sjames)
User Story: (updated)

(Commenting on User Story)

User configs (name, computer, OS version, graphics, monitor(s)) :

  • Christopher Schultz ?

MacBook Pro (15-inch, 2018), 10.14.6 Mojave, Intel UHD Graphics 630 w/1536MiB+Radeon Pro 560X w/4GiB; built-in display (15.4-inch 2880x1800)

(In reply to Wayne Mery (:wsmwk) from comment #217)

I thought I commented that the beta build crashed for me? I have found what I think is the 2017-06-03 you wanted me to test, will try it out.

Thanks.

Flags: needinfo?(sjames)

(In reply to Scott from comment #219)
CGLClearDrawable Mutex crash on
"thunderbird-55.0a1.en-US.mac.dmg 54M 03-Jun-2017 11:08"
after about 2 hours.

Ill try the next one.

(In reply to Wayne Mery (:wsmwk) from comment #217)

I got crashes in June 3rd and June 2nd within an hour. I jumped to May 13st and its so far lasted longer than the others. I will give it some more time then try June 1st.

Using a blank profile w/ no sent folder saving.

(In reply to Scott from comment #221)

(In reply to Wayne Mery (:wsmwk) from comment #217)

I got crashes in June 3rd and June 2nd within an hour. I jumped to May 13st and its so far lasted longer than the others. I will give it some more time then try June 1st.

Using a blank profile w/ no sent folder saving.

OK, I switched back to June 1st this morning and got it to crash. So to recap:

June 3rd, 2nd and 1'st daily builds all crash in an hour or less (lets say less than half a dozen emails sent in each). I have been running May 31st for about a day and half without crashes so I will go back and continue to test that.

Anyone else want to try and confirm these finding by also testing the May 31st and June 1st daily builds?

Thanks for testing. Yes, we need to get the regression range down to one day. So confirming "31st May good, 1st June bad" would be very helpful.

(In reply to Jorg K (GMT+2) from comment #223)

Thanks for testing. Yes, we need to get the regression range down to one day. So confirming "31st May good, 1st June bad" would be very helpful.

Got a crash in the May 31st Build, I'm going to jump back a full week.

(In reply to Scott from comment #219)

(In reply to Wayne Mery (:wsmwk) from comment #217)

I thought I commented that the beta build crashed for me? I have found what I think is the 2017-06-03 you wanted me to test, will try it out.

To reclarify for others, we're at the stage where only testing of NIGHTLY builds is helpful. Thanks for actively working on this

User Story: (updated)

I've gotten 2017-06-07 "daily" to crash a bunch of times, now. I know Scott has been getting this to happen more quickly than I have -- just reiterating that June 7th is definitely bad.

I'd like to try May 30th, but this directory only appears to contain log files and no actual builds:
http://archive.mozilla.org/pub/thunderbird/nightly/2017/05/2017-05-30-03-02-06-comm-central/
So I've backed-up to this build: http://archive.mozilla.org/pub/thunderbird/nightly/2017/05/2017-05-29-03-02-06-comm-central/thunderbird-55.0a1.en-US.mac.dmg

(In reply to Christopher Schultz from comment #226)

You can skip it. May 24th Crashes too.

Ill go back another week.

Duplicate of this bug: 1561615
Duplicate of this bug: 1599553
Blocks: 1599553

How are results from version older than May 17 or 24?

No longer blocks: 1599553
User Story: (updated)
Flags: needinfo?(sjames)
Flags: needinfo?(chris)
Blocks: 1599553

(In reply to Wayne Mery (:wsmwk) from comment #230)

How are results from version older than May 17 or 24?

I was away on vacation for 10 days. I get failures all the way back to May 10th. I am currently testing the AM build (there was an AM and PM) of May 3rd.

Flags: needinfo?(sjames)

(In reply to Scott from comment #227)

How are results from version older than May 17 or 24?

Just another confirmation that 2017-05-17 is locking up. It took several days (weeks?) to start, but today it's been locking up a lot.

I'll go back to 2017-05-01.

Flags: needinfo?(chris)

(In reply to Christopher Schultz from comment #232)

I'll go back to 2017-05-01.

I get crashes in 2017-04-01, I am currently testing March 1st.

User Story: (updated)

(In reply to Christopher Schultz from comment #232)

I'll go back to 2017-05-01.

2017-05-01 is unusable for me: every time I launch it, it re-downloads all email from all folders for all time from Gmail. I think I need to replace the CPU can in my computer, now.

(In reply to Richard Leger from comment #213)

FYI, the deadlock in graphics on CGLClearDrawable may be 100% MacOS only (I am not in position to tell really), but the Hangs sending mail while copying message to Sent folder while displaying the progress bar is not a MacOS only issue... it has been experienced in various TB versions on Windows... that is why I help testing on Windows... but if you think that is not necessary, let me know and I'll stop testing and reporting...

If it's easily reproduced, we still need to find the regression range, so please do keep on testing

(In reply to Scott from comment #233)

(In reply to Christopher Schultz from comment #232)

I'll go back to 2017-05-01.

I get crashes in 2017-04-01, I am currently testing March 1st.

Have you confirmed it is the same crash? How goes it with March 1?

(In reply to Christopher Schultz from comment #234)

(In reply to Christopher Schultz from comment #232)

I'll go back to 2017-05-01.

2017-05-01 is unusable for me: every time I launch it, it re-downloads all email from all folders for all time from Gmail. I think I need to replace the CPU can in my computer, now.

Depending on Scott's results about March 1, can you coordinate the next calendar dates to test

Flags: needinfo?(sjames)
Flags: needinfo?(chris)

MIGHT be on to something with March 1st. Haven't had a crash yet and I have been running it for close to 2 weeks. I think Ill go to weekly builds between April 1st and March 1st and see what I can find this week. I will be away the following 2 weeks.

All of my previous crashes have been CGLClearDrawable ones.

Flags: needinfo?(sjames)

Scott, which date are you going with first, so Christopher can pick a different date?

User Story: (updated)
Flags: needinfo?(sjames)

(In reply to Wayne Mery (:wsmwk) from comment #238)

Scott, which date are you going with first, so Christopher can pick a different date?

I'm starting with the 22nd... give me a day, I usually get crashes within an hour or two. So I should be able to narrow it down to a week of builds fairly quickly then we can break them up.

Flags: needinfo?(sjames)
See Also: → 413240

(Scott is making great progress. Hopefully we have a 1-2 day range by Wednesday or Thurdsay.)

Good results Scott?

Flags: needinfo?(sjames)

Back at it hopefully today. I seem to have lost track of whether I was testing the 15th or March 8th build last. Hopefully Ill have it to a week soon.

Flags: needinfo?(sjames)

I got 2017-05-01 to lock-up, finally.

It's always fun re-downloading your whole email history from Gmail. I will eventually get banned. :(

I'm going to re-try with http://archive.mozilla.org/pub/thunderbird/nightly/2017/04/2017-04-08-00-40-03-comm-aurora/thunderbird-54.0a2.en-US.mac.dmg

2017-04-08 locked-up this morning in (well, beneath) CGLClearDrawable.

I'm backing up to http://archive.mozilla.org/pub/thunderbird/nightly/2017/04/2017-04-04-00-40-03-comm-aurora/thunderbird-54.0a2.en-US.mac.dmg

Hmm. I've looked back at the comments and I've apparently switched from "comm-central" to "comm-aurora". What is the difference, and should I be consistent?

Yes you need to be using nightly consistently. Aurora is what used to be alpha, and that has different code.

But "nightly" has a bunch of options for each day. Which of the e.g. 2017-04-04-* should I be using?

http://archive.mozilla.org/pub/thunderbird/nightly/2017/04/

There are lots of choices:

Dir 2017-04-04-00-40-03-comm-aurora-l10n/
Dir 2017-04-04-00-40-03-comm-aurora/
Dir 2017-04-04-03-02-02-comm-central-l10n/
Dir 2017-04-04-03-02-02-comm-central/
Dir 2017-04-04-03-02-03-comm-esr45/
Dir 2017-04-04-03-02-03-comm-esr52/
Dir 2017-04-04-03-03-28-comm-central/
Dir 2017-04-04-03-03-28-comm-esr45/
Dir 2017-04-04-03-03-28-comm-esr52/

comm-central is where the development happens, so "2017-04-04-03-02-02-comm-central"

(In reply to Wayne Mery (:wsmwk) from comment #241)

Good results Scott?

Ok - I get March 8th (comm-central/ - for clarity) to crash. And March 1st to maybe not crash (I usually get crashes in a couple of hours and I ran it cleanly for almost 2 weeks.)

I'm currently downloading the daily builds for the 2nd, 3rd, 4th, 5th, 6th and 7th.

Interestingly the night of the 7th is when it rolls over from v54 to v55 - this might prove to be significant.

I will start my testing on the 7th and work backwards as its easier/quicker for me to eliminate candidates that prove they work successfully. If anyone else wants to double/triple check the March 1st build and work forwards that would be great.

Unsurprisingly, I got 2017-04-08 to lock-up in a similar way. I'll go back to 2017-03-02 to help bracket Wayne's researches.

Happens twice a day or more including today.. Running 68.4.1 (64-Bit). Really frustrating.

I got 2017-03-02 to lock up this evening, beneath CGLClearDrawable Maybe I should try Communicator? ;)

(In reply to Christopher Schultz from comment #251)

I got 2017-03-02 to lock up this evening, beneath CGLClearDrawable Maybe I should try Communicator? ;)

Interesting. Try March 1st. My 7th is still running after several days. We might need to go back further...!

I'm starting to think that this was introduced by a change in Macos and not a change in Thunderbird.

(In reply to Christopher Schultz from comment #253)

I'm starting to think that this was introduced by a change in Macos and not a change in Thunderbird.

I have a similar sentiment. I recall first encountering crashes in the fall... of I think 2018. But this could also be explained by not being up to date on Thunderbird releases. My office also has Mac's with various different OS - as I am not on site anymore, I cant do a detailed analysis of whom is getting crashes with what version, but I can also see this impacting peoples update schedule.

I'm starting to think that this was introduced by a change in Macos and not a change in Thunderbird.

An interesting idea. It could even be hardware related. But if it is our code, that would be consistent with this being first reported with 54 beta and not seeing majority of reports until version 60 when newer version 5<something> code hit the larger user population.

To exhaust the code regression idea we'd need to test version 53 and 54.
http://archive.mozilla.org/pub/thunderbird/nightly/2017/01/2017-01-24-03-02-12-comm-central/ is roughly the earliest 54 nightly.
http://archive.mozilla.org/pub/thunderbird/nightly/2016/11/2016-11-15-03-02-11-comm-central/ is roughly earliest 53 nightly.

User Story: (updated)
Flags: needinfo?(sjames)
Flags: needinfo?(chris)

(In reply to Wayne Mery (:wsmwk) from comment #255)

I'm starting to think that this was introduced by a change in Macos and not a change in Thunderbird.

An interesting idea. It could even be hardware related. But if it is our code, that would be consistent with this being first reported with 54 beta and not seeing majority of reports until version 60 when newer version 5<something> code hit the larger user population.

To exhaust the code regression idea we'd need to test version 53 and 54.
http://archive.mozilla.org/pub/thunderbird/nightly/2017/01/2017-01-24-03-02-12-comm-central/ is roughly the earliest 54 nightly.
http://archive.mozilla.org/pub/thunderbird/nightly/2016/11/2016-11-15-03-02-11-comm-central/ is roughly earliest 53 nightly.

No big updates from me, I am still running March 7 stable. But Christopher has reported a crash on March 2nd.

Flags: needinfo?(sjames)
Attached file tb_v68_4_1_crash.txt

Another hang-while-sending. Eventually manually force-quit. On restart, confirmed that the email had been sent correctly (as usual). This has happened several times over this last week now, fyi, each with same pattern as before. Write email and click send without any problems. The "Sending" Dialog box finishes sending the email and disappears. Then Thunderbird immediately locks, with beach-ball, until I eventually give up waiting and force-quit.

TB: 68.4.1
MacOSX: 10.14.6

:wsmwk,

Hey there. I note that in all of my hangs, the "sending email" dialog box with progress bar successfully completes sending, and successfully disappears from screen. The hang for me happens immediately after that dialog box clears. However, I just noticed that the summary for this ticket describes having the dialog box still displayed when hanging!?!

Are these the same issue or two different issues?

Flags: needinfo?(vseerror)

Are these the same issue or two different issues?

Interesting observation. I have no technical expertise here. But I think your comment does further suggest this is a graphics issue and not a thunderbird issue.

Flags: needinfo?(vseerror)

This bug is 3 years old. I filed essentially the same bug, but I don't see it in my Dashboard so it may have been closed as a duplicate or something. At the time I provided stack dumps that matched these.

In my experience (which is considerable -- I led the engineering teams for iMovie and iPhoto at Apple), this is a "race condition" bug, not a graphics bug or an OS bug or whatever else has been proposed. It is deadlocking in a mutex, probably around the progress bar, not the mail delivery, which always succeeds.

I don't think you're going to find/fix this bug by regression analysis, as though the bug were somehow introduced at some point. I think it has been there for a long time, and is a design bug. Race conditions are like that.

The only way to fix it, in my opinion, is to look carefully at the code, specifically where the mutexes are established and released. It takes some hard thinking and careful looking, but mutexes are inherently hard to debug.

Suggestions:

  1. Remove the mutexes altogether. Are they really necessary? I've seen very few UI/graphics/progress bar interactions that require mutexes. Presumably they are used because of multiple threads, but if (as should be) only one thread -- the main thread, typically -- is handling UI updates, then a mutex shouldn't be necessary.
  2. Deliberately introduce a delay into one or the other of the threads that is locking the mutex, to see if it can be consistently reproduced.
  3. Add trace/log statements around the lock/unlock of the mutexes so you can see any close timing / race conditions (important to flush stdout as blocked threads that write to log files don't always show up in the log files due to output buffering).

In your bug 1400568 you wrote "This is recent bug, as of 52.3.0, never happened before". So in that bug (which is duped to this one), and here, we have proceeded on the assumption it's a regression - regardless of whether it's a race condition or not.

Looking back, there is also bug 1422251 and bug 1440716. So you are not the only person to have reported this issue against version 52. Still, it was only 3-4 people to report the issue for all or most of version 52. So changed in newer versions make the situation worse. But certainly it's possible part of the underlying issue predates any of these bug reports, and the regression hunt is a waste.

It still may have been introduced in 52, but if that is the theory, then there's little point in testing 53, 54, etc.

Not sure what source code control system you guys use, but GitHub has pretty nice "diff" tools and inspecting code changes in/around the area in question in the timeframe of 52.3.0 might shed some clues.

(In reply to John O'Duinn [:joduinn] (please use "needinfo?" flag) from comment #258)

:wsmwk,

Hey there. I note that in all of my hangs, the "sending email" dialog box with progress bar successfully completes sending, and successfully disappears from screen. The hang for me happens immediately after that dialog box clears. However, I just noticed that the summary for this ticket describes having the dialog box still displayed when hanging!?!

I have never had a stray window hanging around when the lock-up occurs. The mail-send operation appears to have 100% completed. The lock-up occurs when trying to work with the main window after the completion of the send/copy-to-sent/etc. operation and all temporary windows have closed (for me).

Hello all (and John):

I'll add my two cents worth here. I also do not see any window hanging around after I send a message. It clears and then if I'm foolish enough to try ANYTHING else in TB before a period of time (which I've not been able to determine), operation continues normally. If, on the other hand, I attempt to go right back into TB and start another message, I'm entertained by the beach-ball until I get bored and perform a force quit.

Sincerely,
Bob

(In reply to Glenn Reid from comment #262)

It still may have been introduced in 52, but if that is the theory, then there's little point in testing 53, 54, etc.

We've been working back in time, not forward in time.

Not sure what source code control system you guys use

hg

but GitHub has pretty nice "diff" tools and inspecting code changes in/around the area in question in the timeframe of 52.3.0 might shed some clues.

Good idea. Which two versions should we run a diff against?

(Hint: that's what we are trying to determine, so we can actually look at some targeted code changes instead of just "hey, what changed during the decade around release 52?")

(In reply to Christopher Schultz from comment #265)

(In reply to Glenn Reid from comment #262)

It still may have been introduced in 52, but if that is the theory, then there's little point in testing 53, 54, etc.

We've been working back in time, not forward in time.

Looking back through this bug's history, I see regression testing comments for 53, 54, and 55.

but GitHub has pretty nice "diff" tools and inspecting code changes in/around the area in question in the timeframe of 52.3.0 might shed some clues.

Good idea. Which two versions should we run a diff against?

(Hint: that's what we are trying to determine, so we can actually look at some targeted code changes instead of just "hey, what changed during the decade around release 52?")

Based on the stack trace(s) the section of code should be narrow. Can't be that many changes to that code. Pick a version before it (52.2) and after it (52.3).

I know you feel like I am intruding and telling you how to find the bug. I am also noting that in three years you have not found the bug, so I'm just trying to offer suggestions. I have found bugs like this in my career. It is not easy. My previous suggestions are probably more useful than diff'ing, based on your comments.

(In reply to Glenn Reid from comment #266)

I know you feel like I am intruding and telling you how to find the bug. I am also noting that in three years you have not found the bug, so I'm just trying to offer suggestions. I have found bugs like this in my career. It is not easy. My previous suggestions are probably more useful than diff'ing, based on your comments.

For what it's worth, I'm not a tb developer, just a user. So I've been responding to requests from the tb developers to get them more information; specifically, trying to narrow-down a before/after time where the bug can and cannot be reproduced.

My (wild) assertion about an underlying change in Macos "causing" this issue wasn't suggesting that Macos actually has the bug. It's much more likely that some change in Macos has simply made this existing bug in tb more obvious and to occur more frequently.

I agree with you that it's very likely to be an improper lock-management situation within tb.

You need to log in before you can comment on or make changes to this bug.