Closed Bug 825513 Opened 7 years ago Closed Last year

crash in nsImapProtocol::GetMessageSize via nsIMAPBodyShell::Generate. m_runningUrl reset for unknown reason and we keep going through nsIMAPBodyShell::Generate

Categories

(MailNews Core :: Networking: IMAP, defect, critical)

x86
All
defect
Not set
critical

Tracking

(thunderbird_esr6063+ fixed, thunderbird63 wontfix, thunderbird64 fixed)

RESOLVED FIXED
Thunderbird 64.0
Tracking Status
thunderbird_esr60 63+ fixed
thunderbird63 --- wontfix
thunderbird64 --- fixed

People

(Reporter: wsmwk, Assigned: mkmelin)

References

(Blocks 1 open bug)

Details

(5 keywords, Whiteboard: [regression:TB19xx][needs deep code inspection])

Crash Data

Attachments

(1 file, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-b10f5e8e-35b9-4bf4-a3f6-0b6572121229 .
============================================================= 
TB19.0a2

0	xul.dll	nsImapProtocol::GetMessageSize	mailnews/imap/src/nsImapProtocol.cpp:4489
1	xul.dll	nsIMAPBodyShell::Generate	mailnews/imap/src/nsIMAPBodyShell.cpp:220
2	xul.dll	nsImapServerResponseParser::ProcessOkCommand	mailnews/imap/src/nsImapServerResponseParser.cpp:430
3	xul.dll	nsImapServerResponseParser::ParseIMAPServerResponse	mailnews/imap/src/nsImapServerResponseParser.cpp:261
4	xul.dll	nsImapProtocol::ParseIMAPandCheckForNewMail	mailnews/imap/src/nsImapProtocol.cpp:1885
5	xul.dll	nsImapProtocol::FetchMessage	mailnews/imap/src/nsImapProtocol.cpp:3573
6	xul.dll	nsImapProtocol::FetchTryChunking	mailnews/imap/src/nsImapProtocol.cpp:3609
7	xul.dll	nsImapProtocol::FallbackToFetchWholeMsg	mailnews/imap/src/nsImapProtocol.cpp:3309
8	xul.dll	nsIMAPBodyShell::Generate	mailnews/imap/src/nsIMAPBodyShell.cpp:223
9	xul.dll	nsImapProtocol::ProcessSelectedStateURL	mailnews/imap/src/nsImapProtocol.cpp:2642
10	xul.dll	nsImapProtocol::ProcessCurrentURL	mailnews/imap/src/nsImapProtocol.cpp:1724
11	xul.dll	nsImapProtocol::ImapThreadMainLoop	mailnews/imap/src/nsImapProtocol.cpp:1345
12	xul.dll	nsImapProtocol::Run	mailnews/imap/src/nsImapProtocol.cpp:1023 

4489 else
4490 m_runningUrl->AllocateCanonicalPath(
4491 folderFromParser,kOnlineHierarchySeparatorUnknown,
4492 &folderName); 

TB19.0a1 bp-3c9b0f14-6649-4152-9d5f-4530a2121119

Earliest crash with this signature is bp-645be617-d667-47a7-9284-252fc2120524 from TB12.  I can't determine that this is a regression or not.
It's #9 crasher in TB 23.0b1 over the last four weeks.
#16 in TB24.0.  Definitely tracks to version 19.
But *might* be a limited number of users. Will update after socorro is fixed - Socorro column sorting broken and I'm not going to eyeball the list.

bp-75d542a7-032f-4052-b98a-2ff6f2130920
0	xul.dll	nsImapProtocol::GetMessageSize(char const *,bool)	mailnews/imap/src/nsImapProtocol.cpp
1	xul.dll	nsIMAPBodyShell::Generate(char *)	mailnews/imap/src/nsIMAPBodyShell.cpp
2	xul.dll	nsImapServerResponseParser::ProcessOkCommand(char const *)	mailnews/imap/src/nsImapServerResponseParser.cpp
3	xul.dll	nsImapServerResponseParser::ParseIMAPServerResponse(char const *,bool,char *)	mailnews/imap/src/nsImapServerResponseParser.cpp 

hg@0 4426 const char *folderFromParser = GetServerStateParser().GetSelectedMailboxName();
hg@0 4427 if (folderFromParser && messageId)
hg@0 4428 {
hg@0 4429 char *id = (char *)PR_CALLOC(strlen(messageId) + 1);
hg@0 4430 char *folderName;
ehsan@13324 4431 uint32_t size;
hg@0 4432
hg@0 4433 PL_strcpy(id, messageId);
hg@0 4434
mconley@13187 4435 nsIMAPNamespace *nsForMailbox = nullptr;
hg@0 4436 m_hostSessionList->GetNamespaceForMailboxForHost(GetImapServerKey(), folderFromParser,
hg@0 4437 nsForMailbox);
hg@0 4438
hg@0 4439
hg@0 4440 if (nsForMailbox)
hg@0 4441      m_runningUrl->AllocateCanonicalPath(
hg@0 4442           folderFromParser, nsForMailbox->GetDelimiter(),
hg@0 4443               &folderName);
Crash Signature: [@ nsImapProtocol::GetMessageSize(char const*, bool)] → [@ nsImapProtocol::GetMessageSize] [@ nsImapProtocol::GetMessageSize(char const*, bool)]
OS: Windows NT → All
Whiteboard: [regression:TB19xx]
I can't see anything specific in the bug list that would trigger this. I suppose it could be another symptom of the same underlying problem that caused bug 723788; as with that bug, we're tripping across a null pointer deep in a call tree where we expect the field to always have a value.

We could crash-proof nsImapProtocol::GetMessageSize() by null checking m_runningUrl before the AllocateCanonicalPath() call, but I expect that would just shift the failure somewhere else.

The most unusual thing about this call tree is that we're recursing through nsIMAPBodyShell::Generate around mailnews/imap/src/nsIMAPBodyShell.cpp:220-223, which seems deeply wrong to me, but I don't understand that code well enough to know what really should be happening.
Flags: needinfo?(irving) → needinfo?(mozilla)
(#1 imap crash)

(In reply to :Irving Reid from comment #4)
> I can't see anything specific in the bug list that would trigger this. I
> suppose it could be another symptom of the same underlying problem that
> caused bug 723788; as with that bug, we're tripping across a null pointer
> deep in a call tree where we expect the field to always have a value.
> 
> We could crash-proof nsImapProtocol::GetMessageSize() by null checking
> m_runningUrl before the AllocateCanonicalPath() call, but I expect that
> would just shift the failure somewhere else.

Yeah. 

I'll try to get someone to reproduce. Another hard slog - no one from my 10/29 contacts can do it. The only one who had a clue indicated it took a long time to "send" the message which crashed - he sent a couple others while it was in progress
Keywords: testcase-wanted
Blocks: 663328
See Also: → 723888
I am not able to find a user testcase thus far.
Summary: crash in nsImapProtocol::GetMessageSize → crash in nsImapProtocol::GetMessageSize via nsIMAPBodyShell::Generate
(In reply to :Irving Reid from comment #4)
 
> We could crash-proof nsImapProtocol::GetMessageSize() by null checking
> m_runningUrl before the AllocateCanonicalPath() call, but I expect that
> would just shift the failure somewhere else.

I can totally agree.

But nsImapUrl::AllocateCanonicalPath() can be rewritten without m_runningUrl instance.
The method uses only a member variable (m_serverKey) of its class, which is also held in nsImapProtocol.
What do you think?
Flags: needinfo?(irving)
That doesn't feel like the right approach here; I'm guessing that the problem is some other thread trying to cancel, close or time out this IMAP operation, and resetting m_runningUrl to NULL as part of that. If the operation we're trying to perform is partly cancelled, we should fail rather than trying to find another way to continue.
Flags: needinfo?(irving)
(In reply to :Irving Reid from comment #4)
> ...
> The most unusual thing about this call tree is that we're recursing through
> nsIMAPBodyShell::Generate around
> mailnews/imap/src/nsIMAPBodyShell.cpp:220-223, which seems deeply wrong to
> me, but I don't understand that code well enough to know what really should
> be happening.

Bienvenu isn't active so removing needinfo.  Irving, do you want to PM him?
Flags: needinfo?(mozilla) → needinfo?(irving)
Another one where I think we just need to wait for someone to step up and learn the code well enough to fix it.
Flags: needinfo?(irving)
In case it helps, notable fixes in 

TB19 (not including uplifts)
* bug 495318 add support for LIST (SUBSCRIBED) part of IMAP LIST-EXTENDED command (RFC 5258)
* bug 723888 [regression:tb10] SyncRunnable5<nsIImapMailFolderSink, nsIImapProtocol* ... typically working with folder subscribe
* from https://bugzilla.mozilla.org/buglist.cgi?f1=short_desc&list_id=12000050&o1=nowordssubstr&resolution=FIXED&classification=Client%20Software&classification=Components&query_format=advanced&f2=OP&component=Networking%3A%20IMAP&target_milestone=Thunderbird%2019.0&product=MailNews%20Core&product=Thunderbird

TB18 (not including uplifts)
* bug 471492 [bulkoperations][regression action in 3.1?] crash [@ nsImapMailFolder::CopyMessagesOffline(nsIMsgFolder*, nsIArray*, int, nsIMsgWindow*, nsIMsgCopyServiceListener*)] when moving/deleting/copying many messages
* bug 721316 Use Gmail IMAP Extension of X-GM-MSGID, X-GM-THRID, X-GM-LABELS in addition to XLIST
* from https://bugzilla.mozilla.org/buglist.cgi?f1=short_desc&list_id=12000052&o1=nowordssubstr&resolution=FIXED&classification=Client%20Software&classification=Components&query_format=advanced&f2=OP&component=Networking%3A%20IMAP&target_milestone=Thunderbird%2018.0&product=MailNews%20Core&product=Thunderbird
Whiteboard: [regression:TB19xx] → [regression:TB19xx][needs deep code inspection]
(In reply to :Irving Reid (No longer working on Firefox) from comment #4)
> 
> We could crash-proof nsImapProtocol::GetMessageSize() by null checking
> m_runningUrl before the AllocateCanonicalPath() call, but I expect that
> would just shift the failure somewhere else.

m_runningUrl is one of those variables, like the db in the folder, that gets reset for unknown reasons, and so code has typically added a check to prevent crashes. I don't see a good reason not to do that here as well. I'm not claiming that will fix the underlying bug, but in most new code we typically provide defensive checks, so why not here as well? It has very little risk.
m_kato, is this something you could do?
Flags: needinfo?(m_kato)
null check won't good fix.  Maybe, ReleaseUrlState is called by any reason, then m_runningURl is null.

But I cannot find root cause for this issue.
Flags: needinfo?(m_kato)
Alfred, is this something you could look at?

Most crash comments mention attachments
Flags: needinfo?(infofrommozilla)
Depends on: 1016524
pretty similar to bug 558452
Blocks: 558452
Flags: needinfo?(infofrommozilla)
FYI, issue still happening recently...
bp-de868212-79f2-4569-9829-5fc9e2170228   28/02/2017
See Also: → 628646
Richard, are you/your people still seeing crashes?
Flags: needinfo?(richard.leger)
Summary: crash in nsImapProtocol::GetMessageSize via nsIMAPBodyShell::Generate → crash in nsImapProtocol::GetMessageSize via nsIMAPBodyShell::Generate. m_runningUrl reset for unknown reason and we keep going through nsIMAPBodyShell::Generate
(In reply to Wayne Mery (:wsmwk) from comment #19)
> Richard, are you/your people still seeing crashes?

Since bug report bp-b7d62762-9429-46c9-8833-477de0180226 26/02/2018 I haven't seen a crash yet. 
The crash occurred while I was trying go move folders in my mailbox (IMAP)... also afterwards I had issue to list content of subfolder but I don't know if it is linked to it or not. Basically clicking on a folder was not changing the view to show the folder content. That happened coupled of time already randomly... forcing me to quit and re-start Thunderbird to fix issue...
Flags: needinfo?(richard.leger)
Do you have people on beta 59?
Flags: needinfo?(vseerror)
> Since bug report bp-b7d62762-9429-46c9-8833-477de0180226 26/02/2018 I haven't seen a crash yet. 

bp-b7d62762-9429-46c9-8833-477de0180226  is  [@ shutdownhang | arena_t::MallocSmall | nsCOMPtr_base::assign_with_AddRef | mozilla::EventQueue::PutEvent ] which does not have a bug report

Richard, do you have people on beta 59?
Flags: needinfo?(vseerror) → needinfo?(richard.leger)
I am using 59 beta branch on my laptop currently and will upgrade to 60 when available this week... will send bug report if any crash occurs...
Flags: needinfo?(richard.leger)
(In reply to Richard Leger from comment #23)
> I am using 59 beta branch on my laptop currently and will upgrade to 60 when
> available this week... will send bug report if any crash occurs...

How is it working out?
Flags: needinfo?(richard.leger)
No crash reported since 19/03/2018... 

I have been using 60.0b5 (32-bit) for one to two weeks (not recalling exactly when update occurred) and no crash yet. 

The LDAP Address-book issue (addressed in a separate bug) has not be solved yet though. 

Also noted but not necessarily introduced in the 60.x branch version Thunderbird performance is obviously not improving. Lag overtime and especially delay at start-up is a real pain... between Inbox indexing and Lightning network calendar items populating in the UI... it takes me easily 8-10mn waiting before I can do anything and read my first email :-(( and that also is not improving over time...

After May 9, 2017 when the Mozilla Foundation had finally decided to serve as the legal and fiscal home for the Thunderbird project, and have Thunderbird migrated off Mozilla Corporation infrastructure, separating the operational aspects of the project, I had great hope for Thunderbird development... but as of today, I don't feel this decision had any impact on improving TB, and I even start to wonder if I should consider testing other email clients and move away from Thunderbird all together... despite great works from developers to keep it afloat...
Flags: needinfo?(richard.leger)
(In reply to Richard Leger from comment #25)
> No crash reported since 19/03/2018... 
> 
> I have been using 60.0b5 (32-bit) for one to two weeks (not recalling
> exactly when update occurred) and no crash yet. 

That's a nice change :)


> Also noted but not necessarily introduced in the 60.x branch version
> Thunderbird performance is obviously not improving. Lag overtime and
> especially delay at start-up is a real pain... between Inbox indexing and
> Lightning network calendar items populating in the UI... it takes me easily
> 8-10mn waiting before I can do anything and read my first email :-(( and
> that also is not improving over time...

Not related to this bug so we should deal with this elsewhere, but I haven't seen you mention performance issues since 2016 in an old bug report.  A starting point is https://wiki.mozilla.org/Thunderbird:Testing:Memory_Usage_Problems  Note, a testcase where calendar addon is not involved would be more interesting.


> After May 9, 2017 when the Mozilla Foundation had finally decided to serve
> as the legal and fiscal home for the Thunderbird project, and have
> Thunderbird migrated off Mozilla Corporation infrastructure, separating the
> operational aspects of the project, 

It's off topic for this bug, but I don't know why you would think non-code related changes (which all of those are) would impact code development.
See Also: → 1264302
(In reply to Wayne Mery (:wsmwk) from comment #26)
> (In reply to Richard Leger from comment #25)
> > No crash reported since 19/03/2018... 
> > 
> > I have been using 60.0b5 (32-bit) for one to two weeks (not recalling
> > exactly when update occurred) and no crash yet. 
> 
> That's a nice change :)

20 days more and no reports as far as I can tell. That's impressive.

But still a #5 crash for 60.0b6 release :(
Several users with this signature also crash OOM. Example  bp-8f31ab8e-ec5d-420c-9bb6-2123e0180613 OOM | small  and bp-ce74a455-f4c0-431d-8bb0-931c40180531  nsImapProtocol::GetMessageSize
FYI, today I have reverted some settings in my TB to their factory default. In case that may have (had) an effect on this bug or not... https://bugzilla.mozilla.org/show_bug.cgi?id=1264302#c26
This crash bp-174944e6-3f59-42c9-b758-c9fcb0180823 occurred in TB 60.0b10 (32-bit) and seems linked to this bug.

After hitting reply button on a message, I was typing text in body while TB crashed unexpectedly...

Before that, TB was very slow and erratic - not loading (blank) or very slow loading of message content in the Inbox view...

Before that, I tried to send a big message that sent ok, but would not save to Sent items again... waited for a long time but still not saving or cancelling processing by itself... finally despite not being disconnected (I could still access server by other means - ping response time to server OK) I restarted my VPN connection (over 4G conn) successfully... and browse multiple folders in my TB Inbox to trigger IMAP conn and login to the server... after a while TB manage to delete my sent msg in Draft (as per status bar indication) which allowed TB to re-attempt automatically to save the sent msg into Sent folder (on the server) which this time worked successfully. No issue reported on the IMAP server at the time.

The slow loading of email content has also been noticed randomly on normal conn (ADSL without VPN), need to switch between multiple message to re-trigger the loading... happens to all type of message, but especially noticed if big attachment are involved in the message... msg content remain blank till attachments are loaded...

Also noted during the same day:
- TB main UI windows disappeared twice by itself (look like closed itself somehow) while other TB windows (edit msg, calender reminders, etc...) remained opened.
- When closing all the remaining Windows and try to re-opened TB, it failed saying it was already running... process still appearing in Task Manager with memory keeping increasing over time... had to kill process in order to be able to re-open TB.

One positive note is that I seems to have notice improvement in speed when TB starts up, Inbox still not updating while Calendar stuff loading but it now seems quicker till the point I can read emails... to be confirmed. Could also be due to some improvement in Windows OS (or video driver?) as my machine has slightly increase in speed recently especially to open software and windows in general (more snappy). Not directly relevant to this bug but worth mentioning...

Hope that help.
(In reply to Richard Leger from comment #31)

> One positive note is that I seems to have notice improvement in speed when
> TB starts up, Inbox still not updating while Calendar stuff loading but it
> now seems quicker till the point I can read emails... to be confirmed. 

Confirmed. It now takes about ~40seconds (still long but workable) from click on TB icon to being able to read first email... compared to many minutes previously... that is a huge improvement. 

I am wondering if reducing the number of emails from ~8000 to ~3500 in Inbox (and ~13000 to ~8000 in Sent) may have helped, as it reduce the indexing requirement at start-up... just a thought... 

Though as noticed, Activity Manager now remains empty during the start-up process so it might mean that TB no longer try to do indexing immediately at start-up anymore...

BIG Thanks to all the developers that work on improving TB performance... you make users happy!
Those numbers of messages are peanuts to TB, you can easily have 100000 per folder and opening such oflder should take 1-2 seconds. Also there should ideally be NO indexing activity on the startup. New messages should be indexed once upon arrival with the Global indexer (if you have it enabled in Options) and then no more.
Unless there is a bug :)
Or you mean some long synchronizing of IMAP folders with the server.
This and bug 558452 are just as strong in version 60 (no surprise).

As for performance, there should be further imap related speed improvements in the future - hopefully either 60.3.0 or 60.4.0.
Crash Signature: [@ nsImapProtocol::GetMessageSize] [@ nsImapProtocol::GetMessageSize(char const*, bool)] → [@ nsImapProtocol::GetMessageSize]
(In reply to Richard Leger from comment #31)
> This crash bp-174944e6-3f59-42c9-b758-c9fcb0180823 occurred in TB 60.0b10
> (32-bit) and seems linked to this bug.

https://hg.mozilla.org/releases/comm-esr60/annotate/tip/mailnews/imap/src/nsImapProtocol.cpp#l4575

->  m_hostSessionList->GetNamespaceForMailboxForHost(GetImapServerKey(), folderFromParser,


m_hostSessionList is null then
Add null checks and see where that takes us.
Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED
Attachment #9014902 - Flags: review?(jorgk)
Attached patch 825513-crash.patch (obsolete) — Splinter Review
Can we do a bit of boy-scout clean-up while we're here? This function looked just horrific.

(Unrelated: What happened to bug 1444389. And any cycles for bug 1495898?)
Attachment #9014902 - Attachment is obsolete: true
Attachment #9014902 - Flags: review?(jorgk)
Attachment #9014906 - Flags: review+
Attachment #9014906 - Flags: feedback?(mkmelin+mozilla)
Comment on attachment 9014906 [details] [diff] [review]
825513-crash.patch

Review of attachment 9014906 [details] [diff] [review]:
-----------------------------------------------------------------

Heh, that's quite an ugly function

::: mailnews/imap/src/nsImapProtocol.cpp
@@ +4664,5 @@
>    const char *folderFromParser = GetServerStateParser().GetSelectedMailboxName();
> +  if (!folderFromParser || !messageId || !m_runningUrl || !m_hostSessionList)
> +    return 0;
> +
> +  char *id = (char *)PR_CALLOC(strlen(messageId) + 1);

If you like feel free to clean up the id and folderName to be nsAutoCString too
Attachment #9014906 - Flags: feedback?(mkmelin+mozilla) → feedback+
Attached patch 825513-crash.patch (v2) (obsolete) — Splinter Review
Hmm, I didn't change anything to auto-strings, but I removed the duplicating of the ID passed in :-(
Attachment #9014906 - Attachment is obsolete: true
Attachment #9015069 - Flags: review+
Attached patch 825513-crash.patch (v2b) (obsolete) — Splinter Review
Can be simplified even further.
Attachment #9015069 - Attachment is obsolete: true
Attachment #9015070 - Flags: review+
Initialising the folder name is also a good idea since there is no error checking on the call that populates it.
Attachment #9015070 - Attachment is obsolete: true
Attachment #9015071 - Flags: review+
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/cd7b757fff58
Fix crash in nsImapProtocol::GetMessageSize() by adding null check for m_runningUrl && m_hostSessionList. r=jorgk
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 64.0
Attachment #9015071 - Flags: approval-comm-esr60?
Attachment #9015071 - Flags: approval-comm-beta+
Attachment #9015071 - Flags: approval-comm-esr60? → approval-comm-esr60+
Overall the crash rate has increased 5-10% over the past 6 months.

But, because of their high crash rates, we should have very quick insight with beta 64 (3-4 days) and 60.3.0 (within 1-2 days) as to whether this patch helps.
Flags: needinfo?(vseerror)
Preliminary, crsah rate might be down slightly, but certainly not significantly.  That is with only maybe 15-20% of users on 60.3.x.  We have more serious updating to happen
https://crash-stats.mozilla.com/signature/?product=Thunderbird&signature=nsImapProtocol%3A%3AGetMessageSize&date=%3E%3D2018-08-16T04%3A56%3A30.000Z&date=%3C2018-11-16T03%3A56%3A30.000Z#graphs
Flags: needinfo?(vseerror)
recent builds
bp-8acb8015-f617-4390-b678-096a20181115 60.3.1
bp-1a474dcf-5cc5-4538-8d42-8dd4f0181115 60.3.0
Flags: needinfo?(mkmelin+mozilla)
uint32_t nsImapProtocol::GetMessageSize(const char * messageId, bool idsAreUids)
{
  const char *folderFromParser = GetServerStateParser().GetSelectedMailboxName();
  if (!folderFromParser || !messageId || !m_runningUrl || !m_hostSessionList)
    return 0;

  char *folderName = nullptr;
  uint32_t size;

  nsIMAPNamespace *nsForMailbox = nullptr;
  m_hostSessionList->GetNamespaceForMailboxForHost(GetImapServerKey(), <=== line 4574
                                                   folderFromParser,
                                                   nsForMailbox);

I don't see how that line can crash now since we're testing for null right above.

But maybe m_hostSessionList has some stale/free'ed value, see the 0xffffffffe5e5e5e5 in the reports.

I don't particularly like this line in the code:
  m_hostSessionList = aHostSessionList; // no ref count...host session list has life time > connection
And in the .h file:
  nsIImapHostSessionList * m_hostSessionList;

Something like this is asking for trouble. So something our forebears didn't consider :-(

Shall we file a new bug now?
Yes please.
Flags: needinfo?(mkmelin+mozilla)
No longer blocks: 558452
Blocks: 1565517
You need to log in before you can comment on or make changes to this bug.