Closed Bug 1598587 Opened 4 years ago Closed 3 years ago

hitting assertion "selection count is wrong" in nsMsgDBView::GetIndicesForSelection

Categories

(MailNews Core :: Backend, defect)

defect
Not set
normal

Tracking

(thunderbird_esr78 wontfix)

RESOLVED FIXED
91 Branch
Tracking Status
thunderbird_esr78 --- wontfix

People

(Reporter: mkmelin, Assigned: benc)

Details

Attachments

(1 file)

Bug 1562158 would have been my first suspect too.
But looking through the code, I can't see anything obviously borked there.

I can't get that line to assert when I run the composition/test-send-button.js mozmill test locally (see below for the asserts I do get)

The assert occurs when the number of selected items we get from the selection ranges of a nsITreeSelection differs from the nsITreeSelections count attribute.

This could happen if:

  1. the nsITreeSelection has elements selected which are out of the view's range (ie non-existent elements)
  2. the nsITreeSelection is returning overlapping ranges.

My money is on 1. The nsTreeSelection implementation code aims to cope with merging overlapping ranges and it looks solid. I'm sure it's been well exercised over the years. I think overlapping ranges would have been noticed before. The implementation has a couple of oddities, but overall looks good.

I think things are being selected outside the view. Maybe something's being deleted and the selection hasn't been updated accordingly...

Anyway, I've just added a bunch of asserts for a try run to catch both 1 and 2 above:
https://hg.mozilla.org/try-comm-central/rev/9f5877ad6f66e63d35894d199e56b9a60059d65f

try run:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=a6a07fd24a9fc06fa93ba5697504f239cb59e7bb

The asserts I do get on composition/test-send-button.js:

[36909, Main Thread] ###!!! ASSERTION: Table inline-size is less than the sum of its columns' min inline-sizes: '!(aISizeType == BTLS_FINAL_ISIZE && aISize < guess_min)', file /fast/ben/tb/mozilla/layout/tables/BasicTableLayoutStrategy.cpp, line 760

[36909, Main Thread] ###!!! ASSERTION: didn't subtract all that we added: '(space == 0 || space == nscoord_MAX) && ((l2t == FLEX_PCT_LARGE) ? (-0.001f < basis.f && basis.f < 0.001f) : (basis.c == 0 || basis.c == nscoord_MAX))', file /fast/ben/tb/mozilla/layout/tables/BasicTableLayoutStrategy.cpp, line 987

Could conceivably be masking things. I haven't dug into these in detail, but from experience I've noticed treeview widgets tend to use table layouts... so it could be the same bug manifesting differently.

(In reply to Ben Campbell from comment #1)

Bug 1562158 would have been my first suspect too.
But looking through the code, I can't see anything obviously borked there.

I can't get that line to assert when I run the composition/test-send-button.js mozmill test locally (see below for the asserts I do get)

The assert occurs when the number of selected items we get from the selection ranges of a nsITreeSelection differs from the nsITreeSelections count attribute.

This could happen if:

  1. the nsITreeSelection has elements selected which are out of the view's range (ie non-existent elements)
  2. the nsITreeSelection is returning overlapping ranges.

My money is on 1. The nsTreeSelection implementation code aims to cope with merging overlapping ranges and it looks solid. I'm sure it's been well exercised over the years. I think overlapping ranges would have been noticed before. The implementation has a couple of oddities, but overall looks good.

I think things are being selected outside the view. Maybe something's being deleted and the selection hasn't been updated accordingly...

Anyway, I've just added a bunch of asserts for a try run to catch both 1 and 2 above:
https://hg.mozilla.org/try-comm-central/rev/b65b47c9a47990e275f80d1bd2f80c3515c1e57d

try run:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=afd243cd907b3de75750912b2432dc1cd9c5e013

The asserts I do get on composition/test-send-button.js:

[36909, Main Thread] ###!!! ASSERTION: Table inline-size is less than the sum of its columns' min inline-sizes: '!(aISizeType == BTLS_FINAL_ISIZE && aISize < guess_min)', file /fast/ben/tb/mozilla/layout/tables/BasicTableLayoutStrategy.cpp, line 760

[36909, Main Thread] ###!!! ASSERTION: didn't subtract all that we added: '(space == 0 || space == nscoord_MAX) && ((l2t == FLEX_PCT_LARGE) ? (-0.001f < basis.f && basis.f < 0.001f) : (basis.c == 0 || basis.c == nscoord_MAX))', file /fast/ben/tb/mozilla/layout/tables/BasicTableLayoutStrategy.cpp, line 987

Could conceivably be masking things. I haven't dug into these in detail, but from experience I've noticed treeview widgets tend to use table layouts... so it could be the same bug manifesting differently.

My observation in bug 1158471 comment 6
is related to case (1) above.
Basically, I observed that

- selection count is misused a little bit.
  It stands for different numbers in different places.

  In some places, it seems it stands (?) for the selected lines as in selection of message headers in the header pane. If you select, say, 30 headers, then this should be 30. 

  However, in some places, this number seems to be used for
  the lines that are currently *visible* in, say, header pane and selected.

  So even if we may have selected 30 header lines as a whole, suppose the header pane shows only
  10 headers, only 10 selected lines can be shown at maximum, 
  and if we scroll the headers, fewer selected lines are visible depending on the position of selected lines and where we look at the headers through the header pane.

No wonder assertion failures occur (!)

Good to know that someone has finally paid attention to this Assertion.

The last observation was done manually: you select some header lines in message list.
Shrink the message list window, or scroll back and forth.

Voila. You get assertions with DEBUG BUILD.

(In reply to ISHIKAWA, Chiaki from comment #2)

- selection count is misused a little bit.
  It stands for different numbers in different places.

  In some places, it seems it stands (?) for the selected lines as in selection of message headers in the header pane. If you select, say, 30 headers, then this should be 30. 

  However, in some places, this number seems to be used for
  the lines that are currently *visible* in, say, header pane and selected.

Hmm. I don't think I've seen this case. In all the cases I've seen, nsMSgDBView is a list of all the messages, not just the ones onscreen (it has to be all because it reorders them according to the current primary/secondary sorting criteria).

  So even if we may have selected 30 header lines as a whole, suppose the header pane shows only
  10 headers, only 10 selected lines can be shown at maximum, 
  and if we scroll the headers, fewer selected lines are visible depending on the position of selected lines and where we look at the headers through the header pane.

The last observation was done manually: you select some header lines in message list.
Shrink the message list window, or scroll back and forth.

I haven't been able to replicate this, at least not on the main Thunderbird window (I picked an Inbox, then highlighted a bunch of messages and resized the window). How should I go about reproducing this one?

Yay - my test asserts from comment 1 seem to have triggered on a few tests!

Assertion failure: startRange<viewSize, at /builds/worker/workspace/build/src/comm/mailnews/base/src/nsMsgDBView.cpp:2330

They are all startRange<viewSize, which means the treeSelection has got some selection(s) which are outside the range of nsMsgDBView. Which I'd guess implies something hasn't updated the treeSelection to keep it in sync with the nsMsgDBView...

Hopefully I'll be able to get the assert triggering locally... (otherwise: printf debugging via try builds.... shudder...)

Version: unspecified → 71

(In reply to Ben Campbell from comment #4)

Yay - my test asserts from comment 1 seem to have triggered on a few tests!
...
Hopefully I'll be able to get the assert triggering locally... (otherwise: printf debugging via try builds.... shudder...)

any luck?

Flags: needinfo?(benc)

OK - finally making some progress on this. No fix yet, but notes so far...

Easy way to replicate:

  1. set up a local folder with a single message in it.
  2. click on that message and delete it.
  3. Viola.

The main issue is that FolderDisplayWidget.selectedMessages (https://searchfox.org/comm-central/source/mail/base/content/folderDisplay.js#2248) is being accessed after the message has been deleted from the nsMsgDBView, but before the view's tree selection has been updated (via OnMessagesRemoved().)
This is getting a little to much into GUI land for my comfort here ;-)

There might also be an issue with nsMsgDBView::GetMsgToSelectAfterDelete(). It's called before the deletion, then the value returned is used in OnMessagesRemoved to set the selection afterwards. But in this case, of deleting the last message it returns 0, rather than nsMsgViewIndex_None (-1). I think the rest of the code would be tolerant of this out-of-bounds selection (0 is no longer a valid message after the delete), but it still seems like it should be fixed.

Flags: needinfo?(benc)

(In reply to Ben Campbell from comment #6)

OK - finally making some progress on this. No fix yet, but notes so far...

Easy way to replicate:

  1. set up a local folder with a single message in it.
  2. click on that message and delete it.
  3. Viola.

Great simple repeatable case (!)

I have had a bad days during April and early May due to cold-like symptoms (not Covid-19) and am trying to catch up on some remaining patches, but once I take care of them I would like to look into this. This has bothered me for quite a long time when I analyze the local test log file.

(In reply to ISHIKAWA, Chiaki from comment #7)

I have had a bad days during April and early May due to cold-like symptoms

Hope you're feeling much better now!

No fix yet, but I know what's happening - when you delete a message, it causes the various nsIFolderListener notifications to fire in order to update the message counts.
The GUI makes heavy use of these, and iterates over the various commands to see if they should be enabled/disabled. Most of these try to count the selected messages, which ends up in nsMsgDBView::GetIndicesForSelection(). But because the selection hasn't yet been updated to reflect the now-deleted message, the selection is invalid and that assert triggers.

I'm certain the actual fix will be something really simple, but I want to understand what's going on and that means picking through the copying code, which is rather... involved. So many listeners and notifications!

I'm most of the way there now, and hoping to sort out a fix tomorrow.

Well, I've got a WIP fix for it. There was already a mechanism for suppressing nsIFolderListener count notifications (totalmessages and unread count, via OnIntPropertyChanged()). But the notifications were being switched back on too soon. Handlers in the GUI for these notifications caused the problem, by accessing the view selection before it had been updated to account for the now-missing messages.

Obviously, there are other bits of code that depend on those notifications firing at the wrong time. In particular:

./mach xpcshell-test comm/mailnews/local/test/unit/test_pop3MultiCopy2.js

... now fails if this patch is applied. Sigh.
Anyway it doesn't look tooooo serious, so I'm confident of revising my patch.
There'll likely be a bunch of follow-up patches also, to smooth out a few of the rough edges I've encountered along the way so far.

The whole copy/move/delete mechanism is mindbendingly complex and brittle, spread out over large numbers of objects, jumping back and forth from C++ to javascript, and async using at least 5 or 6 separate listener systems. Just complexity creeping in over time.
Definitely some massive simplification is in order. I'll be trying to come up with some steps toward that goal.

Attachment #9226296 - Attachment description: WIP: Bug 1598587 - WIP selection sync fix → Bug 1598587 - Tweak folder notification during deletion to keep view selection in sync. r=mkmelin
Status: NEW → ASSIGNED
Target Milestone: --- → 91 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/216ae92d0317
Tweak folder notification during deletion to keep view selection in sync. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.