Closed Bug 1135588 Opened 5 years ago Closed 5 years ago

New conversations get indexed twice by gloda, leading to duplicate search results

Categories

(Thunderbird :: Instant Messaging, defect)

x86
macOS
defect
Not set
normal

Tracking

(thunderbird38+ fixed)

RESOLVED FIXED
Tracking Status
thunderbird38 + fixed

People

(Reporter: aleth, Assigned: aleth)

References

Details

Attachments

(1 file, 5 obsolete files)

STR
Start a conversation
Say something (X)
Close the conversation (context menu of the sidebar)
Run a search for X: Everything looks good, a result for X appears.
Now close and restart TB.
Run a search for X: The result for X appears twice.

Looks like the file is indexed again on restart, so some flag isn't getting set?
What is the plan to move forward with this?
There has been some progress on this. Some debugging has revealed that the source of the bug is a timer that is never reset. Seems like the issue has been around since chat landed in Tb. I'll be working on this and bug 1135291 this weekend and through next week.
Attached patch Patch (obsolete) — Splinter Review
I haven't tested this yet, clobber build running at the moment.
Assignee: nobody → nhnt11
Status: NEW → ASSIGNED
I remember testing this and it seemed a shutdown handler was required here so pending saves are ensured to haopen.
Attached patch cachesave.diff (obsolete) — Splinter Review
Since I'd already discussed this with nhnt11, it seemed easy to finish this patch so nhnt11 can focus on the chat reindexing bug.
Assignee: nhnt11 → aleth
Attachment #8577712 - Attachment is obsolete: true
Attachment #8578882 - Flags: review?(clokep)
Not actually a regression - this bug has existed since gloda IM indexing was added.
Keywords: regression
Attached patch cachesave.diff v2 (obsolete) — Splinter Review
I guess we should cancel the timeout too, just in case, though I don't think they fire any more at that stage in the shutdown.
Attachment #8578882 - Attachment is obsolete: true
Attachment #8578882 - Flags: review?(clokep)
Attachment #8578896 - Flags: review?(clokep)
Comment on attachment 8578896 [details] [diff] [review]
cachesave.diff v2

This won't do as _saveCacheNow uses NetUtil.asyncCopy.
Attachment #8578896 - Flags: review?(clokep) → review-
Attached patch cachesave.diff v3 (obsolete) — Splinter Review
Apart from having to convert the save to OS.File so the shutdown blocker could wait upon its completion, there was another bug causing duplicates: if autojoined channels exist, the first save is scheduled on startup before the existing cache file has been read. If this save overwrites the cache file, we end up with duplicate entries in the gloda index.
Attachment #8578896 - Attachment is obsolete: true
Attachment #8578982 - Flags: review?(clokep)
Comment on attachment 8578982 [details] [diff] [review]
cachesave.diff v3

Review of attachment 8578982 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good! I wrote a couple of nits asking for more comments. :) I'm going to do a build and test this now.

::: mail/components/im/modules/index_im.js
@@ +268,3 @@
>        return;
>      this._cacheSaveTimer = setTimeout(this._saveCacheNow, 5000);
> +    if (this._shutdownBlockerAdded)

Nit: Add an empty line here and add a comment about what this is doing?

@@ +285,5 @@
>        knownFiles: GlodaIMIndexer._knownFiles,
>        datastoreID: Gloda.datastoreID,
>      };
>  
> +    let path = OS.Path.join(OS.Constants.Path.profileDir, "logs", kCacheFileName);

Keep the comment "Asynchronously copy the data to the file." :)
Attached patch cachesave.diff v4 (obsolete) — Splinter Review
adds comments!
Attachment #8578982 - Attachment is obsolete: true
Attachment #8578982 - Flags: review?(clokep)
Attachment #8579020 - Flags: review?(clokep)
Comment on attachment 8579020 [details] [diff] [review]
cachesave.diff v4

I was able to still get duplicated indices with this. aleth was able to reproduce and has some ideas.
Attachment #8579020 - Flags: review?(clokep) → review-
Attached patch glodafix.diff v5Splinter Review
This was a little tricky to debug.
First I fixed
* autojoined channels got added to the cache before the cache was read from disk (which happened together with the log sweeping), which overwrote these new entries so they go missing
* cache save should work when there is no existing file
* we were receiving every new-text notification twice, scheduling twice as many indexing jobs as needed
* some minor inconsistencies which didn't really matter
but the problem was still there. It turned out to be a consequence of lastModifiedTime not being set on the indexing jobs of newly opened conversations (as opposed to old conversations found on log sweep). This led to weird behaviour and duplicates because if you set obj.property = undefined, hasOwnProperty.call(obj, "property") is true, but !!obj.property is false, and JSON.stringify(obj) doesn't include property at all. That way you could get conversations that were indexed but only sometimes found in the cache, and they would disappear on save.
Attachment #8579020 - Attachment is obsolete: true
Attachment #8580304 - Flags: review?(clokep)
Attachment #8580304 - Flags: review?(clokep) → review+
I looked over this and it seems sane. Fixes the issue on my build.
Thanks a ton for taking over this messy bug!
Comment on attachment 8580304 [details] [diff] [review]
glodafix.diff v5

[Approval Request Comment]
Regression caused by (bug #): 955292, plus longstanding bugs
Attachment #8580304 - Flags: approval-comm-aurora?
Comment on attachment 8580304 [details] [diff] [review]
glodafix.diff v5

https://hg.mozilla.org/releases/comm-aurora/rev/08f97de69ea3
Attachment #8580304 - Flags: approval-comm-aurora? → approval-comm-aurora+
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.