Open Bug 1798233 Opened 3 years ago Updated 7 months ago

Fast repeated spellcheck dictionary updates fail assertions in `SetFallbackDictionary`

Categories

(Core :: Spelling checker, defect, P3)

Desktop
All
defect

Tracking

()

ASSIGNED
Tracking Status
firefox108 --- affected

People

(Reporter: Gijs, Assigned: aiunusov)

References

Details

SetFallbackDictionary asserts that mUpdateDictionaryRunning. This assumes that only 1 update can run at a time. This assertion fails if two updates are made to run: https://treeherder.mozilla.org/jobs?repo=try&revision=b714c24881ec2c1791bfa5d31b713ee19a15f625&selectedTaskRun=J_dd3AIJRyGQV6ohzom4xQ.0 .

Logging in BeginUpdateDictionary and EndUpdateDictionary produces:

0:31.19 GECKO(4140) Started updating dict.
0:31.19 GECKO(4140) Started updating dict.
0:31.19 GECKO(4140) Finished updating dict.

After which we assert and thus crash the debug build.

I am not very familiar with this code, but from a brief look it would seem that:

  1. https://searchfox.org/mozilla-central/rev/73ff5f1ac45202c4621a1bcee4b35d20027e71ad/editor/spellchecker/EditorSpellCheck.cpp#728-729,770-771 is an interface-exposed method that doesn't protect against re-entry
  2. that method will initialize a DictionaryFetcher with an async completion callback
  3. once the fetcher completes we end up in the DictionaryFetched callback, which does have a re-entry protection: https://searchfox.org/mozilla-central/rev/73ff5f1ac45202c4621a1bcee4b35d20027e71ad/editor/spellchecker/EditorSpellCheck.cpp#827-835 which says:
  if (aFetcher->mGroup < mDictionaryFetcherGroup) {
    // SetCurrentDictionary was called after the fetch started.  Don't overwrite
    // that dictionary with the fetched one.
    EndUpdateDictionary();
    if (aFetcher->mCallback) {
      aFetcher->mCallback->EditorSpellCheckDone();
    }
    return NS_OK;
  }
  1. if the second attempt to update the dictionary (which also hits this code but where mGroup < mDictionaryFetcherGroup is not the case) then continues down this function and fails, it will invoke SetFallbackDictionary here and that will assert.

It's not 100% clear to me what the correct solution is supposed to be here, because although there is a comment about the point of mUpdateDictionaryRunning at https://searchfox.org/mozilla-central/rev/73ff5f1ac45202c4621a1bcee4b35d20027e71ad/editor/spellchecker/EditorSpellCheck.cpp#595-599 , I can't say I really follow it. So I can't tell which of the following (non-exhaustive) list of options we should pursue:

  1. get rid of the assert
  2. make mUpdateDictionaryRunning a nesting level counter instead of a bool (so we validate we have an equal number of entries and exits)
  3. avoid starting a second parallel call to UpdateCurrentDictionary

:dminor or :adw, can you help?

Meant to needinfo for comment 0.

Flags: needinfo?(dminor)
Flags: needinfo?(adw)

(note that I'm not 100% sure that the scenario/steps in comment 0 are what is causing the bug, because attempting to do stack logging from the begin/end update calls slowed things down sufficiently to stop reproducing the TV bug on my machine. 🙃)

It's been many years since I worked on this and I don't have the context anymore, so I can't be much help without spending a lot of time on it. I don't remember anything about who calls what and so forth. I did write the first sentence of that comment though, which I guess means if DictionaryFetched() calls that method, then mUpdateDictionaryRunning is true, so we can avoid doing a bunch of work. It sounds like an optimization. I don't know about the second sentence.

Flags: needinfo?(adw)

Although I made some changes around this code for the multiple dictionary support, I didn't look at the logic here in detail, so I don't think I can be much help on this either.

Flags: needinfo?(dminor)
Severity: -- → S3
Priority: -- → P3
Flags: needinfo?(smaug)
Flags: needinfo?(aiunusov)
Assignee: nobody → aiunusov
Status: NEW → ASSIGNED
Flags: needinfo?(aiunusov)

I could not reproduce this issue,
but suppose we can simply check if mUpdateDictionaryRunning at top level method UpdateCurrentDictionary(), and return NS_OK (to prevent calling it twice):
https://searchfox.org/mozilla-central/source/editor/spellchecker/EditorSpellCheck.cpp#729

I'm not sure about the consistency of the mUpdateDictionaryRunning variable, I see at least an attempt to do it correctly, but this class does not seem to be used (dead code): https://searchfox.org/mozilla-central/source/editor/spellchecker/EditorSpellCheck.cpp#58

I guess I know ho to fix it. But still could not reproduce it.
If you remember how to reproduce the issue, I would appreciate it.

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Artur Iunusov from comment #6)

I guess I know ho to fix it. But still could not reproduce it.
If you remember how to reproduce the issue, I would appreciate it.

Remove the skip-if at https://searchfox.org/mozilla-central/rev/9de332d5c8faac58dc1232b8a6383ce6cb1400f4/browser/base/content/test/contextMenu/browser.ini#50-52 and run ./mach test browser_contextmenu_spellcheck.js on a debug build should do it, if nothing has changed since October. Does that reproduce for you or not? :-)

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(aiunusov)

I can't really remember this code. What would it mean if we returned early in UpdateCurrentDictionary() ? What would happen to that callback parameter? Someone needs to call it at some point if the method returns NS_OK.

Could we somehow serialize the calls? Could EditorSpellCheck have a list/array for pending updates and process those one by one?
Or could the method return a Promise? And it would return the same promise until a fetch is done. One would just call .then() on it to get the callback.
I don't quite understand the need for multiple call to the method though.

(Feel free to file a new bug about that dead code ;) )

Flags: needinfo?(smaug)

(In reply to :Gijs (he/him) from comment #7)

(In reply to Artur Iunusov from comment #6)

I guess I know ho to fix it. But still could not reproduce it.
If you remember how to reproduce the issue, I would appreciate it.

Remove the skip-if at https://searchfox.org/mozilla-central/rev/9de332d5c8faac58dc1232b8a6383ce6cb1400f4/browser/base/content/test/contextMenu/browser.ini#50-52 and run ./mach test browser_contextmenu_spellcheck.js on a debug build should do it, if nothing has changed since October. Does that reproduce for you or not? :-)

Thanks for the hint! I guess this test should help to reproduce.

Flags: needinfo?(aiunusov)

Update: after multiple test runs, still could not reproduce assert.
(Just double checked MOZ_ASSERT(false) leads to failure, so, I have a proper debug build)

(In reply to Artur Iunusov from comment #10)

Update: after multiple test runs, still could not reproduce assert.
(Just double checked MOZ_ASSERT(false) leads to failure, so, I have a proper debug build)

Huh, yeah. Looks like try is happy too: https://treeherder.mozilla.org/jobs?repo=try&revision=e555604427a9beecd417369fe0641128036bf676&selectedTaskRun=K8P3oyuITgiB3UjfaFum2Q.0

Maybe the debug skip-if can just be removed at this point? Part of me would like to understand what fixed it, but doing the bisection seems potentially not worth it.

need to remove debug skip-if and check again

Flags: needinfo?(aiunusov)

./mach test ./browser/base/content/test/contextMenu/browser_contextmenu_spellcheck.js
mochitest-browser

Ran 689 checks (688 subtests, 1 tests)
Expected results: 689
Unexpected results: 0
OK
cat mozconfig
ac_add_options --enable-debug
Flags: needinfo?(aiunusov)

Results on my machine (which is Windows 11 on core i9)
But according to the code I understand why it could fail after enabling it.
So, I agree to remove an assert and enable the test or use int counter insted of bool

You need to log in before you can comment on or make changes to this bug.