Closed Bug 1310835 Opened 3 years ago Closed 3 years ago

Search DICPATH for Hunspell dictionaries

Categories

(Core :: Spelling checker, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: 166291, Assigned: 166291)

Details

Attachments

(1 file, 3 obsolete files)

It'd be nice to allow users and distros to specify dictionaries that overwrite the core dictionaries. This would allow easier testing of dictionaries as well as different users or environments being able to set at runtime where dictionaries are, which is useful if you're using a functional package manager like Nix or Guix. The way to do this which other applications (LibreOffice, Hunspell itself) use is to look in DICPATH and load dictionaries from there.
Flags: a11y-review+
Reporter, you need to request review by setting the "review" flag to "?" and putting :ehsan on the requestee.
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/How_to_Submit_a_Patch#Getting_the_patch_reviewed
Attachment #8801898 - Flags: review?(ehsan)
Attachment #8801892 - Flags: review?(ehsan)
I'm leaving on vacation later today so unfortunately I won't be able to finish the reviews here but I'll do a first pass.  For the next iterations of the patches, please try asking review from :smaug.
Comment on attachment 8801892 [details] [diff] [review]
spellcheck: Look for dictionaries in DICPATH.

Review of attachment 8801892 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you for submitting these patches!

I have a few comments below.

::: extensions/spellcheck/hunspell/glue/mozHunspell.cpp
@@ +361,5 @@
> +  // find dictionaries in DICPATH
> +  char* dicEnv = PR_GetEnv("DICPATH");
> +  if (dicEnv) {
> +    // do a two-pass dance so dictionaries are loaded right-to-left as preference
> +    nsCOMArray<nsIFile> dirs;

Nit: nsCOMArray is an old type that shouldn't be used in new code.  Please use nsTArray<nsCOMPtr<nsIFile>> instead.

@@ +362,5 @@
> +  char* dicEnv = PR_GetEnv("DICPATH");
> +  if (dicEnv) {
> +    // do a two-pass dance so dictionaries are loaded right-to-left as preference
> +    nsCOMArray<nsIFile> dirs;
> +    nsDependentCString env(dicEnv); // mutable for strtok

Please avoid mutating the return value of PR_GetEnv.  Instead, you should copy the string here (for example you can switch to using an nsAutoCString.)

@@ +375,5 @@
> +    }
> +
> +    // load them in reverse order so they override each other properly
> +    for (int32_t i = dirs.Count() - 1; i >= 0; i--)
> +      LoadDictionariesFromDir(dirs[i]);

Hmm, shouldn't this be the opposite?  In the PATH environment variable, for example, the last directory wins.  Why is this one the reverse?
Attachment #8801892 - Flags: review?(ehsan)
Comment on attachment 8801898 [details] [diff] [review]
spellcheck: Ignore hyphen dictionaries when loading from DICPATH.

Review of attachment 8801898 [details] [diff] [review]:
-----------------------------------------------------------------

What are hyphen dictionaries?  I don't remember encountering this before.

Is there a better way to detect them rather than rely on file name conventions?  Also, is there any reason why we would want to support loading them otherwise?  We don't rely on that now.
Attachment #8801898 - Flags: review?(ehsan)
(In reply to Away, back on Nov 10 from comment #4)
> Comment on attachment 8801892 [details] [diff] [review]
> spellcheck: Look for dictionaries in DICPATH.
> 
> Review of attachment 8801892 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thank you for submitting these patches!
> 
> I have a few comments below.
> 
> ::: extensions/spellcheck/hunspell/glue/mozHunspell.cpp
> @@ +361,5 @@
> > +  // find dictionaries in DICPATH
> > +  char* dicEnv = PR_GetEnv("DICPATH");
> > +  if (dicEnv) {
> > +    // do a two-pass dance so dictionaries are loaded right-to-left as preference
> > +    nsCOMArray<nsIFile> dirs;
> 
> Nit: nsCOMArray is an old type that shouldn't be used in new code.  Please
> use nsTArray<nsCOMPtr<nsIFile>> instead.
> 
> @@ +362,5 @@
> > +  char* dicEnv = PR_GetEnv("DICPATH");
> > +  if (dicEnv) {
> > +    // do a two-pass dance so dictionaries are loaded right-to-left as preference
> > +    nsCOMArray<nsIFile> dirs;
> > +    nsDependentCString env(dicEnv); // mutable for strtok
> 
> Please avoid mutating the return value of PR_GetEnv.  Instead, you should
> copy the string here (for example you can switch to using an nsAutoCString.)

Noted!

> @@ +375,5 @@
> > +    }
> > +
> > +    // load them in reverse order so they override each other properly
> > +    for (int32_t i = dirs.Count() - 1; i >= 0; i--)
> > +      LoadDictionariesFromDir(dirs[i]);
> 
> Hmm, shouldn't this be the opposite?  In the PATH environment variable, for
> example, the last directory wins.  Why is this one the reverse?

The last directory doesn't win, the first does. If your PATH is /bin:/usr/bin, and 'cat' exists in both, the one in /bin will be used.

(In reply to Away, back on Nov 10 from comment #5)
> Comment on attachment 8801898 [details] [diff] [review]
> spellcheck: Ignore hyphen dictionaries when loading from DICPATH.
> 
> Review of attachment 8801898 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> What are hyphen dictionaries?  I don't remember encountering this before.
> 
> Is there a better way to detect them rather than rely on file name
> conventions?  Also, is there any reason why we would want to support loading
> them otherwise?  We don't rely on that now.

Hyphen dictionaries are used for the 'hyphen' tool. I'm assuming someone might put them in their DICPATH along with thesaurus dictionaries, given they're dictionaries. I don't think there's a way to identify them not using file name, since this is how other programs do it.

There's a discussion here about it: https://github.com/hunspell/hunspell/issues/413

It probably might be an idea to just ignore them, but I don't want to be one to break any existing code or extensions that rely on them.

(In reply to Away, back on Nov 10 from comment #3)
> I'm leaving on vacation later today so unfortunately I won't be able to
> finish the reviews here but I'll do a first pass.  For the next iterations
> of the patches, please try asking review from :smaug.

Have a good vacation. Thanks for the review. :)

New patch coming 'soon'.
Okay, I've looked at the code a bit more. I don't quite get why I need to use nsAutoCString, since nsDependentCString is the same but deals with locales (which is important in environmental variable land). I had assumed BeginWriting would clone it. Should I strdup it before passing it to nsDependentCString?
Attachment #8801892 - Flags: review+
Attachment #8801898 - Flags: review?(bugs)
Attachment #8801892 - Flags: review?(bugs)
Attachment #8801892 - Flags: review+
nsDependentCString doesn't own its data, so whatever you're modifying, is modifying the data it doesn't worn.
And what you mean with locales? I'm not aware nsDependentCString doing anything special with locales.
Its implementation just inherits nsTString_CharT.
Comment on attachment 8801892 [details] [diff] [review]
spellcheck: Look for dictionaries in DICPATH.

>+  // find dictionaries in DICPATH
>+  char* dicEnv = PR_GetEnv("DICPATH");
>+  if (dicEnv) {
>+    // do a two-pass dance so dictionaries are loaded right-to-left as preference
>+    nsCOMArray<nsIFile> dirs;
>+    nsDependentCString env(dicEnv); // mutable for strtok
So this isn't yet following ehsan's review comment.
And he wanted to have nsTArray here, not nsCOMArray.

>+
>+    char* currPath = nullptr;
>+    char* nextPaths = env.BeginWriting();
>+    while (!!(currPath = NS_strtok(":", &nextPaths))) {
Do you actually need !! here.
Attachment #8801892 - Flags: review?(bugs) → review-
Comment on attachment 8801898 [details] [diff] [review]
spellcheck: Ignore hyphen dictionaries when loading from DICPATH.

Btw, using -p -U 8 when creating patches make them quite a bit easier to read.

> {
> #ifdef DEBUG
>   // There must be only one instance of this class: it reports memory based on
>@@ -361,6 +361,9 @@
>   // find dictionaries in DICPATH
>   char* dicEnv = PR_GetEnv("DICPATH");
>   if (dicEnv) {
>+    bool oldIncludeHyphenDicts = mIncludeHyphenDicts;
>+    mIncludeHyphenDicts = false; // DICPATH may include hyph_ files, ignore these
>+    
>     // do a two-pass dance so dictionaries are loaded right-to-left as preference
>     nsCOMArray<nsIFile> dirs;
>     nsDependentCString env(dicEnv); // mutable for strtok
>@@ -377,6 +380,8 @@
>     // load them in reverse order so they override each other properly
>     for (int32_t i = dirs.Count() - 1; i >= 0; i--)
>       LoadDictionariesFromDir(dirs[i]);
Wouldn't it be simpler to just pass a new param to LoadDictionariesFromDir to tell whether to include hyph_ * files.

But is the check even needed? There is a check for .aff files already, and I assume there aren't hyph_foobar.dic files for
foobar.aff, at least if the documentation I found is right: based on document the files are *.dic, *.aff and optional hyph_*.dic.


So, explain why we even need this patch. And if we need, just add a new param to LoadDictionariesFromDir?
Attachment #8801898 - Flags: review?(bugs) → review-
Attached patch Patch v2 (obsolete) — Splinter Review
Attachment #8801892 - Attachment is obsolete: true
Attachment #8801898 - Attachment is obsolete: true
You're right, the second patch can be junked. I fixed the first patch up too. I had wrongly assumed nsDependentCString would convert from the current locale to UTF-8. I can't find any functions that do this, so just assuming the environment variable is UTF-8 should do for now.
Attachment #8806122 - Flags: review?(bugs)
Comment on attachment 8806122 [details] [diff] [review]
Patch v2

>+    char* currPath = nullptr;
>+    char* nextPaths = env.BeginWriting();
>+    while ((currPath = NS_strtok(":", &nextPaths))) {
>+      nsCOMPtr<nsIFile> dir;
>+      rv = NS_NewNativeLocalFile(nsCString(currPath), true, getter_AddRefs(dir));
>+      if (NS_SUCCEEDED(rv))
>+        dirs.AppendElement(dir);
always {} with 'if'

>+    }
>+
>+    // load them in reverse order so they override each other properly
>+    for (int32_t i = dirs.Length() - 1; i >= 0; i--)
>+      LoadDictionariesFromDir(dirs[i]);
always {} with 'for'


(some old code may have still different coding style, but new code should follow the rules more strictly)
Attachment #8806122 - Flags: review?(bugs) → review+
Attached patch Patch v3Splinter Review
Added brackets to for and if statements.
Attachment #8806122 - Attachment is obsolete: true
Attachment #8807718 - Flags: review?(bugs)
Attachment #8807718 - Flags: review?(bugs) → review+
Assignee: nobody → 166291
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b9cd3d1a34e0
Look for spellcheck dictionaries in DICPATH. r=smaug
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/b9cd3d1a34e0
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.