Closed Bug 503502 Opened 15 years ago Closed 10 years ago

Inconsistency between description and behavior on the minimum tag length

Categories

(addons.mozilla.org Graveyard :: Public Pages, defect, P4)

Tracking

(Not tracked)

RESOLVED WONTFIX
Future

People

(Reporter: fcp2007, Unassigned)

References

Details

The UI says that a tag must be two or more characters long, but this is incorrect because one-character tags are almost always allowed ― the only exception is when the character is in US-ASCII ;-) .  For example, “á” (U+00E1) and “海” (U+6D77) are allowed.  In fact, one-character tags can be useful in some languages including Japanese where there are many one-character words (e.g. “海” (U+6D77) means the ocean).

I do not know why one-character tags are disallowed.  Bug 502126 does not explain the reason for this.  But if they are disallowed because they are meaningless, the same reason applies to U+00E1 but does not apply to U+6D77.  I do not think it wise to try to tell if a tag is meaningful or not by looking at its length.
There are a few things here...

One character tags were disallowed because MySQL, with our current setup, can only search 2 or more characters (ft_min_word_len).  If we allowed 1 character tags you wouldn't be able to search for them so they wouldn't be very useful on the site.  This was meant to be a temporary measure waiting on our new search engine, bug 498999.

However, I didn't tell mb_strlen() what encoding it was in assuming it was smarter than it is[1].  Because of that it appears to be letting stuff like 海 through and claiming it is 3 characters long - I might as well be using strlen().  That said, apparently we're not setting the utf8 charset when we connect to mysql[2] so it's treating it the same way and actually running searches on it.

Since the limit is temporary and we're making the same mistake in PHP and MySQL I'm going to leave this bug open for now (with the incorrect description on the site) and mark it dependent on our new search engine.  When we get our new search engine we can remove the text on the site and the 1 char restriction for regular ascii and close this bug.

[1] Bug 503520 filed to fix that everywhere
[2] Bug 503523
Status: UNCONFIRMED → NEW
Depends on: 498999
Ever confirmed: true
Thank you for the detailed explanation, Wil.  I did not think of the possibility that the limitation on one-character tags came from the MySQL setup.  I agree that the tags which we cannot search with are not very useful.

(In reply to comment #1)
> Since the limit is temporary and we're making the same mistake in PHP and MySQL
> I'm going to leave this bug open for now (with the incorrect description on the
> site) and mark it dependent on our new search engine.

I am fine with that.  I am glad to know that this inconsistency is merely a result of a temporary measure, and I hope you can replace the DB engine with less restrictions in future.  Thanks!
Well, we're connecting with UTF8 now, but that is apparently making mysql merge all the cases for tags. Sigh.  Anyway, this bug is still valid, but I'd keep an eye on bug 525271 to see what's going on with tags in the future.
Severity: normal → minor
Priority: -- → P4
Target Milestone: --- → Future
Thanks for filing this.  In an effort to not drown in existing reports we're aggressively closing old enhancements and bugs to get the buglist to a reasonable level so we can scope and process bug sprints in an effective manner.

Patches for this bug are still welcome.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Product: addons.mozilla.org → addons.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.