Closed
Bug 1469010
Opened 6 years ago
Closed 6 years ago
[machinery] In TM, add support for querying strings longer than 255 characters
Categories
(Webtools Graveyard :: Pontoon, enhancement, P3)
Webtools Graveyard
Pontoon
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mathjazz, Assigned: jotes)
References
Details
Attachments
(3 files)
Internal Pontoon translation memory doesn't work for strings longer than 255 characters due to Postgres levenshtein limitations.
Comment 2•6 years ago
|
||
I checked the limitation and it's only a #define[1] that's meant to protect memory and cpu from abuse[2], so that nobody asks for a levenshtein of a big blob. It can be increased, they just used a "reasonable" varchar-like value. I don't know which system you use, but on Debian (and derivatives) you can patch that, recompile postgresql (just the server) and the limitation is gone. It takes a couple of minutes. I just did it. # download the source $ apt-get build-dep postgresql-9.6 $ apt-get source postgresql-9.6 # you go into the directory with the source $ cd DIR_WITH_SOURCE # change the #define, I set it to 512 $ vi src/backend/utils/adt/levenshtein.c # build and install $ dpkg-buildpackage -rfakeroot -uc -b $ cd .. $ dpkg -i NAME_OF_PACKAGE.deb # restart the server I tested the attached file before and after the patch: * Result before the patch: levenshtein ------------- 2 (1 row) ERROR: levenshtein argument exceeds maximum length of 255 characters levenshtein ------------- 5 (1 row) * Result after the patch: levenshtein ------------- 2 (1 row) levenshtein ------------- 5 (1 row) levenshtein ------------- 5 (1 row) [1] https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/levenshtein.c#L26 [2] https://github.com/postgres/postgres/blob/master/src/backend/utils/adt/levenshtein.c#L122
Assignee | ||
Comment 3•6 years ago
|
||
Hi Eduardo! First of all, thanks for your input. I also considered modification of this plugin myself some time ago. However, from what I know, Pontoon is hosted on Heroku and from what I remember, it's not possible to add a custom precompiled extension (feel free to correct me if my knowledge is not up to date).
Comment 4•6 years ago
|
||
Would it be possible to do just those searches (involving source strings whose length is more than 255) on an external copy of the database, at Mozilla for example, where you could add the extension? If they are less than 1%, that impact would be, I guess, not very noticeable. Also, maybe searching and comparing the first 255 would anyway be helpful, though a 100% would have to be downgraded to other value so that translators know it's not a 100% match. That could work for many strings. It's better than an empty Machinery box. I don't know the internals, so those ideas don't necessarily make sense. Just trying to help.
Assignee | ||
Comment 6•6 years ago
|
||
Imho, the fastest way to solve this issue is to measure the Levenshtein ratio in Python (only for strings longer than 255 characters). It may take some time before We'll find a better storage for the Translation Memory (or change algorithm) - and probably this goes beyond the scope of this bug. :mathjazz what do you think about this approach?
Flags: needinfo?(m)
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → poke
Reporter | ||
Updated•6 years ago
|
Status: NEW → ASSIGNED
Comment 8•6 years ago
|
||
Comment 9•6 years ago
|
||
Commit pushed to master at https://github.com/mozilla/pontoon https://github.com/mozilla/pontoon/commit/ce9db869a6a91afa293fbd9f5c489a779d0d5c0d Fix bug 1469010 - Process all strings above 255 characters length in (#1121) * Fix bug 1469010 - Process all strings above 255 characters length in Python.
Updated•6 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 10•6 years ago
|
||
I've deployed this patch to stage: TM returns 500 for any input I try: https://mozilla-pontoon-staging.herokuapp.com/machinery/ Internal Server Error: /translation-memory/ Traceback (most recent call last): File "/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/exception.py", line 41, in inner response = get_response(request) File "/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py", line 249, in _legacy_get_response response = self._get_response(request) File "/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, in _get_response response = self.process_exception_by_middleware(e, request) File "/app/.heroku/python/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, in _get_response response = wrapped_callback(request, *callback_args, **callback_kwargs) File "/app/.heroku/python/lib/python2.7/site-packages/newrelic/hooks/framework_django.py", line 544, in wrapper return wrapped(*args, **kwargs) File "/app/pontoon/machinery/views.py", line 62, in translation_memory .minimum_levenshtein_ratio(text) File "/app/pontoon/base/models.py", line 3030, in minimum_levenshtein_ratio max_dist, File "/app/pontoon/base/models.py", line 2934, in postgres_levenshtein_ratio output_field=models.DecimalField() File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/query.py", line 945, in annotate clone.query.add_annotation(annotation, alias, is_summary=False) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/sql/query.py", line 973, in add_annotation summarize=is_summary) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/expressions.py", line 217, in resolve_expression for expr in c.get_source_expressions() File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/expressions.py", line 411, in resolve_expression c.lhs = c.lhs.resolve_expression(query, allow_joins, reuse, summarize, for_save) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/expressions.py", line 411, in resolve_expression c.lhs = c.lhs.resolve_expression(query, allow_joins, reuse, summarize, for_save) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/expressions.py", line 548, in resolve_expression c.source_expressions[pos] = arg.resolve_expression(query, allow_joins, reuse, summarize, for_save) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/expressions.py", line 411, in resolve_expression c.lhs = c.lhs.resolve_expression(query, allow_joins, reuse, summarize, for_save) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/expressions.py", line 411, in resolve_expression c.lhs = c.lhs.resolve_expression(query, allow_joins, reuse, summarize, for_save) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/expressions.py", line 471, in resolve_expression return query.resolve_ref(self.name, allow_joins, reuse, summarize) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1477, in resolve_ref self.get_initial_alias(), reuse) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1417, in setup_joins names, opts, allow_many, fail_on_missing=True) File "/app/.heroku/python/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1352, in names_to_path "Choices are: %s" % (name, ", ".join(available))) FieldError: Cannot resolve keyword 'source_length' into field. Choices are: entity, entity_id, id, locale, locale_id, project, project_id, source, target, translation, translation_id
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 11•6 years ago
|
||
Comment 12•6 years ago
|
||
Commit pushed to master at https://github.com/mozilla/pontoon https://github.com/mozilla/pontoon/commit/173b38f0389ffa1e2ed7d57db1e0bbcceb1fe3b1 Fix bug 1469010: Calculate source length in the quality expression (#1134)
Updated•6 years ago
|
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Product: Webtools → Webtools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•