Closed Bug 849893 Opened 11 years ago Closed 11 years ago

Detect dodgy character encoding

Categories

(Marketplace Graveyard :: Payments/Refunds, enhancement, P5)

x86
macOS
enhancement

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: andy+bugzilla, Assigned: davidbgk)

Details

(Whiteboard: p=2)

This is an implementation of

https://www.owasp.org/index.php/AppSensor_DetectionPoints#EE2:_Unexpected_Encoding_Used

To be sure we detect different encodings. This should go into django paranoia so it can be reused.

https://django-paranoia.readthedocs.org/
Severity: normal → enhancement
Assignee: nobody → david
Different ways to detect encoding in Python:

* BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#Beautiful%20Soup%20Gives%20You%20Unicode,%20Dammit) on top of chardet (https://pypi.python.org/pypi/chardet)
* python-magic (https://pypi.python.org/pypi/python-magic) on top of libmagic (http://linux.die.net/man/3/libmagic)
* PyICU (https://pypi.python.org/pypi/PyICU) on top of ICU (http://site.icu-project.org/)
* python-libguess (https://bitbucket.org/barro/python-libguess/wiki/Home) on top of libguess (http://www.atheme.org/project/libguess)

Django is performing a "lazy" evaluation of submitted content: https://docs.djangoproject.com/en/dev/ref/unicode/#form-submission

Both Django forms and testing client use the DEFAULT_CHARSET setting to decode data: https://docs.djangoproject.com/en/dev/ref/settings/#default-charset

Logging encoding errors will probably require to monkeypatch both django.http.HttpRequest.body and django.http.QueryDict

To be discussed.
After discussing with Andy, it's not worth the investment to log that kind of error.
Feel free to reopen it if you want to do something valuable with the logged data.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.