Closed Bug 712038 Opened 13 years ago Closed 13 years ago

The free text index should not look inside the object directory

Categories

(Webtools Graveyard :: DXR, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ehsan.akhgari, Assigned: christian)

References

()

Details

Attachments

(1 file)

See the URL in the bug. The search is returning all sorts of garbage output from libsul.so...
OTOH http://dxr.mozilla.org/mozilla/search.cgi?tree=mozilla-central&string=Open also returns results from binary files. There's : - services/sync/tests/unit/places_v10_from_v11.sqlite - toolkit/crashreporter/google-breakpad/src/tools/windows/binaries/symupload.exe - toolkit/components/places/tests/migration/places_v10_from_v11.sqlite content/media/test/bug516323.ogv etc. Maybe it'd be better if the indexer had a list of safe extensions, or a way to detect files that are apparently binary.
The best approach here might be to actually skip non-text types. While I didn't find an existing library to tell "is this a text file", the second best is a small function on top of Xdg.Mime
Status: NEW → ASSIGNED
And perhaps my comment is slightly more useful if I include the pull request: https://github.com/kalikiana/dxr/pull/8
Assignee: nobody → christian.dywan
(In reply to Christian Dywan from comment #2) > The best approach here might be to actually skip non-text types. While I > didn't find an existing library to tell "is this a text file", the second > best is a small function on top of Xdg.Mime you mean a library like libmagic?
nm, just read the github pull. fyi, return valid in ['text', 'xml', 'shellscript', 'perl', 'm4', 'xbel']
(In reply to Taras Glek (:taras) from comment #5) > return valid in ['text', 'xml', 'shellscript', 'perl', 'm4', 'xbel'] I don't follow. If you mean "return mimetype in" it won't work. "valid" is otherwise either from that list. Can you please elaborate?
(In reply to Christian Dywan from comment #6) > (In reply to Taras Glek (:taras) from comment #5) > > return valid in ['text', 'xml', 'shellscript', 'perl', 'm4', 'xbel'] > > I don't follow. If you mean "return mimetype in" it won't work. "valid" is > otherwise either from that list. Can you please elaborate? nevermind, misread code
Apparently github was too smart and closed the pull request, so here we go again: https://github.com/kalikiana/dxr/pull/10
Please attach a patch so that I can review it.
Comment on attachment 587980 [details] [diff] [review] Use xdg.Mime to determine files if files can indexed This makes DXR depend on Gnome, which is not a good idea. Please use the mimetypes library instead: <http://docs.python.org/library/mimetypes.html>
Attachment #587980 - Flags: review?(ehsan) → review-
(In reply to Ehsan Akhgari [:ehsan] from comment #11) > This makes DXR depend on Gnome, which is not a good idea. Please use the > mimetypes library instead: <http://docs.python.org/library/mimetypes.html> No it doesn't. python-xdg depends on "any of the major linux desktops" if anything. Python's "mimetypes" is not up to the task, it only maps filenames to types and doesn't understand what a file contains.
Ehsan > XDG is a freedesktop standard (which is no more related to GNOME than to KDE or whatever). python-xdg only implement some useful stuffs for desktop application but don't depend on anything related to desktop AFAIK. It's the opposite: KDE and GNOME both depends on python-xdg. python-xdg is really common and installed by default in most distributions, I can't really find any reason to not depend on it.
Looking at the code, the mimetypes module looks at the system mimetype information repos. What happens with your patch when somebody tries to run DXR on, let's say Mac?
That's a very good question Ehsan. XDG is a cross-platform specification. XDG libraries *should* run on most platform. Now, it might worth testing python-xdg on mac and windows to ensure it works correctly.
(In reply to Lionel Dricot from comment #15) > That's a very good question Ehsan. XDG is a cross-platform specification. > XDG libraries *should* run on most platform. Now, it might worth testing > python-xdg on mac and windows to ensure it works correctly. I have a Mac so I can help here. Please let me know what kind of testing you would like me to perform.
I don't think we care about running DXR on mac.
(In reply to Taras Glek (:taras) from comment #17) > I don't think we care about running DXR on mac. We will at some point, because there is a lot of Mac related code which the Linux compiler doesn't see at all. And it's also useful to support for people hacking on things locally, etc. So if there's no reason to break it, I'd rather we don't. A good thing to have here is to have some code which tries to use the xdg stuff if available, and fall back to something sane otherwise.
Merged
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: