Closed
Bug 712038
Opened 13 years ago
Closed 13 years ago
The free text index should not look inside the object directory
Categories
(Webtools Graveyard :: DXR, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ehsan.akhgari, Assigned: christian)
References
()
Details
Attachments
(1 file)
1.07 KB,
patch
|
ehsan.akhgari
:
review-
|
Details | Diff | Splinter Review |
See the URL in the bug. The search is returning all sorts of garbage output from libsul.so...
Comment 1•13 years ago
|
||
OTOH http://dxr.mozilla.org/mozilla/search.cgi?tree=mozilla-central&string=Open also returns results from binary files.
There's :
- services/sync/tests/unit/places_v10_from_v11.sqlite
- toolkit/crashreporter/google-breakpad/src/tools/windows/binaries/symupload.exe
- toolkit/components/places/tests/migration/places_v10_from_v11.sqlite
content/media/test/bug516323.ogv
etc.
Maybe it'd be better if the indexer had a list of safe extensions, or a way to detect files that are apparently binary.
Assignee | ||
Comment 2•13 years ago
|
||
The best approach here might be to actually skip non-text types. While I didn't find an existing library to tell "is this a text file", the second best is a small function on top of Xdg.Mime
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•13 years ago
|
||
And perhaps my comment is slightly more useful if I include the pull request:
https://github.com/kalikiana/dxr/pull/8
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → christian.dywan
Comment 4•13 years ago
|
||
(In reply to Christian Dywan from comment #2)
> The best approach here might be to actually skip non-text types. While I
> didn't find an existing library to tell "is this a text file", the second
> best is a small function on top of Xdg.Mime
you mean a library like libmagic?
Comment 5•13 years ago
|
||
nm, just read the github pull.
fyi,
return valid in ['text', 'xml', 'shellscript', 'perl', 'm4', 'xbel']
Assignee | ||
Comment 6•13 years ago
|
||
(In reply to Taras Glek (:taras) from comment #5)
> return valid in ['text', 'xml', 'shellscript', 'perl', 'm4', 'xbel']
I don't follow. If you mean "return mimetype in" it won't work. "valid" is otherwise either from that list. Can you please elaborate?
Comment 7•13 years ago
|
||
(In reply to Christian Dywan from comment #6)
> (In reply to Taras Glek (:taras) from comment #5)
> > return valid in ['text', 'xml', 'shellscript', 'perl', 'm4', 'xbel']
>
> I don't follow. If you mean "return mimetype in" it won't work. "valid" is
> otherwise either from that list. Can you please elaborate?
nevermind, misread code
Assignee | ||
Comment 8•13 years ago
|
||
Apparently github was too smart and closed the pull request, so here we go again:
https://github.com/kalikiana/dxr/pull/10
Reporter | ||
Comment 9•13 years ago
|
||
Please attach a patch so that I can review it.
Assignee | ||
Comment 10•13 years ago
|
||
Attachment #587980 -
Flags: review?(ehsan)
Reporter | ||
Comment 11•13 years ago
|
||
Comment on attachment 587980 [details] [diff] [review]
Use xdg.Mime to determine files if files can indexed
This makes DXR depend on Gnome, which is not a good idea. Please use the mimetypes library instead: <http://docs.python.org/library/mimetypes.html>
Attachment #587980 -
Flags: review?(ehsan) → review-
Assignee | ||
Comment 12•13 years ago
|
||
(In reply to Ehsan Akhgari [:ehsan] from comment #11)
> This makes DXR depend on Gnome, which is not a good idea. Please use the
> mimetypes library instead: <http://docs.python.org/library/mimetypes.html>
No it doesn't. python-xdg depends on "any of the major linux desktops" if anything. Python's "mimetypes" is not up to the task, it only maps filenames to types and doesn't understand what a file contains.
Comment 13•13 years ago
|
||
Ehsan > XDG is a freedesktop standard (which is no more related to GNOME than to KDE or whatever).
python-xdg only implement some useful stuffs for desktop application but don't depend on anything related to desktop AFAIK. It's the opposite: KDE and GNOME both depends on python-xdg.
python-xdg is really common and installed by default in most distributions, I can't really find any reason to not depend on it.
Reporter | ||
Comment 14•13 years ago
|
||
Looking at the code, the mimetypes module looks at the system mimetype information repos. What happens with your patch when somebody tries to run DXR on, let's say Mac?
Comment 15•13 years ago
|
||
That's a very good question Ehsan. XDG is a cross-platform specification. XDG libraries *should* run on most platform. Now, it might worth testing python-xdg on mac and windows to ensure it works correctly.
Reporter | ||
Comment 16•13 years ago
|
||
(In reply to Lionel Dricot from comment #15)
> That's a very good question Ehsan. XDG is a cross-platform specification.
> XDG libraries *should* run on most platform. Now, it might worth testing
> python-xdg on mac and windows to ensure it works correctly.
I have a Mac so I can help here. Please let me know what kind of testing you would like me to perform.
Comment 17•13 years ago
|
||
I don't think we care about running DXR on mac.
Reporter | ||
Comment 18•13 years ago
|
||
(In reply to Taras Glek (:taras) from comment #17)
> I don't think we care about running DXR on mac.
We will at some point, because there is a lot of Mac related code which the Linux compiler doesn't see at all. And it's also useful to support for people hacking on things locally, etc. So if there's no reason to break it, I'd rather we don't.
A good thing to have here is to have some code which tries to use the xdg stuff if available, and fall back to something sane otherwise.
Comment 19•13 years ago
|
||
Merged
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•4 years ago
|
Product: Webtools → Webtools Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•