Closed Bug 370427 Opened 17 years ago Closed 16 years ago

Don't display lines from binary files in search result list

Categories

(Webtools Graveyard :: MXR, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 432684

People

(Reporter: MatsPalmgren_bugz, Assigned: timeless)

References

()

Details

This is a bug report on mxr.mozilla.org, hope this is right place for that.

Don't display lines from binary files in search result list, because they
are usually quite long and not that exciting to read.

First a WARNING, the URL might CRASH in recent builds - see bug 370414.

STEPS TO REPRODUCE
1. load URL (a search at mxr.mozilla.org for "Home", case-sensitive)

ACTUAL RESULTS
The result list contains for example:
plugin/oji/JEP/MRJPlugin.plugin/Contents/MacOS/MRJPlugin
plugin/oji/JEP/JavaEmbeddingPlugin.bundle/Contents/MacOS/JavaEmbeddingPlugin
both are binary blobs as far as I can tell (they have -kb in the CVS repo)

EXPECTED RESULTS
Suggestion: just include the file names, saying something like
"binary file ... matched"
Assignee: bear → timeless
hrm.

the first person to complain was happy enough with:
http://mxr.mozilla.org/seamonkey/search?string=Home&case=on&find=&findi=&filter=^[^\0]*$&tree=seamonkey

(general solution, not actually applicable to your url.)

however, i explained to him that i didn't want to exclude the hits because they might be useful (in fact they often were useful).

A few notes:
1. mxr often indexes things that have no repository (i have a tree called "specs" which is basically pdf, doc and converted doc: html+css+gif+jpg)
2. mxr indexes things like svn (determining binary from this is doable)
3. internally i'm starting to index git repositories, although i don't understand their meta data yet, so that's a future problem

ignoring those notes.
do you have a definition of "binary files" that's better than the one i used (i.e. line contains null byte).

Note that the definition I used is *horrible* for ucs2 files.

The main reason i didn't introduce that solution right away by default is that I didn't have time/energy to discriminate between binary and ucs2.

I think:
http://mxr.mozilla.org/seamonkey/search?string=Home&case=on&find=&findi=&filter=^([^\0]*|(\0[^\0])*|([^\0]\0)*)$&tree=seamonkey
would do it.

except it seems that none of my repositories like at least one of those characters (and it seems that one of my systems is entirely broken).
(In reply to comment #1)
> do you have a definition of "binary files" that's better than the one i used

A file that has -kb in CVS/RCS?  Don't know about other repositories,
but it wouldn't surprise me if they have a similar flag.
Maybe the 'file' command could also be of use?
i ended up committing the filter bit... (upon reviewing some other code, it seems i didn't use lxr.conf properly to do this but...)

I'm wondering if you still want more than this.

note that using file or checking CVS/ or .svn/ or anything similar is theoretically an expensive operation (it's at least a couple of stats per search result, plus parsing).

if you want more, please reopen. if someone here thinks that the performance penalty of doing more is prohibitive, they should verify this bug as a duplicate (with my apologies for not using this bug when i implemented the change).
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.