binary: rejigger ripgrep's handling of binary files
This commit attempts to surface binary filtering in a slightly more user friendly way. Namely, before, ripgrep would silently stop searching a file if it detected a NUL byte, even if it had previously printed a match. This can lead to the user quite reasonably assuming that there are no more matches, since a partial search is fairly unintuitive. (ripgrep has this behavior by default because it really wants to NOT search binary files at all, just like it doesn't search gitignored or hidden files.) With this commit, if a match has already been printed and ripgrep detects a NUL byte, then it will print a warning message indicating that the search stopped prematurely. Moreover, this commit adds a new flag, --binary, which causes ripgrep to stop filtering binary files, but in a way that still avoids dumping binary data into terminals. That is, the --binary flag makes ripgrep behave more like grep's default behavior. For files explicitly specified in a search, e.g., `rg foo some-file`, then no binary filtering is applied (just like no gitignore and no hidden file filtering is applied). Instead, ripgrep behaves as if you gave the --binary flag for all explicitly given files. This was a fairly invasive change, and potentially increases the UX complexity of ripgrep around binary files. (Before, there were two binary modes, where as now there are three.) However, ripgrep is now a bit louder with warning messages when binary file detection might otherwise be hiding potential matches, so hopefully this is a net improvement. Finally, the `-uuu` convenience now maps to `--no-ignore --hidden --binary`, since this is closer to the actualy intent of the `--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a consequence, `rg -uuu foo` should now search roughly the same number of bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the same number of bytes as `grep -ra foo`. (The "roughly" weasel word is used because grep's and ripgrep's binary file detection might differ somewhat---perhaps based on buffer sizes---which can impact exactly what is and isn't searched.) See the numerous tests in tests/binary.rs for intended behavior. Fixes #306, Fixes #855
This commit is contained in:
@@ -59,17 +59,12 @@ impl SubjectBuilder {
|
||||
if let Some(ignore_err) = subj.dent.error() {
|
||||
ignore_message!("{}", ignore_err);
|
||||
}
|
||||
// If this entry represents stdin, then we always search it.
|
||||
if subj.dent.is_stdin() {
|
||||
// If this entry was explicitly provided by an end user, then we always
|
||||
// want to search it.
|
||||
if subj.is_explicit() {
|
||||
return Some(subj);
|
||||
}
|
||||
// If this subject has a depth of 0, then it was provided explicitly
|
||||
// by an end user (or via a shell glob). In this case, we always want
|
||||
// to search it if it even smells like a file (e.g., a symlink).
|
||||
if subj.dent.depth() == 0 && !subj.is_dir() {
|
||||
return Some(subj);
|
||||
}
|
||||
// At this point, we only want to search something it's explicitly a
|
||||
// At this point, we only want to search something if it's explicitly a
|
||||
// file. This omits symlinks. (If ripgrep was configured to follow
|
||||
// symlinks, then they have already been followed by the directory
|
||||
// traversal.)
|
||||
@@ -127,6 +122,26 @@ impl Subject {
|
||||
self.dent.is_stdin()
|
||||
}
|
||||
|
||||
/// Returns true if and only if this entry corresponds to a subject to
|
||||
/// search that was explicitly supplied by an end user.
|
||||
///
|
||||
/// Generally, this corresponds to either stdin or an explicit file path
|
||||
/// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
|
||||
/// an explicit subject, but, e.g., `./some-dir/some-other-file` is not.
|
||||
///
|
||||
/// However, note that ripgrep does not see through shell globbing. e.g.,
|
||||
/// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
|
||||
/// as an explicit subject.
|
||||
pub fn is_explicit(&self) -> bool {
|
||||
// stdin is obvious. When an entry has a depth of 0, that means it
|
||||
// was explicitly provided to our directory iterator, which means it
|
||||
// was in turn explicitly provided by the end user. The !is_dir check
|
||||
// means that we want to search files even if their symlinks, again,
|
||||
// because they were explicitly provided. (And we never want to try
|
||||
// to search a directory.)
|
||||
self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
|
||||
}
|
||||
|
||||
/// Returns true if and only if this subject points to a directory after
|
||||
/// following symbolic links.
|
||||
fn is_dir(&self) -> bool {
|
||||
|
||||
Reference in New Issue
Block a user