searcher: add option to disable BOM sniffing

This commit adds a new encoding feature where the -E/--encoding flag
will now accept a value of 'none'. When given this value, all encoding
related machinery is disabled and ripgrep will search the raw bytes of
the file, including the BOM if it's present.

Closes #1207, Closes #1208
This commit is contained in:
lesnyrumcajs
2019-03-04 17:18:45 +01:00
committed by Andrew Gallant
parent 1604a18db3
commit 5962abc465
9 changed files with 158 additions and 34 deletions

View File

@@ -984,7 +984,9 @@ Specify the text encoding that ripgrep will use on all files searched. The
default value is 'auto', which will cause ripgrep to do a best effort automatic
detection of encoding on a per-file basis. Automatic detection in this case
only applies to files that begin with a UTF-8 or UTF-16 byte-order mark (BOM).
No other automatic detection is performend.
No other automatic detection is performed. One can also specify 'none' which
will then completely disable BOM sniffing and always result in searching the
raw bytes, including a BOM if it's present, regardless of its encoding.
Other supported values can be found in the list of labels here:
https://encoding.spec.whatwg.org/#concept-encoding-get