It's not clear what exactly is happening here, but the Read implementation for text decoding appears a bit sensitive. Small pertubations in the code appear to have a nearly 100% impact on the overall speed of ripgrep when searching UTF-16 files. I haven't had the time to examine the generated code in detail, but `perf stat` seems to think that the instruction cache is performing a lot worse when the code slows down. This might mean that excessive inlining causes a different code structure that leads to less-than-optimal icache usage, but it's at best a guess. Explicitly disabling the inline for the cold path seems to help the optimizer figure out the right thing.
16 KiB
16 KiB