Search files by content on Linux
The power of the command line
The problem
Have you ever found yourself looking for a document in your PC and not being able to find it because you only remember its contents and not its name? If so, then this post may be for you!
First attempt: find + cat + grep
If you're not terribly fond of the command line, you might try something along the lines of this
find ~ -type f -name "*.ext" | cat | grep -i "your match"
Unfortunately, this doesn't work. The reason is that cat
cannot be piped into like that: looks like we need another option.
Note: The -i
option to grep
just makes our match case-insensitive so as to maximize our chances of finding something
Second attempt: find with -exec
Digging into the manual pages for find
yields this:
[...]
-exec command ;
Execute command; true if 0 status is returned. [...]
This will allow us to execute a certain command* after each file with the given extension is found. So, our command now becomes something like
find ~ -type f -name "*.ext" -exec grep {} -i "your match" \;
The curly braces are replaced by find
with the current filename, while the semicolon terminates the argument to -exec
(we just need to escape it using a backslash so the shell doesn't try to interpret it literally. Using ";"
would've worked too).
This works! It will print the matched content of each file to the screen, but we're not done yet...
*: You can use the -exec
flag multiple times to execute multiple commands!
Getting the filename
We are now able to find documents in our system by their content, but we're not getting any new information out of our command: after all, we already knew part of the file content anyway! What we need to do next is hack our way back from the file's content to its name.
The -l option of grep
Fortunately, man
comes to the rescue again:
[...]
-l, --files-with-matches
Suppress normal output; instead print the name of
each input file from which output would
normally have been printed. [...]
So our command has now become
find ~ -type f -name "*.ext" -exec grep {} -li "your match" \;
If we try it in our command line, we should get something like this out of it:
/path/to/file1.ext
/path/to/file2.ext
[...]
Restricting the output size
So, now we can find files by their content, but what if we want to limit how many lines of output are printed? We can use the beauty of the POSIX shell and pipe our previous command into the useful head
and tail
utilities.
find ~ -type f -name "*.ext" -exec grep {} -i "your match" \; | tail -5
This only prints the last 5 matches. Things get interesting if we wanted to fetch the first 5 matches:
find ~ -type f -name "*.ext" -exec grep {} -i "your match" \; | head -5
If you tried running the command above, you'd get your filtered output, but you'd also get a bunch of errors that look like find: grep: interrupted by signal 13
. Looking up what signal corresponds to number 13, we find it means EPIPE: Broken Pipe
: this is because find
(or, well, its subprocess running grep
), was still trying to write to head
's standard output after the fifth match, but head
's process had already exited causing the end of the pipe to be broken.
According to this stackoverflow answer, the solution is to pipe our command into tail -1 +1
first (remember, tail
did not give us any problems before!) and then pipe that into head
. If we do that, we get
find ~ -type f -name "*.ext" -exec grep {} -i "your match" \; | tail -1 +1 | head -5
Go ahead, try it! You'll see that it produces the expected results.
Conclusions
Hopefully you've learned something new from this post: I sure learned a lot by writing it! To close this article I'd like to point out that this is not very efficient if you're looking among a large amount of files, as find
scans each and every one of them sequentially. There are specialized programs (of which I do not know the name, though) that perform more intelligent file indexing (on Windows and MacOS this is done automatically) that are a much better fit for this, but it's nice knowing you can do this with just a plain POSIX shell!
To make things more spicy, maybe you could filtering the files by age using the -mtime
option of find
or ordering them alphabetically by piping find
's output into sort
: the possibilities are endless!