Search files by content on Linux

The problem

Have you ever found yourself looking for a document in your PC and not being able to find it because you only remember its contents and not its name? If so, then this post may be for you!

First attempt: find + cat + grep

If you're not terribly fond of the command line, you might try something along the lines of this

find <dir> -type f -name "*.ext" | cat | grep -ie "your match"

Unfortunately, this doesn't work. The reason is that cat cannot be piped into like that: looks like we need another option.

Note: The -i option to grep just makes our match case-insensitive so as to maximize our chances of finding something

Second attempt: find with -exec

Digging into the manual pages for find yields this:

[...]
-exec command ;
              Execute command; true if 0 status is returned. [...]

This will allow us to execute a certain command* after each file with the given extension is found. So, our command now becomes something like

find ~ -type f -name "*.ext" -exec grep {} -ie "your match" \;

The curly braces are replaced by find with the current filename, while the semicolon terminates the argument to -exec (we just need to escape it using a backslash so the shell doesn't try to interpret it literally. Using ";" would've worked too).

This works! It will print the matched content of each file to the screen, but we're not done yet...

*: You can use the -exec flag multiple times to execute multiple commands!

Getting the filename

We are now able to find documents in our system by their content, but we're not getting any new information out of our command: after all, we already knew part of the file content anyway! What we need to do next is hack our way back from the file's content to its name.

The -l option of `grep`

Fortunately, man comes to the rescue again:

[...]
-l, --files-with-matches
              Suppress normal output; instead print the name of 
              each input file from which output would
              normally have been printed. [...]

So our command has now become

find <dir> -type f -name "*.ext" -exec grep {} -lie "your match" \;

If we try it in our command line, we should get something like this out of it:

/path/to/file1.ext
/path/to/file2.ext
[...]

Restricting the output size

So, now we can find files by their content, but what if we want to limit how many lines of output are printed? We can use the beauty of the POSIX shell and pipe our previous command into the useful head and tail utilities.

find <dir> -type f -name "*.ext" -exec grep {} -lie "your match" \; | tail -5

This only prints the last 5 matches. Things get interesting if we wanted to fetch the first 5 matches:

find <dir> -type f -name "*.ext" -exec grep {} -lie "your match" \; | head -5

If you tried running the command above, you'd get your filtered output, but you'd also get a bunch of errors that look like find: grep: interrupted by signal 13. Looking up what signal corresponds to number 13, we find it means EPIPE: Broken Pipe: this is because find (or, well, its subprocess running grep), was still trying to write to head's standard output after the fifth match, but head's process had already exited causing the end of the pipe to be broken.

According to this stackoverflow answer, the solution is to pipe our command into tail -n +1 first (remember, tail did not give us any problems before!) and then pipe that into head. If we do that, we get

find <dir> -type f -name "*.ext" -exec grep {} -lie "your match" \; | tail -n +1 | head -5

Go ahead, try it! You'll see that it produces the expected results.

Conclusions

Hopefully you've learned something new from this post: I sure learned a lot by writing it! To close this article I'd like to point out that this is not very efficient if you're looking among a large amount of files, as find scans each and every one of them sequentially. There are specialized programs (of which I do not know the name, though) that perform more intelligent file indexing (on Windows and MacOS this is done automatically) that are a much better fit for this, but it's nice knowing you can do this with just a plain POSIX shell!

To make things more spicy, maybe you could filtering the files by age using the -mtime option of find or ordering them alphabetically by piping find's output into sort: the possibilities are endless!

Search files by content on Linux

The problem

First attempt: find + cat + grep

Second attempt: find with -exec

Getting the filename

The -l option of `grep`

Restricting the output size

Conclusions

Comments

More from this blog

My take on working from home

PSA: Analytics on my website

Going Self-Hosted

Open Source Software is in danger: Here's Why

Command Palette

The problem

First attempt: find + cat + grep

Second attempt: find with -exec

Getting the filename

The -l option of grep

Restricting the output size

Conclusions

Comments

More from this blog

The -l option of `grep`