The find tool in Linux is excellent for searching for files and directories. However, you may also send the search results to other applications for additional processing. We demonstrate how.
The Linux find Command
The Linux find command is very strong and versatile. It can search for files and directories using a variety of criteria other than filenames. It can, for example, look for empty files, executable files, or files owned by a certain user. It can identify and list files based on when they were accessed or updated, it supports regex patterns, it is recursive by default, and it works with pseudo-files such as named pipes (FIFO buffers).
All of this is quite beneficial. The simple search command has a lot of clout. But there is a way to capitalize on that power and move things to the next level. We can make something happen to the files and directories that search uncovers for us if we can automatically utilize the output of the find command as the input of other procedures.
The notion of piping the output of one command into the output of another command is a fundamental feature of Unix-derived operating systems. The “Unix philosophy” is typically defined as making a program perform one thing well and expecting its output to be the input of another program—even an unwritten program—yet several key utilities, such as mkdir, do not accept piped input.
To solve this limitation, the xargs command may be used to split up piped input and feed it into other programs as if it were command-line parameters. This does about the same effect as straight piping. That’s “nearly the same,” not “exactly the same,” because shell expansions and file name globbing might generate unanticipated changes.
Using find With xargs
We can use find with xargs to do some action on the files that are discovered. This is a laborious method, but we could pipe the files identified by find into xargs, which would then pipe them into tar to build an archive file containing those files. This command will be executed in a directory containing several help system PAGE files.
The command is made up of many components.
find./ -name “*.page” -type f -print0: The find action will begin in the current directory, looking by name for files that match the search string “*.page.” Because we expressly tell it to check for files only with -type f, directories will not be shown. The print0 option instructs find not to consider whitespace to be the end of a filename. This implies that filenames with spaces will be parsed properly.
xargs -o: The -0 parameters instruct xargs not to consider whitespace to be the end of a filename.
tar -cvzf page files.tar.gz: This is the command to which xargs will give the file list from find. The tar tool will generate a file named “page files.tar.gz.”
We may view the archive file prepared for us using ls.
The archive file is prepared just for us. To make this work, all of the filenames must be supplied to tar at the same time, which is what happened. As an extremely long command line, all of the filenames were appended to the end of the tar command.
You can have the final command execute on all of the file names at once or once per filename. By feeding the output of xargs to the line and character counting tool wc, we can plainly detect the difference.
This command sends all filenames to wc at the same time. Xargs creates a lengthy command line for wc that includes each of the filenames.
Each file’s lines, words, and characters are printed, along with a total for all files.
If we use xarg’s -I (replace string) option and provide a replacement string token—in this example ” “—the token is replaced by each filename in the final command in turn. This means that wc is called many times, one for each file.
The output is not well aligned. Because wc runs on a single file, it has nothing with which to align the output. Each output line is a separate line of text.
We don’t obtain summary statistics since wc can only produce a total when it acts on numerous files at once.
The find -exec Option
The search command has a means for invoking other programs to conduct further processing on the filenames it returns. The syntax of the -exec (execute) option is similar to but distinct from that of the xargs command.
The words in the matched files will be counted. These are the components of the command.
find. : Begin a search inside the current directory. Because the find command is recursive by default, it will search subdirectories as well.
-name “*.page”: We’re looking for files with names that match the search term “*.page.”
-type f: We’re just interested in files, not directories.
-exec wc: The wc command will be executed on the filenames that match the search string.
-w: Any options you want to send to the command must come immediately after the command.
“”: Each filename is represented by the “” placeholder, which must be the final item in the argument list.
;: A semicolon “;” denotes the end of the argument list. It must be escaped by a backslash “” or the shell will understand it.
When we run the command, we see the wc output. The -c (byte count) option restricts the output to the number of bytes in each file.
As may be seen, there is no total. The wc command is run once for each filename. We may adjust -exec’s behavior to work on all files at once by replacing the terminating semicolon “;” with a plus sign “+.”
We obtain a summary total and beautifully tabulated results, indicating that all files were submitted to wc as a single lengthy command line.
exec Really Means exec
The -exec (execute) option does not perform the command in the current shell. It executes the command using Linux’s built-in exec, substituting the current process—your shell—with the command. As a result, the command that is launched does not run in a shell at all. Without a shell, you can’t use wildcard shell expansion, and you can’t use aliases or shell functions.
This machine has a shell function named words-only defined. This just counts the words in a file.
Although “words-only” is a significantly longer command to enter than “wc -w,” it does eliminate the need to know the wc command-line arguments. We can see what it works by doing the following:
That works perfectly with a standard command-line call. If we use find’s -exec option to call that function, it will fail.
The find command can’t find the shell function, and the -exec action fails.
To get around this, we may run a Bash shell and send the rest of the command line to it as arguments. The command line must be enclosed in double quotation marks. This implies we must escape the double quote marks surrounding the “” replace string.
Before we can use search, we must first export our shell function with the -f (as a function) option:
This is working as planned.
Using the Filename More Than Once
You may chain numerous actions together, and you can use the “” replace string in each command.
Because find searches recursively, if we cd up a level out of the “pages” directory and run that command, find will still locate the PAGE files. As previously, the filename and directory are supplied to our words-only method. To demonstrate the use of -exec with two commands, we’re also running the basename command to view the file’s name without its path.
The filenames are provided to the basename command and the words-only shell function through a “” replace string.
Horses for Courses
There is a CPU load and time cost for calling a command several times when you could call it once and send all the filenames to it at once. And if you have to start the command in a new shell each time, the overhead increases.
However, depending on what you’re attempting to do, you may not have another alternative. Whatever approach your scenario necessitates, no one should be shocked that Linux has enough possibilities to discover one that meets your specific requirements.