[Top level Bash Page] .... [References]
Introduction to widely used commands and filters
This page focuses on a small set of commands, filters, and utilities.
Short List Commands/programs
find and xargs
Refer to the man pages and to the GNU Find Utilities Manual for details.
- simple: find [options for handling sym linked files] [starting pt ] [file...] [expression]
- The expression that find uses to select files consists of one or more primaries, each of which is a separate command line argument to find. find evaluates the expression each time it
processes a file.
- find [-H] [-L] [-P] [-D debugopts] [-Olevel] [starting-point...] [expression]
- find prints to stdout the names of files that match the criteria. It searches the dir tree rooted at 'starting-point' by evaluating the expression (left to right and using precedence rules, until the outcome is known.
- The part of the command line after the list of starting points is the expression. This is a kind of query specification describing how we match files and what we do with the files that were matched. An expression is composed of a sequence of things:
- Tests : Tests return a true or false value, usually on the basis of some property of a file we are considering. The -empty test for example is true only when the current file is empty.
- Actions: Actions have side effects (such as printing something on the standard output) and return either true or false, usually based on whether or not they are successful. The -print action for example prints the name of the current file on the standard output.
- Global options : Global options affect the operation of tests and actions specified on any part of the command line. Global options always return true. The -depth option for example makes find traverse the file system in a depth-first order.
- Positional options: Positional optiona affect only tests or actions which follow them. Positional options always return true. The -regextype option for example is positional, specifying the regular expression dialect for regulat expressions occurring later on the command line.
- Operators: Operators join together the other items within the expression. They include for example -o (meaning logical OR) and -a (meaning logical AND). Where an operator is missing, -a is assumed.
- Example expressions
- The first set could set options that impact how other expressions behave : -daystart, -maxdepth,
- A set that add a test done on a file being considered: -executable, -gid, -newermt, -name,
- And then a set that are actions: -print, -ls -printf, -delete
- numeric arguements for commands in expressions: +n , -n , n #greateer, less than or exactly n
- xargs [option ...] [command [initial-arguments]] xargs is another way to call commands or programs. The term exargs is synonmous with 'combine arguements'.
- It builds and executes command lines by gathering together arguments from standard in (which as we will see are many times supplied by find)
- Explain the difference :
- find . -name '*.[dt][ax][t]'
- find . -name '*.[dt][ax][t]' | xargs
- find . -name '*.[dt][ax][t]' | xargs grep ser | wc
- find . -name '*.[dt][ax][t]' | grep ser | wc
- Answer: ?
- First lists all dat and txt , one per line, total of 18 lines
- second, same list as first but all 18 file names passed to xargs as 18 command line args
- Third - lists 38 occurances of the string 'ser' in txt and dat files
- Fourth lists three occurances of the string 'ser' in the list of 18 file names
- ./PB1/myDirB/userdata.dat
./PB1/myDirA/userdata.dat
./PB1/myDirC/userdata.dat
- Summary: when xargs is used to invoke a command, it assembles the data from standard input and rather than executing the command separately (for each item), it executes the command 1, passing the list to the command as arguements. This will be much more efficient !!
- Explain:
- find . -name *.dat -exec ls -l {} \;
-rw-rw-r-- 1 jjm jjm 660 Feb 10 01:22 ./PB1/myDirB/userdata.dat
-rw-rw-r-- 1 jjm jjm 660 Feb 10 01:22 ./PB1/myDirA/userdata.dat
-rw-rw-r-- 1 jjm jjm 660 Feb 10 01:22 ./PB1/myDirC/userdata.dat
jjm@jjm-VirtualBox:~/courses/cpsc424/Spring-2017/ex3$ find . -name *.dat -print | xargs ls -l
-rw-rw-r-- 1 jjm jjm 660 Feb 10 01:22 ./PB1/myDirA/userdata.dat
-rw-rw-r-- 1 jjm jjm 660 Feb 10 01:22 ./PB1/myDirB/userdata.dat
-rw-rw-r-- 1 jjm jjm 660 Feb 10 01:22 ./PB1/myDirC/userdata.da
- Answer: Same output but in the first case, the exec is done for each file in the list one at a time (so three ls commands). In the second, ls is invoked once. It is passed the list of files of interest as parameters.
wget:
wget is a very useful tool that effectively provides a command line interface driven browser. Syntax: '$wget [options]... [URL] ...
- To download a specific web object (a URL):
- wget https://people.cs.clemson.edu/~jmarty/courses/Spring-2017/CPSC424/code/UDPEcho2.tar.gz
- Or....place a set of object names or URLs in file and 'wget -i file'
- To download all files in a directory
- wget -r --no-parent https://people.cs.clemson.edu/~jmarty/courses/Spring-2017/CPSC424/code/
- the -r is recursive, the --no-parent says don't download from the parent dir
- To see if a get will work
- wget --spider DOWNLOAD-URL
- To get a full website (including all links so be careful.....!!)
- wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
- To reduce types of files downloaded, add to the above '--reject=gif' (to reject gif files)
- To restrict any wget command to a max bytes download (this limits to 5MBytes)
- wget -Q5m -i FILE-WHICH-HAS-URLS
- To specify what type of file:
- wget -r -A.pdf http://www.mycompany.com/pagewithPDFs
- To download a URL with a password:
- wget --ftp-user=USERNAME --ftp-password=PASSWORD DOWNLOAD-URL
- Advanced....have not explored these....
- To download an ISO and to compute the sha1 checksum (slow...as it's sequential)
- wget http://example.com/some.iso && sha1sum some.iso
- This interleaves the download and sha1 computation using processes substitution (the '>(cmd)'
- wget -O - http://example.com/dvd.iso | tee >(sha1sum > dvd.sha1) > dvd.iso
- to download an entire Web site, perhaps for off-line viewing, something along the following will work:
- $ wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains website.org --no-parent \ www.website.org/tutorials/html/
- This command downloads the Web siteĀ www.website.org/tutorials/html/.
- The options are:
- --recursive: download the entire Web site.
- --domains website.org: don't follow links outside website.org.
- --no-parent: don't follow links outside the directory tutorials/html/.
- --page-requisites: get all the elements that compose the page (images, CSS and so on).
- --html-extension: save files with the .html extension.
- --convert-links: convert links so that they work locally, off-line.
- --restrict-file-names=windows: modify filenames so that they will work in Windows as well.
- --no-clobber: don't overwrite any existing files (used in case the download is interrupted and
resumed).
grep:
Grep is invoked with options pattern inputFiles. Refer to the GNU Grep Manual for the most complete presentation of grep.
- Options
- -e pattern or ---regexp=pattern # pattern match based on this regex
- -i : ignore case when doing a match
- -v invert the sense of the match (i.e., show the lines that don't match)
- -c : return the count of matches
- -H : print the filename with each match (default when there is >1 file to search)
- -l : return only file names with matches
- --r or --recursive : recursive, the file input must be a directory
- --exclude-from=file or -exclude-dir=glob
- Four variants - based primarily on how matching differences (note: egrep and fgrep should no longer be used)
- -G : basic regexp .... this is the default
- -E : extended regexp ....
- -F : interprets the pattern as a fixed string
- -P - regexp based on Perl
- Support for regexp-
- Examples
- grep -rH --include='*.c' 'main' ~/devel #recursive search of all 'c' files starting at ~/devel for
diff:
- diff set.sorted printenv.sorted | grep "<" | awk '{ print $2 }'
- diff -y ./file1v1.c ./newfiles/file1v1/c | more #extremely useful !!!!!!!
filters
any command that reads STDIN and writes
STDOUT can be used as a filter (i.e., a component of a pipeline)
- cut : separate lines into fields
- od - displays data in many formats
- wc : counts words, lines, characters
- tr : translate characters in a stream
- tee: copy input to two places
- sed: stream editor for filtering and transforming text (online GNU manual)
- Usage: /pattern/action
- Operates on a file: sed [option] ... {script} [inputfile] ...
- sed -i '1d' myFile.txt
- The -i says operate on file inline, removes first line of myFile.txt
- Operates in a pipeline: .... | sed pattern/action | ....
- Performs pattern substitution and deletions using regex
- can add the following flags after the substition
- s/old/new/gi
- if there are no flags (no gi), subs the first occurance only
- g replaces all matches
- gi replaces all matches case insensitive
- g10 replaces the first 10 matches
- Can specify an address- meaning specify a range of lines to act upon
- cat myfile.txt | sed '1,5d' | more //removes the first 5 lines and pipes to more
- cat myfile.txt | sed '1d' > newFile.txt //deletes the first line
- regex highlights
- /^ matches beginning of lines
- /$ matches end of lines
- /. matches any single character
- /* matches 0 or more ocurrences of previous character
- /[chars] matches any of the characters, can use -char to indicate range of chars
- Character sets
- [a-z] matchesa single lowercase leters
- [A-Z]
- [a-zA-Z] matches a single letter
- [0-9]
- [[:alnum:]] matches alphanumerics
- [[:blank:]] matches spaces or tabs
- [[:cntrl:]] matches control chars
- [[:digit:]] matches
- Examples:
- /^$/ Matches blank lines
- / */ Matches one or more spaces
- The following bash script will modify lines in the file myTest.txt that begin with "set numValue 5" to "set numValue $number". . There can be any number of spaces between Value and the digit. The digit we replace might be >1. So the line might be "set numValue 223"
- number=8
- nextValue=$(( $number*2 ))
- echo "USFTPTCP: $nextValue "
- sed -i "s/^set numUSFTPs[[:blank:]]*[0-9]*/set numUSFTPs $nextValue/g" numberCMs.tcl
- editing commands:
- a\ : append text below current line
- c\: change text in current line
- d : delete text
- cat mfile.txt | sed '/^Time/d'
- deletes all lines starting with 'Time'
- cat myfile.txt | sed 'sh$/d'
- delets all lines ending with 'sh'
- i\ : insert above current line
- p : print line
- r : read file
- s : search and replace
- w : write file
- Examples releated to substitution/replacement:
- sed '/^This.*errors.$/p' file.txt
- #prints lines that begin with 'This' and with any text and that ends in errors.
- cat myfile.txt | sed 's/abc//g'
- removes abc , no substitution
- awk: pattern matching and text processing language (online GNU manual)
- Example: cat /proc/loadavg | awk '{ print $1
}'
Examples :
- The following finds 5 largest files starting at the current directory - the {} causes the a list of files to be passed to sort
- find . -type f -exec ls -s {} \; | sort -n -r | head -5
- The following lists (long directory) files sorted by size, tr removes all spaces except for one, Reduces each link to show just file size and date
- ls -al --sort=size . | tr -s ' ' | cut -d ' ' -f 5,8
- #The following example kills the process that is listed as 'perfClient' - assume this is a process that is hung and has a PID shown below....and notice that the grep picks up the process of the grep !
- ps aux | grep perfClient
jjm 2726 0.4 0.1 16016 2172 pts/4 S+ 15:23 0:02 ./perfClient netlabserver1.clemson.edu 5000 1000000 1000 1000 1000 0 10000 129
jjm 5669 0.0 0.0 21292 928 pts/19 S+ 15:33 0:00 grep --color=auto perfClient
- Knowing the PID, just issue 'kill -9 2726'
- To combine the steps of learning the PID with the kill in a single command line:
- kill -9 $(ps aux | grep 'perfClient' | awk '{print $2}')
bash: kill: (5950) - No such process
- So what happened?
- Let's decompose the command....and echo the output of the inner command
- echo $(ps aux | grep 'perfClient' | awk '{print $2}')
- 2726 5669 #So this returns the PID of the two processes that match 'perfClient'
- Kill is actually ok with a list of PIDs to kill. The 'No such process' is because by the time kill runs, the subprocess running the inner command which includes the grep, is no longer active. So kill does terminate the perfClient.
- Fix #1: do a second grep to NOT include matches with grep.
- kill -9 $(ps aux | grep 'perfClient' | grep -v 'grep' | awk '{print $2}')
- Fix #2 : Use a regexp of '[p]erfClient'
- kill -9 $(ps aux | grep '[p]erfClient' | awk '{print $2}')
- When grep runs, the parameter is the literal string '[p]erfClient'. The last field in the ps aux output for each process is the program name and paramters....so it will be the literal '[p]erfClient'
- When grep actually runs, the parameter is a regexp - the [p] is synonomous with p. So the [] has no effect on the grep operation but it serves to filter out the ps aux line corresponding to the grep.
-
- echo $(ps aux | grep 'perfClient' | awk '{print $2}')
2726 5289
jjm@jjm-VirtualBox:~/courses/cpsc424/Spring-2017/work$
kill -9 $(ps aux | grep 'perfClient' | awk '{print $2}')
bash: kill: (5950) - No such process
- kill -9 $(ps aux | grep '[p]erfClient' | awk '{print $2}')
- #the $() substitutes the result from the pipeline---finds the pid of the program perfClient
Broader list of commands (built in to Bash, commands, filters, or applications) to be aware:
- watch - runs the command periodically (2 seconds
by default)
- watch 'cat /proc/loadavg'
- update-alternatives - maintains set of symbolic links that the installation established
- Example - to changing the default config editor (nano by default)
- sudo update-alternatives --config editor
(gives a menu)
List of misc.utilities or applications that might be worth looking into - most need to be installed with apt-get.
Network configuration or status
- in a shell, 'echo $HOSTNAME' to see the configured hostname of the computer
- ifconfig : user interface for configuring network
interfac
- tcpdump and wireshark : programs that can capture packets that
flow over a particular interface (must be root or sysadmin)
- bandwidth monitors:
- netstat: user interface for setting IP operating
parameters or for seeing the current state of IP
- route: similar to netstat, but a bit more focused on
accessing the IP forwarding table
- iptables : user interface to Linux's firewall rules (must be
root)
- arp : user interface to access the ARP cache (try 'arp -a')
- dig or host : programs to interact with the DNS system (try 'host -v www.ibm.com')
- sysctl -N net.ipv4 | more //see
tcp/ip settings for the machine
- To restart networking, several ways...
- sudo bash -x /usr/sbin/service networking restart
- sudo bash -x /etc/init.d/networking restart
- tcpdump and wireshark : programs that can capture packets that
flow over a particular interface (must be root or sysadmin)
- bandwidth monitors:
- To find the DNS nameserver (NS) a Linux host is using:
- On some systems, issue a 'cat /etc/resolv.conf (Ubuntu 14.04 and beyond no longer sets this file)
- On ubuntu, issue 'nm-tool' and it should show you the interfaces, IPs, and configured DNS NS's
Last updated: 7/22/2017