Linux - Filters
Introduction
In this session, we have covered the most common filters of Linux system. Commands that are created to be used with a pipe are often called filters. These filters are very small programs that do one specific thing very efficiently. They can be used as building blocks. The combination of simple commands and filters in a long pipe allows you to design elegant solutions.
cat
When between two pipes, the cat command does nothing (except putting stdin on stdout).
datasoft @ datasoft-linux ~$ tac count | cat | cat | cat |cat |cat
four
three
two
one
datasoft @ datasoft-linux ~$
tee
Writing long pipes in Unix is fun, but sometimes you may want intermediate results. The tee filter puts stdin on stdout and also into a file. So tee is almost the same as cat, except that it has two identical outputs.
datasoft @ datasoft-linux ~$ tac count | tee temp.txt | tac
one
two
three
four
datasoft @ datasoft-linux ~$ cat temp.txt
four
three
two
one
datasoft @ datasoft-linux ~$
grep
In Linux the grep command is used as a searching and pattern matching tools. The most common use of grep is to filter lines of text containing (or not containing) a certain string.
datasoft @ datasoft-linux ~$ cat xyz.txt
raju das
ayan roy
riju saha
dustu saha
ajoy das
datasoft @ datasoft-linux ~$ cat xyz.txt | grep saha
riju saha
dustu saha
datasoft @ datasoft-linux ~$
You can write this without the cat.
datasoft @ datasoft-linux ~$ cat xyz.txt
raju das
ayan roy
riju saha
dustu saha
ajoy das
datasoft @ datasoft-linux ~$ cat xyz.txt | grep saha
riju saha
dustu saha
datasoft @ datasoft-linux ~$ grep saha xyz.txt
riju saha
dustu saha
datasoft @ datasoft-linux ~$ grep das xyz.txt
raju das
ajoy das
datasoft @ datasoft-linux ~$
One of the most useful options of grep is grep -i which filters in a case insensitive way.
datasoft @ datasoft-linux ~$ grep roy xyz.txt
ayan roy
datasoft @ datasoft-linux ~$ grep -i roy xyz.txt
ayan roy
datasoft @ datasoft-linux ~$
Another very useful option is grep -v which outputs lines not matching the string.
datasoft @ datasoft-linux ~$ grep -v dustu xyz.txt
raju das
ayan roy
riju saha
ajoy das
datasoft @ datasoft-linux ~$
And of course, both options can be combined to filter all lines not containing a case insensitive string.
datasoft @ datasoft-linux ~$ grep -vi das xyz.txt
ayan roy
riju saha
dustu saha
datasoft @ datasoft-linux ~$
With grep -A1 one line after the result is also displayed.
datasoft @ datasoft-linux ~$ grep -A1 raju xyz.txt
raju das
ayan roy
datasoft @ datasoft-linux ~$
With grep -B1 one line before the result is also displayed.
datasoft @ datasoft-linux ~$ grep -B1 riju xyz.txt
ayan roy
riju saha
datasoft @ datasoft-linux ~$
With grep -C1 (context) one line before and one after are also displayed. All three options (A,B, and C) can display any number of lines (using e.g. A2, B4 or C20).
datasoft @ datasoft-linux ~$ grep -C1 riju xyz.txt
ayan roy
riju saha
dustu saha
datasoft @ datasoft-linux ~$
cut
The cut filter is used to cut out selected fields (columns) of each line of a file, depending on a delimiter or a count of bytes. The following code uses "cut" to filter the username and userid in the /etc/passwd file. It uses the colon as a delimiter, and selects fields 1 and 3.
datasoft @ datasoft-linux ~$ cut -d: -f1,3 /etc/passwd | tail -4
colord:113
hplip:114
pulse:115
datasoft:1000
datasoft @ datasoft-linux ~$
When using a space as the delimiter for cut, you have to quote the space.
datasoft @ datasoft-linux ~$ cut -d" " -f1 xyz.txt
raju
ayan
riju
dustu
ajoy
datasoft @ datasoft-linux ~$
This example uses cut to display the second to the seventh character of /etc/passwd.
datasoft @ datasoft-linux ~$ cut -c2-7 /etc/passwd | tail -4
olord:
plip:x
ulse:x
atasof
datasoft @ datasoft-linux ~$
tr
You can translate characters with tr. The following command shows the translation of all occurrences of 'e' to 'E'.
datasoft @ datasoft-linux ~$ cat xyz.txt | tr 'e' 'E'
raju das
ayan roy
riju saha
dustu saha
ajoy das
datasoft @ datasoft-linux ~$
Here we set all letters to uppercase by defining two ranges.
datasoft @ datasoft-linux ~$ cat xyz.txt | tr 'a-z' 'A-Z'
RAJU DAS
AYAN ROY
RIJU SAHA
DUSTU SAHA
AJOY DAS
datasoft @ datasoft-linux ~$
Here we translate all newlines to spaces.
datasoft @ datasoft-linux ~$ cat count
one
two
three
four
datasoft @ datasoft-linux ~$ cat count | tr '\n' ' '
one two three four datasoft @ datasoft-linux ~$
The tr -s filter can also be used to squeeze multiple occurrences of a character to one.
datasoft @ datasoft-linux ~$ cat pqr.txt
apple mango orange guava lemon
datasoft @ datasoft-linux ~$ cat pqr.txt | tr -s ' '
apple mango orange guava lemon
datasoft @ datasoft-linux ~$
You can also use tr to 'encrypt' texts with rot13.
datasoft @ datasoft-linux ~$ cat count | tr 'a-z' 'khkasdkhaskdfhkahskfh'
khs
fhk
fhsss
dkhs
datasoft @ datasoft-linux ~$ cat count | tr 'a-z' 'k-sa-f'
feo
fff
frfoo
pfff
datasoft @ datasoft-linux ~$
This last example uses tr -d to delete characters.
datasoft @ datasoft-linux ~$ cat xyz.txt | tr -d e
raju das
ayan roy
riju saha
dustu saha
ajoy das
datasoft @ datasoft-linux ~$
wc
wc command is used to count words, lines and characters for each file
datasoft @ datasoft-linux ~$ wc xyz.txt
5 10 48 xyz.txt
datasoft @ datasoft-linux ~$ wc -l xyz.txt
5 xyz.txt
datasoft @ datasoft-linux ~$ wc -w xyz.txt
10 xyz.txt
datasoft @ datasoft-linux ~$ wc -c xyz.txt
48 xyz.txt
datasoft @ datasoft-linux ~$
sort
The sort filter (alphabetical sort) is used to sort lines of text files.
datasoft @ datasoft-linux ~$ cat xyz.txt
raju das
ayan roy
riju saha
dustu saha
ajoy das
datasoft @ datasoft-linux ~$ sort xyz.txt
ajoy das
ayan roy
dustu saha
raju das
riju saha
datasoft @ datasoft-linux ~$
But the sort filter has many options to tweak its usage. This example shows sorting different columns (column 1 or column 2).
datasoft @ datasoft-linux ~$ sort -k1 abc.txt
Bihar, Andrapradesh, 90
Burdwan, Bhubaneswar, 20
Delhi, Orrisa, 65
Goa, Gujrat, 45
Kolkata , Karnataka, 15
datasoft @ datasoft-linux ~$ sort -k2 abc.txt
Bihar, Andrapradesh, 90
Burdwan, Bhubaneswar, 20
Goa, Gujrat, 45
Kolkata , Karnataka, 15
Delhi, Orrisa, 65
datasoft @ datasoft-linux ~$
The screenshot below shows the difference between an alphabetical sort and a numerical sort (both on the third column).
datasoft @ datasoft-linux ~$ sort -k3 abc.txt
Burdwan, Bhubaneswar, 20
Goa, Gujrat, 45
Delhi, Orrisa, 65
Bihar, Andrapradesh, 90
Kolkata , Karnataka, 15
datasoft @ datasoft-linux ~$ sort -n -k3 abc.txt
Kolkata , Karnataka, 15
Burdwan, Bhubaneswar, 20
Goa, Gujrat, 45
Delhi, Orrisa, 65
Bihar, Andrapradesh, 90
datasoft @ datasoft-linux ~$
uniq
uniq command is used to omit repeated lines from a sorted list.
datasoft @ datasoft-linux ~$ cat abc.txt
Kolkata , Karnataka, 15
Burdwan, Bhubaneswar, 20
Goa, Gujrat, 45
Delhi, Orrisa, 65
Bihar, Andrapradesh, 90
datasoft @ datasoft-linux ~$ sort abc.txt
Bihar, Andrapradesh, 90
Burdwan, Bhubaneswar, 20
Delhi, Orrisa, 65
Goa, Gujrat, 45
Kolkata , Karnataka, 15
datasoft @ datasoft-linux ~$ sort abc.txt |uniq
Bihar, Andrapradesh, 90
Burdwan, Bhubaneswar, 20
Delhi, Orrisa, 65
Goa, Gujrat, 45
Kolkata , Karnataka, 15
datasoft @ datasoft-linux ~$
uniq can also count occurrences with the -c option.
datasoft @ datasoft-linux ~$ sort abc.txt |uniq -c
1 Bihar, Andrapradesh, 90
1 Burdwan, Bhubaneswar, 20
1 Delhi, Orrisa, 65
1 Goa, Gujrat, 45
1 Kolkata , Karnataka, 15
datasoft @ datasoft-linux ~$
comm
Comparing streams (or files) can be done with the comm. By default comm will output three columns. In this example, Abba, Cure and Queen are in both lists, Bowie and Sweet are only in the first file, Turner is only in the second.
datasoft @ datasoft-linux ~$ cat > lebel1.txt
Ape
Bat
Cat
Dog
Mat
Sit
Zip
^C
datasoft @ datasoft-linux ~$ cat > lebel2.txt
Ape
Cat
Dog
Nest
vest
^C
datasoft @ datasoft-linux ~$ comm lebel1.txt lebel2.txt
Ape
Bat
Cat
Dog
Mat
Nest
Sit
vest
Zip
datasoft @ datasoft-linux ~$
The output of comm can be easier to read when outputting only a single column. The digits point out which output columns should not be displayed.
datasoft @ datasoft-linux ~$ comm -12 lebel1.txt lebel2.txt
Ape
Cat
Dog
datasoft @ datasoft-linux ~$ comm -13 lebel1.txt lebel2.txt
Nest
vest
datasoft @ datasoft-linux ~$ comm -23 lebel1.txt lebel2.txt
Bat
Mat
Sit
Zip
datasoft @ datasoft-linux ~$
od
European humans like to work with ascii characters, but computers store files in bytes. The example below creates a simple file, and then uses od to show the contents of the file in hexadecimal bytes
datasoft @ datasoft-linux ~$ cat > sample.txt
ABCDEFGHIJKL
123456789101112
^C
datasoft @ datasoft-linux ~$ od -t x1 sample.txt
0000000 41 42 43 44 45 46 47 48 49 4a 4b 4c 0a 31 32 33
0000020 34 35 36 37 38 39 31 30 31 31 31 32 0a
0000035
datasoft @ datasoft-linux ~$
The same file can also be displayed in octal bytes.
datasoft @ datasoft-linux ~$ od -b sample.txt
0000000 101 102 103 104 105 106 107 110 111 112 113 114 012 061 062 063
0000020 064 065 066 067 070 071 061 060 061 061 061 062 012
0000035
datasoft @ datasoft-linux ~$
And here is the file in ascii (or backslashed) characters.
datasoft @ datasoft-linux ~$ od -c sample.txt
0000000 A B C D E F G H I J K L \n 1 2 3
0000020 4 5 6 7 8 9 1 0 1 1 1 2 \n
0000035
datasoft @ datasoft-linux ~$
sed
Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline).
datasoft @ datasoft-linux ~$ echo level5 | sed 's/5/42/'
level42
datasoft @ datasoft-linux ~$ echo level5 | sed 's/level/high/'
high5
datasoft @ datasoft-linux ~$
Add g for global replacements (all occurrences of the string per line).
datasoft @ datasoft-linux ~$ echo level5 level6 | sed 's/level/high/'
high5 level6
datasoft @ datasoft-linux ~$ echo level5 level6 | sed 's/level/high/g'
high5 high6
datasoft @ datasoft-linux ~$
With d you can remove lines from a stream containing a character.
datasoft @ datasoft-linux ~$ cat > cricket.txt
Sachin Tendulkar, Maharastra
Sourav Ganguly, Kolkata
Mahendra singh Dhoni, Jharkhand
Birat Kohili, Delhi
Birendra Sewag, Delhi
Anil Kumble, Chenni
^C
datasoft @ datasoft-linux ~$ cat cricket.txt | sed '/Delhi/d'
Sachin Tendulkar, Maharastra
Sourav Ganguly, Kolkata
Mahendra singh Dhoni, Jharkhand
Anil Kumble, Chenni
datasoft @ datasoft-linux ~$
pipe examples
who | wc
How many users are logged on to this system ?
datasoft @ datasoft-linux ~$ who
datasoft :0 2014-08-02 10:51 (:0)
datasoft pts/0 2014-08-02 10:54 (:0)
datasoft pts/7 2014-08-02 10:57 (:0)
datasoft pts/14 2014-08-02 14:10 (:0)
datasoft @ datasoft-linux ~$
datasoft @ datasoft-linux ~$ who | wc -l
4
who | cut | sort
Display a sorted list of logged on users.
datasoft @ datasoft-linux ~$ who | cut -d' ' -f1 | sort
datasoft
datasoft
datasoft
datasoft
datasoft @ datasoft-linux ~$
Display a sorted list of logged on users, but every user only once .
datasoft @ datasoft-linux ~$ who | cut -d' ' -f1 | sort | uniq
datasoft
datasoft @ datasoft-linux ~$
grep | cut
Display a list of all bash user accounts on this computer. Users accounts are explained in detail later.
datasoft @ datasoft-linux ~$ grep bash /etc/passwd
root:x:0:0:root:/root:/bin/bash
datasoft:x:1000:1000:datasoft,,,:/home/datasoft:/bin/bash
datasoft @ datasoft-linux ~$ bash /etc/passwd | cut -d: -f1
/etc/passwd: line 1: root:x:0:0:root:/root:/bin/bash: No such file or directory
/etc/passwd: line 2: daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin: No such file or directory
/etc/passwd: line 3: bin:x:2:2:bin:/bin:/usr/sbin/nologin: No such file or directory
Exercise, Practice and Solution:
1. Put a sorted list of all bash users in bashusers.txt.
Code:
grep bash /etc/passwd | cut -d: -f1 | sort > bashusers.txt
2. Put a sorted list of all logged on users in onlineusers.txt.
Code:
who | cut -d' ' -f1 | sort > onlineusers.txt
3. Make a list of all filenames in /etc that contain the string samba.
Code:
ls /etc | grep samba
4. Make a sorted list of all files in /etc that contain the case insensitive string samba.
Code:
ls /etc | grep -i samba | sort
5. Look at the output of /sbin/ifconfig. Write a line that displays only ip address and the subnet mask.
Code:
/sbin/ifconfig | head -2 | grep 'inet ' | tr -s ' ' | cut -d' ' -f3,5
6. Write a line that removes all non-letters from a stream.
Code:
datasoft @ datasoft-linux ~$ cat text
This is, yes really! , a text with ?&* too many str$ange# characters ;-)
datasoft @ datasoft-linux ~$ cat text | tr -d ',!$?.*&^%#@;()-'
This is yes really a text with too many strange characters
7. Write a line that receives a text file, and outputs all words on a separate line.
Code:
datasoft @ datasoft-linux ~$ cat text2
it is very cold today without the sun
datasoft @ datasoft-linux ~$ cat text2 | tr ' ' '\n'
it
is
very
cold
today
without
the
sun
8. Write a spell checker on the command line. (There may be a dictionary in /usr/share/
dict/ .)
Code:
datasoft @ datasoft-linux ~$ echo "The zun is shining today" > text
datasoft @ datasoft-linux ~$ cat > DICT
is
shining
sun
the
today
datasoft @ datasoft-linux ~$ cat text | tr 'A-Z ' 'a-z\n' | sort | uniq | comm -23 - DICT
zun
You could also add the solution from question number 6 to remove non-letters, and tr -s '
'to remove redundant spaces.
Previous:
Linux I/O redirection
Next:
Linux Basic Unix tools
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics