Linux Admin - sort Command



sort has several optimizations for sorting based on datatypes. Theis command writes sorted concatenation of all files to standard output. However, be weary, complex sort operations on large files of a few GigaBytes can impede the system performance.

When running a production server with limited CPU and/or memory availability, it is recommended to offload these larger files to a workstation for sorting operations during peak business hours.

Switch Action
-b Ignore leading blank lines
-d Dictionary order, consider only blanks and alphanumeric characters
-f Ignore case, folding lower and upper characters
-g General numeric sort
-M Month sort
-h Human readable numeric sort 1KB, 1MB, 1GB
-R Random sort
-m Merge already sorted files

Feel free to copy the tabular text below and follow along with our sort examples. Be sure each column is separated with a tab character.

first name last name office
Ted Daniel 101
Jenny Colon 608
Dana Maxwell 602
Marian Little 903
Bobbie Chapman 403
Nicolas Singleton 203
Dale Barton 901
Aaron Dennis 305
Santos Andrews 504
Jacqueline Neal 102
Billy Crawford 301
Rosa Summers 405
Kellie Curtis 903
Matt Davis 305
Gina Carr 902
Francisco Gilbert 101
Sidney Mack 901
Heidi Simmons 204
Cristina Torres 206
Sonya Weaver 403
Donald Evans 403
Gwendolyn Chambers 108
Antonia Lucas 901
Blanche Hayes 603
Carrie Todd 201
Terence Anderson 501
Joan Parsons 102
Rose Fisher 304
Malcolm Matthews 702

Using sort in its most basic, default form −

[root@centosLocal centos]# sort ./Documents/names.txt  
Aaron         Dennis         305 
Antonia       Lucas          901 
Billy         Crawford       301 
Blanche       Hayes          603 
Bobbie        Chapman        403 
Carrie        Todd           201 
Cristina      Torres         206 
Dale          Barton         901 
Dana          Maxwell        602 
Donald        Evans          403 
Francisco     Gilbert        101 
Gina          Carr           902 
Gwendolyn     Chambers       108 
Heidi         Simmons        204 
Jacqueline    Neal           102 
Jenny         Colon          608 
Joan          Parsons        102 
Kellie        Curtis         903 
Malcolm       Matthews       702 
Marian        Little         903 
Matt          Davis          305 
Nicolas      Singleton       203 
Rosa         Summers         405 
Rose         Fisher          304 
Santos       Andrews         504 
Sidney       Mack            901 
Sonya        Weaver          403 
Ted          Daniel          101 
Terence      Anderson        501

[root@centosLocal centos]#

Sometimes, we will want to sort files on another column, other than the first column. A sort can be applied to other columns with the -t and -k switches.

-t : define a file delimiter 
-k : key count to sort by (think of this as a column specified from the delimiter. 
-n : sort in numeric order

Note − In some examples, we have used cat piped into grep. This was to demonstrate the concepts of piping commands. Outputting cat into grep can increase the system load hundreds of times-over with large files, while adding complex sorting. This will make veteran Linux administrators cringe.

Now that we have a good idea of how the pipe character works, this poor practice will be avoided in the chapters to follow. The key to keeping the system resources low with commands like sort, is learning to use them efficiently.

[root@centosLocal centos]# sort -t '    ' -k 3n ./Documents/names.txt  
Ted           Daniel           101 
Francisco     Gilbert          101 
Jacqueline    Neal             102 
Joan          Parsons          102 
Gwendolyn     Chambers         108 
Carrie        Todd             201 
Nicolas       Singleton        203 
Heidi         Simmons          204 
Cristina      Torres           206 
Billy         Crawford         301 
Rose          Fisher           304 
Aaron         Dennis           305 
Matt          Davis            305 
Bobbie        Chapman          403 
Donald        Evans            403 
Sonya         Weaver           403 
Rosa          Summers          405 
Terence       Anderson         501 
Santos        Andrews          504 
Dana          Maxwell          602 
Blanche       Hayes            603 
Jenny         Colon            608 
Malcolm       Matthews         702
Antonia       Lucas            901 
Dale          Barton           901 
Sidney        Mack             901 
Gina          Carr             902 
Kellie        Curtis           903  
Marian        Little           903 

[root@centosLocal centos]#

Now we have our list sorted by office number. The astute reader will notice something out of the ordinary after the -t switch; single quotes separated by what appears to be a few spaces. This was actually a literal Tab character sent to the shell. A literal Tab can be sent to the BASH shell using the key combination of: control+Tab+v.

Most shells will interpret the Tab key as a command. For example, auto-completion in BASH. The shell needs an escape sequence to recognize a literal Tab character. This is one reason why Tabs are not the best choice for delimiters with Linux. Generally speaking, it is best to avoid both spaces and tabs, as they can cause issues when scripting a shell.

Let us fix our names.txt file.

[root@centosLocal centos]# sed -i 's/\t/:/g' ./Documents/names.txt && 
cat ./Documents/names.txt 
Ted:Daniel:101 
Jenny:Colon:608 
Dana:Maxwell:602 
Marian:Little:903 
Bobbie:Chapman:403 
Nicolas:Singleton:203 
Dale:Barton:901 
Aaron:Dennis:305 
Santos:Andrews:504 
Jacqueline:Neal:102 
Billy:Crawford:301 
Rosa:Summers:405 
Kellie:Curtis:903: 
Matt:Davis:305 
Gina:Carr:902 
Francisco:Gilbert:101 
Sidney:Mack:901 
Heidi:Simmons:204 
Cristina:Torres:206
Sonya:Weaver:403 
Donald:Evans:403 
Gwendolyn:Chambers:108 
Antonia:Lucas:901 
Blanche:Hayes:603 
Carrie:Todd:201 
Terence:Anderson:501 
Joan:Parsons:102 
Rose:Fisher:304 
Malcolm: Matthews:702 
[root@centosLocal centos]#

Now, it will be much easier to work with the text file. If someone demands it be returned to Tab delimited for another application (this is common), we can accomplish that task easily as −

sed -i 's/:/\t/g' ./Documents/names.txt

Common end-user applications will work good with Tabs as a delimiter (An Accountant does not want to see a colon separating data columns, while working on Spreadsheets.). So learning to transform characters back and forth is a good practice; it comes up often.

Note − Office uses word-processors and spreadsheets with a Graphical User Interface, running on Windows. Hence, it is common for Linux Administrators to get good at completing transformation actions, accommodating end office-users (most times, our boss will be an end user).

Introduced was a command called sed. sed is a stream editor and can be used as a noninteractive text editor for manipulating streams of text and files. We will learn more about sed later. However, keep in mind for now, using sed, we avoided a need to pipe several filter commands when changing our text file. Thus, making the most efficient use of the tools at hand.

We also introduced a Bash shell operator: &&. && will run the second command only if the first command completes with a successful status of "0".

[root@centosLocal centos]# ls /noDir &&  echo "You cannot see me" 
ls: cannot access /noDir: No such file or directory 
[root@centosLocal centos]# ls /noDir ;  echo "You cannot see me" 
ls: cannot access /noDir: No such file or directory 
You cannot see me 
[root@centosLocal centos]# ls /noDir ;  echo "You cannot see me"

In the above code, note the difference between && and ;? The first will only run the second command when the first has completed successfully, while ; simply chains the commands. More on this when we get to scripting shell commands.

Advertisements