
split Command in Linux
Large files are sometimes a headache to work with, especially working with large datasets, backups, or logs which need to be divided into small pieces that are easy to work with. The split command on Linux is one simple yet very effective way around this issue. It allows users to break files into smaller chunks based on size, line count, or custom suffix preferences.
The split command is extremely useful in situations where moving large files between systems, parallel processing, or even structuring data in an efficient manner is needed.
Table of Contents
Here is a comprehensive guide to the options available with the split command −
- Overview of split Command
- Syntax of the split Command
- Options Available for split Command
- Examples of split Command in Linux
Overview of split Command
The split command is a command-line utility built into Linux to divide one large file into smaller ones. All the chunks the split command makes can be easily processed separately since each of them can be opened as a whole file, suitable for use where memory constraints, low-speed data transfer, or administrative requirements exist. It's part of GNU Core Utilities so it will be found on a majority of Linux distributions.
Syntax of the split Command
The basic syntax of the split command is as follows −
split [options] [input_file] [prefix]
Where,
- [input_file] − The file to be split. If omitted, the command reads from standard input.
- [prefix] − Appends a prefix to the output file names. The default prefix is x.
Options Available for split Command
The split command offers a number of options of customization to suit different requirements −
Option | Description |
---|---|
-a, --suffix-length=N | Produces suffixes of length N for the output file names, and has a default length of 2 characters. |
--additional-suffix=SUFFIX | Appends a user-specified suffix (SUFFIX) to the generated file names so they can be distinguished or sorted more easily. |
-b, --bytes=SIZE | Splits the input file into pieces of specified size in bytes (e.g., 5M for 5 MB, or 1024K for 1024 KB). |
-C, --line-bytes=SIZE | Splits the input file into chunks with each chunk containing no more than SIZE bytes of records. |
-d | Uses numeric suffixes (starting from 0) instead of the default alphabetical suffix. |
--numeric-suffixes[=FROM] | Similar to -d, but with an option to set the starting value of the numeric suffix. |
-x | Operates with hexadecimal suffixes starting from 0 (0, 1, 2, .). |
--hex-suffixes[=FROM] | Same as -x but with the facility to set the starting value of the hexadecimal suffixes. |
-e, --elide-empty-files | Prevents the generation of empty output files during splitting. |
--filter=COMMAND | Pipes every split chunk through a shell command (COMMAND). |
-l, --lines=NUMBER | Splits the file into chunks based on the given number of lines. |
-n, --number=CHUNKS | Generates the specified number of output files (CHUNKS) by dividing the input file equally. |
-t, --separator=SEP | Divides records with a user-specified separator (SEP) instead of a newline. |
-u, --unbuffered | Copies input to output as early as possible when using the -n r/. option for division. |
--verbose | Prints detailed diagnostics for each output file prior to opening it.
--help |
--help | Displays a list of options and usage. |
--version | Outputs the version information of the split command. |
Examples of split Command in Linux
The split command offers several options to customize file splitting for various real-world use cases. Below are the practical applications of each option −
- Splitting a File into Chunks of Specific Size
- Splitting a File by Line Count
- Using Numeric Suffixes for Sorted Output
- Adding a Custom Suffix to File Names
- Splitting and Compressing Files Simultaneously
Splitting a File into Chunks of Specific Size
When dealing with large backup files, splitting them into manageable chunks simplifies storage and transfer. For instance, dividing a compressed archive into 10 MB segments ensures smooth uploads.
split -b 10M backup.tar.gz backup_chunk_
- The -b 10M option specifies the size of each chunk, which will be 10 MB.
- backup.tar.gz is the input file, and the output files will be named with the prefix backup_chunk_. Output files will look like backup_chunk_aa, backup_chunk_ab, and so on.

This method is ideal when you need to upload a large backup file to a system with a file size limit, such as email attachments or cloud platforms.
Splitting a File by Line Count
Log files or datasets often contain thousands of records. Splitting them into smaller sections improves debugging, analysis, and data processing.
split -l 5000 logs.txt log_part_
- The -l 5000 option ensures that each output file contains 5,000 lines from the input file.
- The input file logs.txt is divided, and the output files are named starting with the prefix log_part_.

This approach is helpful when you need to analyze server logs or large data files in smaller sections.
Using Numeric Suffixes for Sorted Output
By default, split generates alphabetical suffixes (aa, ab). For automated processing, switching to numeric suffixes makes sorting easier.
split -d -b 1M dataset.csv data_chunk_
- The -d option generates numeric suffixes like data_chunk_00, data_chunk_01, etc.
- Each output file contains 1 MB of data from the input file, as specified by the -b 1M option.

This is particularly useful when you are automating tasks that require sorted file names, such as batch processing.
Adding a Custom Suffix to File Names
Some tools require specific file extensions for processing. Adding a .log suffix ensures compatibility with log analysis tools.
split -b 2M input.log split_ --additional-suffix=.log
- The --additional-suffix=.log option appends .log to the end of each output file name.
- Files generated will look like split_aa.log, split_ab.log, etc.

Adding a .log suffix ensures compatibility with log analysis tools that require a specific file extension.
Splitting and Compressing Files Simultaneously
When working with large datasets, itâs best to compress each chunk as it is split to save storage space.
split -b 5M datafile.dat chunk_ --filter="gzip > $FILE.gz"
- The --filter option processes each chunk through the gzip command, compressing it into .gz format.
- $FILE represents the name of the current chunk being processed.

This is highly effective for saving storage space when splitting and compressing large datasets, such as database dumps.
Conclusion
The split command provides extensive support for file splitting by size, line number, suffix type, or delimiter. Because of its capability to work on files efficiently, provide dynamic names, and sort out shell commands, split is a required tool to manipulate large files within Linux systems. Knowing how to use these options and how to apply them can help you tailor the split command to meet your individual data handling requirements.