Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Split a File at Given Line Number
The split command in Linux is a powerful utility used to divide large files into smaller, more manageable chunks. This is particularly useful when dealing with log files, databases, or large datasets that need to be processed in smaller portions or transferred across systems with size limitations.
How the Split Command Works
The split command reads an input file and creates multiple output files based on specified criteria such as number of lines, file size, or patterns. By default, it generates files with alphabetical suffixes starting from aa, ab, ac, and so on.
Basic Syntax
split [OPTIONS] [INPUT_FILE] [OUTPUT_PREFIX]
-l lines Splits based on number of lines per file
-b size Splits based on file size (e.g., 1M, 100K)
-d Uses numeric suffixes instead of alphabetic
--suffix-length=N Sets suffix length to N characters
Examples
Split by Line Count
To split bigfile.txt into files containing 1000 lines each:
split -l 1000 bigfile.txt chunk_
This creates files: chunk_aa, chunk_ab, chunk_ac, etc.
Split with Numeric Suffixes
To use numeric suffixes instead of alphabetic ones:
split -l 500 -d bigfile.txt part_
Output files: part_00, part_01, part_02, etc.
Split with Custom Suffix Length
To create files with 4-digit numeric suffixes:
split -l 100 -d --suffix-length=4 bigfile.txt segment_
Output files: segment_0000, segment_0001, segment_0002, etc.
Split with File Extensions
To add file extensions to the split files:
split -l 2000 --additional-suffix=.txt bigfile.txt split_
Output files: split_aa.txt, split_ab.txt, split_ac.txt, etc.
Split by File Size
To split based on file size rather than line count:
split -b 10M largefile.log size_chunk_
Creates files of approximately 10 MB each.
Alternative Commands
| Command | Use Case | Example |
|---|---|---|
| csplit | Split by patterns or line numbers | csplit file.txt /pattern/ {*} |
| awk | Split based on field values | awk -F',' '{print > $1".txt"}' file.csv |
| sed | Complex pattern-based splitting | sed -n '1,100p' file > part1.txt |
Using csplit for Pattern-Based Splitting
The csplit command allows splitting at specific patterns:
csplit logfile.txt /ERROR/ {*}
This splits the file at every line containing "ERROR".
Practical Use Cases
Log file processing Breaking large log files for analysis
Database exports Splitting large SQL dumps for easier import
Data transfer Creating smaller files for network transfer
Parallel processing Distributing work across multiple processes
Conclusion
The split command provides an efficient and flexible way to divide large files into smaller, manageable pieces. With options for line-based, size-based, and pattern-based splitting, it serves various file processing needs. Alternative tools like csplit, awk, and sed offer additional functionality for more complex splitting requirements.
