Split a File at Given Line Number

The split command in Linux is a powerful utility used to divide large files into smaller, more manageable chunks. This is particularly useful when dealing with log files, databases, or large datasets that need to be processed in smaller portions or transferred across systems with size limitations.

How the Split Command Works

The split command reads an input file and creates multiple output files based on specified criteria such as number of lines, file size, or patterns. By default, it generates files with alphabetical suffixes starting from aa, ab, ac, and so on.

Basic Syntax

split [OPTIONS] [INPUT_FILE] [OUTPUT_PREFIX]
  • -l lines Splits based on number of lines per file

  • -b size Splits based on file size (e.g., 1M, 100K)

  • -d Uses numeric suffixes instead of alphabetic

  • --suffix-length=N Sets suffix length to N characters

Examples

Split by Line Count

To split bigfile.txt into files containing 1000 lines each:

split -l 1000 bigfile.txt chunk_

This creates files: chunk_aa, chunk_ab, chunk_ac, etc.

Split with Numeric Suffixes

To use numeric suffixes instead of alphabetic ones:

split -l 500 -d bigfile.txt part_

Output files: part_00, part_01, part_02, etc.

Split with Custom Suffix Length

To create files with 4-digit numeric suffixes:

split -l 100 -d --suffix-length=4 bigfile.txt segment_

Output files: segment_0000, segment_0001, segment_0002, etc.

Split with File Extensions

To add file extensions to the split files:

split -l 2000 --additional-suffix=.txt bigfile.txt split_

Output files: split_aa.txt, split_ab.txt, split_ac.txt, etc.

Split by File Size

To split based on file size rather than line count:

split -b 10M largefile.log size_chunk_

Creates files of approximately 10 MB each.

Alternative Commands

Command Use Case Example
csplit Split by patterns or line numbers csplit file.txt /pattern/ {*}
awk Split based on field values awk -F',' '{print > $1".txt"}' file.csv
sed Complex pattern-based splitting sed -n '1,100p' file > part1.txt

Using csplit for Pattern-Based Splitting

The csplit command allows splitting at specific patterns:

csplit logfile.txt /ERROR/ {*}

This splits the file at every line containing "ERROR".

Practical Use Cases

  • Log file processing Breaking large log files for analysis

  • Database exports Splitting large SQL dumps for easier import

  • Data transfer Creating smaller files for network transfer

  • Parallel processing Distributing work across multiple processes

Conclusion

The split command provides an efficient and flexible way to divide large files into smaller, manageable pieces. With options for line-based, size-based, and pattern-based splitting, it serves various file processing needs. Alternative tools like csplit, awk, and sed offer additional functionality for more complex splitting requirements.

Updated on: 2026-03-17T09:01:38+05:30

5K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements