PDFBox - Splitting a PDF Document


In the previous chapter, we have seen how to add JavaScript to a PDF document. Let us now learn how to split a given PDF document into multiple documents.

Splitting the Pages in a PDF Document

You can split the given PDF document in to multiple PDF documents using the class named Splitter. This class is used to split the given PDF document into several other documents.

Following are the steps to split an existing PDF document

Step 1: Loading an Existing PDF Document

Load an existing PDF document using the static method load() of the PDDocument class. This method accepts a file object as a parameter, since this is a static method you can invoke it using class name as shown below.

File file = new File("path of the document") 
PDDocument document = PDDocument.load(file);

Step 2: Instantiate the Splitter Class

The class named Splitter contains the methods to split the given PDF document therefore, instantiate this class as shown below.

Splitter splitter = new Splitter();

Step 3: Splitting the PDF Document

You can split the given document using the Split() method of the Splitter class this class. This method accepts an object of the PDDocument class as a parameter.

List<PDDocument> Pages = splitter.split(document);

The split() method splits each page of the given document as an individual document and returns all these in the form of a list.

Step 4: Creating an Iterator Object

In order to traverse through the list of documents you need to get an iterator object of the list acquired in the above step, you need to get the iterator object of the list using the listIterator() method as shown below.

Iterator<PDDocument> iterator = Pages.listIterator();

Step 5: Closing the Document

Finally, close the document using close() method of PDDocument class as shown below.



Suppose, there is a PDF document with name sample.pdf in the path C:\PdfBox_Examples\ and this document contains two pages — one page containing image and another page containing text as shown below.

Split page

This example demonstrates how to split the above mentioned PDF document. Here, we will split the PDF document named sample.pdf into two different documents sample1.pdf and sample2.pdf. Save this code in a file with name SplitPages.java.

import org.apache.pdfbox.multipdf.Splitter; 
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File; 
import java.io.IOException; 
import java.util.List; 
import java.util.Iterator;
public class SplitPages {
   public static void main(String[] args) throws IOException {

      //Loading an existing PDF document
      File file = new File("C:/PdfBox_Examples/sample.pdf");
      PDDocument document = PDDocument.load(file); 

      //Instantiating Splitter class
      Splitter splitter = new Splitter();

      //splitting the pages of a PDF document
      List<PDDocument> Pages = splitter.split(document);

      //Creating an iterator 
      Iterator<PDDocument> iterator = Pages.listIterator();

      //Saving each page as an individual document
      int i = 1;
      while(iterator.hasNext()) {
         PDDocument pd = iterator.next();
         pd.save("C:/PdfBox_Examples/sample"+ i++ +".pdf");
      System.out.println("Multiple PDF’s created");

Compile and execute the saved Java file from the command prompt using the following commands

javac SplitPages.java 
java SplitPages

Upon execution, the above program encrypts the given PDF document displaying the following message.

Multiple PDF’s created

If you verify the given path, you can observe that multiple PDFs were created with names sample1 and sample2 as shown below.

Split First Split Second