How to read data from PDF file and display on console in Java?

There are several libraries to read data from a pdf using Java. Let us see how to read data from a PDF document and display it on the console using a library named PDFBox.

You can extract text using the getText() method of the PDFTextStripper class. This class extracts all the text from the given PDF document to use this.

  • Load an existing PDF document using the static method load() of the PDDocument class.

  • Instantiate the PDFTextStripper class.

  • the contents of the PDF page to a String using the getText() method of the PDFTextStripper class.

  • Finally, close the document using the close() method of the PDDocument class as shown below.


Assume we have a pdf with name sample.PDF in the directory D:// as shown below −

Following the Java program reads the contents of the above-mentioned PDF document and displays them on the console.

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
public class PdfToConsole {
   public static void main(String args[]) throws IOException {
      //Loading an existing document
      File file = new File("D://Sample.pdf");
      PDDocument document = PDDocument.load(file);
      //Instantiate PDFTextStripper class
      PDFTextStripper pdfStripper = new PDFTextStripper();
      //Retrieving text from PDF document
      String text = pdfStripper.getText(document);
      //Closing the document


Tutorials Point originated from the idea that there exists a class of readers who respond
better to online content and prefer to learn new skills at their own pace from the comforts 
of their drawing rooms.
The journey commenced with a single tutorial on HTML in 2006 and elated by the response it
generated, we worked our way to adding fresh tutorials to our repository which now proudly 
flaunts a wealth of tutorials and allied articles on topics ranging from
programming languages to web designing to academics and much more.