pdfinfo Command in Linux



The pdfinfo command in Linux extracts information from a PDF document. The pdfinfo extracts metadata and detailed information about a PDF file, such as title, author, subject, page count, creation date, and other information.

Table of Contents

Here is a comprehensive guide to the options available with the pdfinfo command −

Syntax of pdfinfo Command

The syntax of the pdfinfo command is as follows −

pdfinfo [options] [PDF-file]

The [options] field is used to specify various options to change the command's behavior, and the [PDF-file] field is used to specify the PDF file whose metadata needs to be extracted.

pdfinfo Command Options

The options of the Linux pdfinfo command are listed below −

Options Description
-f number Start examining from this page.
-l number Stop examining at this page.
-box Display page bounding boxes such as MediaBox, CropBox
-meta Show document-level metadata from the PDF Catalog
-custom Display both custom and standard metadata
-js Display all JavaScript in the PDF
-struct Display the logical document structure (for tagged files)
-struct-text Display text contents along with document structure (for tagged files)
-isodates Display the dates in ISO-8601 format
-rawdate Display the undecoded date strings directly from the PDF file
-dests Display all named destinations in the PDF
-url Display all URLs inside PDF objects (does not scan text content)
-enc encoding-name Set text encoding (default: UTF-8)
-listenc List available text encodings
-opw password Provide owner password to bypass restrictions
-upw password Provide user password to access the file
-v Show version and copyright info
-h, -help, --help, ? Display usage instructions

Examples of pdfinfo Command in Linux

In this section, the usage of the pdfinfo command will be discussed with examples −

Displaying Information of a PDF File

To display information of a PDF file, use the pdfinfo command with the PDF file name or path.

pdfinfo document.pdf
pdfinfo Command in Linux1

Displaying Information of a Range of Pages

To display information of a specific range of pages, use the -f and -l options with the pdfinfo command. For example, to display information from pages 2 to 4, use the command given below −

pdfinfo -f 2 -l 4 document.pdf
pdfinfo Command in Linux2

Displaying Bounding Box

In a PDF file, a bounding box is a rectangular frame that defines the outer limits or bounds of an object, page, or graphic element. It is used to determine the size, position, and space an object occupies.

To display the bounding boxes for each page, such as MediaBox, CropBox, BleedBox, TrimBox, and ArtBox, use the -box option −

pdfinfo -box document.pdf
pdfinfo Command in Linux3

The above command outputs detailed information about the dimensions of the various bounding boxes used in the PDF layout. This is useful for printing and layout design.

Displaying Meta

To display the document level meta of a PDF file, use the -meta option −

pdfinfo -meta document.pdf

To display custom and standard metadata, use the -custom option −

pdfinfo -custom document.pdf

Listing Supported Encodings

To list the supported text encodings by the pdfinfo command use the -listenc option −

pdfinfo -listenc document.pdf
pdfinfo Command in Linux4

Setting Encoding

By default, the text output is UTF-8 encoded. To use a different encoding use the -enc option −

pdfinfo -enc UTF-16 document.pdf

This allows compatibility with systems that require non-UTF-8 text encoding.

Handling Password Protected PDFs

If a PDF has security settings enabled by the owner, supplying the owner’s password bypasses these restrictions. To bypass all the restrictions, use the owner password with the -opw option −

pdfinfo -opw owner_password document.pdf

To unlock an encrypted PDF for reading by providing the user password using -upw option −

pdfinfo -upw user_password document.pdf

Displaying All URLs in a PDF File

To display all the URLs in a PDF file, use the -url option −

pdfinfo -url document.pdf

Displaying Help

To display help related to the pdfinfo command, use the -h option −

pdfinfo -h

Conclusion

The Linux pdfinfo command is a handy tool for extracting detailed metadata and information from PDF documents. It provides insights into attributes such as title, author, subject, page count, and creation date. Its behavior can be customized with various options, including examining specific page ranges, displaying bounding boxes, or accessing document-level metadata. It supports handling password-protected files and allows text encoding adjustments for compatibility.

In this tutorial, we explained the pdfinfo command, its syntax, options, and usage in Linux with examples.

Advertisements