Apache Tika - Extracting ODP



Example - Extracting Content and Metadata from a ODP Presentation

Given below is the program to extract content and metadata from a ODP.

TikaDemo.java

package com.tutorialspoint.tika;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.odf.OpenDocumentParser;
import org.apache.tika.sax.BodyContentHandler;

import org.xml.sax.SAXException;

public class TikaDemo {

   public static void main(final String[] args) throws IOException,SAXException, TikaException {

      //detecting the file type
      BodyContentHandler handler = new BodyContentHandler();
      Metadata metadata = new Metadata();
      FileInputStream inputstream = new FileInputStream(new File("D:/projects/example.odp"));
      ParseContext pcontext = new ParseContext();
      
      //Open Document Parser
      OpenDocumentParser openofficeparser = new OpenDocumentParser (); 
      openofficeparser.parse(inputstream, handler, metadata,pcontext); 
      System.out.println("Contents of the document:" + handler.toString());
      System.out.println("Metadata of the document:");
      String[] metadataNames = metadata.names();
      
      for(String name : metadataNames) {		        
         System.out.println(name + " :  " + metadata.get(name)); 
      }
   }
}

Output

Given below is snapshot of example.odp file.

Presentation

This document has the following properties −

Example2

After compiling the program, you will get the following output.

Output

Contents of the document:
Apache Tika	

	Apache Tika is a framework for content type detection and content extraction which was designed by Apache software foundation. It detects and extracts metadata and structured text content from different types of documents such as spreadsheets, text documents, images or PDFs including audio or video input formats to certain extent. 

Metadata of the document:
meta:paragraph-count :  2
meta:word-count :  57
odf:version :  1.3
dc:creator :  Mahesh Parashar
extended-properties:TotalTime :  PT150S
generator :  MicrosoftOffice/14.0 MicrosoftPowerPoint
dcterms:created :  2025-10-27T10:44:41Z
dcterms:modified :  2025-10-27T10:47:12Z
editing-cycles :  1
Content-Type :  application/vnd.oasis.opendocument.presentation
Advertisements