- Apache Tika - Home
- Apache Tika - Overview
- Apache Tika - Architecture
- Apache Tika - Environment
- Apache Tika - Referenced API
- Apache Tika - File Formats
- Apache Tika - Document Type Detection
- Apache Tika - Content Extraction
- Apache Tika - Metadata Extraction
- Apache Tika - Language Detection
- Apache Tika - GUI
Apache Tika Examples
- Apache Tika - Extracting PDF
- Apache Tika - Extracting ODF
- Apache Tika - Extracting MS-Office Files
- Apache Tika - Extracting Text Document
- Apache Tika - Extracting HTML Document
- Apache Tika - Extracting XML Document
- Apache Tika - Extracting .class File
- Apache Tika - Extracting JAR File
- Apache Tika - Extracting Image File
- Apache Tika - Extracting mp4 Files
- Apache Tika - Extracting mp3 Files
Apache Tika Resources
Apache Tika - Extracting MS Office Files
Example - Extracting Content and Metadata from an Excel Sheet
Given below is the program to extract content and metadata from a Microsoft Office Excel Sheet.
TikaDemo.java
package com.tutorialspoint.tika;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
public class TikaDemo {
public static void main(final String[] args) throws IOException, TikaException, SAXException {
//detecting the file type
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File("D:/projects/example.xlsx"));
ParseContext pcontext = new ParseContext();
//OOXml parser
OOXMLParser msofficeparser = new OOXMLParser ();
msofficeparser.parse(inputstream, handler, metadata,pcontext);
System.out.println("Contents of the document:" + handler.toString());
System.out.println("Metadata of the document:");
String[] metadataNames = metadata.names();
for(String name : metadataNames) {
System.out.println(name + ": " + metadata.get(name));
}
}
}
Output
Here we are passing the following sample Excel file.
The given Excel file has the following properties −
After executing the above program you will get the following output.
Contents of the document: Sheet1 Name Age Designation Salary Ramu 50 Manager 50000 Raheem 40 Assistant Manager 40000 Robert 30 Supervisor 30000 Sita 25 Clerk 25000 Sameer 25 Section Incharge 20000 Metadata of the document: extended-properties:AppVersion: 16.0300 protected: false extended-properties:Application: Microsoft Excel meta:last-author: Mahesh Parashar extended-properties:DocSecurityString: None dc:creator: Mahesh Parashar extended-properties:Company: dcterms:created: 2025-10-27T10:56:20Z dcterms:modified: 2025-10-27T10:58:35Z X-TIKA:origResourceName: D:\Projects\ Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet dc:publisher:
Advertisements