Extract Content from Excel Sheet


Advertisements

Problem Description

How to extract content from an excel sheet using java.

Solution

Following is the program to extract content from an excel sheet using java.

import java.io.File;
import java.io.FileInputStream;

import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
import org.apache.tika.sax.BodyContentHandler;

public class ExtractContentFromExcel {
   public static void main(String args[]) throws Exception {

      //detecting the file type
      BodyContentHandler handler = new BodyContentHandler();
      
      Metadata metadata = new Metadata();
      FileInputStream inputstream = new FileInputStream(new File(
         "C:/tika/excelExample.xlsx"));
      
      ParseContext pcontext = new ParseContext();

      //OOXml parser
      OOXMLParser  msofficeparser = new OOXMLParser ();
      
      msofficeparser.parse(inputstream, handler, metadata,pcontext);
      System.out.println("Contents of the document:" + handler.toString());
      System.out.println("Metadata of the document:");
      String[] metadataNames = metadata.names();

      for(String name : metadataNames) {
         System.out.println(name + ": " + metadata.get(name));
      }
   }
}

Input

Excel Example

Output

Contents of the document:Sheet1 
   ID   NAME     BRANCH     PERCENTAGE       EMAIL 
   1     Ram       IT          85         ram123@gmail.com 
   2    Rahim      EEE         95         rahim123@gmail.com 
   3    Robert     ECE         90         robert123@gmail.com  

Metadata of the document: 
date: 2017-05-19T09:35:57Z 
extended-properties:AppVersion: 16.0300 
meta:creation-date: 2015-06-05T18:17:20Z 
extended-properties:Application: Microsoft Excel 
extended-properties:Company:  
Creation-Date: 2015-06-05T18:17:20Z 
dcterms:created: 2015-06-05T18:17:20Z 
Last-Modified: 2017-05-19T09:35:57Z 
dcterms:modified: 2017-05-19T09:35:57Z 
Last-Save-Date: 2017-05-19T09:35:57Z 
Application-Version: 16.0300 
protected: false 
meta:save-date: 2017-05-19T09:35:57Z 
Application-Name: Microsoft Excel 
modified: 2017-05-19T09:35:57Z 
publisher:  
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 
dc:publisher: 
java_apache_tika
Advertisements