
- Apache POI Word - Home
- Apache POI Word - Overview
- Apache POI Word - Installation
- Apache POI Word - Core Classes
- Apache POI Word - Document
- Apache POI Word - Paragraph
- Apache POI Word - Borders
- Apache POI Word - Tables
- Apache POI Word - Font & Alignment
- Apache POI Word - Text Extraction
- Apache POI Word Resources
- Apache POI Word - Quick Guide
- Apache POI Word - Useful Resources
- Apache POI Word - Discussion
Apache POI Word - Text Extraction
This chapter explains how to extract simple text data from a Word document using Java. In case you want to extract metadata from a Word document, make use of Apache Tika.
For .docx files, we use the class org.apache.poi.xwpf.extractor.XPFFWordExtractor that extracts and returns simple data from a Word file. In the same way, we have different methodologies to extract headings, footnotes, table data, etc. from a Word file.
The following code shows how to extract simple text from a Word file −
// create a document object from existing work document XWPFDocument docx = new XWPFDocument(new FileInputStream("example.docx")); // using XWPFWordExtractor Class XWPFWordExtractor we = new XWPFWordExtractor(docx); // extract the text System.out.println(we.getText());
Example - Extracting Text from a Document
ApachePoiDocDemo.java
package com.tutorialspoint; import java.io.FileInputStream; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; public class ApachePoiDocDemo { public static void main(String[] args)throws Exception { XWPFDocument docx = new XWPFDocument(new FileInputStream("example.docx")); //using XWPFWordExtractor Class XWPFWordExtractor we = new XWPFWordExtractor(docx); System.out.println(we.getText()); we.close(); } }
Output
It will generate the following output −
At tutorialspoint.com, we strive hard to provide quality tutorials for self-learning purpose in the domains of Academics, Information Technology, Management and Computer Programming Languages. The endeavour started by Mohtashim, an AMU alumni, who is the founder and the managing director of Tutorials Point (I) Pvt. Ltd. He came up with the website tutorialspoint.com in year 2006 with the helpof handpicked freelancers, with an array of tutorials for computer programming languages.
Advertisements