- Apache POI Word Tutorial
- Apache POI Word - Home
- Apache POI Word - Overview
- Apache POI Word - Installation
- Apache POI Word - Core Classes
- Apache POI Word - Document
- Apache POI Word - Paragraph
- Apache POI Word - Borders
- Apache POI Word - Tables
- Apache POI Word - Font & Alignment
- Apache POI Word - Text Extraction
- Apache POI Word Resources
- Apache POI Word - Quick Guide
- Apache POI Word - Useful Resources
- Apache POI Word - Discussion
Apache POI Word - Text Extraction
This chapter explains how to extract simple text data from a Word document using Java. In case you want to extract metadata from a Word document, make use of Apache Tika.
For .docx files, we use the class org.apache.poi.xwpf.extractor.XPFFWordExtractor that extracts and returns simple data from a Word file. In the same way, we have different methodologies to extract headings, footnotes, table data, etc. from a Word file.
The following code shows how to extract simple text from a Word file −
import java.io.FileInputStream; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import org.apache.poi.xwpf.usermodel.XWPFDocument; public class WordExtractor { public static void main(String[] args)throws Exception { XWPFDocument docx = new XWPFDocument(new FileInputStream("createparagraph.docx")); //using XWPFWordExtractor Class XWPFWordExtractor we = new XWPFWordExtractor(docx); System.out.println(we.getText()); } }
Save the above code as WordExtractor.java. Compile and execute it from the command prompt as follows −
$javac WordExtractor.java $java WordExtractor
It will generate the following output −
At tutorialspoint.com, we strive hard to provide quality tutorials for self-learning purpose in the domains of Academics, Information Technology, Management and Computer Programming Languages.
Advertisements