TIKA - File Formats
File Formats Supported by Tika
The following table shows the file formats Tika supports.
| File format | Package Library | Class in Tika |
|---|---|---|
| XML | org.apache.tika.parser.xml | XMLParser |
| HTML | org.apache.tika.parser.html and it uses Tagsoup Library | HtmlParser |
| MS-Office compound document Ole2 till 2007 ooxml 2007 onwards |
org.apache.tika.parser.microsoft org.apache.tika.parser.microsoft.ooxml and it uses Apache Poi library |
OfficeParser(ole2) OOXMLParser (ooxml) |
| OpenDocument Format openoffice | org.apache.tika.parser.odf | OpenOfficeParser |
| portable Document Format(PDF) | org.apache.tika.parser.pdf and this package uses Apache PdfBox library | PDFParser |
| Electronic Publication Format (digital books) | org.apache.tika.parser.epub | EpubParser |
| Rich Text format | org.apache.tika.parser.rtf | RTFParser |
| Compression and packaging formats | org.apache.tika.parser.pkg and this package uses Common compress library | PackageParser and CompressorParser and its sub-classes |
| Text format | org.apache.tika.parser.txt | TXTParser |
| Feed and syndication formats | org.apache.tika.parser.feed | FeedParser |
| Audio formats | org.apache.tika.parser.audio and org.apache.tika.parser.mp3 | AudioParser MidiParser Mp3- for mp3parser |
| Imageparsers | org.apache.tika.parser.jpeg | JpegParser-for jpeg images |
| Videoformats | org.apache.tika.parser.mp4 and org.apache.tika.parser.video this parser internally uses Simple Algorithm to parse flash video formats | Mp4parser FlvParser |
| java class files and jar files | org.apache.tika.parser.asm | ClassParser CompressorParser |
| Mobxformat (email messages) | org.apache.tika.parser.mbox | MobXParser |
| Cad formats | org.apache.tika.parser.dwg | DWGParser |
| FontFormats | org.apache.tika.parser.font | TrueTypeParser |
| executable programs and libraries | org.apache.tika.parser.executable | ExecutableParser |
Advertisements