TIKA - Referenced API



Users can embed Tika in their applications using the Tika facade class. It has methods to explore all the functionalities of Tika. Since it is a facade class, Tika abstracts the complexity behind its functions. In addition to this, users can also use the various classes of Tika in their applications.

User Application

Tika Class (facade)

This is the most prominent class of the Tika library and follows the facade design pattern. Therefore, it abstracts all the internal implementations and provides simple methods to access the Tika functionalities. The following table lists the constructors of this class along with their descriptions.

package − org.apache.tika

class − Tika

Sr.No. Constructor & Description
1

Tika ()

Uses default configuration and constructs the Tika class.

2

Tika (Detector detector)

Creates a Tika facade by accepting the detector instance as parameter

3

Tika (Detector detector, Parser parser)

Creates a Tika facade by accepting the detector and parser instances as parameters.

4

Tika (Detector detector, Parser parser, Translator translator)

Creates a Tika facade by accepting the detector, the parser, and the translator instance as parameters.

5

Tika (TikaConfig config)

Creates a Tika facade by accepting the object of the TikaConfig class as parameter.

Methods and Description

The following are the important methods of Tika facade class −

Sr.No. Methods & Description
1

parseToString (File file)

This method and all its variants parses the file passed as parameter and returns the extracted text content in the String format. By default, the length of this string parameter is limited.

2

int getMaxStringLength ()

Returns the maximum length of strings returned by the parseToString methods.

3

void setMaxStringLength (int maxStringLength)

Sets the maximum length of strings returned by the parseToString methods.

4

Reader parse (File file)

This method and all its variants parses the file passed as parameter and returns the extracted text content in the form of java.io.reader object.

5

String detect (InputStream stream, Metadata metadata)

This method and all its variants accepts an InputStream object and a Metadata object as parameters, detects the type of the given document, and returns the document type name as String object. This method abstracts the detection mechanisms used by Tika.

6

String translate (InputStream text, String targetLanguage)

This method and all its variants accepts the InputStream object and a String representing the language that we want our text to be translated, and translates the given text to the desired language, attempting to auto-detect the source language.

Parser Interface

This is the interface that is implemented by all the parser classes of Tika package.

package − org.apache.tika.parser

Interface − Parser

Methods and Description

The following is the important method of Tika Parser interface −

Sr.No. Methods & Description
1

parse (InputStream stream, ContentHandler handler, Metadata metadata, ParseContext context)

This method parses the given document into a sequence of XHTML and SAX events. After parsing, it places the extracted document content in the object of the ContentHandler class and the metadata in the object of the Metadata class.

Metadata Class

This class implements various interfaces such as CreativeCommons, Geographic, HttpHeaders, Message, MSOffice, ClimateForcast, TIFF, TikaMetadataKeys, TikaMimeKeys, Serializable to support various data models. The following tables list the constructors and methods of this class along with their descriptions.

package − org.apache.tika.metadata

class − Metadata

Sr.No. Constructor & Description
1

Metadata()

Constructs a new, empty metadata.

Sr.No. Methods & Description
1

add (Property property, String value)

Adds a metadata property/value mapping to a given document. Using this function, we can set the value to a property.

2

add (String name, String value)

Adds a metadata property/value mapping to a given document. Using this method, we can set a new name value to the existing metadata of a document.

3

String get (Property property)

Returns the value (if any) of the metadata property given.

4

String get (String name)

Returns the value (if any) of the metadata name given.

5

Date getDate (Property property)

Returns the value of Date metadata property.

6

String[] getValues (Property property)

Returns all the values of a metadata property.

7

String[] getValues (String name)

Returns all the values of a given metadata name.

8

String[] names()

Returns all the names of metadata elements in a metadata object.

9

set (Property property, Date date)

Sets the date value of the given metadata property

10

set(Property property, String[] values)

Sets multiple values to a metadata property.

Language Identifier Class

This class identifies the language of the given content. The following tables list the constructors of this class along with their descriptions.

package − org.apache.tika.language

class − Language Identifier

Sr.No. Constructor & Description
1

LanguageIdentifier (LanguageProfile profile)

Instantiates the language identifier. Here you have to pass a LanguageProfile object as parameter.

2

LanguageIdentifier (String content)

This constructor can instantiate a language identifier by passing on a String from text content.

Sr.No. Methods & Description
1

String getLanguage ()

Returns the language given to the current LanguageIdentifier object.

Advertisements