OpenNLP - Overview



NLP is a set of tools used to derive meaningful and useful information from natural language sources such as web pages and text documents.

What is Open NLP?

Apache OpenNLP is an open-source Java library which is used to process natural language text. You can build an efficient text processing service using this library.

OpenNLP provides services such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and co-reference resolution, etc.

Features of OpenNLP

Following are the notable features of OpenNLP −

  • Named Entity Recognition (NER) − Open NLP supports NER, using which you can extract names of locations, people and things even while processing queries.

  • Summarize − Using the summarize feature, you can summarize Paragraphs, articles, documents or their collection in NLP.

  • Searching − In OpenNLP, a given search string or its synonyms can be identified in given text, even though the given word is altered or misspelled.

  • Tagging (POS) − Tagging in NLP is used to divide the text into various grammatical elements for further analysis.

  • Translation − In NLP, Translation helps in translating one language into another.

  • Information grouping − This option in NLP groups the textual information in the content of the document, just like Parts of speech.

  • Natural Language Generation − It is used for generating information from a database and automating the information reports such as weather analysis or medical reports.

  • Feedback Analysis − As the name implies, various types of feedbacks from people are collected, regarding the products, by NLP to analyze how well the product is successful in winning their hearts.

  • Speech recognition − Though it is difficult to analyze human speech, NLP has some builtin features for this requirement.

Open NLP API

The Apache OpenNLP library provides classes and interfaces to perform various tasks of natural language processing such as sentence detection, tokenization, finding a name, tagging the parts of speech, chunking a sentence, parsing, co-reference resolution, and document categorization.

In addition to these tasks, we can also train and evaluate our own models for any of these tasks.

OpenNLP CLI

In addition to the library, OpenNLP also provides a Command Line Interface (CLI), where we can train and evaluate models. We will discuss this topic in detail in the last chapter of this tutorial.

OpenNLP CLI

Open NLP Models

To perform various NLP tasks, OpenNLP provides a set of predefined models. This set includes models for different languages.

Downloading the models

You can follow the steps given below to download the predefined models provided by OpenNLP.

Step 1 − Open the index page of OpenNLP models by clicking the following link − http://opennlp.sourceforge.net/models-1.5/.

OpenNLP Models

Step 2 − On visiting the given link, you will get to see a list of components of various languages and the links to download them. Here, you can get the list of all the predefined models provided by OpenNLP.

Predefined Models

Download all these models to the folder C:/OpenNLP_models/>, by clicking on their respective links. All these models are language dependent and while using these, you have to make sure that the model language matches with the language of the input text.

History of OpenNLP

  • In 2010, OpenNLP entered the Apache incubation.

  • In 2011, Apache OpenNLP 1.5.2 Incubating was released, and in the same year, it graduated as a top-level Apache project.

  • In 2015, OpenNLP was 1.6.0 released.

Advertisements