- Trending Categories
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
What are the techniques of Text Indexing?
There are several popular text retrievals indexing techniques such as inverted indices and signature files.
Inverted Index − An inverted index is an index structure that maintains two hash indexed or B+-tree indexed tables: document_table and term_table, where document_table consists of a set of document records, each including two fields: doc_id and posting_list, where posting_list is a list of methods (or pointers to methods) that appears in the document, arranged according to some relevance measure.
term_table includes a set of term records, each including two fields: term_id and posting_list, where posting_list specifies a list of records identifiers in which the term occurs.
It can find all of the documents associated with a given set of terms. It is used to find all of the terms associated with a given set of documents. For example, it can find all of the documents associated with a set of terms, we can first find a list of document identifiers in the term table for each term, and then intersect them to obtain the collection of relevant records.
Inverted indices are broadly used in the market. They are simple to execute. The posting lists can be rather long, creating the storage requirement quite large. They are simple to implement but are not satisfactory at managing synonymy (where two very different words can have equal meaning) and polysemy (where a single word can have several meanings).
A signature file is a file that saves signature data for each record in the database.Each signature has a constant size of b bits defining terms. A simple encoding design goes as follows. Each bit of a record signature is started to 0.
A bit is set to 1 if the term it defines appears in the records. A signature S1 matches another signature S2 if each bit that is set in signature S2 is also set in S1. Because there are generally more terms than available bits, several terms can be mapped into a similar bit.
Such multiple-to-one mappings create the search expensive because a record that connects the signature of a query does not necessarily include the set of keywords of the query. The records have to be retrieved, parsed, stemmed and tested.Improvements can be created by first implementing frequency analysis, stemming,and filtering stop words, and then utilizing hashing methods and superimposed coding techniques to encode the list of methods into bit representation.
- Related Articles
- What are the techniques of Text Mining?
- What are the techniques of Text Steganography in Information Security?
- What are the techniques of Steganalysis?
- What are the techniques of Dimensional Modeling?
- What are the techniques of data mining?
- What are the techniques of data Encryption?
- What are the techniques of Monoalphabetic Cipher?
- What are the techniques of spatial domain watermarking?
- What are the Database recovery techniques?
- What Are Some of the Techniques of Modern Management?
- What are the Techniques for Monitoring of Accounts Receivables?
- What are the techniques of database security in information security?
- What are the general techniques to improve the quality of service?
- Indexing large text field to make query faster in MongoDB
- What are the techniques to avoid Pipelining Conflicts?
