- Trending Categories
- Data Structure
- Operating System
- C Programming
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How can the ‘Word2Vec’ algorithm be trained using Tensorflow?
Tensorflow is a machine learning framework that is provided by Google. It is an open−source framework used in conjunction with Python to implement algorithms, deep learning applications and much more. It is used in research and for production purposes. It has optimization techniques that help in performing complicated mathematical operations quickly.
This is because it uses NumPy and multi−dimensional arrays. These multi−dimensional arrays are also known as ‘tensors’. The framework supports working with deep neural network. It is highly scalable, and comes with many popular datasets. It uses GPU computation and automates the management of resources.
The ‘tensorflow’ package can be installed on Windows using the below line of code −
pip install tensorflow
Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram. This flow diagram is known as the ‘Data flow graph’. Tensors are nothing but multidimensional array or a list.
The below code uses an article from Wikipedia to train the model. It helps understand word embeddings. Word embeddings refer to the representation of being able to capture the context of a specific word in a document, its relation with other words, its syntactic similarity, and so on. They are in the form of vectors. These word vectors can be learnt using the technique Word2Vec.
Following is an example −
from __future__ import division, print_function, absolute_import import collections import os import random import urllib import zipfile import numpy as np import tensorflow as tf learning_rate = 0.11 batch_size = 128 num_steps = 3000000 display_step = 10000 eval_step = 200000 eval_words = ['eleven', 'the', 'going', 'good', 'american', 'new york'] embedding_size = 200 # Dimension of embedding vector. max_vocabulary_size = 50000 # Total words in the vocabulary. min_occurrence = 10 # Remove words that don’t appear at least n times. skip_window = 3 # How many words to consider from left and right. num_skips = 2 # How many times to reuse the input to generate a label. num_sampled = 64 # Number of negative examples that need to be sampled. url = 'http://mattmahoney.net/dc/text8.zip' data_path = 'text8.zip' if not os.path.exists(data_path): print("Downloading the dataset... (It may take some time)") filename, _ = urllib.request.urlretrieve(url, data_path) print("Th data has been downloaded") with zipfile.ZipFile(data_path) as f: text_words = f.read(f.namelist()).lower().split() count = [('RARE', −1)] count.extend(collections.Counter(text_words).most_common(max_vocabulary_size − 1)) for i in range(len(count) − 1, −1, −1): if count[i] < min_occurrence: count.pop(i) else: break vocabulary_size = len(count) word2id = dict() for i, (word, _)in enumerate(count): word2id[word] = i data = list() unk_count = 0 for word in text_words: index = word2id.get(word, 0) if index == 0: unk_count += 1 data.append(index) count = ('RARE', unk_count) id2word = dict(zip(word2id.values(), word2id.keys())) print("Word count is :", len(text_words)) print("Unique words:", len(set(text_words))) print("Vocabulary size:", vocabulary_size) print("Most common words:", count[:8])
Word count is : 17005207 Unique words: 253854 Vocabulary size: 47135 Most common words: [('RARE', 444176), (b'the', 1061396), (b'of', 593677), (b'and', 416629), (b'one', 411764), (b'in', 372201), (b'a', 325873), (b'to', 316376)]
The required packages are imported and aliased.
The learning parameters, evaluation parameters, and word2vec parameters are defined.
The data is loaded, and uncompressed.
The rare words are assigned a label of ‘−1’.
The words in the data file are iterated over, and the total number of words, size of vocabulary and common words are displayed on the console.
- How can Tensorflow and pre-trained model be used to compile the model using Python?
- How can Tensorflow and pre-trained model be used to visualize the data using Python?
- How can Tensorflow and pre-trained model be used to continue training the model using Python?
- How can Tensorflow and re-trained model be used for data augmentation?
- How can Tensorflow and pre-trained model be used for feature extraction?
- How can Tensorflow and pre-trained model be used for fine tuning?
- How can Tensorflow be used to extract features with the help of pre-trained model using Python?
- How can Tensorflow and pre-trained model be used to create base model from pre-trained convnets?
- How can Tensorflow and pre-trained model be used to understand the learning curve?
- How can Tensorflow be used with Estimator to make predictions from trained model?
- How can Tensorflow be used with pre-trained model to rescale pixel values?
- How can Tensorflow and pre-trained model be used for evaluation and prediction of data using Python?
- How can Tensorflow and pre-trained model be used to configure the dataset for performance?
- How can Tensorflow and pre-trained model after recompiling be used to visualize the data?
- How can Tensorflow be used with pre-trained model to build the training and validation dataset?