- Trending Categories
- Data Structure
- Operating System
- MS Excel
- C Programming
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How can the preprocessed data be shuffled using Tensorflow and Python?
Tensorflow is a machine learning framework that is provided by Google. It is an open-source framework used in conjunction with Python to implement algorithms, deep learning applications, and much more. It is used in research and for production purposes. It has optimization techniques that help in performing complicated mathematical operations quickly. This is because it uses NumPy and multi-dimensional arrays. These multi-dimensional arrays are also known as ‘tensors’. The framework supports working with a deep neural network.
The ‘tensorflow’ package can be installed on Windows using the below line of code −
pip install tensorflow
Tensor is a data structure used in TensorFlow. It helps connect edges in a flow diagram. This flow diagram is known as the ‘Data flow graph’. Tensors are nothing but a multidimensional array or a list.
We will be using the Illiad’s dataset, which contains text data of three translation works from William Cowper, Edward (Earl of Derby) and Samuel Butler. The model is trained to identify the translator when a single line of text is given. The text files used have been preprocessing. This includes removing the document header and footer, line numbers and chapter titles.
We are using the Google Colaboratory to run the below code. Google Colab or Colaboratory helps run Python code over the browser and requires zero configuration and free access to GPUs (Graphical Processing Units). Colaboratory has been built on top of Jupyter Notebook.
Following is the code snippet −
print("Combine the labelled dataset and reshuffle it") BUFFER_SIZE = 50000 BATCH_SIZE = 64 VALIDATION_SIZE = 5000 all_labeled_data = labeled_data_sets for labeled_dataset in labeled_data_sets[1:]: all_labeled_data = all_labeled_data.concatenate(labeled_dataset) all_labeled_data = all_labeled_data.shuffle( BUFFER_SIZE, reshuffle_each_iteration=False) print("Displaying a few samples of input data") for text, label in all_labeled_data.take(8): print("The sentence is : ", text.numpy()) print("The label is :", label.numpy())
Code credit − https://www.tensorflow.org/tutorials/load_data/text
Combine the labelled dataset and reshuffle it Displaying a few samples of input data The sentence is : b'But I have now both tasted food, and given' The label is : 0 The sentence is : b'All these shall now be thine: but if the Gods' The label is : 1 The sentence is : b'Their spiry summits waved. There, unperceived' The label is : 0 The sentence is : b'"I pray you, would you show your love, dear friends,' The label is : 1 The sentence is : b'Entering beneath the clavicle the point' The label is : 0 The sentence is : b'But grief, his father lost, awaits him now,' The label is : 1 The sentence is : b'in the fore-arm where the sinews of the elbow are united, whereon he' The label is : 2 The sentence is : b'For, as I think, I have already chased' The label is : 0
After preprocessing the data, a few samples from the dataset are displayed on the console.
The data is not grouped, which means every entry in the ‘all_labeled_data’ maps to one data point.
Kickstart Your Career
Get certified by completing the courseGet Started