Load NumPy data in Tensorflow

TensorFlow provides seamless integration with NumPy arrays through the tf.data.Dataset.from_tensor_slices() function. This allows you to convert NumPy arrays into TensorFlow datasets, enabling efficient data processing and model training.

Prerequisites

Make sure that your Python environment has NumPy and TensorFlow installed

pip install numpy tensorflow

Basic NumPy Array Loading

The simplest way to load NumPy data into TensorFlow is using tf.data.Dataset.from_tensor_slices() ?

import numpy as np
import tensorflow as tf

# Create a NumPy array
numpy_data = np.array([1, 2, 3, 4, 5])

# Load the NumPy data into TensorFlow
tensor_dataset = tf.data.Dataset.from_tensor_slices(numpy_data)

# Print the TensorFlow dataset
for element in tensor_dataset:
    print(element)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(4, shape=(), dtype=int64)
tf.Tensor(5, shape=(), dtype=int64)

Loading Multi-Dimensional Arrays

For multi-dimensional arrays, the process remains the same. Each slice along the first dimension becomes a dataset element ?

import numpy as np
import tensorflow as tf

# Create a 2D NumPy array
numpy_data = np.array([[1, 2], [3, 4], [5, 6]])

# Load the NumPy data into TensorFlow
tensor_dataset = tf.data.Dataset.from_tensor_slices(numpy_data)

# Print the TensorFlow dataset
for element in tensor_dataset:
    print(element)
tf.Tensor([1 2], shape=(2,), dtype=int64)
tf.Tensor([3 4], shape=(2,), dtype=int64)
tf.Tensor([5 6], shape=(2,), dtype=int64)

Loading Features and Labels

You can load multiple NumPy arrays simultaneously, which is useful for features and labels ?

import numpy as np
import tensorflow as tf

# Create feature and label arrays
features = np.array([[1, 2], [3, 4], [5, 6]])
labels = np.array(['A', 'B', 'C'])

# Load the NumPy data into TensorFlow
tensor_dataset = tf.data.Dataset.from_tensor_slices((features, labels))

# Print the TensorFlow dataset
for feature, label in tensor_dataset:
    print(f'Feature: {feature}, Label: {label}')
Feature: [1 2], Label: b'A'
Feature: [3 4], Label: b'B'
Feature: [5 6], Label: b'C'

Data Preprocessing Operations

Batching Data

Use batch() to group elements into batches for efficient processing ?

import numpy as np
import tensorflow as tf

# Create a NumPy array
numpy_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Load with batching
tensor_dataset = tf.data.Dataset.from_tensor_slices(numpy_data).batch(3)

# Print batches
for batch in tensor_dataset:
    print(batch)
tf.Tensor([1 2 3], shape=(3,), dtype=int64)
tf.Tensor([4 5 6], shape=(3,), dtype=int64)
tf.Tensor([7 8 9], shape=(3,), dtype=int64)

Shuffling Data

Shuffle your data to prevent the model from learning the order of training examples ?

import numpy as np
import tensorflow as tf

# Create a NumPy array
numpy_data = np.array([1, 2, 3, 4, 5, 6])

# Load with shuffling
tensor_dataset = tf.data.Dataset.from_tensor_slices(numpy_data).shuffle(buffer_size=6)

# Print shuffled elements
for element in tensor_dataset:
    print(element)
tf.Tensor(4, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
tf.Tensor(6, shape=(), dtype=int64)
tf.Tensor(3, shape=(), dtype=int64)
tf.Tensor(2, shape=(), dtype=int64)
tf.Tensor(5, shape=(), dtype=int64)

Combined Operations

Chain multiple operations together for a complete preprocessing pipeline ?

import numpy as np
import tensorflow as tf

# Create a NumPy array
numpy_data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

# Load with shuffling and batching
tensor_dataset = (tf.data.Dataset.from_tensor_slices(numpy_data)
                  .shuffle(buffer_size=9)
                  .batch(3))

# Print batched and shuffled data
for batch in tensor_dataset:
    print(batch)
tf.Tensor([7 2 4], shape=(3,), dtype=int64)
tf.Tensor([1 9 6], shape=(3,), dtype=int64)
tf.Tensor([8 3 5], shape=(3,), dtype=int64)

Best Practices

Operation Purpose Recommendation
shuffle() Randomize data order Buffer size ? dataset size
batch() Group elements efficiently Use appropriate batch size for memory
prefetch() Pipeline data loading Use tf.data.AUTOTUNE

Conclusion

Loading NumPy data into TensorFlow using tf.data.Dataset.from_tensor_slices() provides a powerful way to create efficient data pipelines. Combine operations like shuffling and batching to optimize your machine learning workflows.

Updated on: 2026-03-27T08:24:08+05:30

535 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements