PyBrain - Working with Datasets



Datasets is an input data to be given to test, validate and train networks. The type of dataset to be used depends on the tasks that we are going to do with Machine Learning. In this chapter, we are going to take a look at the following −

  • Creating Dataset
  • Adding Data to Dataset

We will first learn how to create a Dataset and test the dataset with the input given.

Creating Dataset

To create a dataset we need to use the pybrain dataset package: pybrain.datasets.

Pybrain supports datasets classes like SupervisedDataset, SequentialDataset, ClassificationDataSet. We are going to make use of SupervisedDataset , to create our dataset.The dataset to be used depends on the machine learning task that user is trying to implement.SupervisedDataset is the simplest one and we are going to use the same over here.

A SupervisedDataset dataset needs params input and target. Consider an XOR truth table, as shown below −

A B A XOR B
0 0 0
0 1 1
1 0 1
1 1 0

The inputs that are given are like a 2-dimensional array and we get 1 output. So here the input becomes the size and the target it the output which is 1. So the inputs that will go for our dataset will 2,1.

createdataset.py

from pybrain.datasets import SupervisedDataSet
sds = SupervisedDataSet(2, 1)
print(sds)

This is what we get when we execute above code python createdataset.py −

C:\pybrain\pybrain\src>python createdataset.py
input: dim(0, 2)
[]
target: dim(0, 1)
[]

It displays the input of size 2 and target of size 1 as shown above.

Adding Data to Dataset

Let us now add the sample data to the dataset.

createdataset.py

from pybrain.datasets import SupervisedDataSet
sds = SupervisedDataSet(2, 1)
xorModel = [
   [(0,0), (0,)],
   [(0,1), (1,)],
   [(1,0), (1,)],
   [(1,1), (0,)],
]
for input, target in xorModel:
sds.addSample(input, target)
print("Input is:")
print(sds['input'])
print("\nTarget is:")
print(sds['target'])

We have created a XORModel array as shown below −

xorModel = [
   [(0,0), (0,)],
   [(0,1), (1,)],
   [(1,0), (1,)],
   [(1,1), (0,)],
]

To add data to the dataset, we are using addSample() method which takes in input and target.

To add data to the addSample, we will loop through xorModel array as shown below −

for input, target in xorModel:
   sds.addSample(input, target)

After executing, the following is the output we get −

python createdataset.py

C:\pybrain\pybrain\src>python createdataset.py
Input is:
[[0. 0.]
[0. 1.]
[1. 0.]
[1. 1.]]
Target is:
[[0.]
[1.]
[1.]
[0.]]

You can get the input and target details from the dataset created by simply using the input and target index as shown below −

print(sds['input'])
print(sds[‘target’])
Advertisements