Predicting Stock Price Direction using Support Vector Machines

Machine Learning Artificial Intelligence Gadgets

In this article we are going to learn how to predict stock price direction using Support Vector Machines.

Machine Learning is an Artificial Intelligence application that is improving the way the world functions in every discipline. At its essence, it is an algorithm or model that identifies patterns in a specific data collection and then predicts the learned patterns on generic data. In layman's words, it's the concept that robots learn a pattern and adjust through experience to make correct and repeatable conclusions. In this post, we will look into Predicting Stock Price Direction Using Support Vector Machines. Let’s begin.

Installing libraries and importing them

In the first step we just need to install the libraries and import them.

!pip install pandas
!pip install numpy
! pip install scikit-learn
import pandas as pd
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import warnings

Downloading and reading stock dataset

Reading the dataset from the file is the next job. You can download the dataset from here, and the file will be in external storage. We are using pandas to read the dataset.

Example

df = pd.read_csv('/content/sample_data/RELIANCE.csv')
df.head()

Output

Date	Symbol	Series	Prev Close	Open	High	Low	Last	Close	VWAP	Volume	Turnover	Trades	Deliverable Volume	%Deliverble
0	2000-01-03	RELIANCE	EQ	233.05	237.50	251.70	237.50	251.70	251.70	249.37	4456424	1.111319e+14	NaN	NaN	NaN
1	2000-01-04	RELIANCE	EQ	251.70	258.40	271.85	251.30	271.85	271.85	263.52	9487878	2.500222e+14	NaN	NaN	NaN
2	2000-01-05	RELIANCE	EQ	271.85	256.65	287.90	256.65	286.75	282.50	274.79	26833684	7.373697e+14	NaN	NaN	NaN
3	2000-01-06	RELIANCE	EQ	282.50	289.00	300.70	289.00	293.50	294.35	295.45	15682286	4.633254e+14	NaN	NaN	NaN
4	2000-01-07	RELIANCE	EQ	294.35	295.00	317.90	293.00	314.50	314.55	308.91	19870977	6.138388e+14	NaN	NaN	NaN

Data Preparation

The date column should function as an index in order to analyze the data before usage.

Example

# Changes The Date column as index columns
df.index = pd.to_datetime(df['Date'])
df
  
# drop The original date column
df = df.drop(['Date'], axis='columns')
df

Output

	Symbol	Series	Prev Close	Open	High	Low	Last	Close	VWAP	Volume	Turnover	Trades	Deliverable Volume	%Deliverble
Date														
2000-01-03	RELIANCE	EQ	233.05	237.50	251.70	237.50	251.70	251.70	249.37	4456424	1.111319e+14	NaN	NaN	NaN
2000-01-04	RELIANCE	EQ	251.70	258.40	271.85	251.30	271.85	271.85	263.52	9487878	2.500222e+14	NaN	NaN	NaN
2000-01-05	RELIANCE	EQ	271.85	256.65	287.90	256.65	286.75	282.50	274.79	26833684	7.373697e+14	NaN	NaN	NaN
2000-01-06	RELIANCE	EQ	282.50	289.00	300.70	289.00	293.50	294.35	295.45	15682286	4.633254e+14	NaN	NaN	NaN
2000-01-07	RELIANCE	EQ	294.35	295.00	317.90	293.00	314.50	314.55	308.91	19870977	6.138388e+14	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2020-05-22	RELIANCE	EQ	1441.25	1451.80	1458.00	1426.50	1433.00	1431.55	1442.31	17458503	2.518059e+15	388907.0	4083814.0	0.2339
2020-05-26	RELIANCE	EQ	1431.55	1448.15	1449.70	1416.30	1426.00	1424.05	1428.70	15330793	2.190317e+15	341795.0	7437964.0	0.4852
2020-05-27	RELIANCE	EQ	1424.05	1431.00	1454.00	1412.00	1449.85	1445.55	1430.20	16460764	2.354223e+15	348477.0	6524302.0	0.3964
2020-05-28	RELIANCE	EQ	1445.55	1455.00	1479.75	1449.00	1471.05	1472.25	1467.50	18519252	2.717698e+15	405603.0	8377100.0	0.4523
2020-05-29	RELIANCE	EQ	1472.25	1468.00	1472.00	1452.65	1470.00	1464.40	1462.79	18471770	2.702029e+15	300018.0	10292573.0	0.5572

Explanatory factors

The value response variable is predicted using explanatory or independent factors. The variables that are utilized for prediction are stored in the X dataset. Variables like "Open-Close" and "High-Low" are part of the X. These can be viewed as markers that the algorithm will use to forecast the trend for the upcoming day. Feel free to include more metrics and assess the results.

Example

# Create predictor variables
df['Open-Close'] = df.Open - df.Close
df['High-Low'] = df.High - df.Low
  
# Store all predictor variables in a variable X
X = df[['Open-Close', 'High-Low']]
X.head()

Output

	Open-Close	High-Low
Date		
2000-01-03	-14.20	14.20
2000-01-04	-13.45	20.55
2000-01-05	-25.85	31.25
2000-01-06	-5.35	11.70
2000-01-07	-19.55	24.90

Targeting variable

The target dataset y contains the appropriate trade signal, which the machine learning algorithm will try to predict.

y = np.where(df['Close'].shift(-1) > df['Close'], 1, 0)

Splitting the data into train and test

There will be distinct data sets for training and testing.

split_percentage = 0.8
split = int(split_percentage*len(df))
# Train data set
X_train = X[:split]
y_train = y[:split]
# Test data set
X_test = X[split:]
y_test = y[split:]

Support Vector Classifier

Now it’s time use support vector classifier.

Example

cls = SVC().fit(X_train, y_train)
df['prediction'] = cls.predict(X)
print(df['prediction'])

Output

Date
2000-01-03    1
2000-01-04    1
2000-01-05    1
2000-01-06    1
2000-01-07    1
             ..
2020-05-22    1
2020-05-26    1
2020-05-27    1
2020-05-28    1
2020-05-29    1
Name: prediction, Length: 5075, dtype: int64

Conclusion

Support Vector Machine, a well-liked and space-effective approach for classification and regression applications, leverages geometrical concepts to address our issues. We also used the SVM algorithm to forecast the direction of stock price movement. In the corporate sector, stock price forecasting is quite important, and when we automate this process, it raises awareness of the issue.

Jay Singh

Updated on: 01-Dec-2022

967 Views

Kickstart Your Career

Get certified by completing the course

Get Started