Predicting Stock Price Direction using Support Vector Machines

In this article we are going to learn how to predict stock price direction using Support Vector Machines.

Machine Learning is an Artificial Intelligence application that is improving the way the world functions in every discipline. At its essence, it is an algorithm or model that identifies patterns in a specific data collection and then predicts the learned patterns on generic data. In layman's words, it's the concept that robots learn a pattern and adjust through experience to make correct and repeatable conclusions. In this post, we will look into Predicting Stock Price Direction Using Support Vector Machines. Let?s begin.

Installing libraries and importing them

In the first step we just need to install the libraries and import them.

<div class="code-mirror  language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;">!pip install pandas
!pip install numpy
! pip install scikit<span class="token operator">-</span>learn
<span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd
<span class="token keyword">import</span> numpy <span class="token keyword">as</span> np
<span class="token keyword">from</span> sklearn<span class="token punctuation">.</span>svm <span class="token keyword">import</span> SVC
<span class="token keyword">from</span> sklearn<span class="token punctuation">.</span>metrics <span class="token keyword">import</span> accuracy_score
<span class="token keyword">import</span> matplotlib<span class="token punctuation">.</span>pyplot <span class="token keyword">as</span> plt
<span class="token keyword">import</span> warnings
</div>

Downloading and reading stock dataset

Reading the dataset from the file is the next job. You can download the dataset from here, and the file will be in external storage. We are using pandas to read the dataset.

Example

<div class="code-mirror  language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;">df <span class="token operator">=</span> pd<span class="token punctuation">.</span>read_csv<span class="token punctuation">(</span><span class="token string">'/content/sample_data/RELIANCE.csv'</span><span class="token punctuation">)</span>
df<span class="token punctuation">.</span>head<span class="token punctuation">(</span><span class="token punctuation">)</span>
</div>

Output

Date	Symbol	Series	Prev Close	Open	High	Low	Last	Close	VWAP	Volume	Turnover	Trades	Deliverable Volume	%Deliverble
0	2000-01-03	RELIANCE	EQ	233.05	237.50	251.70	237.50	251.70	251.70	249.37	4456424	1.111319e+14	NaN	NaN	NaN
1	2000-01-04	RELIANCE	EQ	251.70	258.40	271.85	251.30	271.85	271.85	263.52	9487878	2.500222e+14	NaN	NaN	NaN
2	2000-01-05	RELIANCE	EQ	271.85	256.65	287.90	256.65	286.75	282.50	274.79	26833684	7.373697e+14	NaN	NaN	NaN
3	2000-01-06	RELIANCE	EQ	282.50	289.00	300.70	289.00	293.50	294.35	295.45	15682286	4.633254e+14	NaN	NaN	NaN
4	2000-01-07	RELIANCE	EQ	294.35	295.00	317.90	293.00	314.50	314.55	308.91	19870977	6.138388e+14	NaN	NaN	NaN

Data Preparation

The date column should function as an index in order to analyze the data before usage.

Example

<div class="code-mirror  language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token comment"># Changes The Date column as index columns</span>
df<span class="token punctuation">.</span>index <span class="token operator">=</span> pd<span class="token punctuation">.</span>to_datetime<span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token string">'Date'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
df
  
<span class="token comment"># drop The original date column</span>
df <span class="token operator">=</span> df<span class="token punctuation">.</span>drop<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token string">'Date'</span><span class="token punctuation">]</span><span class="token punctuation">,</span> axis<span class="token operator">=</span><span class="token string">'columns'</span><span class="token punctuation">)</span>
df
</div>

Output

	Symbol	Series	Prev Close	Open	High	Low	Last	Close	VWAP	Volume	Turnover	Trades	Deliverable Volume	%Deliverble
Date														
2000-01-03	RELIANCE	EQ	233.05	237.50	251.70	237.50	251.70	251.70	249.37	4456424	1.111319e+14	NaN	NaN	NaN
2000-01-04	RELIANCE	EQ	251.70	258.40	271.85	251.30	271.85	271.85	263.52	9487878	2.500222e+14	NaN	NaN	NaN
2000-01-05	RELIANCE	EQ	271.85	256.65	287.90	256.65	286.75	282.50	274.79	26833684	7.373697e+14	NaN	NaN	NaN
2000-01-06	RELIANCE	EQ	282.50	289.00	300.70	289.00	293.50	294.35	295.45	15682286	4.633254e+14	NaN	NaN	NaN
2000-01-07	RELIANCE	EQ	294.35	295.00	317.90	293.00	314.50	314.55	308.91	19870977	6.138388e+14	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2020-05-22	RELIANCE	EQ	1441.25	1451.80	1458.00	1426.50	1433.00	1431.55	1442.31	17458503	2.518059e+15	388907.0	4083814.0	0.2339
2020-05-26	RELIANCE	EQ	1431.55	1448.15	1449.70	1416.30	1426.00	1424.05	1428.70	15330793	2.190317e+15	341795.0	7437964.0	0.4852
2020-05-27	RELIANCE	EQ	1424.05	1431.00	1454.00	1412.00	1449.85	1445.55	1430.20	16460764	2.354223e+15	348477.0	6524302.0	0.3964
2020-05-28	RELIANCE	EQ	1445.55	1455.00	1479.75	1449.00	1471.05	1472.25	1467.50	18519252	2.717698e+15	405603.0	8377100.0	0.4523
2020-05-29	RELIANCE	EQ	1472.25	1468.00	1472.00	1452.65	1470.00	1464.40	1462.79	18471770	2.702029e+15	300018.0	10292573.0	0.5572

Explanatory factors

The value response variable is predicted using explanatory or independent factors. The variables that are utilized for prediction are stored in the X dataset. Variables like "Open-Close" and "High-Low" are part of the X. These can be viewed as markers that the algorithm will use to forecast the trend for the upcoming day. Feel free to include more metrics and assess the results.

Example

<div class="code-mirror  language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token comment"># Create predictor variables</span>
df<span class="token punctuation">[</span><span class="token string">'Open-Close'</span><span class="token punctuation">]</span> <span class="token operator">=</span> df<span class="token punctuation">.</span>Open <span class="token operator">-</span> df<span class="token punctuation">.</span>Close
df<span class="token punctuation">[</span><span class="token string">'High-Low'</span><span class="token punctuation">]</span> <span class="token operator">=</span> df<span class="token punctuation">.</span>High <span class="token operator">-</span> df<span class="token punctuation">.</span>Low
  
<span class="token comment"># Store all predictor variables in a variable X</span>
X <span class="token operator">=</span> df<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">'Open-Close'</span><span class="token punctuation">,</span> <span class="token string">'High-Low'</span><span class="token punctuation">]</span><span class="token punctuation">]</span>
X<span class="token punctuation">.</span>head<span class="token punctuation">(</span><span class="token punctuation">)</span>
</div>

Output

	Open-Close	High-Low
Date		
2000-01-03	-14.20	14.20
2000-01-04	-13.45	20.55
2000-01-05	-25.85	31.25
2000-01-06	-5.35	11.70
2000-01-07	-19.55	24.90

Targeting variable

The target dataset y contains the appropriate trade signal, which the machine learning algorithm will try to predict.

y = np.where(df['Close'].shift(-1) > df['Close'], 1, 0)

Splitting the data into train and test

There will be distinct data sets for training and testing.

<div class="code-mirror  language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;">split_percentage <span class="token operator">=</span> <span class="token number">0.8</span>
split <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>split_percentage<span class="token operator">*</span><span class="token builtin">len</span><span class="token punctuation">(</span>df<span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token comment"># Train data set</span>
X_train <span class="token operator">=</span> X<span class="token punctuation">[</span><span class="token punctuation">:</span>split<span class="token punctuation">]</span>
y_train <span class="token operator">=</span> y<span class="token punctuation">[</span><span class="token punctuation">:</span>split<span class="token punctuation">]</span>
<span class="token comment"># Test data set</span>
X_test <span class="token operator">=</span> X<span class="token punctuation">[</span>split<span class="token punctuation">:</span><span class="token punctuation">]</span>
y_test <span class="token operator">=</span> y<span class="token punctuation">[</span>split<span class="token punctuation">:</span><span class="token punctuation">]</span>
</div>

Support Vector Classifier

Now it?s time use support vector classifier.

Example

<div class="code-mirror  language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;">cls <span class="token operator">=</span> SVC<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span>fit<span class="token punctuation">(</span>X_train<span class="token punctuation">,</span> y_train<span class="token punctuation">)</span>
df<span class="token punctuation">[</span><span class="token string">'prediction'</span><span class="token punctuation">]</span> <span class="token operator">=</span> cls<span class="token punctuation">.</span>predict<span class="token punctuation">(</span>X<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token string">'prediction'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
</div>

Output

Date
2000-01-03    1
2000-01-04    1
2000-01-05    1
2000-01-06    1
2000-01-07    1
             ..
2020-05-22    1
2020-05-26    1
2020-05-27    1
2020-05-28    1
2020-05-29    1
Name: prediction, Length: 5075, dtype: int64

Conclusion

Support Vector Machine, a well-liked and space-effective approach for classification and regression applications, leverages geometrical concepts to address our issues. We also used the SVM algorithm to forecast the direction of stock price movement. In the corporate sector, stock price forecasting is quite important, and when we automate this process, it raises awareness of the issue.

Updated on: 2022-12-01T05:34:17+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements