Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Predicting Stock Price Direction using Support Vector Machines
In this article we are going to learn how to predict stock price direction using Support Vector Machines.
Machine Learning is an Artificial Intelligence application that is improving the way the world functions in every discipline. At its essence, it is an algorithm or model that identifies patterns in a specific data collection and then predicts the learned patterns on generic data. In layman's words, it's the concept that robots learn a pattern and adjust through experience to make correct and repeatable conclusions. In this post, we will look into Predicting Stock Price Direction Using Support Vector Machines. Let?s begin.
Installing libraries and importing them
In the first step we just need to install the libraries and import them.
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;">!pip install pandas !pip install numpy ! pip install scikit<span class="token operator">-</span>learn <span class="token keyword">import</span> pandas <span class="token keyword">as</span> pd <span class="token keyword">import</span> numpy <span class="token keyword">as</span> np <span class="token keyword">from</span> sklearn<span class="token punctuation">.</span>svm <span class="token keyword">import</span> SVC <span class="token keyword">from</span> sklearn<span class="token punctuation">.</span>metrics <span class="token keyword">import</span> accuracy_score <span class="token keyword">import</span> matplotlib<span class="token punctuation">.</span>pyplot <span class="token keyword">as</span> plt <span class="token keyword">import</span> warnings </div>
Downloading and reading stock dataset
Reading the dataset from the file is the next job. You can download the dataset from here, and the file will be in external storage. We are using pandas to read the dataset.
Example
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;">df <span class="token operator">=</span> pd<span class="token punctuation">.</span>read_csv<span class="token punctuation">(</span><span class="token string">'/content/sample_data/RELIANCE.csv'</span><span class="token punctuation">)</span> df<span class="token punctuation">.</span>head<span class="token punctuation">(</span><span class="token punctuation">)</span> </div>
Output
Date Symbol Series Prev Close Open High Low Last Close VWAP Volume Turnover Trades Deliverable Volume %Deliverble 0 2000-01-03 RELIANCE EQ 233.05 237.50 251.70 237.50 251.70 251.70 249.37 4456424 1.111319e+14 NaN NaN NaN 1 2000-01-04 RELIANCE EQ 251.70 258.40 271.85 251.30 271.85 271.85 263.52 9487878 2.500222e+14 NaN NaN NaN 2 2000-01-05 RELIANCE EQ 271.85 256.65 287.90 256.65 286.75 282.50 274.79 26833684 7.373697e+14 NaN NaN NaN 3 2000-01-06 RELIANCE EQ 282.50 289.00 300.70 289.00 293.50 294.35 295.45 15682286 4.633254e+14 NaN NaN NaN 4 2000-01-07 RELIANCE EQ 294.35 295.00 317.90 293.00 314.50 314.55 308.91 19870977 6.138388e+14 NaN NaN NaN
Data Preparation
The date column should function as an index in order to analyze the data before usage.
Example
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token comment"># Changes The Date column as index columns</span> df<span class="token punctuation">.</span>index <span class="token operator">=</span> pd<span class="token punctuation">.</span>to_datetime<span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token string">'Date'</span><span class="token punctuation">]</span><span class="token punctuation">)</span> df <span class="token comment"># drop The original date column</span> df <span class="token operator">=</span> df<span class="token punctuation">.</span>drop<span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token string">'Date'</span><span class="token punctuation">]</span><span class="token punctuation">,</span> axis<span class="token operator">=</span><span class="token string">'columns'</span><span class="token punctuation">)</span> df </div>
Output
Symbol Series Prev Close Open High Low Last Close VWAP Volume Turnover Trades Deliverable Volume %Deliverble Date 2000-01-03 RELIANCE EQ 233.05 237.50 251.70 237.50 251.70 251.70 249.37 4456424 1.111319e+14 NaN NaN NaN 2000-01-04 RELIANCE EQ 251.70 258.40 271.85 251.30 271.85 271.85 263.52 9487878 2.500222e+14 NaN NaN NaN 2000-01-05 RELIANCE EQ 271.85 256.65 287.90 256.65 286.75 282.50 274.79 26833684 7.373697e+14 NaN NaN NaN 2000-01-06 RELIANCE EQ 282.50 289.00 300.70 289.00 293.50 294.35 295.45 15682286 4.633254e+14 NaN NaN NaN 2000-01-07 RELIANCE EQ 294.35 295.00 317.90 293.00 314.50 314.55 308.91 19870977 6.138388e+14 NaN NaN NaN ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 2020-05-22 RELIANCE EQ 1441.25 1451.80 1458.00 1426.50 1433.00 1431.55 1442.31 17458503 2.518059e+15 388907.0 4083814.0 0.2339 2020-05-26 RELIANCE EQ 1431.55 1448.15 1449.70 1416.30 1426.00 1424.05 1428.70 15330793 2.190317e+15 341795.0 7437964.0 0.4852 2020-05-27 RELIANCE EQ 1424.05 1431.00 1454.00 1412.00 1449.85 1445.55 1430.20 16460764 2.354223e+15 348477.0 6524302.0 0.3964 2020-05-28 RELIANCE EQ 1445.55 1455.00 1479.75 1449.00 1471.05 1472.25 1467.50 18519252 2.717698e+15 405603.0 8377100.0 0.4523 2020-05-29 RELIANCE EQ 1472.25 1468.00 1472.00 1452.65 1470.00 1464.40 1462.79 18471770 2.702029e+15 300018.0 10292573.0 0.5572
Explanatory factors
The value response variable is predicted using explanatory or independent factors. The variables that are utilized for prediction are stored in the X dataset. Variables like "Open-Close" and "High-Low" are part of the X. These can be viewed as markers that the algorithm will use to forecast the trend for the upcoming day. Feel free to include more metrics and assess the results.
Example
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token comment"># Create predictor variables</span> df<span class="token punctuation">[</span><span class="token string">'Open-Close'</span><span class="token punctuation">]</span> <span class="token operator">=</span> df<span class="token punctuation">.</span>Open <span class="token operator">-</span> df<span class="token punctuation">.</span>Close df<span class="token punctuation">[</span><span class="token string">'High-Low'</span><span class="token punctuation">]</span> <span class="token operator">=</span> df<span class="token punctuation">.</span>High <span class="token operator">-</span> df<span class="token punctuation">.</span>Low <span class="token comment"># Store all predictor variables in a variable X</span> X <span class="token operator">=</span> df<span class="token punctuation">[</span><span class="token punctuation">[</span><span class="token string">'Open-Close'</span><span class="token punctuation">,</span> <span class="token string">'High-Low'</span><span class="token punctuation">]</span><span class="token punctuation">]</span> X<span class="token punctuation">.</span>head<span class="token punctuation">(</span><span class="token punctuation">)</span> </div>
Output
Open-Close High-Low Date 2000-01-03 -14.20 14.20 2000-01-04 -13.45 20.55 2000-01-05 -25.85 31.25 2000-01-06 -5.35 11.70 2000-01-07 -19.55 24.90
Targeting variable
The target dataset y contains the appropriate trade signal, which the machine learning algorithm will try to predict.
y = np.where(df['Close'].shift(-1) > df['Close'], 1, 0)
Splitting the data into train and test
There will be distinct data sets for training and testing.
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;">split_percentage <span class="token operator">=</span> <span class="token number">0.8</span> split <span class="token operator">=</span> <span class="token builtin">int</span><span class="token punctuation">(</span>split_percentage<span class="token operator">*</span><span class="token builtin">len</span><span class="token punctuation">(</span>df<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token comment"># Train data set</span> X_train <span class="token operator">=</span> X<span class="token punctuation">[</span><span class="token punctuation">:</span>split<span class="token punctuation">]</span> y_train <span class="token operator">=</span> y<span class="token punctuation">[</span><span class="token punctuation">:</span>split<span class="token punctuation">]</span> <span class="token comment"># Test data set</span> X_test <span class="token operator">=</span> X<span class="token punctuation">[</span>split<span class="token punctuation">:</span><span class="token punctuation">]</span> y_test <span class="token operator">=</span> y<span class="token punctuation">[</span>split<span class="token punctuation">:</span><span class="token punctuation">]</span> </div>
Support Vector Classifier
Now it?s time use support vector classifier.
Example
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;">cls <span class="token operator">=</span> SVC<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">.</span>fit<span class="token punctuation">(</span>X_train<span class="token punctuation">,</span> y_train<span class="token punctuation">)</span> df<span class="token punctuation">[</span><span class="token string">'prediction'</span><span class="token punctuation">]</span> <span class="token operator">=</span> cls<span class="token punctuation">.</span>predict<span class="token punctuation">(</span>X<span class="token punctuation">)</span> <span class="token keyword">print</span><span class="token punctuation">(</span>df<span class="token punctuation">[</span><span class="token string">'prediction'</span><span class="token punctuation">]</span><span class="token punctuation">)</span> </div>
Output
Date
2000-01-03 1
2000-01-04 1
2000-01-05 1
2000-01-06 1
2000-01-07 1
..
2020-05-22 1
2020-05-26 1
2020-05-27 1
2020-05-28 1
2020-05-29 1
Name: prediction, Length: 5075, dtype: int64
Conclusion
Support Vector Machine, a well-liked and space-effective approach for classification and regression applications, leverages geometrical concepts to address our issues. We also used the SVM algorithm to forecast the direction of stock price movement. In the corporate sector, stock price forecasting is quite important, and when we automate this process, it raises awareness of the issue.
