You are currently viewing Time series Forecast of Yahoo Finance Data
Yahoo Finance is a site that offers financial news, information, and analysis about businesses, markets, and industries.

Time series Forecast of Yahoo Finance Data

Yahoo Finance is the far the most popular source of information on stock market data and most financial information. It is a site that offers financial news, information, and analysis about businesses, markets, and industries. Additionally, users can access stock quotes, stock charts, and other financial data to track their portfolios and get individualized financial advice. Yahoo Finance also provides tools like financial calculators and other resources to assist consumers in making educated investing decisions. 

Prerequisites

To follow this article you will need the following installed and activated on your PC. Also, feel free to fork the GitHub Repo for the code and other necessary materials.

yfinance Library

The yfinance library in Python allows you to easily interact with Yahoo Finance API, to retrieve historical financial data such as stock market information, financial statement,s and stock quotes. The library provides you numerous access to data such as historical prices, dividends, and splits for a given stock.

The yfinance can be installed in your Python IDE with a simple command as  pip install yfinance.

Now, that the yfinance library is installed let’s go into the project at hand.

Disclaimer: I am not a financial expert. Therefore, this article is for educational purposes and only aims to show how to connect to Yahoo Finance API data using the Python library.

Import all Necessary Libraries

The following libraries are needed for the project in order to pull price data directly from the Yahoo finance library and create a predicting model of it.


import yfinance as yf
#Import the Libraries
import math
import pandas_datareader as web
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense, LSTM
import matplotlib.pyplot as plt
import datetime
plt.style.use("fivethirtyeight")

view raw

yfinanceLib.py

hosted with ❤ by GitHub

 

Get Ticker Data from the Yahoo Finance site

A ticker symbol in finance refers to a unique identifier used in identifying publicly traded companies and their stocks. 

Stocks are usually listed on various stock exchange platforms such as the NASDAQ stock exchange, NYSE (New York Stock Exchange), Tokyo Stock Exchange (TSE),  London Stock Exchange (LSE)  and so much more.

Create a Ticker variable

Let’s pull Google financial data from the Yahoo finance API using the yfinance library, with the start date of 1st January 2010 to today’s date. The Google ticker name is “GOOGL” .


# Create Ticker variables
goog = yf.Ticker("GOOGL")
#Set the time range
goog_hist = goog.history(start=datetime.datetime(2010, 1, 1),end=datetime.datetime.today())
goog_hist.head(20)

view raw

google.py

hosted with ❤ by GitHub

Visualization

Create a line chart to check the trend of Google prices which changes over a continuous interval or period of time.


#Visualization of the Closing price
plt.figure(figsize=(16,8))
plt.title("Closing Price History")
plt.plot(goog_hist["Close"])
plt.xlabel("Date", fontsize=18)
plt.ylabel("Closing Price USD $", fontsize=18)
plt.show()

Create a New DataFrame of the “Close” Column

Drop other columns except for the close column, as our analysis will be based on it.


#Create a new dataframe with only the Adj Close Column
data = goog_hist.filter(["Close"])
#Convert the dataframe to a numpy array
dataset = data.values
#Get the number of rows to train the model on
training_data_len = math.ceil( len(dataset) *.8) #This is use to train 80% of the dataset
training_data_len

view raw

dataframe.py

hosted with ❤ by GitHub

Standardize our Data

Data standardization involves the transformation of data so that it has a mean of zero and a standard deviation. The purpose of standardizing data is to give it a uniform scale across all measurement units so that it can be compared and modeled consistently.

We will be using the Min-Max scaling, which is a technique that scales the data by transforming the data to be between a specific range, for example, between 0 and 1.


#Scale the data
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
scaled_data
#Scaling the data means you are actually standardizing your data

view raw

normalize.py

hosted with ❤ by GitHub

Create Training Model

The data will be split into x_train and y_train.


#Create the training model for the dataset
#Create the scaled training dataset
train_data = scaled_data[0:training_data_len , :]
#Split the data into x_train & y_train
x_train = []
y_train = []
for i in range(60, len(train_data)):
x_train.append(train_data[i60:i, 0])
y_train.append(train_data[i, 0])
if i<= 61:
print(x_train)
print(y_train)
print()

#Convert the x_train and y_train to NumPy arrays
x_train, y_train = np.array(x_train), np.array(y_train)

 


#Reshape the data to a 3 dimensional shape
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
x_train.shape
#Now you'll notice it a 3 dimensional shape

view raw

dimension.py

hosted with ❤ by GitHub

Create your Model

The model used for this forecast will be the LSTM model. Long Short-Term Memory(LSTM) Network is an advanced, recurrent neural network capable of learning order dependence in sequence prediction problems. In addition, it can resolve the RNN’s vanishing gradient issue.

Let’s build the LSTM Model

Let’s start by importing all necessary models for the Neural Network.

from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import Sequential

 

Create a Simple LSTM Model


#Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))#50 means the no of input neurons
model.add(LSTM(50, return_sequences= False))
model.add(Dense(25))
model.add(Dense(1))# Final output

view raw

simplemodel.py

hosted with ❤ by GitHub

Compile Model

Before training a model in TensorFlow Keras, the learning process is set up using the compile() method. It takes some arguments like optimizer and loss.

  • Optimizer: The optimization algorithm to be utilized during training. Common choices include ‘adam’, ‘sgd’, and ‘rmsprop’.
  • Loss: The loss function to be utilized during training can be specified using this option. Mean squared error, category cross-entropy, and binary cross-entropy are popular options.


#Compile the model
model.compile(optimizer="adam", loss="mean_squared_error")

view raw

compilemodel.py

hosted with ❤ by GitHub

Fit Model

At this stage, we are going to train the model using the “fit()”.


#Train the model
model.fit(x_train, y_train, batch_size=1, epochs=5)#Batch size the number of Batch per training, while epochs is the number of Iteration

view raw

modelfit.py

hosted with ❤ by GitHub

Create a test data

We need to spit the test data from the training data. The test data will be used to test how accurately the model was created.


#Create the testing data set
#Create a new array containing scaled values from index 1543 to 2003
test_data = scaled_data[training_data_len 60: , :]
#Create the data sets x_test and y_test
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
    x_test.append(test_data[i60:i, 0])

view raw

testdata.py

hosted with ❤ by GitHub

Convert Data to Numpy Array

We need to convert data to NumPy array because it allows for efficient numerical operations on the data.


#Convert the data to a numpy array
x_test = np.array(x_test)#Reshape the data
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 ))


#Get the model predicted price values
predictions = model.predict(x_test)
predictions = scaler.inverse_transform(predictions)

view raw

datapred.py

hosted with ❤ by GitHub

 


#Evaluate the model: Getting the root square error (RMSE)
rmse = np.sqrt( np.mean( predictions y_test )**2 )
rmse

view raw

rmse.py

hosted with ❤ by GitHub

Prediction Chart

We create a line chart to compare the predicted and validation model. You will notice the model performs well.


#Plot the data
train = data[:training_data_len]
valid = data[training_data_len:]
valid["Predictions"] = predictions
#Visualize the data
plt.figure(figsize=(16,8))
plt.title("Model")
plt.xlabel("Data", fontsize=18)
plt.ylabel("Close Price USD ($)", fontsize=18)
plt.plot(train["Close"])
plt.plot(valid[["Close", "Predictions"]])
plt.legend(["Train", "Val", "Predictions"], loc="lower right")
plt.show()


#Show the valid and predicted prices
valid
#This compares between the "Close" and "Predictions"

view raw

validate.py

hosted with ❤ by GitHub

Conclusion

In this article, you learned how to connect Python to Yahoo Finance API to pull historical finance data. Additionally, we created a model using Tensorflow Keras to forecast based on historical data.

For this project, I will mention the model is not 100% accurate and will advise further improvements like the use of hyperparameters tuning or the use of a pre-trained model to get higher accuracy.

Feel free to connect with me on LinkedIn: Temidayo Omoniyi & Twitter: Kiddojazz.

Leave a Reply