Yahoo Finance is the far the most popular source of information on stock market data and most financial information. It is a site that offers financial news, information, and analysis about businesses, markets, and industries. Additionally, users can access stock quotes, stock charts, and other financial data to track their portfolios and get individualized financial advice. Yahoo Finance also provides tools like financial calculators and other resources to assist consumers in making educated investing decisions.
Prerequisites
To follow this article you will need the following installed and activated on your PC. Also, feel free to fork the GitHub Repo for the code and other necessary materials.
yfinance Library
The yfinance library in Python allows you to easily interact with Yahoo Finance API, to retrieve historical financial data such as stock market information, financial statement,s and stock quotes. The library provides you numerous access to data such as historical prices, dividends, and splits for a given stock.
The yfinance can be installed in your Python IDE with a simple command as pip install yfinance.
Now, that the yfinance library is installed let’s go into the project at hand.
Disclaimer: I am not a financial expert. Therefore, this article is for educational purposes and only aims to show how to connect to Yahoo Finance API data using the Python library.
Import all Necessary Libraries
The following libraries are needed for the project in order to pull price data directly from the Yahoo finance library and create a predicting model of it.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import yfinance as yf | |
#Import the Libraries | |
import math | |
import pandas_datareader as web | |
import numpy as np | |
import pandas as pd | |
from sklearn.preprocessing import MinMaxScaler | |
from keras.models import Sequential | |
from keras.layers import Dense, LSTM | |
import matplotlib.pyplot as plt | |
import datetime | |
plt.style.use("fivethirtyeight") |
Get Ticker Data from the Yahoo Finance site
A ticker symbol in finance refers to a unique identifier used in identifying publicly traded companies and their stocks.
Stocks are usually listed on various stock exchange platforms such as the NASDAQ stock exchange, NYSE (New York Stock Exchange), Tokyo Stock Exchange (TSE), London Stock Exchange (LSE) and so much more.
Create a Ticker variable
Let’s pull Google financial data from the Yahoo finance API using the yfinance library, with the start date of 1st January 2010 to today’s date. The Google ticker name is “GOOGL” .
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Create Ticker variables | |
goog = yf.Ticker("GOOGL") | |
#Set the time range | |
goog_hist = goog.history(start=datetime.datetime(2010, 1, 1),end=datetime.datetime.today()) | |
goog_hist.head(20) |
Visualization
Create a line chart to check the trend of Google prices which changes over a continuous interval or period of time.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Visualization of the Closing price | |
plt.figure(figsize=(16,8)) | |
plt.title("Closing Price History") | |
plt.plot(goog_hist["Close"]) | |
plt.xlabel("Date", fontsize=18) | |
plt.ylabel("Closing Price USD $", fontsize=18) | |
plt.show() |
Create a New DataFrame of the “Close” Column
Drop other columns except for the close column, as our analysis will be based on it.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Create a new dataframe with only the Adj Close Column | |
data = goog_hist.filter(["Close"]) | |
#Convert the dataframe to a numpy array | |
dataset = data.values | |
#Get the number of rows to train the model on | |
training_data_len = math.ceil( len(dataset) *.8) #This is use to train 80% of the dataset | |
training_data_len |
Standardize our Data
Data standardization involves the transformation of data so that it has a mean of zero and a standard deviation. The purpose of standardizing data is to give it a uniform scale across all measurement units so that it can be compared and modeled consistently.
We will be using the Min-Max scaling, which is a technique that scales the data by transforming the data to be between a specific range, for example, between 0 and 1.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Scale the data | |
scaler = MinMaxScaler(feature_range=(0,1)) | |
scaled_data = scaler.fit_transform(dataset) | |
scaled_data | |
#Scaling the data means you are actually standardizing your data |
Create Training Model
The data will be split into x_train and y_train.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Create the training model for the dataset | |
#Create the scaled training dataset | |
train_data = scaled_data[0:training_data_len , :] | |
#Split the data into x_train & y_train | |
x_train = [] | |
y_train = [] | |
for i in range(60, len(train_data)): | |
x_train.append(train_data[i–60:i, 0]) | |
y_train.append(train_data[i, 0]) | |
if i<= 61: | |
print(x_train) | |
print(y_train) | |
print() |
#Convert the x_train and y_train to NumPy arrays x_train, y_train = np.array(x_train), np.array(y_train) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Reshape the data to a 3 dimensional shape | |
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1)) | |
x_train.shape | |
#Now you'll notice it a 3 dimensional shape |
Create your Model
The model used for this forecast will be the LSTM model. Long Short-Term Memory(LSTM) Network is an advanced, recurrent neural network capable of learning order dependence in sequence prediction problems. In addition, it can resolve the RNN’s vanishing gradient issue.
Let’s build the LSTM Model
Let’s start by importing all necessary models for the Neural Network.
from sklearn.preprocessing import MinMaxScaler from tensorflow.keras.layers import Dense, LSTM from tensorflow.keras.models import Sequential |
Create a Simple LSTM Model
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Build the LSTM model | |
model = Sequential() | |
model.add(LSTM(50, return_sequences=True, input_shape= (x_train.shape[1], 1)))#50 means the no of input neurons | |
model.add(LSTM(50, return_sequences= False)) | |
model.add(Dense(25)) | |
model.add(Dense(1))# Final output |
Compile Model
Before training a model in TensorFlow Keras, the learning process is set up using the compile() method. It takes some arguments like optimizer and loss.
- Optimizer: The optimization algorithm to be utilized during training. Common choices include ‘adam’, ‘sgd’, and ‘rmsprop’.
- Loss: The loss function to be utilized during training can be specified using this option. Mean squared error, category cross-entropy, and binary cross-entropy are popular options.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Compile the model | |
model.compile(optimizer="adam", loss="mean_squared_error") |
Fit Model
At this stage, we are going to train the model using the “fit()”.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Train the model | |
model.fit(x_train, y_train, batch_size=1, epochs=5)#Batch size the number of Batch per training, while epochs is the number of Iteration |



Create a test data
We need to spit the test data from the training data. The test data will be used to test how accurately the model was created.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Create the testing data set | |
#Create a new array containing scaled values from index 1543 to 2003 | |
test_data = scaled_data[training_data_len – 60: , :] | |
#Create the data sets x_test and y_test | |
x_test = [] | |
y_test = dataset[training_data_len:, :] | |
for i in range(60, len(test_data)): | |
x_test.append(test_data[i–60:i, 0]) |
Convert Data to Numpy Array
We need to convert data to NumPy array because it allows for efficient numerical operations on the data.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Convert the data to a numpy array | |
x_test = np.array(x_test)#Reshape the data | |
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], 1 )) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Get the model predicted price values | |
predictions = model.predict(x_test) | |
predictions = scaler.inverse_transform(predictions) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Evaluate the model: Getting the root square error (RMSE) | |
rmse = np.sqrt( np.mean( predictions – y_test )**2 ) | |
rmse |
Prediction Chart
We create a line chart to compare the predicted and validation model. You will notice the model performs well.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Plot the data | |
train = data[:training_data_len] | |
valid = data[training_data_len:] | |
valid["Predictions"] = predictions | |
#Visualize the data | |
plt.figure(figsize=(16,8)) | |
plt.title("Model") | |
plt.xlabel("Data", fontsize=18) | |
plt.ylabel("Close Price USD ($)", fontsize=18) | |
plt.plot(train["Close"]) | |
plt.plot(valid[["Close", "Predictions"]]) | |
plt.legend(["Train", "Val", "Predictions"], loc="lower right") | |
plt.show() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Show the valid and predicted prices | |
valid | |
#This compares between the "Close" and "Predictions" |
Conclusion
In this article, you learned how to connect Python to Yahoo Finance API to pull historical finance data. Additionally, we created a model using Tensorflow Keras to forecast based on historical data.
For this project, I will mention the model is not 100% accurate and will advise further improvements like the use of hyperparameters tuning or the use of a pre-trained model to get higher accuracy.
Feel free to connect with me on LinkedIn: Temidayo Omoniyi & Twitter: Kiddojazz.