Name	Name	Last commit message	Last commit date
Latest commit History 15 Commits
code	code
images	images
README.md	README.md

StockGram : An Intelligent Portfolio Manager

Research Paper

Please find my research paper on Intelligent Portfolio Management via NLP Analysis of Financial 10-K Statements, published in the November issue of International Journal of Artificial Intelligence and Applications

Overview

The project attempts to analyze if the sentiment stability of financial 10-K reports over time can determine the company’s future mean returns. A diverse portfolio of stocks was selected to test this hypothesis. The proposed framework downloads 10-K reports of the companies from SEC’s EDGAR database. It passes them through the preprocessing pipeline to extract critical sections of the filings to perform NLP analysis. Using Loughran and McDonald sentiment word list, the framework generates sentiment TF-IDF from the 10-K documents to calculate the cosine similarity between two consecutive 10-K reports and proposes to leverage this cosine similarity as the alpha factor. For analyzing the effectiveness of our alpha factor at predicting future returns, the framework uses the alphalens library to perform factor return analysis, turnover analysis, and for comparing the Sharpe ratio of potential alpha factors. The results show that there exists a strong correlation between the sentiment stability of our portfolio’s 10-K statements and its future mean returns.

Quandl Dataset

Quandl end of day US Stock Prices database, Accessed: 2020-10

How to use Quandl data?

!pip install quandl

import quandl

quandl.ApiConfig.api_key = "YOURAPIKEY"

data = quandl.get(['EOD/AMZN', 'EOD/NKE'])

data.head()

Portfolio

We test our hypothesis: Sentiment stability of financial 10-K report can be a potential trading signal, on a diverse portfolio of 7 stocks as below:

The SEC EDGAR Database

In order to extract financial 10-K reports of the stocks in our universe, we leverage a pre-defined SEC API and the CIK number of the stock. Details on how to extract the 10-K report from SEC EDGAR database and pre-process it can be found in this notebook.

Loughran McDonald Sentiment Word List

Code

You can find the PyTorch implementation of the framework here

Evaluation and Results

Factor Returns

Factor returns are a way to directly measure the returns of our portfolio if their weights were determined purely by the alpha factor. Alphalens requires two mandatory arguments to predict future mean returns: factors and prices. In this project, we consider cosine similarity between two consecutive 10-K reports as factor data and year-end adjusted closing prices of the stocks in our portfolio as pricing data to run against our factor data.

After generating the factor data frame and setting the pricing data, we pass both the arguments in the alphalens’ method called get_clean_factor_and_forward_returns, which accepts factor data, pricing data, quantiles, bins, and periods. This function generates a multi-indexed merged data frame that is indexed by date at level 0 and followed by stock/asset at level 1. This data frame contains the values for a single alpha factor, forward returns for each period, and quantile/bin in which the signal belongs.

Figure 1 shows a plot between factor returns and time. As we can see from the graph, 10-K financial reports expressing the sentiment interesting and positive, yield the maximum returns. On the other hand, the forms that convey constraining , negative , and litigious resulted in the lowest returns. The following observation aligns with our hypothesis that performing NLP analysis on financial 10-K statements could predict future mean returns.

Turnover Analysis

Since liquidity and transaction costs are dependent upon market conditions at the time of the trade, it is challenging to simulate actual transaction costs when evaluating an Alpha factor. So a useful proxy for these real-world constraints is to measure the turnover. The turnover analysis estimates the fraction of the portfolio's total value getting traded in a period. One of the ways to measure turnover is factor rank autocorrelation . Factor rank autocorrelation is a way to measure how stable are the ranked alpha factors. In this case, stability refers to the fact that alpha ranks do not change much from period to period. Since trading is costly, we would always prefer other factors to be the same, i.e., the alpha factor’s ranks do not change significantly per period. A high factor rank autocorrelation is an indication that the turnover is lower. A low or even a negative autocorrelation is a proxy to indicate a higher turnover. If two alpha factors have similar quintile performance and similar factor returns, we would prefer the one with lower turnover.

The reason for choosing alpha factor with lower turnover is that it makes it possible for us to execute trades if we have liquid stocks and reduce transaction costs. Excessive turnover could imply that our Alpha factor is only catching noise.

Sharpe Ratio

The Sharpe Ratio or risk-adjusted return is a critical metric in evaluating alpha factors. It is the measure of excess portfolio return over the risk-free rate relative to its standard deviation. Sharpe ratio helps us to compare the relative performance of alpha factors. One important thing to note is that the Sharpe ratio is the key and not the magnitude of factor returns. Table 1 shows the Sharpe ratio of our alpha factors.

Usually, a ratio under 1.0 is considered sub-optimal. Sharpe ratio greater than 1.0 is acceptable to good by investors. A Sharpe ratio higher than 2.0 is good, and investors deem a 3.0 or higher Sharpe ratio excellent. Looking at the Sharpe ratio of our Alpha factor, we can see that the 10-K filing reports that convey the sentiment interesting have the highest Sharpe ratio of 4.10, followed by the 10-K documents that express a positive view with a Sharpe ratio of 1.02.

Contributor

Purva Singh

Contributing

Please feel free to open a Pull Request to contribute towards this repository. Also, if you think there's any section that requires more/better explanation, please use the issue tracker to let me know about the same.

Support

If you like this repo and find it useful, please consider (★) starring it (on top right of the page) so that it can reach a broader audience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StockGram : An Intelligent Portfolio Manager

Research Paper

Overview

Quandl Dataset

How to use Quandl data?

Portfolio

The SEC EDGAR Database

Loughran McDonald Sentiment Word List

Code

Evaluation and Results

Factor Returns

Turnover Analysis

Sharpe Ratio

Contributor

Contributing

Support

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StockGram : An Intelligent Portfolio Manager

Research Paper

Overview

Quandl Dataset

How to use Quandl data?

Portfolio

The SEC EDGAR Database

Loughran McDonald Sentiment Word List

Code

Evaluation and Results

Factor Returns

Turnover Analysis

Sharpe Ratio

Contributor

Contributing

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages