Skip to content

This repository deals with the application of Natural Language Processing methods (NLP) as part of the Opencampus course "Machine Learning with TensorFlow", offered in the summer semester 2023. The materials uploaded here represent the final project of the course. As my own project, I chose a scientific problem: Are journalists' opinions influenced

Notifications You must be signed in to change notification settings

RobertHennings/OpenCampus_Sentiment_Project

Repository files navigation

OpenCampus_Sentiment_Project

This repository deals with the application of Natural Language Processing methods (NLP) as part of the Opencampus course "Machine Learning with TensorFlow", offered in the summer semester 2023. The materials uploaded here represent the final project of the course. As my own project, I chose a scientific problem: Are journalists' opinions influenced by short sale data and can trading strategies be derived from it? The original plan was to do the investigation on a german (home market) level, but the data quality one could obtain from the Bundesanzeiger website was found to be unsufficient to carry out exhaustive analysis. Therefore I switched to the US-capital market and found better data quality and availability. I investigated the reserach questions by first scraping the short sale data of the US capital market through a web scraper, file by file, from the FINRA site. Subsequently, various NLP models were set up, to evaluate the sentiment in the news articles written by the journalists. The ultimate plan is to create a numeric sentiment score from the categorical sentiment classe (positive, neutral, negative), for which the model outputs have to be transformed into numerical values. Examples can be found in this markdown file, following similar approaches as the Data and Analytics Vendor RavenPack. The desired output to create such a score and observe it over time would look like follows:

Positive sentiment: Scores in the range 60 - 100

Neutral sentiment: Scores in the range 40 - 60

Negative sentiment: Scores in the range 0 - 40

From this company specific sentiment time series inferences can be drawn comparing it to the short sale time series of the according company.

In terms of the models, I followed four approaches:

  1. Setting up my own model, trained on the Financial Phrasebank dataset.
  2. Using the baseline model of the Hugging face Transformer pipeline.
  3. Using the Bert Baseline Model.
  4. Optimization and extension of the FinBert model.

These trained and/or optimized models then should classifiy the news article text, output a numeric score, that will be compared with the behaviour of the respective short sale quotes. Finally from these results, portfolios can be formed (Long and Short Portfolios) and inferences from the achieved returns drawn, to rate a potential effect, finally answering the research questions.

Useful Links

Bundesanzeiger Short Sale Files

FINRA Daily Short Sale Files

Sentiment Score

Example Usage

ToDO

  • Scrape more single short sale files from FINRA website
  • Code the Sentiment Score Methodology
  • Get News articles
  • Train NLP Model with decent accuracy and precision metrics
  • Tweak NLP Model output to numerical scale

About

This repository deals with the application of Natural Language Processing methods (NLP) as part of the Opencampus course "Machine Learning with TensorFlow", offered in the summer semester 2023. The materials uploaded here represent the final project of the course. As my own project, I chose a scientific problem: Are journalists' opinions influenced

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages