-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
120 lines (89 loc) · 3.78 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
<!-- Usage: Rscript -e 'library(rmarkdown); rmarkdown::render("README.Rmd", NULL)' -->
```{r, include = FALSE}
knitr::opts_chunk$set(
python.reticulate = FALSE,
engine.path = list(
python = '/Users/gp/opt/miniconda3/bin/python'
),
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# Scraper for SEC Form 13F-HR
## Overview
Securities and Exchange Commission (SEC) describes form 13F-HR as “Quarterly
report filed by institutional managers, Holdings”. SEC requires professional
investment managers (who manage over 100MM USD) to disclose their holdings
every quarter by filing form 13F-HR.
Scraping form 13F-HR allows us to find out the holdings of an investment firm,
and track the trading pattern if we compare the holdings to previous quarters.
Form 13F-HR does not track derivative exposures though. There are a few portals
that aggregate 13F-HR data, but SEC website is fairly easy to scrape. SEC
publishers [daily reports](https://www.sec.gov/Archives/edgar/daily-index/) as
well as [quarterly aggregates](https://www.sec.gov/Archives/edgar/full-index/).
The scrapers in this folder scrape using quarterly index. It is trivial to
adapt the scrapers to scrape using daily index files. Scraping is a two-step
process: First, index file is scraped to build a list of links to follow. Index
file contains a list of filings. Each entry contains the type of form filed,
CIK (central index key) number of the filing firm, name of the firm, and a
relative URL to the filed report. URLs lead to text files containing 13F-HR
forms (with embedded XML content). Each holding entry has name of stock, number
of stocks held, their USD value, and few other details. Index file for
quarterly aggregates can be big (over 50MB). This file is downloaded into a
local copy to reduce burden on SEC servers and to make incremental scraping
faster.
## Usage
#### Soup
The scraper found in *soup* directory uses BeautifulSoup to parse tags. Specify
year, quarter, and (optionally) number of links (CIKs) to follow or a list of
CIK numbers to include.
```{zsh, engine.opts='-l'}
python soup/sec13F.py -h
```
__Scrapers in *scrapy* and *selenium* directories produce the same output as
the one based on *BeautifulSoup*.__
#### Scrapy
```
#> Usage: scrapy crawl sec13Fspider -a year=<year> -a quarter=<1-4> \
#> -a n=<integer> -o file.csv -t csv
#> (place the spider file, sec13F_spider.py, inside a scrapy project)
```
#### Selenium
Place the geckodriver executable where $PATH can find it.
```{zsh, engine.opts='-l'}
python selenium/sec13F.py -h
```
## Example
Berkshire Hathaway's CIK is 1067983. We can download their holdings as of 1st
quarter 2021 into a local file.
```{zsh, engine.opts='-l'}
if [ ! -f "brk_2021_1.csv" ]; then
python soup/sec13F.py --year 2021 --quarter 1 1067983 > brk_2021_1.csv
fi
```
We can find out the top holdings by value (happens to be Apple).
```{python, message=F}
import pandas as pd
brk2021 = pd.read_csv('brk_2021_1.csv')
gr2021 = brk2021.groupby(['issuer', 'cusip']).agg({'value':'sum', 'quantity':'sum'})
print(gr2021.sort_values('value', ascending=False).head())
```
Similarly, top holdings as of the 4th quarter of 2020 are as follows.
```{zsh, engine.opts='-l'}
if [ ! -f "brk_2020_4.csv" ]; then
python soup/sec13F.py --year 2020 --quarter 4 1067983 > brk_2020_4.csv
fi
```
```{python, message=F}
import pandas as pd
brk2020 = pd.read_csv('brk_2020_4.csv')
gr2020 = brk2020.groupby(['issuer', 'cusip']).agg({'value':'sum', 'quantity':'sum'})
print(gr2020.sort_values('value', ascending=False).head())
```
It is apparent that they have sold 57160000 shares (6%) of Apple (AAPL) during
the 1st quarter of 2021, when the market was hitting all-time highs!