Legislation ETL

ETL application processing legislation data and deployed on Azure.

Deploying locally will process files from the legislation_files folder and insert into a local SQL Server instance
Azure deployment will process files already existing in a blob bucket and insert into a serverless Azure SQL Server instance

Directory structure

legislation_files - contains json files with legislation data
local - docker files for building and running local dockerised app
trigger_etl - core functionality
- azure_etl.py - Azure app entry point
- etl.py - core functionality for both azure and local deployment

Data Models

legislation: basis of a real-world legal document that contains rules and regulations.
Legislation version: a specific version of a piece of legislation.
Issuing body: the organization responsible for creating and issuing the legislation.
Jurisdiction: the geographic area where a particular set of laws and regulations applies.
Parts: the various components of legislation that organize and structure the document into sections, chapters, paragraphs, and clauses.

Running the application locally

Bringing up SQL Server:

cd local
docker-compose up

Building the app image:

docker build -f local/Dockerfile -t cube-etl .

The image was developed on a linux machine, no guarentee that the odbc setup script will work on Windows, if any issues:

Replace the setup_odbc.sh file with suitable odbc setup script (and change Dockerfile line: RUN bash -c "./setup_odbc.sh"), or,
Call the CLI directly with python instead (see below)

Running the app:

# Running docker container (volume for data access -v <dir of data files>:/app/legislation_files)
docker run -v legislation_files:/app/legislation_files -it cube-etl -P <password> -S <IP Address> -U sa
# Running without docker
python3 local_etl_cli.py -P <password> -S <IP Address> -U sa -f legislation_files

Checking the app CLI params:

python3 local_etl_cli.py --help
docker run -it cube-etl --help

Azure Function API

The API can either be called via a web browser or via API request as demonstrated below.

To Run the ETL (FYI this takes ~5 minutes, expect this much time before receiving a http response):

import requests

url = "https://cube-etl.azurewebsites.net/api/trigger_etl?function=etl"
response = requests.get(url)
print(response.content)

To drop the database (~10 seconds):

import requests

url = "https://cube-etl.azurewebsites.net/api/trigger_etl?function=drop"
response = requests.get(url)
print(response.content)

To say hello:

import requests

url = "https://cube-etl.azurewebsites.net/api/trigger_etl?name=John"
response = requests.get(url)
print(response.content)

Querying the database

search_legislation (minimum params = 1): Searches legislation parts based on 4 optional criteria:
- Jurisdiction (optional)
- IssuingBody (optional)
- Title (optional)
- Content (optional)

EXEC search_legislation
    @Jurisdiction = 'Japan',
    @Content = 'Enforcement'

search_any: Searches legislation for a substring in the follow fields: legislation title, jurisdiction name, issuing body name, parts content.
- Search

EXEC search_any 
	@Search = 'tax'

Next Steps

Unit and integration testing
Move db credentials to env variables (create config files for other hard coded vars)
Event trigger for ETL when new files placed in blob bucket
- Possibly move processed files into different bucket
Data validation for input json files
Power BI dashboard for insights
Look for any inefficiencies (container/azurefunc memory, query performance etc)
CI/CD (github actions / jenkins)
Develop / automate infastructure deployment with terraform

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
legislation_files		legislation_files
local		local
trigger_etl		trigger_etl
.dockerignore		.dockerignore
.funcignore		.funcignore
.gitignore		.gitignore
README.md		README.md
getting_started.md		getting_started.md
host.json		host.json
local_etl_cli.py		local_etl_cli.py
notes.md		notes.md
requirements.txt		requirements.txt
setup_odbc.sh		setup_odbc.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legislation ETL

Directory structure

Data Models

Running the application locally

Azure Function API

Querying the database

Next Steps

About

Releases

Packages

Languages

michaeljs-c/legislation-etl

Folders and files

Latest commit

History

Repository files navigation

Legislation ETL

Directory structure

Data Models

Running the application locally

Azure Function API

Querying the database

Next Steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages