Notice: all scripts bellow (w/o python installation on server) can be run at once with
sudo bash start_application.sh [file_id]
Important: if you are not initializing project on a server but on your local device you can skip some steps from this file and just download git project and then do steps for local init
First step is to clone git project on local device. Before doing that git needs
to be installed via command sudo apt install git
. Cloning project repository is
then easy and can be done with command: git clone https://github.com/jmajaca/infobot-public.git
.
Notice: Skip this step if you are not planing to develop application or create hotfixes directly on server. Alternative to installing Python directly on the server is to run python application in docker container which will be described in one of the following chapters.
To install Python 3.7 and its dependencies as well as packages for infobot-public
project run following command:
sudo bash python_setup.sh
. Script follows instructions for installing Python
3.7 on Debian 9 Device from this
step by step tutorial.
Script installs these python packages:
- slack
- slackclient
- requests
- bs4
- sqlalchemy
- psycopg2-binary
- flask
- flask_wtf
Tools that are required for project to work correctly are:
- git (already installed)
- docker
- unzip
- curl
Docker is installed from a package based on this official tutorial.
Run command sudo bash tools_setup.sh
to install listed tools.
To initialize database and application docker containers you will need access
to files config.cfg and docker-compose.yml which both contain sensitive data.
To get access to these two files you will need to contact project maintainer
so that he could potentially give you id of zip file which contains these two
files. You will then need to pass that id as a parameter of bash script call
so that script knows which file to download from Google Drive.
When you have file id run command
sudo bash containers_setup.sh [file_id]
.
This part of setup is based on this
and this tutorial.
For script to work you will need docker-compose.yaml
file in which options are defined. File can be accessed as described earlier.
Command will create two databases: one for production and one for testing. Both databases will be initialized with empty
tables from infobot project. Production database is on default postgres port 5432
while test database is on port
5431
.
In order to activate cron job for snapshotting database every day at midnight paste
following string 0 0 * * * {path_to_git_repo}/infobot-public/src/resources/bash_scripts/database_backup.sh > /tmp/script.log 2>&1
in file which is opened with command sudo crontab -e
.
Docker bash script will install
requirements (as described in first chapter of this file regarding installation of Python)
and start application with exposed port 9000
.
For script to work you will need config.cfg
file in which options are defined. File can be accessed as described earlier.
Entry method is located in start.py
file.
Instance is created with following settings.
Setting | Value |
---|---|
Name | optional |
Region | europe-west3 (Frankfurt) |
Zone | europe-west3-c |
Machine Family | General-purpose |
Machine Series | N1 |
Machine Type | f1-micro (1 vCPU, 614 MB memory) |
Boot Disk | Debian GNU/Linux 9 (stretch) |
To connect locally to server you need SSH key which can be easily added. To edit instance click edit when on VM instance details page. In edit mode under section SSH Keys click on Show and edit which will open a form for adding or deleting ssh keys.
To expose ports to outside world you will need to create Firewall rules. Follow
this
link for more information.
You will want to create rules that expose ports 5432
, 5431
and 9000
to all (IP range: 0.0.0.0/0
).
Logs can be found in src/log
folder. Here is the table of logs found in that file.
Log | Description |
---|---|
docker_db_log.log | Logs from docker container that contains test and production database |
docker_app_log.log | Logs from docker container that contains python application |
application_log.log | Logs from python application |
application_trace.log | Trace stack from errors that are logged in application_log.log |
Click for more information
For test workspace url please contact project maintainer.
Infobot has following permissions on slack workspace:
OAuth Scope | Description |
---|---|
app_mentions:read | View messages that directly mention @infobot in conversations that the app is in |
channels:read | View basic information about public channels in the workspace |
chat:write | Send messages as @infobot |
chat:write.public | Send messages to channels @infobot isn't a member of |
im:read | View basic information about direct messages that Infobot has been added to |
im:write | Start direct messages with people |
pins:read | View pinned content in channels and conversations that Infobot has been added to |
pins:write | Add and remove pinned messages and files |
reactions: read | View emoji reactions and their associated content in channels and conversations that Infobot has been added to |
users:read | View people in the workspace |
For program to be able to run locally you will need file config.cfg stored on path src/resources/config.cfg
.
If you don't have config file please contact project maintainer. Next step is to install Python requirements, this
can easily be done with command pip install -r src/resources/requirements.txt
from project root folder. Project entry
point is located in start.py
in project root folder.
The last thing is to create log files which can be done from the project root folder with command
mkdir src/log touch src/log/{application_log.log,application_trace.log}
.
Application is started by running start.py
.
By default config application is running on http://localhost:9000
and is connected to test database on port 5431
.
If config is not default these options can differ.
Click for more information
This table contains all relevant data of notification that is beaning scrapped.
name | type | description |
---|---|---|
id | SERIAL | unique autogenerated notification identification |
title | VARCHAR | title of the notification |
author | INTEGER | foreign key that is unique identification of the notification author |
site | INTEGER | foreign key that is unique identification of the notification course |
publish_date | TIMESTAMP | date and time when the notification was published |
text | VARCHAR | text of the notification parsed for the Slack messaging |
link | VARCHAR | link to the original notification that was scrapped |
This table represents autogenerated reminders from scrapped notifications.
name | type | description |
---|---|---|
id | SERIAL | unique autogenerated reminder identification |
text | VARCHAR | text of the reminder (one notification paragraph) parsed for the Slack messaging |
end_date | TIMESTAMP | date and time when the event in the reminder is meant to happen |
timer | INTERVAL | time before end_date when reminder is meant to be sent as a Slack message |
posted | BOOLEAN | flag that tells if reminder has already been sent |
This table represents Slack channels from selected Slack Workspace.
name | type | description |
---|---|---|
id | VARCHAR | unique channel identification generated by Slack |
tag | VARCHAR | channel tag |
creator_id | VARCHAR | foreign key that is unique identification of Slack user who created the channel |
created | TIMESTAMP | date and time when the channel was created |
This table represents a faculty course.
name | type | description |
---|---|---|
id | SERIAL | unique autogenerated course identification |
name | VARCHAR | course name |
channel_tag | VARCHAR | foreign key that connects course and channel via channel tag |
url | VARCHAR | course url on the internet |
watch | BOOLEAN | flag that tells if course is on the watchlist, if course is on the watchlist the scraper is going to scrape notifications for the course via url |
This table represent an author of the notification that was scrapped.
name | type | description |
---|---|---|
id | SERIAL | unique autogenerated author identification |
first_name | VARCHAR | author's first name |
last_name | VARCHAR | author's last name |
This table represents a message pin from the Slack Workspace.
name | type | description |
---|---|---|
id | SERIAL | unique autogenerated pin identification |
creation_date | TIMESTAMP | date and time when the pin was created |
timer | INTERVAL | time for how long the message is going to be pinned |
notification | INTEGER | foreign key of the notification that was pinned |
timestamp | FLOAT | unique message identification represented as timestamp |
done | BOOLEAN | flag that tells if pin is done, if pin is done then it is unpinned |
This table represents a Slack user from the Slack Workspace.
name | type | description |
---|---|---|
id | VARCHAR | unique user identification generated by Slack |
name | VARCHAR | slack username generated from user's e-mail |
This table represents a reaction from the Slack Workspace.
name | type | description |
---|---|---|
id | SERIAL | unique autogenerated reaction identification |
name | VARCHAR | reactions name in the Slack Workspace |
timestamp | FLOAT | unique message identification represented as timestamp on which the reaction was given |
sender | VARCHAR | foreign key that is unique identification of Slack user who gave the reaction |
receiver | VARCHAR | foreign key that is unique identification of Slack user who received the reaction |
This table represent a user that can log in on the web application. This table is not to be confused with Slack User table.
name | type | description |
---|---|---|
id | SERIAL | unique autogenerated user identification |
name | VARCHAR | username |
password | VARCHAR | hashed user's password |
Notice: this table is yet to be clearly defined, as of now it is not used in the application
This table represents filters on scrapping notifications.
name | type | description |
---|---|---|
id | SERIAL | unique autogenerated filter identification |
ban_title | VARCHAR | title regex that if matched means that notification is not scrapped |
Click for more information
Navigation bar has 6 main buttons: Home, Courses, Reminders, Reactions, Scan and button containing username of
logged in user. Click on the application icon or Infobot text next to is going to redirect user to Home page.
Also there is scrapper status on the right side of navigation bar near logged in user. Scrapper status is changed dynamically.
Scrapper can have 4 statuses (all in different color): Off, Scrapping, Sleeping, Error. Click on the status is redirecting user to Home page.
Buttons Home, Courses, Reminders and Reactions are redirecting to corresponding pages. Button Scan opens a
dropdown that has following options: Scan for users, Scan for channels, Scan for reactions and Complete scan
which starts all other scans by one click. All scans are done on the Slack Workspace. For scanning users and channels class
Scanner
is used and for reactions scan class ReactionManager
is used. When clicked on a scan spinner is appearing for the duration
of scanning. If scan finished without errors check mark will replace spinner, if that is not the case error icon will appear.
On the far right there is username of logged in user along with user icon. By clicking on the username dropdown appears that has following options: Settings and Logout. By clicking on Settings user open it's profile settings where he can change his user settings. Click on the Logout is going to logout user and redirect him to login page.
Navigation bar html code is located in base.html
as it is JavaScript code for navigation bar actions. Backend logic is located
in nav_bar_view.py
. Scraper progress is stored in localStorage in base.html
from endpoint
progress
in base_view.py
. Scraper status card gets its value from that stored value in localStorage.
On home page are the most important things about Infobot - it's scrapping status and log entries.
Progress bar tells user what is the progress of scrapping and description of current scraping phase. Progress bar is getting information from the same place as scraping status in the navigation bar. There are two states of progress bar: running (blue with progress percent) and error (red with ERROR message).
Bellow progress bar are two buttons: START and STOP. Button START starts process of scraping in the Scraper
class, while button STOP send SIGINT to process started in Scraper
class.
Logs are also very important aspect of Infobot application. Creating and managing logs is duty of
Logger
class.
Looger
class does not only create logs for scrapping process but for scanning process too.
There are three types of logs:
type | description |
---|---|
INFO | events like starting or terminating scraping/scanning process and inserting a new element to database |
WARNING | events that are not crashing scraping/scanning process but are disturbing it, like not being able to log in on the web page to scrape data |
ERROR | events that cause scrapping/scanning process to terminate |
Important: Logs listed on the Home page are not only logs from
Scrapper
, but are logs for whole Infobot application
Logs are sorted by the newest, so the fresh logs are always up top. INFO and WARNING logs can only be read, while ERROR log can be clicked which opens popup with stack trace of error for more detailed information.
Data from popup is located in src/log/application_trace.log
and log data from home page is located in src/log/application_log.log
.
HTML and JavaScript for home page are located in home.html
file. Backend logic is located in home_view.py
.
On the courses page there are tree tables: Watch list, Unwatched list and Archived list.
list | description |
---|---|
Watch | Courses for which corresponding channels exists and which the Scrapper is scraping |
Unwatched | Courses for which corresponding channels exists and which the Scrapper is not scraping |
Archived | Courses for which corresponding channels exists but are archived and which the Scrapper is not scraping |
Every list has following elements.
element | description |
---|---|
Name | Course name |
Tag | Slack channel tag of channel in which Infobot is going to send notifications and reminders |
Url | Course url from which Infobot is scrapping data |
Watch | Toggle that defines if Infobot is going to scrape data from provided url for course and post messages to the Slack channel |
Actions |
|
HTML and Javascript for this page is located in course.html
and backend in course_view.py
.
List shows all watched courses with option to edit certain attributes of each course. If users changes Watch attribute of the course then that course will move to Unwatched list and if user archives channel that course will go to Archived list. The last row is made for saving new course to Watch/Unwatched list, so it does not have action buttons for Archive and Delete. If there is currently no corresponding channel for a new course user can click on + icon in Tags section which will open popup for creating new Slack channel.
When creating channel user can define channel tag, topic and users. Creator of the channel will be Infobot application in the Slack Workspace.
If users whishes not to create special channel for the course then user can select one of the channel tags in the dropdown without creating new channel.
In the channel tag dropdown are shown only channels that have no course connected to it plus channel #general
.
Channel #general
can have as many connected courses to it as the user wants.
This behaviour is only present for #general
channel.
Unwatched list is almost the same as the Watched list with exception that Unwatched list shows only courses that have Watch attribute set to Off.
There is still option to edit courses, but not to add any new course like on the Watch list.
If users changes Watch attribute of the course then that course will move to Watch list and if user archives channel that course will go to Archived list.
All list elements have all actions available except for courses that have channel tag #general
which is to prevent user from archiving the #general
channel.
Archived list contains courses for which channels are archived. For that reason every course has attribute Watch set to Off. User can edit every attribute of course except attribute Watch which is read only. In the actions section there is no archive action because channels is already archived but there is option for unarchiving the channel. Action that unarchives channel is not currently possible because of Slack API and it's limitations for bots.