Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(event schema) Basic infrastructure to load tpyecheck processor code out of the jsonschema definition. #1192

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b852ebd
wip on jsonschema project for snuba
untitaker Jul 31, 2020
26dd251
Replace event with data (#1197)
fpacifici Aug 4, 2020
3ae5ba4
Merge branch 'master' into wip/jsonschema-mypy
fpacifici Aug 4, 2020
0d216d8
Merge branch 'wip/jsonschema-mypy' of https://github.com/getsentry/sn…
fpacifici Aug 4, 2020
1354941
Re-fix tests (#1200)
fpacifici Aug 4, 2020
0bfb660
Hackweek: Added Makefile entry to fetch schema; eliminate type errors…
dbennett-sentry Aug 4, 2020
f5823b2
fix(hackweek): Use the correct event Typed Dict in event processor ba…
fpacifici Aug 5, 2020
9fc5fee
Take the schema from the schema repo (#1203)
fpacifici Aug 6, 2020
ce6d4e2
feat(hackweek) Remove merged jsonschema and validate it is downloaded…
fpacifici Aug 7, 2020
d11e6dc
Merge branch 'master' into wip/jsonschema-mypy
fpacifici Sep 10, 2020
6ada6bf
Fix import and python version
fpacifici Sep 10, 2020
016323e
update readme
fpacifici Sep 10, 2020
fd3d2aa
Merge branch 'master' into wip/jsonschema-mypy
fpacifici Sep 22, 2020
69c83b1
Code review comments
fpacifici Sep 22, 2020
c9fda8e
Add post merge hook
fpacifici Sep 22, 2020
5aa582a
Post merge hook
fpacifici Sep 22, 2020
d415dc0
Merge branch 'master' into wip/jsonschema-mypy
fpacifici Sep 24, 2020
117b37d
Temporarily commit the schema
fpacifici Sep 24, 2020
74afa60
Temporarily commit the schema 2
fpacifici Sep 24, 2020
69368b9
Temporarily commit the schema 3
fpacifici Sep 24, 2020
e1d7703
Merge branch 'master' into wip/jsonschema-mypy
fpacifici Oct 1, 2020
c19c91e
Try to import pip
fpacifici Oct 2, 2020
bfc8f2f
Merge branch 'master' into wip/jsonschema-mypy
fpacifici Dec 31, 2020
60e8e50
Add script to update version
fpacifici Jan 4, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ repos:
- id: check-yaml
- id: detect-private-key
- id: end-of-file-fixer
# TODO: Remove this exclusion when we stop committing the schema.
exclude: ^schemas/
- id: trailing-whitespace
- id: fix-encoding-pragma
args: ["--remove"]
Expand Down
8 changes: 7 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.PHONY: develop setup-git test install-python-dependencies

develop: install-python-dependencies setup-git
develop: install-python-dependencies setup-git fetch-and-validate-schema

setup-git:
mkdir -p .git/hooks && cd .git/hooks && ln -sf ../../config/hooks/* ./
Expand All @@ -12,3 +12,9 @@ test:

install-python-dependencies:
pip install -e .

fetch-and-validate-schema:
mkdir -p schemas
curl https://raw.githubusercontent.com/getsentry/sentry-data-schemas/main/relay/event.schema.json -o schemas/event.schema.json
mypy snuba > /dev/null || (if [ "$$?" -gt 1 ]; then exit 1; fi)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cases is this trying to check?

Never mind, figured this one out — a short comment here that explains the significance of the exit code would be useful.

.PHONY: fetch-and-validate-schema
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,7 @@ Note that Snuba assumes that everything is running on UTC time. Otherwise you ma

mkvirtualenv snuba --python=python3.8
workon snuba
make install-python-dependencies
make setup-git
make develop

# Run API server
snuba api
Expand Down
8 changes: 8 additions & 0 deletions config/hooks/post-merge
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,15 @@ red="$(tput setaf 1)"
bold="$(tput bold)"
reset="$(tput sgr0)"

# Ensures up to data event schemas are downloaded
mkdir -p schemas
curl https://raw.githubusercontent.com/getsentry/sentry-data-schemas/main/relay/event.schema.json -o schemas/event.schema.json
mypy snuba > /dev/null
if [ $? -gt 1 ]; then
echo [${red}${bold}!!!${reset}] Could not dowload schemas.
fi
Comment on lines +7 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just the same as the make step?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much the same (the syntax is different because that one is in a Makefile). I added it in both because this happens automatically and you cannot really easily trigger it by yourself (technically you can execute the hook being it a shell script). The Makefile one is something you can always trigger manually.


# Ensures migrations are executed
files_changed_upstream="$(mktemp)"
trap "rm -f ${files_changed_upstream}" EXIT

Expand Down
2 changes: 2 additions & 0 deletions mypy.ini
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
[mypy]
plugins = jsonschema_typed.plugin
python_version = 3.8
ignore_missing_imports = False

[mypy-_strptime]
Expand Down
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,14 @@ honcho==1.0.1
idna==2.7
itsdangerous==0.24
Jinja2==2.10.1
jsonschema==3.0.1
jsonschema==3.2.0
lz4==2.0.0
Markdown==2.6.11
MarkupSafe==1.1.1
mywsgi==1.0.3
more-itertools==4.2.0
mypy>=0.761
git+https://github.com/getsentry/sentry-data-schemas.git@76c6870d4b81e9c7a3a983cf4f591aeecb579521#subdirectory=py
packaging==17.1
parsimonious==0.8.1
py==1.5.3
Expand Down
9 changes: 9 additions & 0 deletions scripts/update-schemas.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

SCHEMAS_REPO='https://github.com/getsentry/sentry-data-schemas.git'
LATEST_VERSION=$(git ls-remote $SCHEMAS_REPO HEAD | awk '{ print $1}')

REGEX='^git\+https\:\/\/github\.com\/getsentry\/sentry\-data\-schemas\.git\@([a-f0-9]+)\#subdirectory=py'
TEMP='git+https://github.com/getsentry/sentry-data-schemas.git@c123123#subdirectory=py'

echo $TEMP | sed "s/c([1-f0-9]+)/\$LATEST_VERSION/"
1 change: 0 additions & 1 deletion snuba/datasets/errors_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ def extract_custom(
) -> None:
data = event.get("data", {})
user_dict = data.get("user", data.get("sentry.interfaces.User", None)) or {}

user_data: MutableMapping[str, Any] = {}
extract_user(user_data, user_dict)
output["user_name"] = user_data["username"]
Expand Down
13 changes: 8 additions & 5 deletions snuba/datasets/events_processor.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
from typing import Any, Mapping, MutableMapping

import logging
import _strptime # NOQA fixes _strptime deferred import issue
import logging
tkaemming marked this conversation as resolved.
Show resolved Hide resolved
from typing import Any, Mapping, MutableMapping

from snuba.clickhouse.columns import ColumnSet
from snuba.consumer import KafkaMessageMetadata
Expand Down Expand Up @@ -47,11 +46,15 @@ def extract_custom(
metadata: KafkaMessageMetadata,
) -> None:
data = event.get("data", {})

output["message"] = _unicodify(event["message"])

# USER REQUEST GEO
user = data.get("user", data.get("sentry.interfaces.User", None)) or {}
user = (
data.get(
"user", data.get("sentry.interfaces.User", None) # type: ignore
)
or {}
)
extract_user(output, user)

geo = user.get("geo", None) or {}
Expand Down
28 changes: 17 additions & 11 deletions snuba/datasets/events_processor_base.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
import logging
from abc import ABC, abstractmethod
from datetime import datetime
from typing import Any, Mapping, MutableMapping, Optional, Sequence

from typing_extensions import TypedDict
from typing import Any, cast, Mapping, MutableMapping, Optional, Sequence, TypedDict

from schemas import EventData
from snuba import settings
from snuba.consumer import KafkaMessageMetadata
from snuba.datasets.events_format import (
Expand All @@ -30,7 +29,6 @@

logger = logging.getLogger(__name__)


REPLACEMENT_EVENT_TYPES = frozenset(
[
"start_delete_groups",
Expand All @@ -55,9 +53,10 @@ class InsertEvent(TypedDict):
message: str
platform: str
datetime: str # snuba.settings.PAYLOAD_DATETIME_FORMAT
data: MutableMapping[str, Any]
data: EventData
primary_hash: str # empty string represents None
retention_days: int
search_message: Any


class EventsProcessorBase(MessageProcessor, ABC):
Expand Down Expand Up @@ -116,6 +115,7 @@ def extract_required(

# This is not ideal but it should never happen anyways
timestamp = _ensure_valid_date(
# TODO: Switch to using event["data"]["timestamp"]
datetime.strptime(event["datetime"], settings.PAYLOAD_DATETIME_FORMAT)
)
if timestamp is None:
Expand Down Expand Up @@ -151,7 +151,7 @@ def process_message(
type_, event = message[1:3]
if type_ == "insert":
try:
row = self.process_insert(event, metadata)
row = self.process_insert(cast(InsertEvent, event), metadata)
except EventTooOld:
return None

Expand All @@ -168,6 +168,7 @@ def process_message(
def process_insert(
self, event: InsertEvent, metadata: KafkaMessageMetadata
) -> Optional[Mapping[str, Any]]:

if not self._should_process(event):
return None

Expand All @@ -189,7 +190,7 @@ def process_insert(
self.extract_common(processed, event, metadata)
self.extract_custom(processed, event, metadata)

sdk = data.get("sdk", None) or {}
sdk = data.get("sdk", None) or {} # type: ignore
self.extract_sdk(processed, sdk)

tags = _as_dict_safe(data.get("tags", None))
Expand All @@ -205,9 +206,13 @@ def process_insert(
processed["tags.key"], processed["tags.value"] = extract_extra_tags(tags)

exception = (
data.get("exception", data.get("sentry.interfaces.Exception", None)) or {}
data.get(
"exception",
data.get("sentry.interfaces.Exception", None), # type: ignore
)
or {}
)
stacks = exception.get("values", None) or []
stacks: Sequence[Any] = exception.get("values", None) or []
self.extract_stacktraces(processed, stacks)

processed["offset"] = metadata.offset
Expand All @@ -223,11 +228,12 @@ def extract_common(
metadata: KafkaMessageMetadata,
) -> None:
# Properties we get from the top level of the message payload
data = event.get("data", {})
output["platform"] = _unicodify(event["platform"])

# Properties we get from the "data" dict, which is the actual event body.
data = event.get("data", {})
received = _collapse_uint32(int(data["received"]))

received = _collapse_uint32(data["received"])
output["received"] = (
datetime.utcfromtimestamp(received) if received is not None else None
)
Expand Down