Replies: 4 comments 5 replies
-
My favoured Options are 1 or 2. I think the additional cost is outweighed by the reduction in complexity of implementation for both these options. Option 1 has the advantage of logging additional unauthorised requests. Option 2 has the advantage of being customisable so the logs won't be as bloated with unhelpful/useless data. There is a bit more of a cost associated but at $0.10 per 100,000 events logged that's going to be negligible using targeted logging of only data events. |
Beta Was this translation helpful? Give feedback.
-
Can you explain what cross-account considerations there are in using MP's trails? I think I like option 2 but I want to understand why we'd want to create a trail instead of reusing the one MP has. |
Beta Was this translation helpful? Give feedback.
-
Thanks Matthew for kick starting the discussion . My preferred options are Option 1: Utilise the existing cloud trail on the cloud platform, assuming we possess sufficient permissions to access the bucket. Configure a data event as illustrated in the example provided here: [https://dsdmoj.atlassian.net/wiki/spaces/DE/pages/4121460902/ADR-3+Data+uploader+-+Audit] Option 2: Consider implementing both a data event cloud trail and server-side logging, especially if there are specific use case advantages associated with having server-side logging. |
Beta Was this translation helpful? Give feedback.
-
We are going to proceed with implementing Option 2 - configuring our own Cloudtrail Trail within the data platform account. |
Beta Was this translation helpful? Give feedback.
-
Context
Logging of actions performed on data is critical to the Data Platform service - both to diagnose issues with the service and to ensure we have captured critical information about an incident - data breach or other.
There are two ways in which object level logging can be enabled for an S3 bucket, neither of which is enabled as standard. This page explores these two methods.
Ideally logs should be saved to S3 in JSON (or some other Athena compatible) format, making logs more easily searchable via Athena queries and consistent with our python lambda container logging.
Our accounts sit within the Modernisation Platform and it appears all buckets are setup as standard to log object level events to a central account through a CloudTrail trail. We do not have permissions to interact with any of these logs. There would be some duplication of logging if we are to collate our own object level S3 logs - this is undesirable in terms of cost, efficiency and clarity.
We need to decide on the approach for object level S3 logging in the Data Platform
Options
Setup our own Server Access Logging, saving logs within the data platform AWS account.
Setup our own Cloudtrail Trail, getting event data for only a subset of available api calls (initially PutObject), saving logs within the data platform AWS account.
Use Modernisation Platform’s Cloudtrail logs. Would need to agree cross account permissions to access S3 logs from the data platform account, held within the central modernisation platform bucket.
Server access logging
This method writes log files to a designated target bucket (that must be in the same account as the source bucket). They are space separated text files which can be queried via Athena. It logs every API call to the bucket and cannot be configured to filter certain actions, e.g. PutObject or GetObject.
Cost: the only charge incurred is S3 storage costs of the log files
Pros
Cons
CloudTrail
CloudTrail offers two different methods for logging S3 events:
Create a Trail - This enables customisation of what is logged. E.g. you can log events from specific buckets and for specific API calls. You are also able to pass the logs to CloudWatch and to save log files to a specified S3 bucket.
Create an Event Data Store - This enables the same customisation but does not give access to logs via CloudWatch or save log files to S3. Logs must be queried through CloudTrail lakes or linked to a CloudTrail dashboard.
The limitations of event data stores make a Trail the more suitable CloudTrail option.
Cost: $0.10 per 100,000 data events delivered, plus S3 storage costs for saved log files.
Pros
Cons
A more comprehensive CloudTrail vs Server Access comparison can be seen at Logging options for Amazon S3 - Amazon Simple Storage Service
Test Logging Outputs
I have created a two test log Athena tables (using saved log files) to demonstrate the outputs of each different approach, which both logged the same s3 events.
Cloudtrail log table contains the logs using a CloudTrail trail filtered to only capture PutObject events. 10 rows of data.
Server Access log table contains the logs produced from server access enabled on a source bucket. 225 rows of data.
Grafana Integration
Observability and interrogation of logs is critical.
One element of the plan for our logs is to use Grafana to create visualisations of metrics from the logs.
Both of these options give the ability to save log files queryable by Athena, and Grafana has an Athena plugin available which will make developing monitoring metrics achievable through standard SQL queries to log Athena tables. Query and analyze Amazon S3 data with the new Amazon Athena plugin for Grafana | Grafana Labs
Beta Was this translation helpful? Give feedback.
All reactions