All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Toolchain updated to
nightly-2024-12-26
- Core changes from the Private Datasets epic (kamu CLI
0.219.1
)
- Telemetry-driven fixes in flow listings (kamu CLI
0.217.3
)
- Batched loading of flows and tasks (kamu CLI
0.217.2
)
- Extend database config (kamu CLI
0.217.1
)
- Add missing
RemoteStatusServiceImpl
service to catalog
- Env var and flow API changes (kamu CLI
0.217.0
)
- Flight SQL authentication (see kamu-data/kamu-cli#1012)
/verify
endpoint hot fix (kamu CLI0.215.1
)
- Flow configuration separation (kamu CLI
0.215.0
)
- Improved FlightSQL session state management (kamu CLI
0.214.0
)
- Regression in FlightSQL interface related to database-backed
QueryService
- Less aggressive telemetry for key dataset services, like ingestion (kamu CLI
0.213.1
)
- Eliminated regression crash on metadata queries
- Upgrade kamu-cli version to
0.213.0
- Upgrade to
datafusion v43
- Upgrade to
alloy v0.6
- Planners and executors in key dataset manipulation services
- Upgrade to
- Environment variables are automatically deleted if the dataset they refer to is deleted.
- Upgrade kamu-cli version to
0.211.0
:- Dataset dependency graph is now backed with a database, removing need in dependency scanning at startup.
- Upgrade kamu-cli version to
0.210.0
:- Improved OpenAPI integration
- Replaced Swagger with Scalar for presenting OpenAPI spec
kamu-api-server
: error if specialized config is not found- Separated runtime and dynamic UI configuration (such as feature flags)
- Upgrade kamu-cli version to
0.208.1
(minor updates in data image)
- Introduced
DatasetRegistry
abstraction, encapsulating listing and resolution of datasets (kamu-cli version to0.208.0
):- Registry is backed by database-stored dataset entries, which are automatically maintained
- Scope for
DatasetRepository
is now limited to supportDatasetRegistry
and in-memory dataset dependency graph - New concept of
ResolvedDataset
: a wrapper aroundArc<dyn Dataset>
, aware of dataset identity - Query and Dataset Search functions now consider only the datasets accessible for current user
- Core services now explicitly separate planning (transactional) and execution (non-transactional) processing phases
- Similar decomposition introduced in task system execution logic
- Batched form for dataset authorization checks
- Ensuring correct transactionality for dataset lookup and authorization checks all over the code base
- Passing multi/single tenancy as an enum configuration instead of boolean
- Renamed outbox "durability" term to "delivery mechanism" to clarify the design intent
- Upgrade kamu-cli version to
0.207.3
(Outbox versions)
- Upgrade kamu-cli version to
0.207.1
- Correct image version
- Upgrade kamu-cli version to
0.207.0
- Upgrade kamu-cli version to
0.206.5
- Upgrade kamu-cli version to
0.206.3
:- GraphQL: Removed deprecated
JSON_LD
in favor ofND_JSON
inDataBatchFormat
- GraphQL: In
DataBatchFormat
introducedJSON_AOS
format to replace the now deprecated JSON in effort to harmonize format names with REST API
- GraphQL: Removed deprecated
- GraphQL: Fixed invalid JSON encoding in
PARQUET_JSON
schema format when column names contain special character - Improved telemetry for dataset entry indexing process
- Corrected recent migration related to outbox consumptions of old dataset events
- Upgrade kamu-cli version to
0.206.1
:DatasetEntryIndexer
: guarantee startup afterOutboxExecutor
for a more predictable initialization- Add
DatasetEntry
'is re-indexing migration
- Add
- Introduced OpenAPI spec generation
/openapi.json
endpoint now returns the generated spec/swagger
endpoint serves an embedded Swagger UI for viewing the spec directly in the running server- OpenAPI schema is available in the repo
resources/openapi.json
beside its multi-tenant version
- Added endpoint to read a recently uploaded file (
GET /platform/file/upload/{upload_token}
)
- Upgrade kamu-cli version to
0.205.0
:- Simplified organization of startup initialization code over different components
- Postgres implementation for dataset entry and account Re-BAC repositories
DatasetEntry
integration that will allow us to build dataset indexing- Added REST API endpoint:
GET /info
GET /accounts/me
GET /datasets/:id
- Upgrade kamu-cli version to
0.203.1
:- Added database migration & scripting to create an application user with restricted permissions
- Support
List
andStruct
arrow types injson
andjson-aoa
encodings
- Upgrade kamu-cli version to
0.202.0
:- Major dependency upgrades:
- DataFusion 42
- HTTP stack v.1
- Axum 0.7
- latest AWS SDK
- latest versions of all remaining libs we depend on
- Outbox refactoring towards true parallelism via Tokio spanned tasks instead of futures
- Major dependency upgrades:
- Re-enabled missing optional features for eth, ftp, mqtt ingest and JSON SQL extensions
- Failed flows should still propagate
finishedAt
time - Eliminate
span.enter
, replaced with instrument everywhere
- REST API: New
/verify
endpoint allows verification of query commitment
- Upgrade kamu-cli version to
0.201.0
:- Outbox main loop was revised to minimize the number of transactions
- Detecting concurrent modifications in flow and task event stores
- Improved and cleaned handling of flow abortions at different stages of processing
- Revised implementation of flow scheduling to avoid in-memory time wheel
- Added application name prefix to Prometheus metrics
- API Server now exposes Prometheus metrics
- FlightSQL tracing
- Oracle Provider Prometheus metrics names changed to conform to the convention
- Oracle Provider: Updated to use V2
/query
REST API - Oracle Provider: Added ability to scan back only a certain interval of past blocks
- Oracle Provider: Added ability to ignore requests by ID and from certain consumers
- Identity config registration bug that prevented response signing from working
- REST API: The
/query
endpoint now supports response proofs via reproducibility and signing (#816) - REST API: New
/{dataset}/metadata
endpoint for retrieving schema, description, attachments etc. (#816)
- Upgrade kamu-cli version to
0.199.2
- Hot fixes in persistent Tasks & Flows
- Upgrade kamu-cli version to
0.199.1
- Persistent Tasks & Flows
- Database schema breaking changes
- Get Data Panel: use SmTP for pull & push links
- GQL api method
setConfigCompaction
allows to setmetadataOnly
configuration for both root and derived datasets - GQL api
triggerFlow
allows to triggerHARD_COMPACTION
flow inmetadataOnly
mode for both root and derived datasets
- Critical errors were not logged due to logging guard destroyed before the call to tracing
- Upgrade kamu-cli version to
0.198.2
- ReBAC: in-memory & SQLite components
- Smart Transfer Protocol: breaking changes
- Upgrade kamu-cli version to
0.198.0
(address RUSTSEC-2024-0363)
- Add missed
ResetService
dependency
- Upgrade kamu-cli version to
0.197.0
- Missing initialization issue for outbox processor
- Upgrade kamu-cli version to 0.195.1 (DataFusion 41, Messaging outbox)
- Upgrade kamu-cli version to 0.194.0 and add
DatasetKeyValueSysEnv
service if encryption key was not provided
- Upgrade kamu-cli version to 0.191.5 and add init of new
DatasetKeyValueService
in catalog
- Exposed new
engine
,source
, andprotocol
sections in theapi-server
config (#109)
- Dropped "bunyan" log format in favor of standard
tracing
JSON logs (#106)
- The
oracle-provider
now exposes Prometheus metrics via/system/metrics
endpoint (#106) - All apps now support exporting traces via Open Telemetry protocol (#106)
- The
api-server
now support graceful shutdown (#106) - All apps now support
/system/health?type={liveness,readiness,startup}
heath check endpoint using Kubernetes probe semantics (#106)
- Make dataset env vars encryption key optional
- Upgraded to kamu
0.191.4
- Upgraded to kamu
0.191.3
- Integrated the
DatasetEnvVars
service that allows configuring custom variables and secrets to be used during the data ingestion
- Upgraded to new
rustc
version and some dependencies - Upgraded to kamu
0.191.2
- Regression where oracle provider stopped respecting the
block_stride
config
- Upgraded to kamu
0.189.7
which fixes the operation of SmTP along with database transactions
- Integrating modes of RDS password access
- Upgraded to kamu
0.188.3
which is fixing file ingestion feature
- Upgraded to kamu
0.188.1
that includes a fix for transactions getting "stuck" in data queries
- Fixed invalid REST response decoding by
oracle-provider
- Fixed invalid REST request encoding by
oracle-provider
- Upgraded to kamu
0.188.1
- Improve
oracle-provider
:- Dataset identity support
- SQL errors and missing dataset handling
- Reproducibility state support
- Upgraded to kamu
0.188.0
- Oracle provider was migrated from deprecated
ethers
toalloy
crate - Upgraded to kamu
0.186.0
- Upgraded
kamu
from0.181.1
to0.185.1
(changelog)
- Hotfix: upgrade to Kamu CLI v0.181.1 (dealing with unresolved accounts)
- HTTP API: add
/platform/login
handler to enable GitHub authorization inside Jupyter Notebook
- Fix startup: correct config parameter name (
jwt_token
->jwt_secret
)
- Upgraded
kamu
from0.177.0
to0.180.0
(changelog) - Read settings from config file, absorb:
--repo-url
CLI argument- environment variables used for configuration
- Introduced new
kamu-oracle-provider
component which can fulfil data requests from any EVM compatible blockchain, working in conjunction withOdfOracle
contracts defined inkamu-contracts
repository
- Upgraded
kamu
from0.176.3
to0.177.0
(changelog) - CI improvements:
- use
cargo-udeps
to prevent the possibility of using unused dependencies - use
cargo-binstall
to speed up CI jobs
- use
- Missing compacting service dependency
- Synchronized with latest
kamu-cli
v0.176.3
- Fixed startup failure by missed DI dependency
- The
/ingest
REST API endpoint also supports event time hints via odf-event-time header
- Removed paused from
setConfigCompacting
mutation - Extended GraphQL
FlowDescriptionDatasetHardCompacting
empty result with a resulting message - GraphQL Dataset Endpoints object: fixed the query endpoint
- OData API now supports querying by collection ID/key (e.g.
account/covid.cases(123)
)
- Fixed all pedantic lint warnings
- Fixed CI build
- Updated to
kamu v0.171.2
to correct the CLI push command in the Data access panel
- Updated to
kamu v0.171.1
to correct the web link in the Data access panel
- Updated to
kamu v0.171.0
to put in place endpoints for the Data Access panel
- Enable local FS object store for push ingest to work
- Made number of runtime threads configurable
- Incorporate FlightSQL performance fixes in
kamu v0.168.0
- Incorporate FlightSQL location bugfix in
kamu-adapter-flight-sql v0.167.2
- Incorporate dataset creation handle bugfix in
kamu-core v0.167.1
- Changed config env var prefix to
KAMU_API_SERVER_CONFIG_
to avoid collisions with Kubernetes automatic variables
- Support for metadata object caching on local file system (e.g. to avoid too many calls to S3 repo)
- Support for caching the list of datasets in a remote repo (e.g. to avoid very expensive S3 bucket prefix listing calls)
- OData adapter will ignore fields with unsupported data types instead of crashing
- Experimental support for OData protocol
- Updated to
kamu v0.165.0
to bring in flow system latest demo version
- Updated to
kamu v0.164.0
to bring in new REST data endpoints
- Introduced a
ghcr.io/kamu-data/kamu-api-server:latest-with-data-mt
image with multi-tenant workspace
- Updated to
kamu v0.162.1
to bring in more verbose logging on JWT token rejection reason
- Startup crash in Flow Service that started to require admin token to operate
- Updated to
kamu v0.162.0
- Upgraded Rust toolchain and minor dependencies
- Synced with
kamu
v0.158.0
- Upgraded to major changes in ODF and
kamu
- Push ingest API
- Introduced a config file allowing to configure the list of supported auth providers
- FlightSQL endpoint
- Integrated multi-tenancy support: authentication & authorization for public datasets
- Keeping a CHANGELOG
- Integrated latest core with engine I/O strategies - this allows
api-server
run ingest/transform tasks for datasets located in S3 (currently by downloading necessary inputs locally)