Allow users to collaborate while using experiment tracking #1218

NeroOkwa · 2022-06-17T12:36:45Z

Description

This is the third highest priority issue resulting from the experiment tracking adoption user research. Users want to be able to write their experiments to storage that is not on their local computer and share their experiments with other team members.

This is important as it enables a team of users to collaborate and see each other's results as they iterate on a pipeline, compared to the experience of being limited to one user's local machine. Hence, this is a deciding factor for the adoption of Kedro Experiment Tracking.

This pain point also came up in the experiment tracking user testing sessions:

"It was easier for me just to put MLflow in the cloud and report there. And I thought that the native Kedro one was still very coupled. I didn't know how to share the database across many data scientists."

Context

What is the problem?

Users can only perform a model run on a local machine, making it difficult to collaborate on a project with the rest of their team because experiment results are on multiple computers.

Additionally, users have raised other concerns:

"I can't store my experiment tracking data on S3, even though we are running everything else on S3."

Who are the users of this functionality?

Users are primarily data scientists, and data engineers are secondary users.

Why do our users currently have this problem?

We designed it this way to launch a simpler experiment tracking in Kedro, even though the predecessor of Experiment Tracking in Kedro (PerformanceAI) had this functionality.

Currently, users can only store their runs on a local machine via SQLite:

"For now, we'll keep it simple, and team members will track only their own experiments."

What is the impact of solving this problem?

It would be possible to view all user experiments across a team in one place and also solve an outstanding adoption issue for Kedro Experiment Tracking:

"If we could write our metrics files to like an S3 bucket and then run experiment tracking pointing at that S3 bucket, that simplifies our workflow in many different ways. It would make Kedro experiment tracking just as easy, if not easier than MLflow for us."

"You might train one model locally on your computer... Having all those experiments in one place as a single source of truth is powerful."

How could we implement this functionality?

Option 1

Open the browser and see it - This involves having a server you can connect your Kedro-Viz and other users' Kedro-Viz to. Then you can share from that server.

Implication: That server introduces many complications and must be running all the time. We also need to decide who hosts the server, us or the user. This is related to the shareable URL work, which is another Kedro-Viz project.

Option 2

Create a mechanism where only data is shared and Kedro-Viz is still running locally but has access to the shared data service - An S3 database for example, which can provide the new data.

Implication: This is easier to implement and wouldn't need our constant support to run the service beyond S3, but users need to run locally vs through the browser.

Option 3

Connecting to other solutions (MLflow and Weights & Biases) - These tools provide this functionality natively so we would be relying on their implementations.

Implication: This would only show experiment tracking data and not the flowchart (which would only be local). We would also have to try and support multiple tooling.

What important considerations do we have?

All options above would require we redesign the backend data model. How do we contain all of the things currently shown on Kedro-Viz into a single database vs always deriving the data from code on Kedro framework?

Our current data model is that we currently read the data directly from the Kedro project. We need to solve the data model first - SQLite store - before considering any of these options.

What other related issues can I read?

This is related to other open issues: #1217, #1039, and #1116

yetudada · 2022-06-20T09:46:10Z

This functionality would help some of the internal teams that would like to use Kedro experiment tracking too; they cannot use it because they cannot configure the storage location.

antonymilne · 2022-06-22T09:19:31Z

Very relevant, @limdauto pointed out this recently: https://fly.io/blog/all-in-on-sqlite-litestream/

NeroOkwa · 2023-01-11T17:09:28Z

Technical Design Discussion on 11/01/2023

The 3 options were evaluated for their advantages and feasibility, and Option 2 was selected as the most feasible.

Option 1

This option would require the team to always be on call for support
An alternative would be a solution in which we build Kedro-Viz in a way that it can be hosted, but hosting and the infrastructure would be managed by the user. Most users may not have the required skills or motivation to do this

Option 2

This was the most feasible solution as we could keep Kedro-Viz locally and focus on a shared data storage
This is supported by the fact that in the past, setting up a bucket or storage location hasn’t been an issue for our users

Option 3

Maintainability might be an issue as we would need to keep up with 3rd party changes
Another point was that this might be an experiment tracking solution if we decide to ‘sunset’ experiment tracking

Next Steps for Option 2

Do a full investigation on how to efficiently store experiment tracking in a bucket - S3, Axure
- Investigate alternative to SQLite like DuckDB
How to consolidate all data in SQLite for Kedro-Viz
Redesign how users configure experiment tracking on Kedro

MatthiasRoels · 2023-01-19T08:37:16Z

Another possibility could be to draw inspiration from what Prefect is doing with their Orion UI. In my opinion, it’s the best of both worlds; you could set it up with a local SQLite db so that you use it locally (just like kedro-viz is currently working). But, you also have the option to set it up with a PostgreSQL backend db and run it as a remote web server. In both cases, the server is able to track metadata of your runs (e.g. DAG, how long each step runs, inputs/outputs generated). This is all metadata already available in kedro at runtime so it should be easy enough to expose it through an API call in a hook!

limdauto · 2023-02-02T14:13:47Z

FWIW what @MatthiasRoels mentioned was the original idea and was also why we bothered with SQLAlchemy in the first place. It should be backend* agnostic. Unless anything has changed recently, literally this is the only place we need to change to enable a different db than sqlite: https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/database.py#L15 -- and make that configurable by end user. I was planning to fork viz to do a PoC for a problem I'm facing at work that will require a different backend than sqlite. I could report some learning back if I get to it.

*: SQLAlchemy-compatible backend, not S3

This PR addresses Issue 3 (#1218) from the user research for Experiment tracking. The update enables users to store their session_store and tracking data in the cloud. Here's an overview of the Collaborative Experiment Tracking implementation: In the settings.py, user specifies the S3 bucket location, which triggers the upload of their session_store.db (a SQLite database) to the cloud using fsspec. This process is executed via the SQLiteStore._upload() function during a Kedro run, when a user creates a new experiment. When users execute Kedro-viz, session_store.db files from all other users are downloaded to the user's local machine through the SQLiteStore._download() function. These downloaded databases are then merged with the user's current local session_store.db through SQLITEStore._merge() function . As a result, the local session_store.db contains not only the user's experiments, but also those conducted by other team members. Every user collaborating on a particular experiment tracking project consistently maintains a copy of everyone's experiments, both locally and in the cloud. This synchronization is achieved through the SQLiteStore._sync function (which basically downloads, merges and uploads the session_store.db)

tynandebold · 2023-06-05T14:51:08Z

Closing this ticket, as we have many others in the works for this feature. Follow along here.

NeroOkwa added the Technical Design label Jun 17, 2022

NeroOkwa self-assigned this Jun 17, 2022

yetudada changed the title ~~Experiment Tracking Adoption: Issue 3 - Providing remote server options for an experiment.~~ Allow users to write experiments to a remote server Jun 23, 2022

This was referenced Sep 5, 2022

Fixing the Experiment tracking set up docs #1042

Closed

Experiment tracking: big open questions #1217

Closed

yetudada mentioned this issue Oct 4, 2022

Shareable URLs for Kedro-Viz #1116

Closed

NeroOkwa mentioned this issue Nov 9, 2022

Exploring Metrics on Experiment Tracking - User Testing Synthesis kedro-org/kedro#1627

Closed

tynandebold transferred this issue from kedro-org/kedro Jan 16, 2023

tynandebold added this to the Collaborative Experiment Tracking milestone Jan 16, 2023

yetudada changed the title ~~Allow users to write experiments to a remote server~~ Allow users to collaborate while using experiment tracking Feb 23, 2023

yetudada moved this from Discovery or Research - Later 🧪 to Delivery - Now ⌛ in Roadmap Feb 23, 2023

yetudada mentioned this issue Feb 23, 2023

Kedro-Viz flowchart timeline implementation brainstorm #903

Closed

merelcht mentioned this issue Mar 8, 2023

Experiment Tracking in Kedro kedro-org/kedro#1070

Closed

rashidakanchwala mentioned this issue Mar 27, 2023

Build collaborative Experiment Tracking feature #1295

Closed

1 task

rashidakanchwala mentioned this issue May 24, 2023

Collaborative Experiment Tracking V 1.0 #1286

Merged

5 tasks

tynandebold closed this as completed Jun 5, 2023

yetudada moved this from Delivery - Now ⌛ to Shipped 🚀 in Roadmap Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow users to collaborate while using experiment tracking #1218

Allow users to collaborate while using experiment tracking #1218

NeroOkwa commented Jun 17, 2022 •

edited by yetudada

Loading

yetudada commented Jun 20, 2022

antonymilne commented Jun 22, 2022

NeroOkwa commented Jan 11, 2023 •

edited

Loading

MatthiasRoels commented Jan 19, 2023 •

edited

Loading

limdauto commented Feb 2, 2023 •

edited

Loading

tynandebold commented Jun 5, 2023

Allow users to collaborate while using experiment tracking #1218

Allow users to collaborate while using experiment tracking #1218

Comments

NeroOkwa commented Jun 17, 2022 • edited by yetudada Loading

Description

Context

What is the problem?

Who are the users of this functionality?

Why do our users currently have this problem?

What is the impact of solving this problem?

How could we implement this functionality?

Option 1

Option 2

Option 3

What other related issues can I read?

yetudada commented Jun 20, 2022

antonymilne commented Jun 22, 2022

NeroOkwa commented Jan 11, 2023 • edited Loading

Technical Design Discussion on 11/01/2023

Option 1

Option 2

Option 3

Next Steps for Option 2

MatthiasRoels commented Jan 19, 2023 • edited Loading

limdauto commented Feb 2, 2023 • edited Loading

tynandebold commented Jun 5, 2023

NeroOkwa commented Jun 17, 2022 •

edited by yetudada

Loading

NeroOkwa commented Jan 11, 2023 •

edited

Loading

MatthiasRoels commented Jan 19, 2023 •

edited

Loading

limdauto commented Feb 2, 2023 •

edited

Loading