Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Establish SIG OpenXLA #419

Merged
merged 4 commits into from
Jul 25, 2022
Merged

RFC: Establish SIG OpenXLA #419

merged 4 commits into from
Jul 25, 2022

Conversation

theadactyl
Copy link
Contributor

@theadactyl theadactyl commented Jul 13, 2022

RFC: Establish SIG OpenXLA

Status Accepted
RFC # 419
Author(s) Thea Lamkin ([email protected]), Mehdi Amini ([email protected])
Sponsor Thea Lamkin ([email protected])
Updated 2022-07-13

We propose to create SIG OpenXLA to facilitate development of an open, state-of-art ML compiler, built collaboratively with ML hardware & framework developers, using the best of XLA & MLIR.

Objective

OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors.

SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA.

Goals

  • Accelerate industry collaboration around XLA and build a vibrant OSS community.
  • Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors.
  • Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent.
  • Ensure the extraction of XLA from TensorFlow is minimally disruptive to existing users and contributors.
  • Create a product identity with its own brand, website, docs, and communication channels.
  • Discuss establishment of governance outside TensorFlow.

We propose to create SIG OpenXLA to facilitate development of an open, state-of-art ML compiler, built collaboratively with ML hardware & framework developers, using the best of XLA & MLIR.

## Objective

OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors.

SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA. 

## Goals

* Accelerate industry collaboration around XLA and build a vibrant OSS community.
* Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors.
* Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent. 
* Ensure the extraction of XLA from TensorFlow is minimally disruptive to existing users and contributors.  
* Create a product identity with its own brand, website, docs, and communication channels.
* Discuss establishment of governance outside TensorFlow.
@theadactyl theadactyl requested a review from ematejska as a code owner July 13, 2022 20:35
@theadactyl theadactyl changed the title Establish SIG OpenXLA RFC: Establish SIG OpenXLA Jul 13, 2022

* Accelerate industry collaboration around XLA and build a vibrant OSS community.
* Share and receive feedback on the technical direction for OpenXLA and ensure it meets the needs of major users and contributors.
* Set up a new XLA repository or organization with independent build/test, with infra to more easily accept PRs, and that is hardware and framework independent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to evaluate the pros and cons of using an independent Github org also related to the Keras migration experience.

One of the main issue:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this all bout

@bhack
Copy link
Contributor

bhack commented Jul 20, 2022

It could be nice if the new SIGs like this one could adopt and eventually improve then README.md and CONTIRBUTING.md templates.


OpenXLA will be a community-driven and modular open source compiler. It will enable efficient lowering, optimization and deployment of ML models from most major frameworks to any hardware backend notably CPUs, GPUs, and ML ASICs. This work will be done collaboratively with major ML frameworks and hardware vendors.

SIG OpenXLA will focus on creating the OpenXLA project, including the extraction of XLA from TensorFlow into a standalone project. SIG discussions will facilitate coordination around roadmap, design evolution, and new workflows to be created in OpenXLA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have already an idea of what TF folders will be involved in this process?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I hope that we are not going to just mirror folders in TF as with the MHLO repo and we could clearly isolate the components.

Copy link
Contributor

@joker-eph joker-eph Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan on the short-term is to "vendor" OpenXLA inside TensorFlow/third_party, like MLIR was before it moved to LLVM.
Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

MHLO isn't the same setup: it is inside TensorFlow primarily and the standalone repo is a read-only mirror (basically the opposite of vendoring).

The folders involved are:

  • tensorflow/compiler/xla -> will be the new OpenXLA repository root
  • tensorflow/compiler/mlir/hlo -> will move into OpenXLA prior to the split
  • A new "support" repo that will contain the platform abstractions and utilities (tensorflow/core/lib/ and tensorflow/core/platform/ for example, but also some of the profiler runtime).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

I think that this is the most important part. Just with a monolithic approach in third_party we are not going to solve the build invalidation (and some TF breakages) that we experience every day.

The folders involved are:
....

I've recently contributed to TF2XLA with many frictions between OSS and the internal infra.
As this folder is not included in your list are these contributions still going to be done in the TF main repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is the most important part. Just with a monolithic approach in third_party we are not going to solve the build invalidation (and some TF breakages) that we experience every day.

Yes absolutely: this is just an entire different track of work with a different motivation than what motivates OpenXLA right now.
Also, on the topic of build invalidation, LLVM/MLIR will continue to be used in TensorFlow independently of XLA and this won't change. The build invalidation problem will remain an issue for this. I'm not sure what we can do about it though?

I've recently contributed to TF2XLA with tensorflow/build#122 between OSS and the internal infra.

Ouch... these kind of difference between Bazel and the internal Google checks seems really annoying, we should be able to align this though?

As this folder is not included in your list are these contributions still going to be done in the TF main repo?

OpenXLA won't have any dependency on TensorFlow, so the TF/XLA bridge will naturally continue to be part of TensorFlow moving forward.
(regardless of where the code go: the kind of problem you refer to will exist and we should address them!)

Copy link
Contributor

@bhack bhack Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, on the topic of build invalidation, LLVM/MLIR will continue to be used in TensorFlow independently of XLA and this won't change. The build invalidation problem will remain an issue for this. I'm not sure what we can do about it though?

This really depend on your vision about the productization roadmap.
If TF master/nightly will rely on OpenXLA "rolling sha" commits that will rely on LLVM "rolling sha" commits and so we are really not relying on releases, API versioning, etc.. I think it will be really a weak modularization and not so much something that could improve the current status quo.

Some positive side effects could be retreived by disentangling the targets deps graph:
#238

But I think that the main impact is still realated to the OpenXLA own roadmap/vision.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan on the short-term is to "vendor" OpenXLA inside TensorFlow/third_party, like MLIR was before it moved to LLVM. Isolating components would require to design stable APIs, release processes, and upgrade processes: this may happen in the future but it'll take time and it isn't obvious to do at the C++ level.

MHLO isn't the same setup: it is inside TensorFlow primarily and the standalone repo is a read-only mirror (basically the opposite of vendoring).

The folders involved are:

  • tensorflow/compiler/xla -> will be the new OpenXLA repository root
  • tensorflow/compiler/mlir/hlo -> will move into OpenXLA prior to the split
  • A new "support" repo that will contain the platform abstractions and utilities (tensorflow/core/lib/ and tensorflow/core/platform/ for example, but also some of the profiler runtime).

Hi, I have a question about the plan for the existing XLA compiler (based on HLO IR currently) and MHLO (based on MLIR)?

I see you mentioned that MHLO will also be moved into OpenXLA. What’s the relationship between the XLA compiler and MHLO (based on MLIR) in the future? Will the XLA compiler be re-implemented based on MHLO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I have a question about the plan for the existing XLA compiler (based on HLO IR currently) and MHLO (based on MLIR)?

I see you mentioned that MHLO will also be moved into OpenXLA. What’s the relationship between the XLA compiler and MHLO (based on MLIR) in the future? Will the XLA compiler be re-implemented based on MHLO?

Mostly yes: HLO isn't gonna go away anytime soon, but for the current targets publicly supported by XLA (CPU/GPU) we're pledging to use MLIR (and MHLO) end-to-end on the long term and be the preferred way to add new high-level optimizations to XLA. We're also planning to continue developing most of the codegen inside MLIR/LLVM itself (Linalg in particular) and use it inside XLA. This offers opportunities to share large part of it with other projects like IREE for example.

@tdb-alcorn
Copy link

Excited to see this develop!

@ematejska ematejska merged commit 1c34d56 into master Jul 25, 2022
@mihaimaruseac mihaimaruseac deleted the theadactyl-patch-11 branch July 25, 2022 17:58
@sanjoy
Copy link
Contributor

sanjoy commented Jul 25, 2022

Super exciting!

Will OpenXLA be under open governance (i.e. similar to the LLVM model)? Or will it be governed under the TensorFlow / Google umbrella?

@joker-eph
Copy link
Contributor

We touched on this in the RFC, see this section: https://github.com/tensorflow/community/blob/master/rfcs/20220713-sig-open-xla.md#collaboration--governance

We aim to evolve toward a model as-open-as-LLVM in terms of governance. It’ll be a gradual process and we want to consult with the members/contributors to help us define a good governance for the project. This will be an important aspect of the SIG.

@bhack
Copy link
Contributor

bhack commented Jul 25, 2022

@sanjoy Other then this, another governance point that was discussed was the related sub-governance of MHLO:
llvm/torch-mlir#999 (comment)

@burmako
Copy link

burmako commented Jul 25, 2022

As far as MHLO goes, we've been internally working on something called StableHLO - a version of HLO/MHLO that will provide stability guarantees, a specification, a test suite and a reference implementation.

In the near future, StableHLO will be switching to GitHub-first development process - the code will be developed via pull requests, there will be a GitHub-based test suite, GitHub Issues will be used to track the work, and GitHub Discussions / Discord will be used for discussions. We're in the final stages of approvals for all this, and I expect that we'll be able to tell (and show) more shortly.

The overall goal for StableHLO is to create a community to build an amazing portability layer between ML frameworks and ML compilers. HLO/MHLO provide a good foundation, but there are a lot of good ideas beyond that, and I can't wait to start working this all out together.

@tanyokwok
Copy link

As far as MHLO goes, we've been internally working on something called StableHLO - a version of HLO/MHLO that will provide stability guarantees, a specification, a test suite and a reference implementation.

In the near future, StableHLO will be switching to GitHub-first development process - the code will be developed via pull requests, there will be a GitHub-based test suite, GitHub Issues will be used to track the work, and GitHub Discussions / Discord will be used for discussions. We're in the final stages of approvals for all this, and I expect that we'll be able to tell (and show) more shortly.

The overall goal for StableHLO is to create a community to build an amazing portability layer between ML frameworks and ML compilers. HLO/MHLO provide a good foundation, but there are a lot of good ideas beyond that, and I can't wait to start working this all out together.

Then what is the relationship between OpenXLA and StableHLO? @burmako @joker-eph @theadactyl

@JamesTheZ
Copy link

What about JAX? The XLA part will also be extracted out?

@burmako
Copy link

burmako commented Jul 26, 2022

@fortianyou "Then what is the relationship between OpenXLA and StableHLO?". There is a plan for StableHLO to be used as input for XLA, and StableHLO has its roots in HLO which comes from XLA, so I expect that OpenXLA and StableHLO will have a close relationship.

That said, our goal with StableHLO is to build a portability layer between ML frameworks and ML compilers, which means that we will avoid coupling StableHLO with particular compilers, e.g. XLA, so that other compilers could pick it up as well if they are interested.

As we bootstrap StableHLO in the near future, we'll be reviewing which parts of HLO/MHLO can become part of StableHLO right away and which parts are XLA-specific (and should stay internal to XLA or should be generalized before being included in StableHLO).

E.g. should ops like mhlo.fusion be in StableHLO? What about advanced functionality like bounded dynamism - should we make it part of the compiler interface or that should be an implementation detail of XLA? We have done an internal review, so we have some thoughts on all this already, but the whole point of StableHLO is to build a community, so let's discuss together! (Let's just wait until StableHLO is opensourced, which I expect to happen by next week at the latest).

We believe that OpenXLA will be a great forum for these discussions, so we decided that we will be opensourcing StableHLO under OpenXLA's GitHub organization and will be using OpenXLA's Discord server to chat about StableHLO. Hopefully, this answers your question!

@penpornk
Copy link
Member

@wchao1115 FYI.

@joker-eph
Copy link
Contributor

Just to follow up, feel free to subscribe to this repo: https://github.com/openxla/community

We're using GitHub discussions right now, see here the announcement for the first public meeting (next Tuesday): openxla/community#5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.