[Proposal] Mojo project manifest and build tool #1785

modocache · 2024-02-20T17:30:05Z

modocache
Feb 20, 2024

Hi all, please check out this proposal for a Mojo project manifest and build tool.

As mentioned on the proposal itself, we're looking to hear from the Mojo community:

Do you agree with the motivations and guiding principles in the proposal?
Which project manifest formats and build tools do you love, and why?
Should we adopt the build server protocol?
Should we define the project manifest as an executable program (such as project.mojo)?
Do you have any other thoughts you'd like to contribute?

We are build systems and language tooling nerds, and would love to hear from you! Please comment here to join the discussion -- thanks!

ReXase27 · 2024-02-20T19:47:13Z

ReXase27
Feb 20, 2024

The proposal is exciting and I vote for config as code simply because of the potentially better IDE smartness, as for the format of the manifest anything goes, but yaml and JSON aren't that great in my opinion. TOML seems like a great option as it's super readable, even Apple's new language Pkl looks decent but that is probably overreaching.

An important question I have is for the future package manager, as it defines how we interact with the tool.

I'll give you two examples, Cargo and Swift Package Manager. The reason cargo is so good for dependencies is that you can simply do cargo add <pkg> -F <features...> from the terminal making it a very simple process. SPM is a little more tedious, dependencies are added from within XCode (I think, not a Mac owner) or by editing Package.swift yourself. The difference is that Rust has crates.io, a centralized package index which Swift does not, relying on GitHub or similar.

Honestly, it's important to think this through now and decide if a Cargo approach is more suitable, or something different, such as go get.

Zig and Rust have nailed it with build.zig and build.rs respectively (although build.rs is more uncommon as you often don't need it).

So I'd put Go and Cargo as the examples to follow simply because they are what I'm familiar with. I'd avoid anything like CMake and/or Gradle as they're oftentimes viewed as pain points.

I prefer things to be batteries included, and a build/project.mojo file seems like the right choice.

2 replies

modocache Feb 20, 2024
Author

Thanks a ton for the response and all of these great suggestions! I appreciate you taking the time out.

The reason cargo is so good for dependencies is that you can simply do cargo add <pkg> -F <features...> from the terminal making it a very simple process.

On the topic of convenient commands, I'm definitely anticipating we will implement a few of these. I like cargo init for creating new projects, for example, and I anticipate Mojo will have an analogue that makes it a one-liner to create a new project.

I mentioned dependencies as outside the scope of this proposal, but I think it's clearly something we'll do eventually. At that time, a cargo add-like command is, I think, very nice to have. So, I am definitely in agreement with the spirit of this suggestion!

However, you also mention that you'd like a Mojo program to act as the project manifest, like build.zig -- a project.mojo, or similar. We might have some trouble there, because if you can write an arbitrary Mojo program to define your dependencies, it might be tricky for a cargo add-like command to parse the Mojo program and understand where to add the dependency. It's a little easier with Cargo.toml, because the command knows to add to the [dependencies] section, end of story. Anyway, it's not impossible, but it adds some complexity.

Another comment on this proposal asks about Turing-complete project manifest formats (like build.zig) vs. static configuration language manifests (like Cargo.toml), so I'll add more of my opinions as a reply to that comment soon.

So I'd put Go and Cargo as the examples to follow simply because they are what I'm familiar with. I'd avoid anything like CMake and/or Gradle as they're oftentimes viewed as pain points.

We're definitely going to take inspiration from Go and Cargo. On the CMake/Gradle point, I think I'll just point out that in the proposal, one of our guiding principles is to integrate with these as best as possible. So, I'm anticipating that we'll have a single command, like cargo build, that builds a Mojo project. But, if there are interfaces that we can provide that make it easier to define a CMake function or Bazel rule to include a Mojo project as part of a build graph, then we will add those.

An important question I have is for the future package manager, as it defines how we interact with the tool.

Yup, I definitely think this has far-reaching implications. I really wanted to solicit feedback on the concept of a project manifest and build tool first, before getting into package management, since it's a huge topic. But, it sounds like you're a fan of the proposal, so, that's good common ground to have. Again, another commenter brings up package management and build dependencies, so I'll reply in more detail there.

ReXase27 Feb 21, 2024

However, you also mention that you'd like a Mojo program to act as the project manifest, like build.zig -- a project.mojo, or similar. We might have some trouble there, because if you can write an arbitrary Mojo program to define your dependencies, it might be tricky for a cargo add-like command to parse the Mojo program and understand where to add the dependency. It's a little easier with Cargo.toml, because the command knows to add to the [dependencies] section, end of story. Anyway, it's not impossible, but it adds some complexity.

You make a great point. I think I worded my original idea poorly. Having dependencies in an easy-to-parse format like TOML makes sense. IDE features could even be supported with schemas.

Perhaps it would be best to have two files: one for declaring dependencies, versions, and project metadata (author/s, copyright, etc.), and another, potentially build.mojo, for custom build commands. Like build.rs, this could be optional. I acknowledge this is an idealistic vision and may not be fully feasible.

On the CMake/Gradle point, I think I'll just point out that in the proposal, one of our guiding principles is to integrate with these as best as possible.

Understood. My earlier comment was about CMake and Gradle workflows, which can be less popular than alternatives in other languages. Tooling (formatters, linters, LSP support) is often lacking in those ecosystems. Compatibility with these tools makes sense given Mojo's goals.

Ideally, this proposal would result in cohesive and well-designed tooling. That "batteries included" feel is important to me – good workflows are crucial for language adoption.

Yup, I definitely think this has far-reaching implications. I really wanted to solicit feedback on the concept of a project manifest and build tool first, before getting into package management, since it's a huge topic. But, it sounds like you're a fan of the proposal, so, that's good common ground to have. Again, another commenter brings up package management and build dependencies, so I'll reply in more detail there.

I'm curious if Mojo/Modular is interested in a centralized package index. There are advantages, but also significant costs. This decision would heavily impact build tool design (repositories, versioning, discoverability, etc.).

I'm not familiar with the build server protocol in detail, but at first glance, it seems like it could be a valuable addition. Implementing it might enhance the overall experience. However, I'd defer to those with more experience for a better-informed opinion.

gabrieldemarmiesse · 2024-02-20T20:47:02Z

gabrieldemarmiesse
Feb 20, 2024

Do you agree with the motivations and guiding principles in the proposal?

Yes, agree, especially the support interfacing with other build tools. It's critical for the adoption of Mojo.
I also fully agree with the "written in Mojo" part. I don't see any reason not to since the stdlib will be open-source too. We'll be able to ship improvements there.

Which project manifest formats and build tools do you love, and why?

I dislike:

Make and friends
setup.py
Bazel

I like much more:

Cargo.toml
pyproject.toml
docker compose (if this can be considered a build tool, but I think so)
docker buildx bake (awesome, even if it overlaps a bit with docker compose)

Should we adopt the build server protocol?

I don't have enough knowledge to answer.

Should we define the project manifest as an executable program (such as project.mojo)?

I believe this is one of the most important questions here. Before taking a decision, may I suggest that we gather the pros/cons of this so that the community can make a good decision. Very few people have interacted with multiple build systems. Even fewer have interacted enough to answer the question of having an executable program as the main build system file or not. I'll put here some pros and cons I can see:

Pros of an executable program:

Have full flexibility, can't go higher than that.
Enable complexe pattern when declaring things that might not be possible in json/yaml/etc...
Allow users to avoid duplication of configuration

Pros of a json/yaml/toml... file:

Can be read by someone who doesn't understand the language (readers don't need to understand the stdlib or Mojo basic syntax)
Doesn't need Mojo to be installed to parse the file (might be useful for third-party tools)
The file can be easily shared by the registry when we have the package manager without security concerns
Easy to version (see docker-compose.yaml, it has a version as a main key)
We can provide tools for easy upgrade when there are breaking changes, which is a big plus for innovating in the future. Maybe it's possible with an executable program but it's more complexe.
Less "magic", though this can be avoided by thinking harder about the API (setup.py has a bad API in my opinion).
No bootstraping (if this is an executable program, which tool will build the manifest? :p )

Something interesting that we saw is the python community slowly moving away from setup.py and progressively migrating to pyproject.toml. Some projects still use both. It would be very interesting to know why the move was done.

Something interesting too is that rust uses Cargo.toml as the main configuration manifest, and users are so far very happy with it. It may be because there are escape hatches like build.rs. They really went for the "simple things should be simple, hard things should be possible".

@modocache Since you are a "build systems and language tooling nerd" 😆 I would appreciate to have more takes on the pros/cons of the manifest as an executable program. It would help the community choose better. I'm myself not sure what is preferable here.

Do you have any other thoughts you'd like to contribute?

If we go for a non-executable program as manifest, let's stay as close as possible to the pyproject.toml, without binding ourselves completely to it, obviously.

4 replies

modocache Feb 21, 2024
Author

Thanks for the feedback! I think you nailed a lot of the benefits of Turing-complete vs. static configuration project manifest formats. I'll add a few of my own as well:

Benefits of a Turing-complete project format (for example Package.swift or build.zig)
- The Mojo language server could provide perfect syntax highlighting, autocompletion, and documentation. The Mojo extension for VS Code already activates for files that end in .mojo. Users would be able to hover over build system functions, structs, and traits to read their documentation, for example. (However, I believe this is possible to implement for pre-existing file formats such as .toml or .json as well, provided we had a specific name such as project.mojo.toml or something, and provided the Mojo language server implemented responses to requests for those kinds of files.)
- A project.mojo file would be "just another Mojo program." The Mojo debugger is launching soon, and it could be used to debug such a program. The ability to step through and debug the build system is powerful.
- Several people have brought up build.rs and other "escape hatches" for introducing Turing-complete logic for more complex builds. At the point that such a system becomes necessary, you essentially get two formats: the configuration file format, and the Turing-complete format. Using a Turing-complete project.mojo format from the start ensures you only end up with a single format -- a single way of doing things.
Benefits of using a configuration language format (for example Cargo.toml or pyproject.toml)
- As @bethebunny points out in his comment, when we eventually consider build dependencies and graphs, the execution of a large graph of project.mojo files could require a great deal of computation. There are also security and sandboxing considerations -- let's say some transitive dependency of a project you're building includes arbitrary Mojo code to compute primes or send emails -- should the package manager prevent that from happening? I'll comment more on that post, but, I think there's a great deal of benefit to be had from greatly restricting what can be done in a Mojo project manifest at first, and then very gradually opening that up. Allowing for arbitrary Mojo programs to be executed from day one provides a very powerful, broad interface -- and IMHO it's harder to pare down interfaces than it is to expand them over time.

akirchhoff-modular Feb 21, 2024
Collaborator

@modocache

However, I believe this is possible to implement for pre-existing file formats such as .toml or .json as well, provided we had a specific name such as project.mojo.toml or something, and provided the Mojo language server implemented responses to requests for those kinds of files.

For some formats in VSCode (JSON and YAML at least I believe -- maybe not for TOML or others, haven't checked) I believe VSCode has built-in support for providing completions and documentation from an associated JSON Schema file. So it might not even be necessary to add anything to the Mojo language server -- I think we could just ship a schema with embedded documentation, and sufficient configuration in the Mojo plugin to allow VSCode to identify files for which this schema should be used. Of course, adding support to the language server might allow richer behavior in other ways -- e.g. being able to autocomplete available versions of a package, or something.

gabrieldemarmiesse Feb 21, 2024

From what I see in the answers, I believe the community is leaning more toward rust's solution with something declarative + an escape hatch with a turing complete script if needed. I also like this. What I would recommend if we go in this direction is:

Use toml as the main manifest format. It will be used by 95%+ of projects.
Allow a mojo file as a manifest for complicated builds, this will likely be 5% of projects (and I think those are power users and will accept the loss of some tooling functionality, for example, they'll likely edit the file manualy instead of doing a cargo add ... equivalent).
Provide a tool that will convert the toml into Mojo code, this can help the transition toml -> Mojo, and can also help debugging for very complexe toml files (convert it to Mojo + debug it). People who need to transition from toml to Mojo will likely have very big manifest toml files.

Long story short, toml users will be first-class citizens, and we allow a Mojo file too, but since users who make use of Mojo files have the power to do everything, and are likely power users, it's not necessary to adapt the "optional" tooling for them.

E.g: Let's say that the community makes a tool that will find unused dependencies/imports in the project, by reading the manifest. It's much easier to do this by parsing a toml. This tool can be released as a small binary, which doesn't have Mojo. Users who have a manifest as a Mojo file won't be able to use it, but since they're power users that should be fine.
The support for "simple" Mojo manifest files can be added later.

We'll have a much shorter iteration cycles if we can release functionalities for toml first, especially concerning the tooling.

gryznar Feb 21, 2024

3rd idea is great! Having such tool could be very helpful :)

bethebunny · 2024-02-20T20:51:32Z

bethebunny
Feb 20, 2024
Collaborator

I want to present a specific counterpoint to allowing fully programmatic (and therefore undecidable) dependencies, regardless of how other parts of setup are declared. Dependency resolution is NP-hard, and there's lots of prior art on package managers attempting to tractably deal with this problem:

For any sufficiently large project, in order to update a dependency safely with turing-complete dependency definitions, you need to do the following:

for each set of supported platforms/feature flags/etc:
  for each dependency:
    for each acceptable version of that dependency:
      download that package version in its entirety
      execute its configuration script to list its dependencies
      add those dependencies as dependencies
  run an NP-hard satisfiability solver to generate version bounds for each dependency, if any legal solution exists

in practice this often takes many hours for large projects, and updating even one dependency will result in conflicts where no solution to the dependency resolution exists, and dependencies will need to be "pinned", "overridden", or other packages will need to also be manually updated until the problem has a solution.

I think there's a lot of good discussion to be had later/elsewhere about how to help create valid solutions to version conflicts. However, fully static dependency configuration allows the package repository to store pre-computed dependency version bounds for each version of each package, which allows maintainers to avoid having to walk transitive dependencies at each step, as well as having to pull and execute package versions at all, and the algorithm simplifies to:

for each set of supported platforms/feature flags/etc:
  for each dependency:
    for each acceptable version of that dependency:
      ask the package repository for the transitive dependency version bounds for that dependency
  run the satisfiability solver

I suspect the concerns about config-as-code are more about clarify and simplicity of configuration than actually wanting Turing-complete configurations. For instance if I have a dependency that exists for platforms X and Y but not Z, I want to be able to clearly state that in my config rather than listing 100 slightly different dependencies 3 times, but dependencies shouldn't need to depend on the output of (for example) a network call or random number generator. For that purpose I think a configuration language such as JSonnet might be a better fit than full-scriptability, or a design where the project.mojo file outputs a single structured format like JSON as a static config, and that file can then be a static source-of-truth for dependency resolution.

6 replies

modocache Feb 21, 2024
Author

Semantic versioning is generally a mess, or rather a wishful thinking of control, where in reality most maintainers can't identify breaking changes.

In general I agree. I think that, where semantic versioning is used, tools that verify the delta between two interfaces is semantically versioned properly are a must-have -- for example, something like obi1kenobi/cargo-semver-checks.

That being said, I believe there's utility in having projects specify a range of dependency versions that they function properly with. #1401 is a really interesting concept, but let's say my application depends on library_a at v3.1, and on library_b, which in turn depends on library_a at v3 or greater. I think we'd want the compiler to NOT auto-patch imports in library_b to be import library_a_3, since in this case import library_a (using v3.1) would also work, and auto-patching could lead to a larger binary with two versions of some function being included. (I think auto-patching imports is a good idea, I'm just saying it doesn't render specifying dependency version ranges obsolete -- there are still valid use cases, I think.)

So, assuming we do allow specifying a range of compatible versions for dependencies someday, then I agree that fully programmatic methods of specifying dependencies make dependency resolution difficult. Put another way, set aside the concept of "semantic versioning" for now, and I think it's still the case that programmatic project manifest makes dependency resolution of versions in general more expensive and difficult.

Taken further, processing projects en masse (such as for a project index, analogous to crates.io) in general becomes more computationally expensive when the project manifest format is a Mojo program -- you have to execute the program just to learn basic information about it, like its version or the names of its maintainers.

bethebunny Feb 21, 2024
Collaborator

I haven't spent a ton of time thinking about this, I agree that if you could solve all of the problems with it, allowing multiple versions of the same library would be an amazing option, there seem to be deeply non-trivial issues to resolve. For instance,

I have library A depending on C_1 and library B depending on C_2. I get a C_1.Foo from library A. How do I safely convert it to a C_2.Foo to pass to functions in B?
Library A and library B both depend on a library C that has some global state. For instance, say C sets a global event loop that A and B expect to share, or sets a global allocator instance, or has static initialization which initializes a filesystem cache. How do C_1 and C_2 resolve this? What if they have different expectations about that global state (say the filesystem cache changed file format between C_1 and C_2?)

bethebunny Feb 21, 2024
Collaborator

I like the general shape of the thinking of different kinds of dependencies, eg. "dependency A is a stable dependency, update for minor versions but not major versions", "dependency B is an experimental dependency, allow RC versions + aggressively update", "dependency C is a transitive dependency, I don't care about it just any version that works", etc., vs having one fixed resolution strategy.

mzaks Feb 22, 2024

@bethebunny regarding non-trivial issues, they will appear gradually as you develop the code base, you pull in lib A than you pull in lib B and will be notified that C is a conflict and compiler should figure out if it can find a non breaking version of C which fits both A and B (maybe through tree shaking of C) or it needs to resolve to a side-by-side installation. With side-by-side installation you will end up with two types of Foo which you would need to convert, through casting (potentially the compiler could help with that by evaluating the layout and telling you if it is safe), or by writing mappings, which is tedious, but it's on you to decide how badly you need B and A as dependencies in your project. The good thing is, if the dependencies are transitive and not leaking in third party libraries, they will not leave the boundary and side-by-side installation should juts work.

The global state issue is a tough one, I agree. In this case, we could either hope for static code analysis to find libs which should not be side-by-side installed, or the lib developer explicitly mark the lib as not side-by-side installable in project manifest.

Regarding resolution strategy. When I am developing code gradually, I either say give me the latest or fix version of following lib. If there are conflicts, first put them side-by-side and let me code, but you can also spawn a parallel process to figure out common version, tree shaking strategy etc. I don't have too sit and "watch the paint dry" ;). When the resolution process is done it will provide a report which I can examine. Something like Terraform plan command.

mzaks Feb 22, 2024

let's say my application depends on library_a at v3.1, and on library_b, which in turn depends on library_a at v3 or greater. I think we'd want the compiler to NOT auto-patch imports in library_b to be import library_a_3, since in this case import library_a (using v3.1) would also work, and auto-patching could lead to a larger binary with two versions of some function being included.

@modocache I think in best case the developer defines dependencies they need. The build tool downloads the dependencies and downloads pinned or latest dependencies of dependencies. Everything is done side by side with import patching first. If everything compiles, developer already can proceed with coding and the build tool spawns a dedicated process which plans dependency optimizations, figuring out what is the best possible DAG of dependencies based on the unfolded dependency tree. Hence the folding of library_a_3 and library_a_3_1 can be determined automatically by the compiler. The dependency resolution plan can be stored in a separate file and applied explicitly by developer. It should work in a similar way if the developer wants to check if they can update their dependencies to a newer version. Let the build tool try stuff out and write a plan / report of what does not work and what needs to be fixed. Currently the process is very tedious, as a developer if I want to update a dependency I set a version, run package update, let build tool / compiler run just to see that I have compilation errors, or that my tests break. I am babysitting a process which can be done by a tool periodically, while I am doing something else.

One note regarding the version ranges. I think it is best if the lib developer has an option to define a version as incompatible with previous ones in project manifest, or even incompatible for side-by-side installation. This strategy allows SemVer, CalVer, or even just a monotonically increasing number. Say I have transitive dependencies to A where the versions range from 7 to 23, the build does its magic to check if there are no breaking API changes between 7 and 23. If not it needs to scan through project manifests 23 .. 8 to check if there are no breaking change flags in any of the manifests. And yeah in this case having the project manifest to be defined as code is less ergonomic.

melodyogonna · 2024-02-20T20:51:54Z

melodyogonna
Feb 20, 2024

I think TOML works nicely for the manifest. I have tried Zig's build manifest, it had a lot going on and was confusing to get started with. I think build manifests should be declarative, like Rust has with Cargo.toml, or Node's package.json. It should just describe what dependencies are there and some other meta information, the build system should handle the builds the best way it sees fit. This is also the way Bazel is designed to work. It creates a nice separation of concern, the build system can change internal algorithms without breaking the manifest; in fact, you can replace the build system entirely with something else without touching the manifest.

1 reply

modocache Feb 26, 2024
Author

I understand your point about project manifests that use simple configuration languages, but, to be fair to Zig, I believe its build.zig file is an imperative program that, when executed, defines a declarative representation of a build graph. That is, it too describes dependencies and metadata, and the internal Zig build system does the actual building.

That being said, your point is well taken that project manifests that use configuration languages may be more approachable.

gryznar · 2024-02-20T23:11:48Z

gryznar
Feb 20, 2024

How about taking both? I assume that .toml file would be essential in most cases, but some may require advanced processing. It may be cool to specify build pipeline in .toml based on calling additional scripts (e.g. placed in .\build directory)

1 reply

modocache Feb 26, 2024
Author

I assume that a configuration language-based project format will only go so far, and that choosing it will eventually require an additional, program-based method to complete more complex builds. One thing I was curious about was whether we could go with a single approach of just the program-based manifest; this discussion has definitely brought up a lot of arguments for and against! There will certainly be more to write on this topic soon.

geo-mak · 2024-02-21T01:08:23Z

geo-mak
Feb 21, 2024

Cargo stands out as a prime exemplar of an all-encompassing solution. Its familiarity also makes it an excellent 'starting point'.

While 'build.zig' and 'Package.swift' are functional, integration with CLI poses challenges. Similar to setuptools, I find them uncomfortable to deal with. I frequently use Poetry and appreciate the progress being made in Rye.

In this scenario, advocating for a structure akin to Cargo seems prudent to me. It offers fair tradeoffs that can particularly benefit mojo projects. While there are limitations with static files like TOML, using a CLI is much easier with them, as mentioned in the comments.

Dynamic and programmable solutions are only relevant for heterogeneous and complex build workflows and there are already many solutions in this area.

0 replies

Brian-M-J · 2024-02-21T04:15:23Z

Brian-M-J
Feb 21, 2024

Another feature I'd like to see is being able to easily turn my Mojo project into an extension for Python, and integrate it with Python's package ecosystem. Though this seems really hard with the Python packaging story being a complete mess. To illustrate why, I will name all the Python packaging tools I know of:

pip, venv, virtualenv, pipenv, poetry, hatch, conda, pixi, uv
...plus many others, and I'm sure you can find more by looking up "list of all Python packaging tools".

You shouldn't need an entire website to explain how to manage a project.

Edit: I don't know whether to laugh or cry on reading this:

Building your understanding of Python packaging is a journey. Patience and continuous improvement are key to success.

3 replies

modocache Feb 21, 2024
Author

This is something I'm very interested in as well. I believe Mojo can help improve the Python programming experience as a healthy part of its ecosystem, and I think Python packaging, distribution, and runtime environment management is definitely a big part of how people use Python (indeed, how people develop software written in any language). To truly be helpful, though, I really want to make sure we don't just add "yet another packaging tool" -- as the now-famous xkcd comic goes:

Mojo can import Python modules, and obviously it must determine where to find these modules. We have docs on how Mojo interacts with a person's Python environment here, but it can get quite tricky, in part because of all the tools you describe. As we flesh out the Mojo project manifest, distributing artifacts built from Mojo projects becomes a big part of that, and so the natural question is "how does one distribute a Mojo executable or package that depends on a Python module, such as numpy?" A standardized solution for managing the Python environment for Mojo programs would be desirable -- but should we use an existing system, like Conda, or invent something new?

There are a ton of questions to unpack here, so I'm thinking that we'll tackle these in future proposals, coming soon. For now, I'm reading your comment as generally supportive of what's in the proposal, which is great! There'll definitely be much more to discuss soon.

mzaks Feb 21, 2024

"how does one distribute a Mojo executable or package that depends on a Python module, such as numpy?"

This is also something I was wondering about in my comment. I don't think it is ergonomic to say, here is an executable, but you can't run it until you install python, conda and all dependencies you need. IMHO per default a binary should be self sufficient, opting out for a slim binary should be an explicit choice.

Brian-M-J Feb 22, 2024

@modocache I'm aware that package management is a topic for a later discussion. I wasn't really talking about that.

I was thinking of something like maturin that would automate the tedium of figuring out how to integrate one's Mojo package with the PyPI and Anaconda ecosystems.

Edit: I mean, technically this might fall under "package management" but I'm looking at this from the perspective of completing the bidirectional interop between Mojo and Python.

mkitti · 2024-02-21T06:00:19Z

mkitti
Feb 21, 2024

I've really enjoyed Julia's split (Julia)Project.toml and (Julia)Manifest.toml:
https://pkgdocs.julialang.org/dev/toml-files/#Manifest.toml

The Project.toml describes the immediate dependencies and compatibilities. The Manifest.toml describes all transitive dependencies allowing for reproducible environments.

0 replies

SyseAdmine · 2024-02-21T08:34:30Z

SyseAdmine
Feb 21, 2024

Just a heads up for 2 projects:

https://github.com/cps-org/cps

and

https://github.com/apple/pkl

They want to achieve interoperability between different build systems. They are using an own meta language to describe packages.

Probably worth to have a look at them.

3 replies

ReXase27 Feb 21, 2024

I really like Pkl. While it's new and lacks extensive real-world testing, my initial experience has been very positive. I recognize there will likely be trade-offs with this solution, but it also offers the potential for much-desired flexibility.

modocache Feb 22, 2024
Author

Thanks for the links! Pkl is on my radar, I think if we go with a static configuration language like TOML or JSON then we could look into it. FWIW, Zig started out with .ini files, then moved to their own format, .zon, described here: ziglang/zig#14290

I don't want to flood the Zig repository's issues with backlinks to our discussion here, but, I consider the Zig package manager pull request ziglang/zig#14265 and issue ziglang/zig#943 great background reading. That project does a great job documenting the rationale behind each of their decisions -- something I hope to emulate with these project manifest & build tool discussions.

I hadn't heard of CPS, I will look into it as well. I'll be frank and say that it worries me that this common specification doesn't seem to be used by very many software projects, as far as I can tell...? But I'm being ignorant here; I'll look into it more.

SyseAdmine Feb 22, 2024

CPS is still WIP, mainly written and maintained by M. Woehlke, a Kitware Engineer.
DCBaker, another volunteer their is an Intel Engineer and one of the maintainer of the Meson build system, AFAIK.

There are hopes to replace pkg-config and CMake text files with .cps files at some point - following the discussions in their issues.

Edit:

I found 3 interesting CppCon 2023 talks about CPS:

https://www.youtube.com/watch?v=IwuBZpLUq8Q

https://www.youtube.com/watch?v=s0q6s5XzIrA

https://www.youtube.com/watch?v=ZTjG8fy6Bek

mzaks · 2024-02-21T11:46:07Z

mzaks
Feb 21, 2024

This is great news, open source Mojo based build tool is ❤️‍🔥 .

Few opinions:

I like SwiftPM, but IMHO it is not the way to go. The internal DSL approach has multiple downsides which were already mentioned, one additional point have a look at Package.swift from package manager itself. It is long uses a bit of scripting capabilities, but nothing super fancy everything else is rather verbose repetitive declaration
Project manifest should be boring, not smart, hence declarative is better than imperative
I am a fan of convention over declaration
If you need smart you can have a script to enhance or generate a YAML/TOML/... based project manifest file
I think it would be great to collect requirements for the project manifest file and generally Mojo projects. I think it will dictate how complex the manifest needs to be. e.g. a manifest for a simple lib with just one module or a single CLI app is simple, but what about being able to build just some modules? How does the Python interop works? Say I want to build a Mojo App which depends on Python libs. Should I be able to define the libs? Should I be able to define Python version? Do we want to build a "fat binary" which contains all the dependencies and can run on system which do not have Python installed?
Build Server protocol is a great idea to allow better integration of build system with different IDEs, but complicates stuff

Few ideas for the build tool:

consider tracing of build times
consider multi version support to avoid dependency hell. I wrote my ideas here
debugging the build would be great, I thin kit should be possible to map break points in project manifest to the build script
have a declarative project manifest which allows executing Mojo build scripts, @gryznar had similar idea, additionally it can be done more object oriented way where the Structs in /build module can be referenced by name, think SwiftUI or JavaFX style
add formatting, linting, etc... to the build tool, so that project manifest can reference the formatting and linting configs

0 replies

marcom · 2024-02-21T11:49:43Z

marcom
Feb 21, 2024

Just a quick note: from a security perspective, anything that is in a declarative format (e.g. mojoproject.toml) is much easier to safeguard than fully executable code (e.g. build.mojo).

0 replies

lsh · 2024-02-21T15:35:06Z

lsh
Feb 21, 2024
Collaborator

It's worth keeping in mind that Mojo's goals of solving the "two language problem" and "meeting users where they are" means that it will have to cover certain use cases that other build systems may not have to consider. Off the top of my head, Mojo plans to be used in:

Jupyter Notebooks.
Single file scripts.
Projects where it is not the primary language (and therefore integrate with Bazel/CMake/Buck2/etc).
Projects where it is the primary language but makes significant use of FFI.
More traditional single language projects.

With that context, we can consider some of the tradeoffs that come with a given packaging decision.

Notebooks & Single File Scripts (In Python)

In a notebook, one might install dependencies using a cell like:

!pip install numpy

since there is no way to handle dependencies from within Python itself.

Meanwhile in single file scripts, users are basically out of luck. There is PEP 723 now, which will at least include some metadata for single file scripting use cases, but part of that decision comes from the fact that Python does not ship its build system. The question of how to use that metadata is left up to individual tools in the ecosystem. Neither example even touches build system isolation, so either a user is setting up a virtual environment or wrapping their script in some something like a Docker container.

Since Mojo will ship with a batteries included build system, it should be able to offer something more ergonomic here. That could look something like Julia's Pkg.add. So with a configuration language solution in a notebook or single file script, it could look like (though to stress this syntax/spelling is bikeshed as a way to talk about the user flow):

from buildsystem import Build

Build("""
[dependencies]
max = { git = "...", version = "..." }

[python-dependencies]
requests = { source = "pypy", version = "2.31.0" }
""")

and with an executable based solution it might look like:

from buildsystem import Build, Package, PyPackage
Build(
	deps=[
		Package(name="max", git="...", version="..."),
		PyPackage(name="requests", source="pypy", version="2.31.0")
	]
)

One weakness of the configuration language approach here is that we give up editor support (highlighting and completion) when embedded in a script like this. One could make the argument that a script shouldn't have a long list of dependencies, but that's where the value trade-off comes in.

More traditional projects with FFI

Mojo plans to have first class FFI with C and C++. To have that support, the build system will definitely need a way to feed information about those projects (such as linker paths and arguments). It would also be nice if the build system had the option to specify the source for those dependencies, otherwise we're right back to writing a build script and wrapping it in a Docker container.

Rust tries to handle the FFI case with the build.rs, but arguably not in a particularly clean way. The user basically calls println!() to add arguments to rustc with linker invocations. Zig's build.zig is better off with wrapping this behavior programmatically, but the sharp edges in my experience come from the lack of documentation, lack of editor support, which results in adding packages in the "just Zig code" path to be more work than one would prefer.

When doing serious FFI work, usually a build script will be more than just listing a series of dependencies. Users may want to search for system libraries, check hardware options, etc. Specifying this control flow is at least difficult in a configuration language.

I would also be curious to see what potential manifests look like for this with a configuration language based approach, since I imagine it would be slightly trickier. It could look something like:

[dependencies]
max = {type = "mojo", git = "...", version = "..." }
requests = { type = "python", source = "pypa", version = "..." }
libncurses = { type = "c", git = "...", commit = "..." }

# or

[dependencies]
max = {type = "mojo", git = "...", version = "..." }

[python-dependencies]
requests = { source = "pypa", version = "..." }

[c-dependencies]
libncurses = { git = "...", commit = "..." }

This is another area where the executable based approach would shine, since the various package types would be able to be members of a Variant.

This use case also complicates adding packages. cargo add heavily prioritizes the crates.io package index which only serves Rust projects. Is Modular going to host a package index? If not, how does one specify to add a package from git? What about a Python or C package?

On the flip side, if Mojo is not the primary language, reading a configuration language to ingest Mojo code will probably be easier. This weakness with the executable approach could be semi-alleviated via some sort of build hook system, but it is there nonetheless.

Task Runners

Another place that is at least interesting to explore is how the build system will possibly make use of task runners.

Take code generation as an example (such as generating Vulkan bindings from an xml file). In Rust, this was sometimes handled in build.rs and using some macros pointing to the build target files, but increasingly users make use of the cargo-xtask pattern to handle script and task use cases due to some of the pain points with build.rs.

With an executable based build configuration, this could look something like:

from buildsystem import Build, Task
from .proj import another_task

fn generate_structs() raises:
   ...

Build(
	# ...
	tasks=[
		Task[generate_structs](name="build:types"),
		Task[another_task](name="run:another_task")
	]
)

Some closing thoughts

The above are just several ideas about some of the use cases a build script will be necessary, and are not specific enough to cover the wide variety of user needs. There are tradeoffs that can make some of the above use cases easier, at the disadvantage of others. The Modular team and the community will have to make decisions about which use cases take priority, and what tradeoffs are acceptable.

0 replies

mkitti · 2024-02-21T16:28:14Z

mkitti
Feb 21, 2024

It would be quite neat if you could collaborate with https://prefix.dev/ as a default package manager.

3 replies

Brian-M-J Feb 21, 2024

It would be quite neat if you could collaborate with https://prefix.dev/ as a default package manager.

There's also Astral, the people who made Ruff and now uv.

mkitti Feb 21, 2024

uv, with it's continuation of pip-style package management, is exactly what I want to avoid here. I would much prefer a mojo-forge or perhaps an Anaconda-compatible-clone by Modular. You should still be able to use pip, but its usage should be clearly deprecated. It's really not appropropriate for what needs to be done here.

Brian-M-J Feb 22, 2024

conda already interoperates with pip, so I don't see the problem here. PyPI is still a huge part of the Python ecosystem and is in no way "deprecated".

There are plenty of Python users for whom the Anaconda ecosystem is completely overkill. There is no reason to abandon the main interaction point with third-party packages for most Python users.

P.S.: Adopting uv in pixi

mkitti · 2024-02-21T16:31:33Z

mkitti
Feb 21, 2024

Since notebooks came up, https://github.com/fonsp/Pluto.jl 's integration of Julia's Project.toml and Manifest.toml into the notebook file itself is game changer. The notebook file is itself just plain Julia source code, making it easy to commit the code into source control. Graphs and other artifacts are included in exports only.

Also, it would be great to have a reactive notebook like Pluto.jl and IPyFlow.

0 replies

walter-erquinigo · 2024-02-21T17:38:26Z

walter-erquinigo
Feb 21, 2024
Collaborator

Following @bethebunny opinion, the dependency resolution problem with Turing-complete manifests indeed seems like a burden we might want to avoid for as long as we can. I'm also a bit worried about the future complexities we'll face when we interact with the python package ecosystem, if we ever work on that, striving for simplicity might be key to our success.

Aside from that, I see 3 alternatives for the manifest

TOML-like
Turing-complete Mojo
Sandboxed Mojo or a similar alternative

It seems to me like a nice progression would be 1 -> 3 -> 2, or even 1-> 2. The biggest reason is that starting with 1 leaves a foundation of an ecosystem of manifests that are very simple and easy to deal with, keeping simple things simple. Then, 3 or 2 could be implemented as a backdoor for the more complex cases that will eventually happen.

0 replies

ivellapillil · 2024-02-22T09:05:25Z

ivellapillil
Feb 22, 2024

Whether you agree with the motivations and guiding principles in this proposal.

Absolutely!

Which project manifest formats and build tools you love, and why. We’re drawing inspiration from a broad set of language ecosystems, including Rust, Zig, Swift, and especially Python.

Rust Cargo is a good reference.

Whether to adopt the build server protocol. We think doing so may help with our guiding principle to integrate well into the existing ecosystem of tools.

I think this would be great.

Whether to define the project manifest as an executable program. Analogous to how build.zig and Package.swift are programs that define a project, should we define a project.mojo or similar construct? There are many arguments in favor of doing so, but on the other hand, we see tradeoffs as well, and a purely declarative form could be used.

I support some of the comments above that a TOML based (or even StrictYAML 🏃🏻‍♂️). In most of the cases in my experience, declarative is enough and also declarative formats encourage a standardized approach to builds. Mojo build files would result in projects implementing their own approaches and each project ends up being slightly different.

An added benefit is that we can build simple tools to parse TOML files, while if the build is in Mojo language, we need more complex tools to process it. It increases barrier to entry for simple tools within for example an enterprise setup.

There were only a very few cases in my decades experience where I needed to really debug deeply build systems. So I strongly would support declarative builds, with may be an escape hatch for "special" cases. Existence of such escape hatches indicates to a newcomer to the project that there is something special going on. My experience with Gradle which uses a Turing complete language also has not been that great - I need to worry if there is something "special" happening behind the file. Declarative files come with some restrictions that is really helpful.

Let's optimize for 80% cases - which I strongly believe are simple builds, and therefore benefits from TOML/StrictYAML (YAML has benefit of broader schema support) based build files.

0 replies

River707 · 2024-02-22T10:45:44Z

River707
Feb 22, 2024

There are a lot of really really great threads going on in here. There are a lot of requirements that bubble up from the various avenues in which Mojo gets used, and it's important to keep those in mind in these discussions. For myself, outside of TOML vs Executable vs X vs Y, I'm more interested in what we actually want to achieve:

Easy to read and Easy to write:
This is one of the most important aspects, because it can have a dramatic impact on new users into the ecosystem, and the types of use cases lean towards mojo (nobody wants to draft a new cmake/make/etc. file to write a simple script or even cli). This is also form agnostic, I've googled random "insert language" TOML things just as much or more than I have random CMake/Package.swift/etc. things.
Meet customers where they are:
Ish has a really nice comment upthread about all of the different ways that Mojo can be used and integrated into different projects. We talk a lot about Python interop, but C/C++/etc. interop are just as important and also foundational for a lot of other things Modular (and the community I'd expect) are interested in building. In a lot of these cases as well, it's not a mojo entry point, but linking mojo into something else. There are a lot of dragons in this world, but we need some sensible principles for approaching this and it can't be fully ignored from the beginning.
An amazing tooling experience:
This part is really important to me, and there is a ton we can innovate here: IDE documentation, code completion, build snippets, diagnostics and hints, debugging, etc. Using mojo as the definition mechanism (ala program) means that we can do effectively anything we want, having another form can present limitations for any of the former.

If I were to see that from the beginning, it'd involve using a Mojo API to define things (and have a structured form to communicate with tools, actually distribute things, etc.). I could also see a world mentioned above where we have a TOML-ish thing for the basics and a Mojo API for the escape hatch "I need to do something fun" path. If we were to walk that path, it would be good to have a TOML->Mojo generation from the beginning to be used for debugging/etc. (going from declarative to imperative can help spill out a lot of details in way that is easier to grok)

2 replies

lsh Feb 22, 2024
Collaborator

I agree with the values you laid out in this post. To expand on what I wrote above, there are some questions I think should be answered to shape the design space of the build system.

A few of the comments in this thread seem to allude to some idea of a "common" case. This common case seems fairly fuzzy to me, especially since Mojo doesn't really have a common build case yet.

If we look at Cargo, the "common" case is a Rust project that has a list of dependencies that are all Rust projects, and the main interaction is adding dependencies or bumping dependency versions. One literally can't depend on a Python or C project in a Cargo.toml, only a Rust project that wraps those Python or C projects. I want to push back on the idea that this setup is (or should be) the common case for Mojo.

Do we consider relying on Python projects (or at least having the ability to rely on Python projects) to be a common case?
Similarly, is having a target that allows Python to import one's project a common case?
Is it possibly to specify the Python virtual env a project uses?
It it possible to create the Python virtual env the project uses?
Can one make use of a pyproject.toml to manage Python parts of their build?
Are any of the above cases advanced (and therefore require one to move away from whatever "common" build format people keep alluding to).
How much of the build process should Mojo be able to own?
- For example, Zig build scripts are literally able to replace C build systems and serve as a testing framework for C code even if one doesn't plan to use Zig for any other part of their project.
- Should we be able to specify dependencies that are not written in Mojo?
- Should we theoretically be able to build a Python/C/C++ project without any Mojo code in it using the build system?
Is adding a C or C++ library an "advanced" or "uncommon" use case?
Is adding Mojo to a project that is managed with Bazel or CMake an "uncommon" use case?

Brian-M-J Feb 22, 2024

There are also Python extensions written in Rust, so there needs to be some sort of interop story there too.

ivellapillil · 2024-02-22T11:13:41Z

ivellapillil
Feb 22, 2024

It would be great if there is one-to-one correspondence between terms used in TOML and structs used within the build system. That is:

-- build.toml --

[dependencies]
gtk = {type = "mojo", git = "...", version = "..." }

translates to (assuming dictionary literal, top level expressions, etc):

-- build.mojo --

Dependencies({
        "gtk": Dependency(type="", git="", version="")
    })

There were some ideas in threads above that Mojo files within /build directory given special treatment. I think this would be great as they could be used to then implement custom handlers/plugins for build pipeline. So basically any file within the /build directory can define Terms used within the TOML.

1 reply

mzaks Feb 22, 2024

Yeah I was thinking in the similar direction. Actually Apache Ant allows this kind of stuff:

package com.mydomain;

import org.apache.tools.ant.BuildException;
import org.apache.tools.ant.Task;

public class MyVeryOwnTask extends Task {
    private String msg;

    // The method executing the task
    public void execute() throws BuildException {
        System.out.println(msg);
    }

    // The setter for the "message" attribute
    public void setMessage(String msg) {
        this.msg = msg;
    }
}

can be executed as following:

<?xml version="1.0"?>

<project name="OwnTaskExample" default="main" basedir=".">
  <taskdef name="mytask" classname="com.mydomain.MyVeryOwnTask"/>

  <target name="main">
    <mytask message="Hello World! MyVeryOwnTask works!"/>
  </target>
</project>

Reduce the verbosity of XML and Java and it is an interesting solution to explore.

Sharktheone · 2024-02-22T13:18:16Z

Sharktheone
Feb 22, 2024

I think, the general dependency management should not be made in an executable file.
However Mojo should also have the option to execute build files like build.rs. I think, what'd be also nice is to have the option to have more custom build files that you can specify in the equivalent of Cargo.toml.
As most here, I think, Rust did the right thing with cargo, go's package format is also nice, because it is based around git.
I really hate maven or gradle to build a Java Project, when you have a large build file that is split up into different modules, it becomes quite difficult, to see what it does.
I assum, Mojo is able to do all that better, even it is in an executable file.
Mojo should also don't go the CMake.txt path CMake is also a horrible format / build tool.

However what i really dislike about Cargo or respectively target/ is this:

I had one target directory that was 12GB large, Mojo should really look into, so this doesn't happen.

I also really like you can extend cargo with own executables, mojo should also adopt this. I'm actually not sure if Mojo should go the JavaScript way with NPX. I think NPX is really good for temporary things like creating a new project, however for custom Cargo functionallity it isn't really great.

2 replies

modocache Feb 22, 2024
Author

I really hate maven or gradle to build a Java Project, when you have a large build file that is split up into different modules, it becomes quite difficult, to see what it does.

That's an interesting point, thanks for bringing this up! For what it's worth, I was thinking that it would be nice to at least allow large projects to split up their project manifests. I anticipate most projects will use a single project manifest file (although, as you can see, the debate is still on as to whether that should be a static configuration file or a Mojo program), but for the truly massive projects, the ability to split things up would be helpful. For configuration files, that means a Rust Cargo-style "workspace" concept, and for Mojo programs that means the ability to import other Mojo modules.

Mojo should also don't go the CMake.txt path CMake is also a horrible format / build tool.

Personally, I don't hate CMake, but I think what many people dislike about it is that the CMake language has quirks that make it difficult to work with. For Mojo, I think the only real language options being deliberated in this discussion are static configuration language vs. Mojo, so I think we'll end up with a language that people like :)

However what i really dislike about Cargo or respectively target/ is this [...] I had one target directory that was 12GB large, Mojo should really look into, so this doesn't happen.

I think the target directory contains not only the final build products (executables and libraries), but also debug symbols, data used for incremental compilation, and docs (if they're built) -- and these exist for each build configuration (for example, debug and release). So, it can get pretty big.

The Mojo compiler does make extensive use of caching -- you can check out the .mojo_cache directory on your local filesystem, for example. Thankfully, for now, build products for Mojo programs are relatively small, but we will have to continue to invest to make sure that this remains the case. We could also provide handy tools to prune build output -- for example, if we build for multiple configurations, a command to "delete everything for this configuration I'm no longer planning on using" could be nice.

I also really like you can extend cargo with own executables, mojo should also adopt this.

This is definitely on my loooooong-term roadmap, but, I am pretty certain we will not build plugins into the build tool in any initial releases. I feel it's something we can build affordances for early on, but flesh it out much later with its own proposal.

Sharktheone Feb 23, 2024

I was thinking that it would be nice to at least allow large projects to split up their project manifests.

Splitting the manifest up in different files is most likely also something good. The project I meant, used a custom gradle "module" (or whatever it is called in gradle). The module defined tasks and when you wanted to see where task X defined, you couldn't. That's what I meant, that mojo can definitely do better than gradle.

I think the target directory contains not only the final build products (executables and libraries), but also debug symbols, data used for incremental compilation, and docs (if they're built) -- and these exist for each build configuration (for example, debug and release). So, it can get pretty big.

The problem was that the directory contained also very old build artifacts that were a few months old and with each dependency update it became more and more. The solution would be to delete old artifacts from versions that are already replaced. So when you update mojo, you probably should delete all artifacts, when you update a dependency from version v1.1.2 to v1.1.3 the artifact from v1.1.2 should be automatically deleted.

This is definitely on my loooooong-term roadmap, but, I am pretty certain we will not build plugins into the build tool in any initial releases. I feel it's something we can build affordances for early on, but flesh it out much later with its own proposal.

Yes, it was also more meant as a suggestion probably this can be implemented when Mojo implements macros, since macros like rust's proc-macros also do similar things.

rarebreed · 2024-02-23T02:32:17Z

rarebreed
Feb 23, 2024

Here's my two cents:

Do you agree with the motivations and guiding principles in the proposal?

There definitely needs to be a way to build mojo projects, and a way to specify how to build, package and distribute the build artifact(s). However, as can be seen from the number of replies there's a lot more that needs to be addressed in additional proposals (which I'll cover in the last question)

Which project manifest formats and build tools do you love, and why?

I've got some experience with cargo, poetry, maven, gradle, leiningen, and yarn, and I think cargo is the bar to aim for (with poetry in second). Why?

toml as the specification format
cargo plugins
build.rs (on one hand, it's nice to have custom builds, OTOH, it's a security risk waiting to happen)
workspaces are trivial compared to other languages (parent POM or gradle multiproject anyone?)
publishing is a snap

There are a few shortcomings though that I will discuss in the last point.

Should we adopt the build server protocol?

Reading the docs, while it seems like a worthwhile endeavor, since the standard is still evolving, you might wind up doing a lot of refactoring.

Should we define the project manifest as an executable program (such as project.mojo)?

While toml is a nice specification format for the manifest, I do also really like the concept of the manifest being written in a full turing complete language. Examples of this are clojure for leiningen or kotlin DSL for gradle. Since mojo will (eventually) be dynamic but with strong typing, I would strongly consider having the manifest itself be plain old mojo code. If mojo ever gets a macro system, it should be possible to build a DSL for the specification too.

A long time ago, I used waf as a build system for C++ projects (I think CMake was still in its infancy), which basically was a python script that was executed to perform the building and linking of the project. waf, descended from scons, was orders of magnitude better than Makefiles, and honestly, I kind of preferred it to CMake.

Do you have any other thoughts you'd like to contribute?

As seen from many comments, there's many aspects of a build system that will need a lot more proposals to flesh out.

Dependency Management
- Not just source dependency management, but precompiled object files/archives/libraries too
- I want mojo to have a better dynamic linking story than rust has
- Dependency security auditing (some companies require things like snyk approval)
- Good tracking of duplicate or version conflicts for the dependency resolver (maven enforcer = bad, poetry = good)
- lock files are a good idea
Build profilers (what is taking the longest to build?)
- while not necessarily the responsibility of the build tool, make sure the mojo compiler is fast (rustc truly is too slow)
Hooks/API for test frameworks
- This is one of the shortcomings of cargo/rustc, as the test harness is baked in and hard to customize
Make it easy to create plugins (ie, good documentation)
While not specifically a manifest or a build tool, the modular CLI should become like rustup:
- If mojo will support cross compiles, you'll need a way to specify target triples
- You'll need a way to support different build toolchains (like rust's stable, beta and nightly)
- You'll need a way to install other plugins or components (like rust's clippy, rust-analyzer or rustfmt)

1 reply

modocache Feb 23, 2024
Author

As seen from many comments, there's many aspects of a build system that will need a lot more proposals to flesh out.

This is absolutely the case, and you can expect many more proposals here! I will call out one of your ideas specifically:

While not specifically a manifest or a build tool, the modular CLI should become like rustup

I honestly could not agree more with the idea that Mojo would benefit from a tool like rustup. rustup is, I believe, a gold standard for compiler toolchain managers. I do think it's an open question as to whether the modular executable should be responsible for Mojo compiler toolchain management, but, I will say that you and I definitely agree that a Mojo compiler toolchain management solution ought to exist.

strangemonad · 2024-02-23T19:06:32Z

strangemonad
Feb 23, 2024

I'm very excited to see this coming together and what it represents, not just for mojo, but my hope for what it could also mean for the broader ecosystem that mojo could interact with.

For simplicity, I'm assuming that the key concern with the immediate design is about defining, building, and possibly executing targets (in Blaze/Bazel/Buck terminology, BuildTarget in BSP terminology).

High-level feature requirements

Community and Interop Must not exacerbate (python) packaging ecosystem fragmentation (ideally it can help improve it)
Modular design
Intuitive The happy path should require no thought. Extending should be always be possible but be exposed harmoniously.

Let me elaborate each a little.

1) Community and Interop

This is probably the most important one in my mind. There's lots of different way's I'd want (or need) mojo to interoperate with the existing library and tooling ecosystem.

Future desired use cases
I know this isn't the immediate goal of this design proposal but I think sketching out future interop use cases.

I need to be able to use existing python packages in mojo
I need to be able to use mojo packages in python (hopefully this is a much better experience than consuming C native extensions in python today)
I need to be able to consume existing C / C++ "packages" in mojo. Is there a simple way to do this. Bazel/Buck effectively make me vendor the 3rd party source and rewrite the target build definition. Zig is noteworthy here. Possibly OK if this is limited to only MLIR/Clang/Flang interop.
I need to be able to consume arbitrary binary dependencies. Punting on the "how" to identify compatibility, the stretch goal that comes to mind is being able to handle scipy libs containing fortan compiled code.
Nice to have would be something that could even support or simplify the hoops NumPy and Scipy hav to jump through to build in the first place https://labs.quansight.org/blog/building-scipy-with-flang
Nice to have - I can consume existing rust packages. There's lots of high-quality rust libs out there at this point, it would suck to not be able to leverage them. Maybe it's limited to anything exposed with a C binding, bonus if it's more.

Notable existing tools and workflows
A quick pre-mortem / worse fear. I'm trying to gradually get mojo adopted into an existing code base. A likely place to start is by writing a few high-performance mojo libraries and let the team use them in regular python projects and workflows. Especially with data scientists, there would likely be a lot of friction learning entirely new tools. Maybe we got them comfortable with coda or poetry, there's loads of online examples that just say "pip install"

Nice to have: some ability to drop this in to a pyproject.toml build-system "build backend" PEP517 to ease incremental adoption of mojo into existing codebases. Maturin gets a notable mention here for how it allows Py03 + cargo + python builds to interoperate as far as the consumers of the package are concerned. Pydantic has massive adoption and Pola.rs has growing adoption and folks are none the wiser about the rust underpinnings (except that it runs faster). What's less ideal about maturin is that I don't see myself reaching for it for non-py03 projects e.g. in a larger team and codebase there'd probably be a mix of maturin and poetry/pip-tools/(...pick your other pypa tooling)
Nice to have: less competition for mindshare on the build tooling front. Astral is working on a "cargo for python", maybe there's opportunity to combine forces for some aspects of this?

2) Modular design

I think clear calling out the separation between the different layers of the build system will be really important to meet the ecosystem where it is.

To me, this is less about if you should support BSP and more about defining good abstractions that would allow some level of interop. e.g. one of the great ideas in Bazel/Buck is the concept of a target but you can't easily interoperate with these targets without going all in to one of those systems. LL-build is probably a little too low-level. BSP might be a good starting point but probably still needs to evolve e.g. BuildTargetSources and BuildTargetDeps is probably not quite the right concept. You don't want to end up with an explosion of concepts here e.g. build-time (linking .o) time vs run-time dependencies (dll) but you often find yourself still needing some form of compile-time dependency module signature or require full source code (e.g. .switinterface/switmodule, rust HIR modules, Gradle and Maven have a concept of a "provided jar" expected to be on the jvm classpath at runtime).

More than a build server protocol, I'm probably yearning for something like a build-tool-protocol (BTP). e.g. given a definition of a target, the BTP probably defines something like (requires_build, build, ...)

It's probably a good idea to also follow the dual-use CLI/Library approach. LLVM, Rustc/Cargo and golang are all good examples of the type of community tooling ecosystem that can emerge with you allow programatic access.

3) Intuitive

There's 2 aspects to this, consuming vs extending.

When consuming, give me a standard set of verbs for project lifecycle. I love that I can go to a rust-cargo project or golang project and know what to do. Once you're over the hurdle of learning Bazel/Buck, the regularity of targets either being libs or executables that can be built or run is refreshing. There's only a few build system verbs I need to learn. The "nouns" are all the targets in the current project that I can easily query.

When extending, don't make me learn every nuance of your internal implementation philosophy. Gradle and Maven are probably at the top of the list of bad offenders here. Basic things shouldn't require any extensions (e.g. if there's a crisp definition of a target, simple extensions like building the artifacts for a static doc site could be as simple as defining a few command line invocations). More complex extensions should be possible.

Implementation Concerns

Config vs Code

A few (maybe contrarian) thoughts here.

I'd say start with config for build target definitions. It's always possible to evolve this to full code later. It's probably easier to go from config to code than the other way around (e.g. older python projects that did more complex things in setup.py are harder to migrate to static pyproject.toml). Separate from what you might think about the complexity of kubernetes (the system) I do like the regularity of resources being defined as structs that can be serialized. e.g. I can jump to the code that defines each type of resource, deserialized them and manipulate them programatically.
Config for build target definitions, Code for extensions. If you look at the majority of places where you actually need the full power of Starlark in Bazel/Buck, it's usually when you're writing a new type of target builder that doesn't exist. I wish that separation were more strict. A good chunk of work for keeping builds fast is quickly serializing / deserializing the build target starlark defs. Additionally, a good amount of user confusion comes from thinking they can run arbitrary logic during build target definition evaluation.

Workspace friendly

I know orchestrating workspaces of multiple projects and targets isn't the concern of this design proposal. The only thing I'd like to call out is that whatever is designed here shouldn't preclude the ability to support Blaze/Bazel/Buck, Yarn/NPM, Cargo/Rust style workspaces or Maven-Bom / Gradle multi-module distributions.

Note, I'm not suggesting being prescriptive about mono-repo vs multi-repo cloned and stitched into a workspace vs submodules, merely that wherever the design of this layer of the system lands, it shouldn't prevent later developing a workspace style workflow tool.

0 replies

modocache · 2024-02-23T19:39:20Z

modocache
Feb 23, 2024
Author

Hey all -- first of all let me just say how amazing it is that all of you helped contribute to this discussion. I think of my work on Mojo as a great opportunity, but also a deep responsibility -- to get this stuff right, and to build something that the world needs to have built. All of your input is a really important part of that, so I want to thank you all for taking time out of your day to contribute to the discussion.

I'm still reading through many of your comments, but the primary question I wanted to answer here was "do community members agree that a Mojo-specific project manifest be defined, and a Mojo-specific build tool be developed?" Reading your comments, it seems the answer is a resounding "yes!" I don't believe a single person has explicitly proposed otherwise (for example, that Mojo should adopt Bazel as its standard project manifest format -- if you do feel this way, let us know!).

Let's keep this discussion going until Monday, at which point I'll close this out. More specific proposals will follow, with feedback from this discussion incorporated into them 🚀

0 replies

ryanmiville · 2024-02-24T02:08:35Z

ryanmiville
Feb 24, 2024

Do you agree with the motivations and guiding principles in the proposal?

Absolutely.

Which project manifest formats and build tools do you love, and why?

Like many others, Cargo and TOML are the standouts to me. I also like the Go build tool, but not the go.mod manifest.

Should we adopt the build server protocol?

I'm primarily a Scala developer. BSP was created for, and almost exclusively used by, the Scala community. It solved a particular problem that I think Mojo can and will avoid: many competing build tools.

The biggest thing BSP gave Scala was an LSP that worked for all major build tools. I don't see any value in a Mojo BSP if we have a single build tool.

Should we define the project manifest as an executable program (such as project.mojo)?

I do not like this idea. It enables complexity in the build that is probably not necessary. If it really becomes a pain point, build scripts like build.rs could always be added later. Keeping the manifest in something like TOML will push the community to standardize on a general layout and make it easier to hop in to new codebases. It would also be easier for tools to parse.

Do you have any other thoughts you'd like to contribute?

It may be worthwhile to connect with the folks at Astral about their vision for uv and how much it aligns with yours for Mojo. They released with a pip-compatible API for adoption, but plan for it to become a "Cargo for Python". Of course there are several other Python package managers competing, but since their Ruff tool has been widely adopted, this one may end up standing out. If uv does deliver a Cargo-like experience, and becomes popular, we may be able to ensure an easy migration to Mojo.

2 replies

strangemonad Feb 24, 2024

+1 for seeing if there's potential alignment with astral / uv.

modocache Feb 26, 2024
Author

I'll definitely look into Astral and uv -- thanks for the links!

rkallal · 2024-02-25T19:17:07Z

rkallal
Feb 25, 2024

Should we define the project manifest as an executable program (such as project.mojo)?

Yes - A reduced version of Mojo.

Thinking:

TOML, YAML, JSON, etc ends up being yet another language that the Mojo developer needs to learn. A simple domain specific language (DSL) subset of Mojo removes needing a second dialect that the Mojo developer needs to learn.
The Mojo user experience for source code files, build files, and manifest files (and any other source-like files) should be the same. That experience includes linting, auto formatting, auto completion, and AI code generation. TOML, YAML, and JSON are simple data-only dialects but having to fully support them at the same quality as project.mojo is a cost. These other source-like files generally do not receive the same tooling quality found in the main programming language.
This path might also help the Mojo Compiler team define how EDSLs will work in Mojo.

1 reply

rarebreed Feb 26, 2024

I find the idea of a mojo build file written in mojo itself to be interesting enough to discuss and potentially pursue. I also agree that it could further enhance discussion around additional metaprogramming aspects of mojo itself (i.e., do they want lisp-like macros like rust has, plugins to the compiler like zig, or something different?).

It is a double-edged sword to have the manifest in a standard file format like JSON, YAML or TOML though. On the bad side, it means it needs additional work to create a library for the data format. I also believe it should not be a 3rd party-maintained package. This also begs the question of what mojo should have in its standard library. On the good side, it means it would be a well-maintained library since crucial infrastructure relies on it (and possibly even be part of the standard library).

One thing I very much dislike about YAML (or at least how so many people have abused it), is that everyone tries to turn it into some kind of Turing-complete language, replete with "functions" that gets run by some other tool that reads in and parses the file. It also often requires 3rd party templating to do its work, or some extra tooling to merge many YAML files into a bigger one. At that point of complexity, just use a real language because the whole point of a data-only markup is gone and it's no longer easy to learn. Look at the differences between kubernetes, cloudformation, ansible, salt, etc.

The reality of builds has shown that they do need more than just a data format to specify things. For example:

cargo has build.rs
maven has plugins
npm has build scripts

The only thing I'm a bit concerned about if mojo has "manifest-as-code", and it's a somewhat large concern, is that it can be hard for new people to understand what exactly the build is doing. Effectively, they have to learn a program (that builds the project). The big pro of data-only formats is that they are constrained in what they do, and how they do it. That's the good side. The bad side is that one-size-does-not-fit-all. The reality is that build tools need a way to change or add new behavior to the build system.

For example, gradle can be hard to understand depending on what is going on. The DSL is very powerful, which means you have to learn a new tool just to build your project. While this is true for every build system, cargo has shown how remarkably simple a build system can be. I even argue that rust really isn't that hard to learn once you factor in things like build tooling. The several month learning curve to become proficient in rust for newbies, is balanced by the learning curve cost of learning a gradle-based kotlin project (for any real-world non-trivial project). Gradle might be an egregious example since it tries (unsuccessfully in my opinion) to be a universal build tool for many languages, but the complexity of learning a DSL for a build system is something that needs to be considered.

The other disadvantage of "manifests-as-code", is security. It's hard enough auditing all the dependencies for CVE's. Now throw in the fact that any downstream dependency can, in effect, do whatever it wants during the build process. I don't know if there's a good answer around this. Cargo hasn't found one. Maybe isolate the equivalent project.mojo in a container?

modocache · 2024-02-26T18:26:14Z

modocache
Feb 26, 2024
Author

Once again, thank you all for the great discussion! I'll close this discussion out for now -- there will be much more to share here soon, including additional proposals and discussions. Thanks!

0 replies

[Proposal] Mojo project manifest and build tool #1785

Replies: 25 comments · 32 replies

modocache Feb 20, 2024 Author

Do you agree with the motivations and guiding principles in the proposal?

Which project manifest formats and build tools do you love, and why?

Should we adopt the build server protocol?

Should we define the project manifest as an executable program (such as project.mojo)?

Pros of an executable program:

Pros of a json/yaml/toml... file:

Do you have any other thoughts you'd like to contribute?

modocache Feb 21, 2024 Author

akirchhoff-modular Feb 21, 2024 Collaborator

bethebunny Feb 20, 2024 Collaborator

modocache Feb 21, 2024 Author

bethebunny Feb 21, 2024 Collaborator

bethebunny Feb 21, 2024 Collaborator

modocache Feb 26, 2024 Author

modocache Feb 26, 2024 Author

modocache Feb 21, 2024 Author

modocache Feb 22, 2024 Author

lsh Feb 21, 2024 Collaborator

walter-erquinigo Feb 21, 2024 Collaborator

lsh Feb 22, 2024 Collaborator

modocache Feb 22, 2024 Author

modocache Feb 23, 2024 Author

High-level feature requirements

1) Community and Interop

Replies: 25 comments 32 replies

modocache Feb 20, 2024
Author

modocache Feb 21, 2024
Author

akirchhoff-modular Feb 21, 2024
Collaborator

bethebunny
Feb 20, 2024
Collaborator

modocache Feb 21, 2024
Author

bethebunny Feb 21, 2024
Collaborator

bethebunny Feb 21, 2024
Collaborator

modocache Feb 26, 2024
Author

modocache Feb 26, 2024
Author

modocache Feb 21, 2024
Author

modocache Feb 22, 2024
Author

lsh
Feb 21, 2024
Collaborator

walter-erquinigo
Feb 21, 2024
Collaborator

lsh Feb 22, 2024
Collaborator

modocache Feb 22, 2024
Author

modocache Feb 23, 2024
Author