Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nim Roadmap 2025 and beyond #556

Open
1 task
Araq opened this issue Dec 4, 2024 · 19 comments
Open
1 task

Nim Roadmap 2025 and beyond #556

Araq opened this issue Dec 4, 2024 · 19 comments

Comments

@Araq
Copy link
Member

Araq commented Dec 4, 2024

Life has a melody, Gaius. A rhythm of notes that become your existence once
they're played in harmony with God's plan. Come. See the face of the shape of things to come.
-- "6" from Battlestar Galactica

NOTE: This plan covers 2025 and beyond as it is very ambitious.

Nil/not nil

The plan for this feature changed again and it is now:

  • ref/ptr T: Remains to be unchecked so that it is backwards compatible with every Nim version ever released. But eventually this will be written as unchecked ref/ptr T.
  • ref/ptr T not nil: Checked at compile-time (we need to fix the bugs in the implementation).
  • nil ref/ptr T: Can be nil and it is checked at compile-time that every deref operation is within a guard like if x != nil.

This feature is not scheduled for any particular release.

Version 2.4

- [ ] Enables strict "definite assignment" analysis which enforces at compile-time that variables have been initialized.

  • Offers an experimental "type-bound operations" mode.

Version 3

Version 3 will be achieved via a combination of compiler phase rewrites, code reuse, refactorings and porting of compiler code. The primary goal is to finally give us a Nim that offers:

  1. Incremental recompilations.
  2. No forward declarations for procs and types required.
  3. Allow for explicit cyclic module dependencies.
  4. Type-checked generics.
  5. Avoid the phase ordering problems that plagued Nim for a long time: Destructors
    and other =hooks can be invoked before they have been synthesized successfully
    which is hard for users to understand.

Implementation

The implementation will use NIF for everything: It is the data format used for communication between the different compiler phases.

Internally most (if not all) phases work on streams of NIF tokens, no tree constructions are required. It is expected that this reduces the amount of memory allocations the compiler has to perform by an order of magnitude.

Arguably a token stream enforces a principled approach to compiler development where by design subtrees cannot be forgotten to be traversed and handled. Roughly a NIF token corresponds to a PNode in the old compiler. A NIF token takes 8 bytes in total, a PNode of today's compiler takes 40 bytes. Therefore it is expected that the new compiler takes 5 times less memory than the current compiler. Since precompiled modules are loaded lazily, this factor should be even higher.

The phases of compilation are:

  1. Pure parsing (nifler): Turn Nim code into a dialect of NIF.
  2. Semantic checking phase 1 (nimony): symbol lookups, type checking, template&macro expansions.
  3. Semantic checking phase 2 (nimony): Effect inference.
  4. Iterator inlining (lowerer).
  5. Lambda lifting (lowerer).
  6. Inject derefs (and the corresponding mutation checking) (lowerer).
  7. Inject dups (lowerer).
  8. Lower control flow expressions to control flow statements (elminate the expr/nkStmtListExpr construct) (lowerer).
  9. Inject destructors (lowerer).
  10. Map builtins like new and + to "compiler procs" (lowerer).
  11. Translate exception handling (lowerer).
  12. Generate NIFC code (gear3).

These phases have been collected into different tools with dedicated names.

NIF

NIF is a general purpose text format designed for compiler construction. Think of it as a "JSON for compilers". NIF has a short precise specification and a Nim library implementation.

While NIF is almost a classical Lisp, it innovates in these aspects:

  1. It uses a separate namespace for tags ("builtins") and source code identifiers. It is most extensible and supports a wide range of abstraction levels. Code that is very high level can be represented effectively as well as code that is close to machine code.
  2. Line information is carried around during all phases of compilation for easy debugging and code introspection tools. The line information is based on the difference between a parent and its child node so that the resulting text representation is kept small.
  3. Declaration and use of a symbol are clearly distinguished in the syntax allowing for many different tasks to be implemented with a fraction of the usual complexity: "find definition", "find all uses" and "inline this code snippet" are particularly easy to implement.
  4. There is an additional format called Nif-index that allows for the lazy on-demand loading of symbols. This is most essential for incremental compilations.

Status: Implemented in 2024. No further changes anticipated.

Nifler

The Nifler tool encapsulates the initial Nim-to-NIF translation step and is generally useful for other tools that want to process Nim code without importing the Nim compiler as a library.

Nifler can also evaluate Nim's configuration system, the nim.cfg files and the NimScript files so that tools like nimsuggest get precise --path information and configuration settings without having to import the Nim compiler as a library.

Status: Implemented in 2024. Minor adjustments required.

Nimony

The primary point of Nifler is to shield "Nimony" from Nim's compiler internals. Nimony is a new frontend for Nim, designed from day one for:

  • Low memory consumption.
  • Incremental compilation.
  • Tooling. The compiler continues after an error and supports "find all usages" and "goto definition"
    which work much more reliably since generics and macros are type-checked too.
  • Efficient handling of code bases that make heavy use of generics, type computations and macros.
  • Reducing the bug count by orders of magnitute. A "Minimal Redundancy Internal Representation" is used, ensuring that there is only one access path to a piece of data. The internally used data structures cannot get out of sync.

Status: In heavy development.

Lowerer

The lowerer's job is to "lower" high level Nim code to low level Nim code that does not use features such as closures, iterators and automatic memory management. It is planned to only support Nim's ARC/ORC scheme of doing memory management. In the old compiler ARC/ORC was very complex to support as the problem was not as well understood as it is now: A key insight here is to split up the tasks into multiple well-defined subtasks:

  1. Inject dups/copies: This can produce weird constructs like while (;let tmp = f(); tmp.value).
  2. Lower control flow expressions to statements. This means if and case do not produce values anymore.
  3. Inject destructors: Now that values have been bound to temporaries explicitly and the control flow has been simplified it is rather easy to inject =destroy(x) calls at scope exists.

As previously mentioned, the lowerer also does:

  • Map builtins to Nim's runtime.
  • Iterator inlining.
  • Eliminate closures by performing "lambda lifting".
  • Inject pointer derefs and implement "pass by reference".
  • Translate exception handling constructs to NIFC's supported error handling.

Status: In heavy development.

Expander

The expander ("Gear 3") performs backend tasks that need to operate on multiple NIF files at once:

  • It copies used imported symbols into the current NIF file. As a fix point operation
    until no foreign symbols are left.
  • importc'ed symbols are replaced by their .c variants.
  • importc'ed symbols might lead to (incl "file.h") injections.
  • Nim types must be translated to NIFC types.
  • Types and procs must be moved to toplevel statements.

Status: Implemented in 2024. Major adjustments expected.

NIFC: C/C++ Backends based on NIF

NIFC is a dialect of NIF designed to be very close to C. Its benefits are:

  • NIFC is easier to generate than generating C/C++ code directly because:
    1. It uses NIF's regular syntax.
    2. It allows for an arbitrary order of declarations without the need for forward declarations.
  • NIFC improves upon C's quirky array and pointer type confusion by clearly distinguishing
    between array which is always a value type, ptr which always points to a
    single element and aptr which points to an array of elements.
  • Inheritance is modelled directly in the type system as opposed to C's quirky type aliasing
    rule that is concerned with aliasings between a struct and its first element.
  • NIFC can also produce C++ code without information loss because inheritance and exception handling are directly supported.

Status: Implemented in 2024. Bugfixes required.

@Araq Araq pinned this issue Dec 4, 2024
@Araq Araq mentioned this issue Dec 4, 2024
18 tasks
@d4h0
Copy link

d4h0 commented Dec 4, 2024

ref/ptr T: Remains to be unchecked so that it is backwards compatible with every Nim version ever released.

Sad to read that. Why not have some kind of "legacy" mode for code that can't be updated?

All ref/ptr T would be exposed to non-legacy code as nil ref/ptr T, so that we can at least be sure that the non-legacy code is correct.

Also, wouldn't it be possible to automatically convert most ref/ptr T to nil ref/ptr T, with the compiler complaining about places without guards? If so, wouldn't that make it relatively easy to upgrade "nil-unsafe" code to "nil-safe" code?

To me, code correctness seems more important than being backwards-compatible with old, unmaintained code. I also believe that a strong focus on correctness would be better for the growth of the Nim ecosystem (see Rust).

I'd love something that is focused on correctness as Rust, but is as expressive as Nim. Obviously, there would need to be compromises, but having Nil in regular code just for backwards-compatibility shouldn't be one.

@Araq
Copy link
Member Author

Araq commented Dec 4, 2024

Sad to read that. Why not have some kind of "legacy" mode for code that can't be updated?

Sure but we need to be able to write code that works with all sort of Nim versions and my proposed way accomplishes that. Changing the default via a mode-switch will come too, but it is much harder to pull off in practice, mostly for bootstrapping reasons.

@d4h0
Copy link

d4h0 commented Dec 4, 2024

Changing the default via a mode-switch will come too, but it is much harder to pull off in practice, mostly for bootstrapping reasons.

And is it the goal to have Nil-safety as the default in the long run?

I think, most of the benefits of such features are only gained if they are the default.

For example, pattern matching can be implemented in libraries in Nim, but because pattern matching isn't used by a significant amount of the ecosystem, it's not remotely as useful as in languages that have it natively.

@Araq
Copy link
Member Author

Araq commented Dec 4, 2024

And is it the goal to have Nil-safety as the default in the long run?

Yes, sure.

@Zectbumo
Copy link

Zectbumo commented Dec 4, 2024

For legacy code couldn't we default to nil ref/ptr T and have any unchecked deref be a compile warning?

@Araq
Copy link
Member Author

Araq commented Dec 4, 2024

I think we continue to need unchecked ref/ptr T for good but we can write it with the explicit unchecked in the long run.

@Clonkk
Copy link

Clonkk commented Dec 5, 2024

Offers an experimental "type-bound operations" mode.

Could you expand on that ? I assume this is related to #380 . Do you have an idea of which form it's going to take

@Araq
Copy link
Member Author

Araq commented Dec 6, 2024

It's already available via --experimental:typeBoundOps.

@d4h0
Copy link

d4h0 commented Dec 6, 2024

Yes, sure.

That's fantastic.

I observe Nim already for more than 5 years, and really love many things about it. Unfortunately, there are also a few reasons why I don't use it at the moment.

It seems, most of these issues will be addressed in the near future, which is great.

For me the ideal situation would be Nim + more compile-time guarantees + sum types + pattern matching + better tooling (which all seems to be planned) + an ergonomic (ideally, automatic) way to use Rust libraries.

With the new architecture described above, is there any realistic chance that there will be a new backend that targets the Rust compiler?

If not, is there any chance that Nim will natively support crABI once it is implemented/stabilized?

(crABI is basically a proposal for a new ABI for languages with nice type systems).

Making Rust libraries natively usable would solve one of the biggest issues Nim currently has (there are not many libraries).

To me, it seems, Rust is the only option that offers this opportunity. It is compiled, type-safe, has many libraries, has a build tool that basically all Rust projects use (so should be relatively easy to integrate automatically) - I don't think there is anything else that has all of these properties.

I'm sure, I'm not the only one in the Rust community who would absolutely love to use Nim and leverage the Rust ecosystem.

@arnetheduck
Copy link

arnetheduck commented Dec 6, 2024

I'm sure, I'm not the only one in the Rust community who would absolutely love to use Nim and leverage the Rust ecosystem.

See also https://github.com/arnetheduck/nbindgen/ - this is mostly seamless for our usage of rust using the "old world" C FFI support - crABI would make a few things a little bit more smooth but it is by no means a blocker.

For compiler-level interop, there's https://github.com/arnetheduck/nlvm which lets you do cross-language LTO between nim and rust (and anything else llvm-based) - this is better than crABI in many aspects, ie basically it's using the LLVM IR as ABI interop layer and can deduce many of the things crABI encodes).

@bpr
Copy link

bpr commented Dec 6, 2024

Inject destructors: Now that values have been bound to temporaries explicitly and the control flow has been simplified it is rather easy to inject =destroy(x) calls at scope exists.

I've been watching progress in the Mojo language and I'm curious about support for ASAP destruction in a future Nim. How much harder is it to destroy at last usage? What are thoughts on the reported benefits of ASAP destruction over scope based RAII?

@Spirarel
Copy link

Spirarel commented Dec 6, 2024

Inject destructors: Now that values have been bound to temporaries explicitly and the control flow has been simplified it is rather easy to inject =destroy(x) calls at scope exists.

I've been watching progress in the Mojo language and I'm curious about spport for ASAP destruction in a future Nim. How much harder is it to destroy at last usage? What are thoughts on the reported benefits of ASAP destruction over scope based RAII?

Hard real-time apps aren't my area of expertise, but wouldn't you want destruction to be predictable and planned in those cases since it occupies a core? This seems like it would share some of the issues of mark-and-sweep in such apps. Maybe destruction could be ASAP unless there's an explicit call to delete(?)

@Araq
Copy link
Member Author

Araq commented Dec 6, 2024

How much harder is it to destroy at last usage?

Not much harder but I have never implemented it so I could be wrong. Also, the resulting edge cases make me a bit nervous...

What are thoughts on the reported benefits of ASAP destruction over scope based RAII?

Seems like a natural extension but the benefit is marginal for Nim which already has fine grained scopes. Things are different for Python/Mojo where the benefit is larger as Python's scoping rules are terrible.

@Araq
Copy link
Member Author

Araq commented Dec 7, 2024

With the new architecture described above, is there any realistic chance that there will be a new backend that targets the Rust compiler?

Compiling Nim or NIFC to Rust seems to be very hard and unrealistic. Instead there should be a NIFC to LLVM translation step, much like nlvm does it today.

That said, I'm more interested in interop with Python than with Rust for the time being.

@bpr
Copy link

bpr commented Dec 8, 2024

Seems like a natural extension but the benefit is marginal for Nim which already has fine grained scopes. Things are different for Python/Mojo where the benefit is larger as Python's scoping rules are terrible.

I was more interested in the reported benefits with respect to tail recursion and various move optimizations.

Yeah Python scoping rules have always been bad. I wish Guido had copied Scheme, or Pascal, which Scheme copied.

@Araq
Copy link
Member Author

Araq commented Dec 9, 2024

Well Nim does elide wasMoved+destroy pairs so it's quite comparable and should also enable more tail recursions. But tail recursion should have a dedicated syntax as relying on it as an "optimization" is too subtle, in my personal opinion. In fact, using recursion for everything is a "principle of least power" violation.

Having said that, Mojo's way of doing it might even be simpler to implement than Nim's wasMoved+destroy elision so I'm taking a closer look...

@mratsim
Copy link
Collaborator

mratsim commented Dec 9, 2024

Inject dups/copies: This can produce weird constructs like while (;let tmp = f(); tmp.value)

There has been a significant amount of work done to remove copies, use moves instead and also remove useless initialization (zeroMem), it would be helpful to know what requires a copy or maybe have some metrics to track that.

But tail recursion should have a dedicated syntax as relying on it as an "optimization" is too subtle, in my personal opinion.

I agree. And it would allow using them in Nim in debug-mode without running afoul of the 2000 stacktrace recursion limit.

@Araq
Copy link
Member Author

Araq commented Dec 9, 2024

it would be helpful to know what requires a copy or maybe have some metrics to track that.

Well the compiler follows the spec, https://nim-lang.org/docs/destructors.html#rewrite-rules but there is also ensureMove and a "cursor optimizer" which can elide copies

@bpr
Copy link

bpr commented Dec 16, 2024

In fact, using recursion for everything is a "principle of least power" violation.

I agree and would welcome a keyword for explicitly introducing tail call optimization. I'll say TCO rather than "recursion " because IMO most tail self recursions can be written better as loops, and the optimization becomes more valuable when we have mutual (tail) recursion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants