diff --git a/docs/design/README.md b/docs/design/README.md index aed9c2bf4d6e2..eb86b4c1cf58a 100644 --- a/docs/design/README.md +++ b/docs/design/README.md @@ -24,6 +24,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Floating-point literals](#floating-point-literals) - [String types](#string-types) - [String literals](#string-literals) +- [Value categories and value phases](#value-categories-and-value-phases) - [Composite types](#composite-types) - [Tuples](#tuples) - [Struct types](#struct-types) @@ -40,6 +41,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Variable `var` declarations](#variable-var-declarations) - [`auto`](#auto) - [Functions](#functions) + - [Parameters](#parameters) - [`auto` return type](#auto-return-type) - [Blocks and statements](#blocks-and-statements) - [Assignment statements](#assignment-statements) @@ -61,6 +63,9 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Inheritance](#inheritance) - [Access control](#access-control) - [Destructors](#destructors) + - [`const`](#const) + - [Unformed state](#unformed-state) + - [Move](#move) - [Mixins](#mixins) - [Choice types](#choice-types) - [Names](#names) @@ -384,6 +389,53 @@ are available for representing strings with `\`s and `"`s. > - Proposal > [#199: String literals](https://github.com/carbon-language/carbon-lang/pull/199) +## Value categories and value phases + +**FIXME:** Should this be moved together with +[Types are values](#types-are-values)? + +Every value has a +[value category](), +similar to [C++](https://en.cppreference.com/w/cpp/language/value_category), +that is either _l-value_ or _r-value_. Carbon will automatically convert an +l-value to an r-value, but not in the other direction. + +L-values have storage and a stable address. They may be modified, assuming their +type is not [`const`](#const). + +R-values may not have dedicated storage. This means they cannot be modified and +their address generally cannot be taken. R-values are broken down into three +kinds, called _value phases_: + +- A _constant_ has a value known at compile time, and that value is available + during type checking, for example to use as the size of an array. These + include literals ([integer](#integer-literals), + [floating-point](#floating-point-literals), [string](#string-literals)), + concrete type values (like `f64` or `Optional(i32*)`), expressions in terms + of constants, and values of + [`template` parameters](#checked-and-template-parameters). +- A _symbolic value_ has a value that will be known at the code generation + stage of compilation when + [monomorphization](https://en.wikipedia.org/wiki/Monomorphization) happens, + but is not known during type checking. This includes + [checked-generic parameters](#checked-and-template-parameters), and type + expressions with checked-generic arguments, like `Optional(T*)`. +- A _runtime value_ has a dynamic value only known at runtime. + +Carbon will automatically convert a constant to a symbolic value, or any value +to a runtime value: + +```mermaid +graph TD; + A(constant)-->B(symbolic value)-->C(runtime value); + D(l-value)-->C; +``` + +Constants convert to symbolic values and to runtime values. Symbolic values will +generally convert into runtime values if an operation that inspects the value is +performed on them. Runtime values will convert into constants or to symbolic +values if constant evaluation of the runtime expression succeeds. + ## Composite types ### Tuples @@ -457,19 +509,15 @@ not support the only pointer [operations](#expressions) are: - Dereference: given a pointer `p`, `*p` gives the value `p` points to as an - [l-value](). - `p->m` is syntactic sugar for `(*p).m`. -- Address-of: given an - [l-value]() - `x`, `&x` returns a pointer to `x`. + [l-value](#value-categories-and-value-phases). `p->m` is syntactic sugar for + `(*p).m`. +- Address-of: given an [l-value](#value-categories-and-value-phases) `x`, `&x` + returns a pointer to `x`. There are no [null pointers](https://en.wikipedia.org/wiki/Null_pointer) in Carbon. To represent a pointer that may not refer to a valid object, use the type `Optional(T*)`. -Pointers are the main Carbon mechanism for allowing a function to modify a -variable of the caller. - **TODO:** Perhaps Carbon will have [stricter pointer provenance](https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html) or restrictions on casts between pointers and integers. @@ -537,6 +585,7 @@ Some common expressions in Carbon include: - [Indexing](#arrays-and-slices): `a[3]` - [Function](#functions) call: `f(4)` - [Pointer](#pointer-types): `*p`, `p->m`, `&x` + - [Move](#move): `~x` - [Conditionals](expressions/if.md): `if c then t else f` - Parentheses: `(7 + 8) * (3 - 1)` @@ -639,14 +688,25 @@ Binding patterns default to _`let` bindings_. The `var` keyword is used to make it a _`var` binding_. - The result of a `let` binding is the name is bound to an - [non-l-value](). - This means the value can not be modified, and its address cannot be taken. + [r-value](#value-categories-and-value-phases). This means the value cannot + be modified, and its address generally cannot be taken. - A `var` binding has dedicated storage, and so the name is an - [l-value]() - which can be modified and has a stable address. + [l-value](#value-categories-and-value-phases) which can be modified and has + a stable address. + +A `let`-binding may trigger a copy of the original value, or a move if the +original value is a temporary, or the binding may be a pointer to the original +value, like a +[`const` reference in C++](). +Which option must not be observable to the programmer. For example, Carbon will +not allow modifications to the original value when it is through a pointer. This +choice may also be influenced by the type. For example, types that don't support +being copied will be passed by pointer instead. A [generic binding](#checked-and-template-parameters) uses `:!` instead of a -colon (`:`) and can only match compile-time values. +colon (`:`) and can only match +[constant or symbolic values](#value-categories-and-value-phases), not run-time +values. The keyword `auto` may be used in place of the type in a binding pattern, as long as the type can be deduced from the type of a value in the same @@ -725,18 +785,17 @@ introduced into the enclosing [scope](#declarations-definitions-and-scopes). ### Variable `var` declarations A `var` declaration is similar, except with `var` bindings, so `x` here is an -[l-value]() with -storage and an address, and so may be modified: +[l-value](#value-categories-and-value-phases) with storage and an address, and +so may be modified: ```carbon var x: i64 = 42; x = 7; ``` -Variables with a type that has -[an unformed state](https://github.com/carbon-language/carbon-lang/pull/257) do -not need to be initialized in the variable declaration, but do need to be -assigned before they are used. +Variables with a type that has [an unformed state](#unformed-state) do not need +to be initialized in the variable declaration, but do need to be assigned before +they are used. > References: > @@ -784,8 +843,8 @@ Breaking this apart: - `fn` is the keyword used to introduce a function. - Its name is `Add`. This is the name added to the enclosing [scope](#declarations-definitions-and-scopes). -- The parameter list in parentheses (`(`...`)`) is a comma-separated list of - [irrefutable patterns](#patterns). +- The [parameter list](#parameters) in parentheses (`(`...`)`) is a + comma-separated list of [irrefutable patterns](#patterns). - It returns an `i64` result. Functions that return nothing omit the `->` and return type. @@ -801,17 +860,8 @@ fn Add(a: i64, b: i64) -> i64 { ``` The names of the parameters are in scope until the end of the definition or -declaration. - -The bindings in the parameter list default to -[`let` bindings](#binding-patterns), and so the parameter names are treated as -[r-values](). If -the `var` keyword is added before the binding, then the arguments will be copied -to new storage, and so can be mutated in the function body. The copy ensures -that any mutations will not be visible to the caller. - -The parameter names in a forward declaration may be omitted using `_`, but must -match the definition if they are specified. +declaration. The parameter names in a forward declaration may be omitted using +`_`, but must match the definition if they are specified. > References: > @@ -825,6 +875,28 @@ match the definition if they are specified. > - Question-for-leads issue > [#1132: How do we match forward declarations with their definitions?](https://github.com/carbon-language/carbon-lang/issues/1132) +### Parameters + +The bindings in the parameter list default to +[`let` bindings](#binding-patterns), and so the parameter names are treated as +[r-values](#value-categories-and-value-phases). This is appropriate for input +parameters. This binding will be implemented using a pointer, unless it is legal +to copy and copying is cheaper. + +If the `var` keyword is added before the binding, then the arguments will be +copied (or moved from a temporary) to new storage, and so can be mutated in the +function body. The copy ensures that any mutations will not be visible to the +caller. + +Use a [pointer](#pointer-types) parameter type to represent an +[input/output parameter](), +allowing a function to modify a variable of the caller's. This makes the +possibility of those modifications visible: by taking the address using `&` in +the caller, and dereferencing using `*` in the callee. + +Outputs of a function should prefer to be returned. Multiple values may be +returned using a [tuple](#tuples) or [struct](#struct-types) type. + ### `auto` return type If `auto` is used in place of the return type, the return type of the function @@ -882,8 +954,8 @@ fn Foo() { ### Assignment statements Assignment statements mutate the value of the -[l-value]() -described on the left-hand side of the assignment. +[l-value](#value-categories-and-value-phases) described on the left-hand side of +the assignment. - Assignment: `x = y;`. `x` is assigned the value of `y`. - Increment and decrement: `++i;`, `--j;`. `i` is set to `i + 1`, `j` is set @@ -1306,14 +1378,14 @@ class Point { var dy: i32 = y2 - me.y; return Math.Sqrt(dx * dx - dy * dy); } - // Mutating method + // Mutating method declaration fn Offset[addr me: Self*](dx: i32, dy: i32); var x: i32; var y: i32; } -// Out-of-line definition of method declared inline. +// Out-of-line definition of method declared inline fn Point.Offset[addr me: Self*](dx: i32, dy: i32) { me->x += dx; me->y += dy; @@ -1337,7 +1409,9 @@ two methods `Distance` and `Offset`: modifying the `Point`. This is signified using `[me: Self]` in the method declaration. - `origin.Offset(`...`)` does modify the value of `origin`. This is signified - using `[addr me: Self*]` in the method declaration. + using `[addr me: Self*]` in the method declaration. Since calling this + method requires taking the address of `origin`, it may only be called on + [non-`const`](#const) [l-values](#value-categories-and-value-phases). - Methods may be declared lexically inline like `Distance`, or lexically out of line like `Offset`. @@ -1517,6 +1591,89 @@ type, use `UnsafeDelete`. > - Proposal > [#1154: Destructors](https://github.com/carbon-language/carbon-lang/pull/1154) +#### `const` + +**Note:** This is provisional, no design for `const` has been through the +proposal process yet. + +For every type `MyClass`, there is the type `const MyClass` such that: + +- The data representation is the same, so a `MyClass*` value may be implicitly + converted to a `(const MyClass)*`. +- A `const MyClass` [l-value](#value-categories-and-value-phases) may + automatically convert to a `MyClass` r-value, the same way that a `MyClass` + l-value can. +- If member `x` of `MyClass` has type `T`, then member `x` of `const MyClass` + has type `const T`. +- The API of a `const MyClass` is a subset of `MyClass`, excluding all methods + taking `[addr me: Self*]`. + +Note that `const` binds more tightly than postfix-`*` for forming a pointer +type, so `const MyClass*` is equal to `(const MyClass)*`. + +This example uses the definition of `Point` from the +["methods" section](#methods): + +```carbon +var origin: Point = {.x = 0, .y = 0}; + +// ✅ Allowed conversion from `Point*` to +// `const Point*`: +let p: const Point* = &origin; + +// ✅ Allowed conversion of `const Point` l-value +// to `Point` r-value. +let five: f32 = p->Distance(3, 4); + +// ❌ Error: mutating method `Offset` excluded +// from `const Point` API. +p->Offset(3, 4); + +// ❌ Error: mutating method `AssignAdd.Op` +// excluded from `const i32` API. +p->x += 2; +``` + +#### Unformed state + +Types indicate that they support unformed states by +[implementing a particular interface](#interfaces-and-implementations), +otherwise variables of that type must be explicitly initialized when they are +declared. + +An unformed state for an object is one that satisfies the following properties: + +- Assignment from a fully formed value is correct using the normal assignment + implementation for the type. +- Destruction must be correct using the type's normal destruction + implementation. +- Destruction must be optional. The behavior of the program must be equivalent + whether the destructor is run or not for an unformed object, including not + leaking resources. + +A type might have more than one in-memory representation for the unformed state, +and those representations may be the same as valid fully formed values for that +type. For example, all values are legal representations of the unformed state +for any type with a trivial destructor like `i32`. Types may define additional +initialization for the [hardened build mode](#build-modes). For example, this +causes integers to be set to `0` when in unformed state in this mode. + +Any operation on an unformed object _other_ than destruction or assignment from +a fully formed value is an error, even if its in-memory representation is that +of a valid value for that type. + +> References: +> +> - Proposal +> [#257: Initialization of memory and variables](https://github.com/carbon-language/carbon-lang/pull/257) + +#### Move + +Carbon will allow types to define if and how they are moved. This can happen +when returning a value from a function or by using the _move operator_ `~x`. +This leaves `x` in an [unformed state](#unformed-state) and returns its old +value. + #### Mixins Mixins allow reuse with different trade-offs compared to @@ -2123,6 +2280,9 @@ templates. Constraints can then be added incrementally, with the compiler verifying that the semantics stay the same. Once all constraints have been added, removing the word `template` to switch to a checked parameter is safe. +The [value phase](#value-categories-and-value-phases) of a checked parameter is +a symbolic value whereas the value phase of a template parameter is constant. + Although checked generics are generally preferred, templates enable translation of code between C++ and Carbon, and address some cases where the type checking rigor of generics are problematic. @@ -2555,12 +2715,25 @@ The interfaces that correspond to each operator are given by: - **TODO:** [Assignment](#assignment-statements): `x = y`, `++x`, `x += y`, and so on - **TODO:** Dereference: `*p` +- **TODO:** [Move](#move): `~x` - **TODO:** Indexing: `a[3]` - **TODO:** Function call: `f(4)` The [logical operators can not be overloaded](expressions/logical_operators.md#overloading). +Operators that result in [l-values](#value-categories-and-value-phases), such as +dereferencing `*p` and indexing `a[3]`, have interfaces that return the address +of the value. Carbon automatically dereferences the pointer to get the l-value. + +Operators that can take multiple arguments, such as function calling operator +`f(4)`, have a [variadic](generics/details.md#variadic-arguments) parameter +list. + +Whether and how a value supports other operations, such as being copied, +swapped, or set into an [unformed state](#unformed-state), is also determined by +implementing corresponding interfaces for the value's type. + > References: > > - [Operator overloading](generics/details.md#operator-overloading) @@ -2704,7 +2877,7 @@ A C++ library header file may be [imported](#imports) into Carbon using an ```carbon // like `#include "circle.h"` in C++ -import Cpp library "circle.h" +import Cpp library "circle.h"; ``` This adds the names from `circle.h` into the `Cpp` namespace. If `circle.h`