Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design overview update part 7: values #1378

Merged
merged 23 commits into from
Jul 15, 2022
Merged
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
247 changes: 209 additions & 38 deletions docs/design/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Floating-point literals](#floating-point-literals)
- [String types](#string-types)
- [String literals](#string-literals)
- [Value categories and value phases](#value-categories-and-value-phases)
- [Composite types](#composite-types)
- [Tuples](#tuples)
- [Struct types](#struct-types)
Expand All @@ -40,6 +41,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Variable `var` declarations](#variable-var-declarations)
- [`auto`](#auto)
- [Functions](#functions)
- [Parameters](#parameters)
- [`auto` return type](#auto-return-type)
- [Blocks and statements](#blocks-and-statements)
- [Assignment statements](#assignment-statements)
Expand All @@ -61,6 +63,9 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
- [Inheritance](#inheritance)
- [Access control](#access-control)
- [Destructors](#destructors)
- [`const`](#const)
- [Unformed state](#unformed-state)
- [Move](#move)
- [Mixins](#mixins)
- [Choice types](#choice-types)
- [Names](#names)
Expand Down Expand Up @@ -384,6 +389,52 @@ are available for representing strings with `\`s and `"`s.
> - Proposal
> [#199: String literals](https://github.com/carbon-language/carbon-lang/pull/199)

## Value categories and value phases

**FIXME:** Should this be moved together with
[Types are values](#types-are-values)?

Every value has a
[value category](<https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue>),
similar to [C++](https://en.cppreference.com/w/cpp/language/value_category),
that is either _l-value_ or _r-value_. Carbon will automatically convert an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Ivalue" and "rvalue" are usually spelled without a hyphen (https://trends.google.com/trends/explore?geo=US&q=lvalue,l-value,rvalue,r-value). If this is a conscious deviation from that, I don't mind it, but just wanted to make sure we weren't deviating by accident :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I find the hyphen to help me with readability quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hyphen was what I learned and what was in wikipedia. I saw lvalue in the C++ docs I saw, so I suspect the C++ community might skip the hyphen. I'm happy to switch it if there is agreement, it is not that important to me one way or the other.

l-value to an r-value, but not in the other direction.

L-values have storage and a stable address. They may be modified, assuming their
type is not [`const`](#const).

R-values may not have dedicated storage. This means they can not be modified and
josh11b marked this conversation as resolved.
Show resolved Hide resolved
their address generally cannot be taken. R-values are broken down into three
kinds, called _value phases_:

- A _constant_ has a value known at compile time, and that value is available
chandlerc marked this conversation as resolved.
Show resolved Hide resolved
during type checking, for example to use as the size of an array. These
include literals ([integer](#integer-literals),
[floating-point](#floating-point-literals), [string](#string-literals)),
concrete type values (like `f64` or `Optional(i32*)`), expressions in terms
of constants, and values of
[`template` parameters](#checked-and-template-parameters).
- A _symbolic value_ has a value that will be known at the code generation
stage of compilation when
[monomorphization](https://en.wikipedia.org/wiki/Monomorphization) happens,
but is not known during type checking. This includes
[checked-generic parameters](#checked-and-template-parameters), and type
expressions with checked-generic arguments, like `Optional(T*)`.
- A _runtime value_ has a dynamic value only known at runtime.

Carbon will automatically convert a constant to a symbolic value, or any value
to a runtime value:

```mermaid
graph TD;
A(constant)-->B(symbolic value)-->C(runtime value);
D(l-value)-->C;
```

Symbolic values will generally convert into runtime values if an operation is
performed on them. Operations on just constant values will generally result in
constants.
josh11b marked this conversation as resolved.
Show resolved Hide resolved

## Composite types

### Tuples
Expand Down Expand Up @@ -457,19 +508,15 @@ not support
the only pointer [operations](#expressions) are:

- Dereference: given a pointer `p`, `*p` gives the value `p` points to as an
[l-value](<https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue>).
`p->m` is syntactic sugar for `(*p).m`.
- Address-of: given an
[l-value](<https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue>)
`x`, `&x` returns a pointer to `x`.
[l-value](#value-categories-and-value-phases). `p->m` is syntactic sugar for
`(*p).m`.
- Address-of: given an [l-value](#value-categories-and-value-phases) `x`, `&x`
returns a pointer to `x`.

There are no [null pointers](https://en.wikipedia.org/wiki/Null_pointer) in
Carbon. To represent a pointer that may not refer to a valid object, use the
type `Optional(T*)`.

Pointers are the main Carbon mechanism for allowing a function to modify a
variable of the caller.

**TODO:** Perhaps Carbon will have
[stricter pointer provenance](https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html)
or restrictions on casts between pointers and integers.
Expand Down Expand Up @@ -537,6 +584,7 @@ Some common expressions in Carbon include:
- [Indexing](#arrays-and-slices): `a[3]`
- [Function](#functions) call: `f(4)`
- [Pointer](#pointer-types): `*p`, `p->m`, `&x`
- [Move](#move): `~x`

- [Conditionals](expressions/if.md): `if c then t else f`
- Parentheses: `(7 + 8) * (3 - 1)`
Expand Down Expand Up @@ -639,14 +687,25 @@ Binding patterns default to _`let` bindings_. The `var` keyword is used to make
it a _`var` binding_.

- The result of a `let` binding is the name is bound to an
[non-l-value](<https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue>).
This means the value can not be modified, and its address cannot be taken.
[r-value](#value-categories-and-value-phases). This means the value can not
josh11b marked this conversation as resolved.
Show resolved Hide resolved
be modified, and its address generally cannot be taken.
- A `var` binding has dedicated storage, and so the name is an
[l-value](<https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue>)
which can be modified and has a stable address.
[l-value](#value-categories-and-value-phases) which can be modified and has
a stable address.

A `let`-binding may trigger a copy of the original value, or a move if the
original value is a temporary, or the binding may be a pointer to the original
value, like a
[`const` reference in C++](<https://en.wikipedia.org/wiki/Reference_(C%2B%2B)>).
Which option must not be observable to the programmer. For example, Carbon will
not allow modifications to the original value when it is through a pointer. This
choice may also be influenced by the type. For example, types that don't support
being copied will be passed by pointer instead.

A [generic binding](#checked-and-template-parameters) uses `:!` instead of a
colon (`:`) and can only match compile-time values.
colon (`:`) and can only match
[constant or symbolic values](#value-categories-and-value-phases), not run-time
values.

The keyword `auto` may be used in place of the type in a binding pattern, as
long as the type can be deduced from the type of a value in the same
Expand Down Expand Up @@ -725,18 +784,17 @@ introduced into the enclosing [scope](#declarations-definitions-and-scopes).
### Variable `var` declarations

A `var` declaration is similar, except with `var` bindings, so `x` here is an
[l-value](<https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue>) with
storage and an address, and so may be modified:
[l-value](#value-categories-and-value-phases) with storage and an address, and
so may be modified:

```carbon
var x: i64 = 42;
x = 7;
```

Variables with a type that has
[an unformed state](https://github.com/carbon-language/carbon-lang/pull/257) do
not need to be initialized in the variable declaration, but do need to be
assigned before they are used.
Variables with a type that has [an unformed state](#unformed-state) do not need
to be initialized in the variable declaration, but do need to be assigned before
they are used.

> References:
>
Expand Down Expand Up @@ -784,8 +842,8 @@ Breaking this apart:
- `fn` is the keyword used to introduce a function.
- Its name is `Add`. This is the name added to the enclosing
[scope](#declarations-definitions-and-scopes).
- The parameter list in parentheses (`(`...`)`) is a comma-separated list of
[irrefutable patterns](#patterns).
- The [parameter list](#parameters) in parentheses (`(`...`)`) is a
comma-separated list of [irrefutable patterns](#patterns).
- It returns an `i64` result. Functions that return nothing omit the `->` and
return type.

Expand All @@ -801,17 +859,8 @@ fn Add(a: i64, b: i64) -> i64 {
```

The names of the parameters are in scope until the end of the definition or
declaration.

The bindings in the parameter list default to
[`let` bindings](#binding-patterns), and so the parameter names are treated as
[r-values](<https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue>). If
the `var` keyword is added before the binding, then the arguments will be copied
to new storage, and so can be mutated in the function body. The copy ensures
that any mutations will not be visible to the caller.

The parameter names in a forward declaration may be omitted using `_`, but must
match the definition if they are specified.
declaration. The parameter names in a forward declaration may be omitted using
`_`, but must match the definition if they are specified.

> References:
>
Expand All @@ -825,6 +874,27 @@ match the definition if they are specified.
> - Question-for-leads issue
> [#1132: How do we match forward declarations with their definitions?](https://github.com/carbon-language/carbon-lang/issues/1132)

### Parameters

The bindings in the parameter list default to
[`let` bindings](#binding-patterns), and so the parameter names are treated as
[r-values](#value-categories-and-value-phases). This is appropriate for input
parameters. This binding will be implemented using a pointer, unless it is legal
to copy and copying is cheaper.

If the `var` keyword is added before the binding, then the arguments will be
copied to new storage, and so can be mutated in the function body. The copy
ensures that any mutations will not be visible to the caller.
josh11b marked this conversation as resolved.
Show resolved Hide resolved

Use a [pointer](#pointer-types) parameter type to represent an
[input/output parameter](<https://en.wikipedia.org/wiki/Parameter_(computer_programming)#Output_parameters>),
allowing a function to modify a variable of the caller's. This makes the
possibility of those modifications visible: by taking the address using `&` in
the caller, and dereferencing using `*` in the callee.

Outputs of a function should prefer to be returned. Multiple values may be
returned using a [tuple](#tuples) or [struct](#struct-types) type.

### `auto` return type

If `auto` is used in place of the return type, the return type of the function
Expand Down Expand Up @@ -882,8 +952,8 @@ fn Foo() {
### Assignment statements

Assignment statements mutate the value of the
[l-value](<https://en.wikipedia.org/wiki/Value_(computer_science)#lrvalue>)
described on the left-hand side of the assignment.
[l-value](#value-categories-and-value-phases) described on the left-hand side of
the assignment.

- Assignment: `x = y;`. `x` is assigned the value of `y`.
- Increment and decrement: `++i;`, `--j;`. `i` is set to `i + 1`, `j` is set
Expand Down Expand Up @@ -1306,14 +1376,14 @@ class Point {
var dy: i32 = y2 - me.y;
return Math.Sqrt(dx * dx - dy * dy);
}
// Mutating method
// Mutating method declaration
fn Offset[addr me: Self*](dx: i32, dy: i32);

var x: i32;
var y: i32;
}

// Out-of-line definition of method declared inline.
// Out-of-line definition of method declared inline
fn Point.Offset[addr me: Self*](dx: i32, dy: i32) {
me->x += dx;
me->y += dy;
Expand All @@ -1337,7 +1407,9 @@ two methods `Distance` and `Offset`:
modifying the `Point`. This is signified using `[me: Self]` in the method
declaration.
- `origin.Offset(`...`)` does modify the value of `origin`. This is signified
using `[addr me: Self*]` in the method declaration.
using `[addr me: Self*]` in the method declaration. Since calling this
method requires taking the address of `origin`, it may only be called on
[non-`const`](#const) [l-values](#value-categories-and-value-phases).
- Methods may be declared lexically inline like `Distance`, or lexically out
of line like `Offset`.

Expand Down Expand Up @@ -1517,6 +1589,89 @@ type, use `UnsafeDelete`.
> - Proposal
> [#1154: Destructors](https://github.com/carbon-language/carbon-lang/pull/1154)

#### `const`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I've missed something, this hasn't been through the proposal process, and this is a pretty contested and uncertain area, so it might be better to be clear that we don't have a design for this yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a disclaimer.


**Note:** This is provisional, no design for `const` has been through the
proposal process yet.

For every type `MyClass`, there is the type `const MyClass` such that:

- The data representation is the same, so a `MyClass*` value may be implicitly
converted to a `(const MyClass)*`.
- A `const MyClass` [l-value](#value-categories-and-value-phases) may
automatically convert to a `MyClass` r-value, the same way that a `MyClass`
l-value can.
- If member `x` of `MyClass` has type `T`, then member `x` of `const MyClass`
has type `const T`.
- The API of a `const MyClass` is a subset of `MyClass`, excluding all methods
taking `[addr me: Self*]`.

Note that `const` binds more tightly than postfix-`*` for forming a pointer
type, so `const MyClass*` is equal to `(const MyClass)*`.

This example uses the definition of `Point` from the
["methods" section](#methods):

```carbon
var origin: Point = {.x = 0, .y = 0};

// ✅ Allowed conversion from `Point*` to
// `const Point*`:
let p: const Point* = &origin;

// ✅ Allowed conversion of `const Point` l-value
// to `Point` r-value.
let five: f32 = p->Distance(3, 4);

// ❌ Error: mutating method `Offset` excluded
// from `const Point` API.
p->Offset(3, 4);

// ❌ Error: mutating method `AssignAdd.Op`
// excluded from `const i32` API.
p->x += 2;
```

#### Unformed state

Types indicate that they support unformed states by
[implementing a particular interface](#interfaces-and-implementations),
otherwise variables of that type must be explicitly initialized when they are
declared.

An unformed state for an object is one that satisfies the following properties:

- Assignment from a fully formed value is correct using the normal assignment
implementation for the type.
- Destruction must be correct using the type's normal destruction
implementation.
- Destruction must be optional. The behavior of the program must be equivalent
whether the destructor is run or not for an unformed object, including not
leaking resources.

A type might have more than one in-memory representation for the unformed state,
and those representations may be the same as valid fully formed values for that
type. For example, all values are legal representations of the unformed state
for any type with a trivial destructor like `i32`. Types may define additional
initialization for the [hardened build mode](#build-modes). For example, this
causes integers to be set to `0` when in unformed state in this mode.

Any operation on an unformed object _other_ than destruction or assignment from
a fully formed value is an error, even if its in-memory representation is that
of a valid value for that type.

> References:
>
> - Proposal
> [#257: Initialization of memory and variables](https://github.com/carbon-language/carbon-lang/pull/257)

#### Move

Carbon will allow types to define if and how they are moved. This can happen
when returning a value from a function or by using the _move operator_ `~x`.
This leaves `x` in an [unformed state](#unformed-state) and returns its old
value.

#### Mixins

Mixins allow reuse with different trade-offs compared to
Expand Down Expand Up @@ -2123,6 +2278,9 @@ templates. Constraints can then be added incrementally, with the compiler
verifying that the semantics stay the same. Once all constraints have been
added, removing the word `template` to switch to a checked parameter is safe.

The [value phase](#value-categories-and-value-phases) of a checked parameter is
a symbolic value whereas the value phase of a template parameter is constant.

Although checked generics are generally preferred, templates enable translation
of code between C++ and Carbon, and address some cases where the type checking
rigor of generics are problematic.
Expand Down Expand Up @@ -2555,12 +2713,25 @@ The interfaces that correspond to each operator are given by:
- **TODO:** [Assignment](#assignment-statements): `x = y`, `++x`, `x += y`,
and so on
- **TODO:** Dereference: `*p`
- **TODO:** [Move](#move): `~x`
- **TODO:** Indexing: `a[3]`
- **TODO:** Function call: `f(4)`

The
[logical operators can not be overloaded](expressions/logical_operators.md#overloading).

Operators that result in [l-values](#value-categories-and-value-phases), such as
dereferencing `*p` and indexing `a[3]`, have interfaces that return the address
of the value. Carbon automatically dereferences the pointer to get the l-value.

Operators that can take multiple arguments, such as function calling operator
`f(4)`, have a [variadic](generics/details.md#variadic-arguments) parameter
list.

Whether and how a value supports other operations, such as being copied,
swapped, or set into an [unformed state](#unformed-state), is also determined by
implementing corresponding interfaces for the value's type.

> References:
>
> - [Operator overloading](generics/details.md#operator-overloading)
Expand Down Expand Up @@ -2704,7 +2875,7 @@ A C++ library header file may be [imported](#imports) into Carbon using an

```carbon
// like `#include "circle.h"` in C++
import Cpp library "circle.h"
import Cpp library "circle.h";
```

This adds the names from `circle.h` into the `Cpp` namespace. If `circle.h`
Expand Down