Skip to content
Hannes Harnisch edited this page Mar 27, 2024 · 19 revisions

💠 Cero language design notes

Built-in types

All built-in types are distinct types, and not aliases of other built-in types. They are not keywords but always accessible as global names, unless noted otherwise.

Integer types

Unsigned integer types

Types Notes
uint8 unsigned 8-bit integer
uint16 unsigned 16-bit integer
uint32 unsigned 32-bit integer
uint64 unsigned 64-bit integer
uint128 unsigned 128-bit integer
uintptr unsigned pointer-sized integer
usize unsigned integer for memory amounts, array indexing and object sizes; analogue of size_t in C and C++

Signed integer types

All of these use two's complement.

Types Notes
int8 signed 8-bit integer
int16 signed 16-bit integer
int32 signed 32-bit integer
int64 signed 64-bit integer
int128 signed 128-bit integer
intptr signed pointer-sized integer
isize signed integer for memory offsets, pointer differences and array indexing with negative values; analogue of ptrdiff_t and ssize_t in C and C++

Floating-point types

Types Notes
float16 IEEE 754 binary16 format (5 exponent bits, 10 fraction bits)
float32 IEEE 754 binary32 format (8 exponent bits, 23 fraction bits)
float64 IEEE 754 binary64 format (11 exponent bits, 52 fraction bits)
float128 IEEE 754 binary128 format (15 exponent bits, 112 fraction bits)

These types must be imported:

Types Notes
BFloat16 Brain 16-bit format (hardware support with ARMv8.6-A extensions and Intel AVX-512 BF16 extensions)
XFloat80 x86 80-bit extended format (hardware support on x87)
DFloat128 double-double arithmetic (hardware support on PowerPC)

Array types

Vector types

Pointer types

bool

Type representing values that can only be false or true. Its size and alignment are 1. When used as the type of a bit field, its minimum bit size is 1. Can be cast to any integer type, in which case false becomes 0 and true becomes 1.

Creating a bool with any other bit pattern than 0x0 and 0x1 through the use of memcpy or other mechanisms is undefined behavior. Whether only the use of such a bool is undefined (and not creating one) would still have to be investigated. This UB is justified because existing C and C++ ABIs also define this behavior to be undefined.

void

The unit type which is also the default return type of functions, indicating that it doesn't return anything. It is generally compatible with void from C and C++ but has none of its restrictions, so that using it in a generic context has no exceptional behavior requiring awkward workarounds. The size of void is zero. Objects of type void are useless and it's probably reasonable if constructing them outside of a generic context generates a warning.

Because zero-sized types are allowed, an empty struct has a layout identical to void.

It is probably unnecessary to give any special semantics to ^void. It would be fine if the convention of using them to represent type-erased pointers is kept, as in C and C++. The lack of an implicit conversion from any pointer type to ^void will probably not be sorely missed, but that remains to be seen. Loads and stores of ^void are guaranteed to be no-ops (even when volatile).

It could be useful to make pointer arithmetic on ^[]void operate in units of 1. This would prevent a potential footgun in containers using pointer arithmetic for element count computations. This has precedent and seems to work fine in practice, since pointer-to-void arithmetic using a unit of 1 is provided by a GNU extension for C, seemingly without issues.

never

The bottom type that indicates that an operation never completes to produce a value. It is compatible with [[noreturn]] from C and C++ for the purpose of return types in function signatures. Because it is a bottom type, no objects of type never can ever be legally formed at runtime, which is ensured by the type system and tricking the compiler into breaking this guarantee would cause undefined behavior. However, it is still an object type and it is possible to declare variables and fields of type never and use it in other contexts where object declarations of otherwise normal types are accepted. They are just effectively unreachable. The representation of the never type is that of void: its size is 0 and its alignment is 1.

Function call expressions calling a never-returning function, break, continue, throw and return expressions have this type. Because it is the bottom type, it coerces to any other type. Expressions of this type are useful because they can be used in place of expressions where any other type would be expected, for example:

killProgram(String message) -> never {
    print(message);
    exit(-1); // exit also returns never
}

describeSomeEnumValue(SomeEnum e) -> String {
    return switch e {
        Value1 => { "First value" }
        Value2 => { "Second value" }
        Value3 => { "Third value" }
        else   => { killProgram("invalid enum value") }
    };
}

Constant types

Built-in types whose values only exist at compile-time and can only be directly used as constants.

User-defined types

struct types

Structs are composed of named fields made from other types. An empty struct has size 0 and alignment 1. A struct can inherit the interface and data of another struct by declaring a field without a name, which is an easy way to reuse code:

struct Image {
    uint32 width,
    uint32 height,
    ByteBuffer data
}

struct UserImage {
    Image,
    String userName
}

getWidth(UserImage userImage) -> uint32 {
    return userImage.width;
}

However, this will not make that struct a subtype of the other struct. To make it a subtype in addition to inheritance, name the field this:

struct Image {
    uint32 width,
    uint32 height,
    ByteBuffer data
}

struct UserImage {
    Image this,
    String userName
}

treatAsSupertype(^UserImage userImage) -> ^Image {
    return userImage;
}

enum types

Operators

Unless noted otherwise, operators marked as built-in for some category of types will also be built-in for vectors of such types.

Name Syntax Built-in for Notes
Assignment a = b all built-in types Not overloadable.
Addition a + b
a += b
integers, floats, ^[]T May cause overflow for integers.
Subtraction a - b
a -= b
integers, floats, ^[]T May cause overflow for integers.
Negation -a signed integers, floats May cause overflow for integers.
Multiplication a * b
a *= b
integers, floats May cause overflow for integers.
Division a / b
a /= b
integers, floats May cause overflow for signed integers.
May cause division-by-zero for integers.
May cause division-by-zero for floats in fast-float mode
Remainder a % b
a %= b
integers, floats May cause division-by-zero for integers.
May cause division-by-zero for floats in fast-float mode
Exponentiation a ** b
a **= b
integers, floats May cause overflow for integers.
Pre-increment ++a integers, floats, ^[]T May cause overflow for integers.
Pre-decrement --a integers, floats, ^[]T May cause overflow for integers.
Post-increment a++ integers, floats, ^[]T May cause overflow for integers.
Post-decrement a-- integers, floats, ^[]T May cause overflow for integers.
Bitwise AND a & b
a &= b
integers, bool
Bitwise OR a | b
a |= b
integers, bool
XOR a ~ b
a ~= b
integers, bool
NOT ~a integers, bool
Left shift a << b
a <<= b
integers
Right shift a >> b
a >>= b
integers
Logical AND a && b
a &&= b
bool Not overloadable.
Not available for bool vectors.
Only evaluates b if a is true.
Logical OR a || b
a ||= b
bool Not overloadable.
Not available for bool vectors.
Only evaluates b if a is false.
Equal a == b all built-in types Always returns bool.
Not equal a != b all built-in types Always returns bool.
Less than a < b integers, floats, ^[]T Always returns bool.
Not available for vectors.
Less than or equal a <= b integers, floats, ^[]T Always returns bool.
Not available for vectors.
Greater than a > b integers, floats, ^[]T Always returns bool.
Not available for vectors.
Greater than or equal a >= b integers, floats, ^[]T Always returns bool.
Not available for vectors.
Address of &a all non-constant types, functions Not overloadable.
Dereference a^ pointers
Function call a(b) functions
Indexing a[b] arrays, vectors, ^[]T
Member access a.b all types Not overloadable.

Precedence

Operator precedence should be intuitive and lead to obvious behavior. When operators have no conventionally agreed upon precedence in mathematical notation or are otherwise ambiguous, combining them will lead to syntax errors as noted in the table below with footnotes. Those cases can be resolved by adding parentheses to explicitly specify the intent.

This is intended to prevent pitfalls when combining operators that are not frequently combined with each other, and therefore a user should not feel the need to consult the precedence table while programming.

Level Operators Associativity
1

Postfix
a. a^
a(b) a[b]
a++ a--

left-to-right
2

Prefix
&a
-a ~a
++a --a
a ** b

right-to-left(1)
3

Multiplicative
a * b a / b a % b

Bitwise
a & b a | b a ~ b
a << b a >> b

left-to-right(2)
4

Additive
a + b a - b

5

Comparison
a == b a != b
a < b a <= b
a > b a >= b

left-to-right(3)
6

Logical
a && b a || b

left-to-right(4)
7

Assignment
a = b
a += b a -= b
a *= b a /= b a %= b a **= b
a &= b a |= b a ~= b
a <<= b a >>= b
a &&= b a ||= b

right-to-left

(1): Associating unary - with ** is a syntax error, since there is no accepted convention on whether ‑a**b should mean (‑a)**b or ‑(a**b).

(2): A bitwise operator cannot be combined without parentheses with another arithmetic operator that isn't itself. For example, the binary & operator associates left with itself, so a & b & c is valid to write, but does not associate with other arithmetic operators. Therefore expressions like a & b | c, a & b * c or a + b & c result in a syntax error and require parentheses.

(3): When comparison operators are associated with each other, like a <op> b <op> c, they behave as a <op> b && b <op> c, except that operand b is only evaluated once. Only transitive comparison chains are allowed, meaning those where if a <op> b is true and b <op> c is true, it implies a <op> c. The transitive comparisons are:
a < b < c
a < b <= c
a <= b < c
a <= b <= c
a == b == c
a > b > c
a > b >= c
a >= b > c
a >= b >= c

Non-transitive comparison chains, such as a < b > c, a == b > c or a != b != c, will result in a syntax error, since they are less useful, less clear and likely to cause confusion or bugs due to falsely expecting some kind of transitive relation to hold. If parentheses are used in any of these cases, the special chaining behavior does not occur, so (a == b) == c is not the same as a == b == c. It will instead equality-compare the boolean result of comparing a and b, with c.

(4): Associating && with || without parentheses is a syntax error, so a && b && c is valid to write, but a && b || c requires parentheses.

Comments

Line comments

use ce.io;

main() {
    // A line comment begins with `//` and continues until the next new-line character.
    print("Hello world!"); // Another comment.
}

Block comments

use ce.io;

main() {
    /* A block comment begins with `/*` and must be closed with `*/`.
     * It's conventional to put a star at the beginning of every new block comment line.
     */
    print("Hello world!" /* Block comments don't end at
                            the end of the line. */);

    /* Block comments can be /*/* nested */*/
     * as many times
     * /*/*/**/*/*/
     * as you want. */
}

Documentation comments

Might be added either as a distinct language feature or as a variant of block comments when a certain format for the comment text is used.

Syntax alternatives

It might make sense to change some parts of the syntax to these alternative designs.

Alternate built-in type names

uint8   => u8          int8   => i8
uint16  => u16         int16  => i16       float16  => f16
uint32  => u32         int32  => i32       float32  => f32
uint64  => u64         int64  => i64       float64  => f64
uint128 => u128        int128 => i128      float128 => f128
usize   => usize       isize  => isize
uintptr => uptrsize    intptr => iptrsize

Undesired features

Features that I would avoid adding to the language at all costs, in decreasing order of undesirability:

  • Garbage collection, because it defeats the purpose of the language, which is full control over memory and static memory management; even opt-in GC is detrimental because it splits the ecosystem (see the D language as a case study)
  • Preprocessor, because it's just unnecessary complexity
  • Undefined behavior without specially designated code blocks where it can be caused
  • Built-in high-level data types like dynamic arrays, maps, strings, because the language should be expressive enough to implement these things in libraries efficiently and conveniently
  • Uniform function call syntax (UFCS), because it adds unnecessary complexity for little benefit, and introducing arbitrary choices in writing function calls adds nothing of value; a library author should be able to decide how their facilities are used syntactically by users; note that this doesn't exclude a feature like extension methods or traits, UFCS itself is just syntax sugar that complicates name lookup
  • User-defined operator syntax, because they make parsing much harder and encourage write-only code
  • Invisible C++-style exception handling
  • C-style macros, because they are unhygienic and other metaprogramming features should be powerful enough to make them unnecessary
  • Rust-style macros, because of their complexity and being very divorced from the rest of the language, and also because metaprogramming features should be powerful enough to make them unnecessary
  • First-class built-in tuple types, because their primary use cases are provided by other mechanisms and usually normal structs are preferable anyway because giving a name to each tuple member and the tuple as a whole makes code more readable; multiple return values can just be provided as-is and destructuring has to be provided for convenience for struct types anyway
  • goto
  • defer
  • Named arguments, because changing parameter names should not lead to API breaks