Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: unify float and int as simply number #253

Open
cueckoo opened this issue Jul 3, 2021 · 14 comments
Open

Proposal: unify float and int as simply number #253

cueckoo opened this issue Jul 3, 2021 · 14 comments
Labels
pre v1.0 Proposal roadmap/language-changes Specific tag for roadmap issue #339

Comments

@cueckoo
Copy link
Collaborator

cueckoo commented Jul 3, 2021

Originally opened by @rudolph9 in cuelang/cue#253

Background

JSON has no concepts of float or int only number. The current behavior of cuelang has float and int as completely disjoint branches on the value lattice. Further, any concrete number with a dot . implicitly is a float (e.g. 0.0) and any number without a dot implicitly is an int (e.g. 0).

This causes an error in the following example:

foo: 0
foo: float

A couple tickets related to this behavior:
cuelang/cue#252
cuelang/cue#233

Objective

Unify float and int as simply number which would make for example {foo: 0.0} & {foo: 0} valid cuelang.

Transition

Phase 1:

  • Change the behavior of float to be synonymous with number.
  • Change the behavior of int to declare a number with a constraint that does not allow decimal values.
  • Make any concrete decoration of a number (e.g. 0, 0.0, 1.2, 5`, etc.) implicitly declare a number.

Phase 2:

  • declare float deprecated
  • Configure cue fmt to change all usages of float to number

Phase 3:

  • remove builtin support for float
@cueckoo cueckoo added Proposal roadmap/language-changes Specific tag for roadmap issue #339 labels Jul 3, 2021
@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @rudolph9 in cuelang/cue#253 (comment)

related slack thread

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#253 (comment)

This does simplify things considerably, which is good.

For good measure, let me bring up some clarifications and concerns:

Change the behavior of int to declare a number with a constraint that does not allow decimal values.

Binary floating points (what CUE uses) allow integer values with decimals. (e.g. 0.00). I presume you would allow such numbers to be assignable to int.

This also means that, say 1/ Infinity or any other computation that results in an integer after some precision cutoff point will be treated as an integer. In Go this works fine.

One concern is that this may make configurations somewhat unstable: for instance, a configuration may have a field a: int & (x / y) which is valid for some values of x and y and not for other. When distinguishing between these types it forces the user to specify how to handle the invalid cases upfront. Not that currently, int & (x / y) is not a valid expression in cue

Another concern is that non-integer values may unify with int due to rounding errors. Experience with arbitrary-precision constants in Go teaches us this is not much of a concern in practice.

An alternative is to remove the ability to require a field to be of type float. The predeclared identifier float could then be equated to number. In other words, 0 unifies with either number or int, but 0.0 can only unify with number. This solves some of your concerns.

But overall, this is definitely worth considering. It simplifies things considerably at various places. NCL, a Google-internal configuration language does something similar and this seems to work fairly well. The main drawback there is the inability to constrain a number to int, which is solved by this proposal by keeping int as a constraint on number.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#253 (comment)

I can see the following options:

  1. keep as is (int and float are both instances of number)
  2. keep as is, but remove float as a predeclared identifier. I.e. 0.0 and 0 would not unify. In other words, internally float and int exist, but fields can only be of type number or int.
  3. only have number and int as type, where int is an instance of number. 0.0 would be of type number while 0 would be of type int. 0.0 would still not be assignable to int, but 0 would be assignable to number and 0.0 and 0 would unify to 0.0.
  4. the proposal of this issue: reduce to one number type, and let int be a constraint to specify only integral values. 0.0 and 0 would be identical.

Option 3 would have most of the protections of the current CUE implementation, while being simpler, while Option 4 would be the simplest and offers slightly better JSON compatibility (allowing 0.0 to be assigned to integers). The latter could be addressed with JSON interpretation, but this may be confusing as CUE is a superset of JSON.

Allowing literals to be of multiple types (e.g. 0 is of type int|float) is not an option. It was considered in the past but was rejected due to implications in the lattice model.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @rudolph9 in cuelang/cue#253 (comment)

One consideration (particularly for options 3 and 4), importing cue specs into other languages is to use field attribute to indicate a float type.

foo: 1.0 @golang('type', float) //I think this is the right syntax? 

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @rudolph9 in cuelang/cue#253 (comment)

  1. only have number and int as type, where int is an instance of number. 0.0 would be of type number while 0 would be of type int. 0.0 would still not be assignable to int, but 0 would be assignable to number and 0.0 and 0 would unify to 0.0.

@mpvl is there an example analogous to "0.0 would still not be assignable to int, but 0 would be assignable to number and 0.0 and 0 would unify to 0.0"?

If I'm reading this correctly (please correct me if I'm mistaken) the following would be true:

a: int // => int
b: number // => number
c: number & int // => number
d: 0 & int // => 0
e: 0 & number => 0.0
f: 0 & 0.0 => 0.0
g: int & 0.0 => _|_
h: number & 0.0 => 0.0

If my understanding is indeed correct the following three seem out of place since int is an instance of number

c: number & int // => number
f: 0 & 0.0 => 0.0
g: int & 0.0 => _|_

int by definition is more specific so I would think f: 0 & 0.0 => 0 and g: int & 0.0 => 0 and c: number & int => int.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#253 (comment)

@rudolph9 c: number & int // => number this would currently result in int.

There are various strong reasons to adopt this proposal in some form:

  • JSON does not distinguish between 0.0 and 0
  • Otherwise, either it will not be possible to derive a numeric return type for a builtin, or one would have to have two implementations for each type.

There are some design issues still to be solved. Like having number and int seems inconsistent. Aside from naming, though, it seems good to adopt this.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#253 (comment)

Another data point: The Go json unmarshaler doesn't allow decoding 1.0 into an int.

Atm, I'm leaving to option 3.

Open to evidence it should be 4 though, especially if this is required by JSON schema compatibility for instance.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#253 (comment)

There seems to be no consistency in the JSON schema world on this matter:

The precise treatment of the “integer” type may depend on the implementation of your JSON Schema validator. JavaScript (and thus also JSON) does not have distinct types for integers and floating-point values. Therefore, JSON Schema can not use type alone to distinguish between integers and non-integers. The JSON Schema specification recommends, but does not require, that validators use the mathematical value to determine whether a number is an integer, and not the type alone. Therefore, there is some disagreement between validators on this point. For example, a JavaScript-based validator may accept 1.0 as an integer, whereas the Python-based jsonschema does not.

http://json-schema.org/understanding-json-schema/reference/numeric.html

But based on this I see further evidence to chose 3.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @rudolph9 in cuelang/cue#253 (comment)

0.0 would still not be assignable to int, but 0 would be assignable to number and 0.0 and 0 would unify to 0.0.

0.0 ultimatley constrains the same number as 0 so it seems odd that 0.0 & int // => _|_. It almost feels like we'd be better off getting rid of both int and float since they both carry common system notions of types and how the values are stored on the underlying system (which JSON has no notion of). Perhaps we could decommission both float and int and introduce something more mathematically pure to truly represent a constraint on number for valid integers

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#253 (comment)

0.0 ultimatley constrains the same number as 0
That is not necessarily true.

A big benefit to keep number separate from int is that one can represent "imprecise" calculations syntactically. For instance, a division may result in an inexact number that is 0 after rounding. Such a number should not be considered to be an integer, as it can result in unpredictable and inconsistent outcomes. One could argue that this problem goes away when there is only one numeric type to start with, but the reality is that CUE is used in a world where it needs to roundtrip with systems that do make these distinctions. And thus will lose the ability to use typing to ensure that a result is indeed an integer.

In general, it is a good idea that if CUE interprets a value in a certain way, it can also be represented that way in CUE itself. Right now, that can be done as 0., 0.0, etc, where the number decimals indicates the precision.

When we collapse the two numeric types, there is now way to represent this in CUE itself.

so it seems odd that 0.0 & int // => _|_

That really depends on where you are coming from and I don't agree this is generally true. The decision on this should be guided by usability, practice, and how things fit in the value lattice. I think there are good arguments to allow 0 to be interpreted as a float and even to get rid of the float type. But I haven't really seen good arguments to adopt only a single numeric type, while finding good arguments to keep number and int.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @extemporalgenome in cuelang/cue#253 (comment)

Allowing literals to be of multiple types (e.g. 0 is of type int|float) is not an option. It was considered in the past but was rejected due to implications in the lattice model.

But wouldn't options 3 and 4 change this arrangement, and thus reopen the discussion?

0 could be of type number|int (which if I understand correctly, unifies to just type number). In option 3, since 0 is an instance of int which is in turn an instance of number there should no longer be any conflict? 0 & int & number works today. Thus 0 and 0.0 would both be valid literals for any field constrained to number, but as soon as that same field is also constrained to int, 0.0 would produce an error.

Would it be a problem to defer inferring the type of an integer literal until it is determined that no other constraints prohibit the unification with int (or alternatively, it is determined that there are any constraints that explicitly constrain it to an int)?

  1. the proposal of this issue: reduce to one number type, and let int be a constraint to specify only integral values. 0.0 and 0 would be identical.

This seems much cleaner and more flexible to me, particularly if arbitrary precision constraints can be encoded somehow, perhaps with a new operator (or clever mathematical use of existing syntax), for example: "#Currency: " or "#Even: <this number has -1 fractional digits of precision)"

This would also work well with the internal number representation switching from arbitrary-position base-2 floats to arbitrary precision rationals, since JSON does not mandate any particular interpretation of range or precision (and before 2014, none of the JSON RFCs I've seen even provide guidance on interoperability). I've seen applications that do treat some JSON numbers as having arbitrary decimal precision, for example (sometimes encoded in strings, sometimes not), where by contract or agreement, client and server agree on the arbitrary precision interpretation. It would be quite powerful if CUE could drop the binary number assumption in the core type, leaving the role of precision and base to just constraints. I am not a mathematician, so I do not know what the practical implications of this would be: would we limit arithmetic to having no more precision, even temporary, than the constraint allows, or would the result merely need to be no more precise (no rounding) than the constraint dictates?

I believe some hard, but solvable, problems with making int a precision constraint are:

  1. Error messages could be bad unless cue implementations can recognize and special case "zero digits of precision" during error reporting (whether arrived at via a builtin int constraint or arrived at via a free-form precision constraint): we should continue to say "3.1 is not an int" rather than "3.1 has more than 0 fractional digits of precision" (we could use the precision-style error for non-integer precision constraints).
  2. encoding/* and 3rd party tooling should also be able to easily recognize an integer constraint, so that exporting to formats like JSON Schema can continue to express the type as "int"

If the above could be solved and are worth solving, I believe option 4 would lead to a better language than option 3, though option 3 sounds like it would also result in a better language than we have today.

@myitcv
Copy link
Member

myitcv commented Aug 26, 2022

  • declare float deprecated

Raising a question in the context of #1883. In that issue, I have proposed a mode in which any dropping down to float precision should result in an error (unless explicitly cast to float in the arguments to that function/expression). There is a lot of detail still missing from that sketch, but flagging that point in the context of this proposal.

ptMcGit pushed a commit to ptMcGit/cue that referenced this issue Jan 22, 2023
The playground was initially developed as a proof of concept over a
weekend, in a separate repo/module. However, it has proved to be quite
popular.

The current situation involves a rather awkward "dance" of running
scripts from the playground module within the cuelang.org main module.
Having this additional dependency is unncessary, and introduces more
development effort.

Moving the playground code to the same repository helps eliminate a
dependency, and hence the admin associated with maintaining that
additional dependency.

As part of some enhancements for the GerritHub -> GitHub setup for the
main CUE repositroy, we need to establish a new serverless function.
By eliminating the playground dependency and housing the code alongside
the cuelang.org site, we can simplify our approach to serverless
functions, making the addition of future serverless functions easier,
including that required to improve the GerritHub -> GitHub setup.

This change is incorporates the playground alongside the cuelang.org
site. The contents of play/ very closely mirror the contents of
github.com/cue-sh/[email protected]. We
detail exceptions below. There is still tidy up that can be done, but in
the interests of making minimal changes to the playground code, that is
deferred for a later date.

* .gitignore and .gitattributes files from both repositories have been
  consolidated.
* The GitHub test workflow from both repositories have been combined.
* /serve.sh is added to make it easier to locally serve the cuelang.org
  Hugo site.
* /play/run.sh is added to make it easier to locally serve the
  playground.
* /build.sh is changed to reflect the fact the playground is no longer a
  dependency.
* /play/README.md is identical to the playground source version, but for
  dropping the instructions on how to develop and run the playground
  locally, instructions which are now in /README.md
* /play/dist.sh is slimmed down to reflect the code that was extracted
  to /play/run.sh.
* /play/go.{mod,sum} are exact copies of the playground originals, but
  appear as diffs because /play/go.{mod,sum} existed before in order to
  capture the playground dependency separate from the cuelang.org
  dependencies. This previously allowed the playground to depend on a
  different version of CUE to the main cuelang.org site. This separation
  remains now the playground is incorporated with the main cuelang.org
  site for the same reason.
* /play/package.json is changed slightly to drop the "serve" command for
  the playground in favour of "watch". Because with the playground
  incorporated into the cuelang.org repo, the hugo command serves the site
  and playground. Hence the watch command allows for hot-reload-like
  behaviour of the playground portion.
* /play/tools.go is deleted; this was previously used to declare the
  dependency on the playground module.
* /play/webpack.config.js is updated from the playground original to
  alter the distRoot.
* The cuelang.org README is updated to reflect how to develop and run
  the entire site locally, including the playground.

Further tidy up is captured in cue-lang#253.

For reference, the diff between the playground repo at 4c9bd3cf5202 and
the play directory in this commit is:

    Only in /home/myitcv/cue/playground/: .git
    Only in /home/myitcv/cue/playground/: .gitattributes
    Only in /home/myitcv/cue/playground/: .github
    Only in /home/myitcv/cue/playground/: .gitignore
    Only in /home/myitcv/cue/playground/: LICENSE
    diff -ruw /home/myitcv/cue/playground/README.md ./play/README.md
    --- /home/myitcv/cue/playground/README.md	2021-11-22 12:33:13.436985000 +0000
    +++ ./play/README.md	2022-07-05 19:41:07.728977874 +0100
    @@ -20,16 +20,6 @@

     The output is the JSON-marshalled result of the CUE input.

    -### Developing locally
    -
    -To develop the application locally, within the `play` directory at the repo root:
    -
    -```bash
    -# Running dist.sh outside of the netlify environment works in a "dev"
    -# mode which ultimately results in running npm run serve
    -./dist.sh
    -```
    -
     ### Details

     * The TypeScript single-page application entry point is `src/index.tsx`
    diff -ruw /home/myitcv/cue/playground/dist.sh ./play/dist.sh
    --- /home/myitcv/cue/playground/dist.sh	2022-07-05 09:28:18.932827261 +0100
    +++ ./play/dist.sh	2022-07-12 05:35:19.806465471 +0100
    @@ -15,12 +15,6 @@
     #   be a cache directory
     # * CUELANG_ORG_DIST - the directory into which we should run dist

    -if [ "${NETLIFY:-}" != "true" ]
    -then
    -	trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT
    -	echo "running in 'dev' mode"
    -fi
    -
     # cd to the directory containing the script
     cd "$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

    @@ -48,27 +42,13 @@
            rsync -a node_modules/ $NETLIFY_BUILD_BASE/cache/playground_node_modules
     fi

    -if [ "${NETLIFY:-}" == "true" ]
    -then
            # Dist
            echo "Install serverless functions"
            go install -tags netlify github.com/cue-sh/playground/functions/snippets
    -else
    -	echo "Running serverless functions (background)"
    -	go run github.com/cue-sh/playground/functions/snippets >/dev/null 2>&1 &
    -fi

     echo "Building WASM backend"
     GOOS=js GOARCH=wasm go build -o main.wasm
     cp $(go env GOROOT)/misc/wasm/wasm_exec.js ./src

    -
    -if [ "${NETLIFY:-}" == "true" ]
    -then
            echo "Running dist into $CUELANG_ORG_DIST"
            npm run dist
    -else
    -	echo "Running npm run serve"
    -	npm run serve
    -fi
    -
    diff -ruw /home/myitcv/cue/playground/package.json ./play/package.json
    --- /home/myitcv/cue/playground/package.json	2022-07-05 20:38:34.100123583 +0100
    +++ ./play/package.json	2022-07-05 14:08:15.559549887 +0100
    @@ -31,6 +31,6 @@
            "name": "CUEplayground",
            "scripts": {
                    "dist": "webpack --mode production",
    -		"serve": "webpack serve --mode development --open --hot --host 0.0.0.0"
    +		"watch": "webpack watch --mode development"
            }
     }
    Only in ./play/: run.sh
    diff -ruw /home/myitcv/cue/playground/webpack.config.js ./play/webpack.config.js
    --- /home/myitcv/cue/playground/webpack.config.js	2022-07-05 20:38:34.100123583 +0100
    +++ ./play/webpack.config.js	2022-07-12 05:35:19.810465740 +0100
    @@ -2,7 +2,7 @@

     const HtmlWebpackPlugin = require('html-webpack-plugin');

    -let distRoot = __dirname;
    +let distRoot = path.join(__dirname, "..");

     if (process.env.NETLIFY == "true") {
            if (process.env.CUELANG_ORG_DIST == "") {

Signed-off-by: Paul Jolly <[email protected]>
@myitcv myitcv added the zGarden label Jun 15, 2023
@mvdan mvdan removed the zGarden label Feb 8, 2024
@jbcpollak
Copy link

Just found this because it was extremely confusing that an integer was not validating with the following cue:

given:

scaling_factor: float | *0.5

this was not valid:

{
   "scaling_factor": 3
}

which seems wrong to me.

I appreciate the complexity of this issue and typing, but its a bit frustrating to have to specify scaling_factor: int | float | *0.5, since all ints are implicitly floats.

@beyondan
Copy link

beyondan commented Sep 4, 2024

Curious if any work is being done regarding this? We are currently having to educate everyone to use number instead of float everywhere, but it's obviously not a clean solution.

Or is there a way to parse all float types as number at compile time using go sdk? Something like...

ctx := cuecontext.New()
instances := load.Instances([]string{"foo.cue"}, nil)
v := ctx.BuildInstance(instances[0])

// Everything below is from imagination, looking for something like this:
v.Walk(func(x cue.Value) {
    if x.Kind() == cue.FloatKind {
        x.Kind = cue.NumberKind
    }
})

I've tried digging through the source code for a bit, and it seems like

a: number

compiles to an Ident node with Name = "number", and I assume this node can be conjuncted with BasicLit node with either Kind = FloatKind or Kind = IntKind (total guess). In any case, I'm having trouble coming up with a simple workaround for this issue, and would greatly appreciate any help if possible.

cueckoo pushed a commit that referenced this issue Sep 10, 2024
JSON does not make a distinction between floats and ints,
and there are tests in the jsonschema test suite that check that
it's OK to use floating point literals in primitives that expect
an integer.

So accept whole floats anywhere we need a uint.

Relevant issue: #253.

Signed-off-by: Roger Peppe <[email protected]>
Change-Id: I44632a6eadd2aeaff626b27ca6536d4aa47035aa
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1200939
Unity-Result: CUE porcuepine <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Reviewed-by: Daniel Martí <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pre v1.0 Proposal roadmap/language-changes Specific tag for roadmap issue #339
Projects
None yet
Development

No branches or pull requests

5 participants