Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Builtin SHA256 hashing #6977

Draft
wants to merge 57 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
81ac895
Adds files and first function
Jun 26, 2024
48fffa0
zig functions
MatthewJohnHeath Jul 1, 2024
fff57ae
Adds crypt to main.zig
MatthewJohnHeath Jul 2, 2024
1bde780
Converts to pointers
MatthewJohnHeath Jul 4, 2024
8168545
Zig functions export
MatthewJohnHeath Jul 5, 2024
881b1be
WIP broken. trying to do plumbing
MatthewJohnHeath Jul 6, 2024
cb294ee
WIP filling in missing match cases
MatthewJohnHeath Jul 6, 2024
07e0c28
Filling in missing matches before rebase
MatthewJohnHeath Jul 7, 2024
883d580
Fixed rebase
MatthewJohnHeath Jul 7, 2024
0ab4f45
Adds case staements for lowlevel to compile
MatthewJohnHeath Jul 7, 2024
315c396
switch out bad pointer rep
MatthewJohnHeath Aug 9, 2024
0824029
Fix Crypt builtin
smores56 Aug 30, 2024
277bc0b
Fix formatting
smores56 Aug 30, 2024
2f986f6
Adding digest inspectors
MatthewJohnHeath Sep 2, 2024
2ef3af3
Revert "Adding digest inspectors"
MatthewJohnHeath Sep 3, 2024
44243ba
Adds functions to access digest
MatthewJohnHeath Sep 4, 2024
a37dc00
add missing module import
MatthewJohnHeath Sep 4, 2024
ebbc440
Fixes structured binding in digest256ToBytes
MatthewJohnHeath Sep 9, 2024
eae0c5e
Name changes
MatthewJohnHeath Sep 10, 2024
61db153
Tidy formatting
MatthewJohnHeath Sep 10, 2024
302d5c3
Attempt to fix formatting
MatthewJohnHeath Sep 10, 2024
43bfd8a
Ran fomatter
MatthewJohnHeath Sep 10, 2024
34fb4cb
tests
MatthewJohnHeath Sep 12, 2024
75f3420
Docs for exposed functions and types
MatthewJohnHeath Sep 12, 2024
32fa02b
Fix spelling and zig fmt
MatthewJohnHeath Sep 12, 2024
2acbbd2
Response to review on comment
MatthewJohnHeath Sep 13, 2024
5fe4d61
Fixes typo in test
MatthewJohnHeath Sep 13, 2024
638d214
update mono tests
smores56 Oct 8, 2024
5e3f502
rename Crypt to Crypto
lukewilliamboswell Nov 19, 2024
d920853
fix LLVM issues
lukewilliamboswell Nov 20, 2024
e54ff61
add test
MatthewJohnHeath Nov 29, 2024
0f37191
Removed unwanted line
MatthewJohnHeath Nov 29, 2024
6a07cb4
correct test
MatthewJohnHeath Nov 29, 2024
307d6a7
test doing its job. alignment is broken
MatthewJohnHeath Nov 29, 2024
0028062
location type switched. local test added
MatthewJohnHeath Nov 30, 2024
1cc7a8e
test add bytes
MatthewJohnHeath Nov 30, 2024
070a312
comment
MatthewJohnHeath Nov 30, 2024
5e7cc8d
added length check to sameBytesAsHex
MatthewJohnHeath Dec 1, 2024
463d694
Test for digest
MatthewJohnHeath Dec 1, 2024
c034933
Applied hint
MatthewJohnHeath Dec 1, 2024
4035953
Switches u128Bytes to little-endian
MatthewJohnHeath Dec 1, 2024
861c75f
roc format Crypto.roc
lukewilliamboswell Dec 7, 2024
608f787
syntax and mono updates
lukewilliamboswell Dec 7, 2024
056b918
Removed alloc calls from zig code
MatthewJohnHeath Dec 8, 2024
cd51cdb
Switched type in Roc
MatthewJohnHeath Dec 8, 2024
8c47be1
removed unwrap from rust glue
MatthewJohnHeath Dec 9, 2024
e614ffa
switched hasher ot be passed as owned
MatthewJohnHeath Dec 9, 2024
19ffe1a
cargo fmt
MatthewJohnHeath Dec 9, 2024
60379ce
roc format
MatthewJohnHeath Dec 9, 2024
b8f429f
back to pointer version
MatthewJohnHeath Dec 9, 2024
bbc7b58
changes produced in test suite
MatthewJohnHeath Dec 9, 2024
40eb6fc
removes unwrap
MatthewJohnHeath Dec 9, 2024
83ae431
both versions of zig
MatthewJohnHeath Dec 9, 2024
f80f009
WIP threading through in place version
MatthewJohnHeath Dec 10, 2024
bed0497
re added call with unwrap of struct type
MatthewJohnHeath Dec 10, 2024
cc9be6d
more "dummy" experiment nonsense
MatthewJohnHeath Dec 10, 2024
1f30fe0
roll back to pointer only
MatthewJohnHeath Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ expression: cli_test_out.normalize_stdout_and_stderr()
---
Compiled in <ignored for test> ms.

Direct.roc:
0 failed and 2 passed in <ignored for test> ms.

Transitive.roc:
0 failed and 1 passed in <ignored for test> ms.

Direct.roc:
0 failed and 2 passed in <ignored for test> ms.
91 changes: 91 additions & 0 deletions crates/compiler/builtins/bitcode/src/crypto.zig
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
const std = @import("std");
const builtin = @import("builtin");
const crypto = std.crypto;
const sha2 = crypto.hash.sha2;
const list = @import("list.zig");
const utils = @import("utils.zig");
const testing = std.testing;

const Sha256 = extern struct {
location: *sha2.Sha256,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pointer because:

  • The native Zig sha256 object doesn't have a defined layout and so Zig makes it (at least) difficult to pass the actual bytes in and out through FFI.
  • Given the possible use case of adding streaming data a few bytes at a time, it seems like a good candidate for opportunistic mutation

};

fn create(comptime T: type) *T {
//test_roc_alloc ignores alignment
if (builtin.is_test) {
return std.testing.allocator.create(T) catch unreachable;
}
return @alignCast(@ptrCast(utils.allocateWithRefcount(@sizeOf(sha2.Sha256), @alignOf(sha2.Sha256), false)));
}

pub fn emptySha256() callconv(.C) Sha256 {
const location: *sha2.Sha256 = create(sha2.Sha256);
location.* = sha2.Sha256.init(.{});
return Sha256{
.location = location,
};
}

test "emptySha256" {
const empty_sha = emptySha256();
defer std.testing.allocator.destroy(empty_sha.location);
const empty_hash = empty_sha.location.*.peek();
try std.testing.expect(sameBytesAsHex("e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", empty_hash[0..empty_hash.len]));
}

pub fn sha256AddBytes(sha: Sha256, data: list.RocList) callconv(.C) Sha256 {
const out = emptySha256();
out.location.* = sha.location.*;
if (data.bytes) |bytes| {
const byteSlice: []u8 = bytes[0..data.length];
out.location.*.update(byteSlice);
}
return out;
}

test "sha256AddBytes" {
const empty_sha = emptySha256();
defer std.testing.allocator.destroy(empty_sha.location);
const abc = list.RocList.fromSlice(u8, "abc", false);
defer abc.decref(@alignOf(u8), @sizeOf(u8), false, rcNone);
const abc_sha = sha256AddBytes(empty_sha, abc);
defer std.testing.allocator.destroy(abc_sha.location);
const abc_hash = abc_sha.location.*.peek();
try std.testing.expect(sameBytesAsHex("ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad", abc_hash[0..abc_hash.len]));
}

pub const Digest256 = extern struct {
Copy link
Contributor Author

@MatthewJohnHeath MatthewJohnHeath Dec 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently a struct with as few fundamental values in it as possible, because that is more convenient on the Roc side, and it is maybe clearest to keep them the same. I can see a case for [16] u8instead

first_half: u128,
second_half: u128,
};

pub fn sha256Digest(sha: Sha256) callconv(.C) Digest256 {
return @bitCast(sha.location.*.peek());
}

test "sha256Digest" {
const empty_sha = emptySha256();
defer std.testing.allocator.destroy(empty_sha.location);
const digest = sha256Digest(empty_sha);
const first_half_bytes: [16]u8 = @bitCast(digest.first_half);
const second_half_bytes: [16]u8 = @bitCast(digest.second_half);
try std.testing.expect(sameBytesAsHex("e3b0c44298fc1c149afbf4c8996fb924", first_half_bytes[0..first_half_bytes.len]));
try std.testing.expect(sameBytesAsHex("27ae41e4649b934ca495991b7852b855", second_half_bytes[0..second_half_bytes.len]));
}
//----------------test utilities ------------------------
fn rcNone(_: ?[*]u8) callconv(.C) void {}

fn sameBytesAsHex(comptime expected_hex: [:0]const u8, input: []const u8) bool {
if (expected_hex.len != 2 * input.len) {
return false;
}

for (input, 0..) |input_byte, i| {
const hex_byte = std.fmt.parseInt(u8, expected_hex[2 * i .. 2 * i + 2], 16) catch unreachable;
if (hex_byte != input_byte) {
return false;
}
}

return true;
}
11 changes: 11 additions & 0 deletions crates/compiler/builtins/bitcode/src/main.zig
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ const ROC_BUILTINS = "roc_builtins";
const NUM = "num";
const STR = "str";

// Crypt Module
const crypto = @import("crypto.zig");
comptime {
exportCryptoFn(crypto.emptySha256, "emptySha256");
exportCryptoFn(crypto.sha256AddBytes, "sha256AddBytes");
exportCryptoFn(crypto.sha256Digest, "sha256Digest");
}

// Dec Module
const dec = @import("dec.zig");

Expand Down Expand Up @@ -389,6 +397,9 @@ fn exportListFn(comptime func: anytype, comptime func_name: []const u8) void {
fn exportDecFn(comptime func: anytype, comptime func_name: []const u8) void {
exportBuiltinFn(func, "dec." ++ func_name);
}
fn exportCryptoFn(comptime func: anytype, comptime func_name: []const u8) void {
exportBuiltinFn(func, "crypto." ++ func_name);
}

fn exportUtilsFn(comptime func: anytype, comptime func_name: []const u8) void {
exportBuiltinFn(func, "utils." ++ func_name);
Expand Down
133 changes: 133 additions & 0 deletions crates/compiler/builtins/roc/Crypto.roc
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
module [
emptySha256,
sha256AddBytes,
sha256Digest,
hashSha256,
digest256ToBytes,
Sha256,
Digest256,
]

import Bool exposing [Eq]
import List
import Num exposing [U8, U64, U128]
import Result
import Str

## Represents the state of a SHA-256 cryptographic hashing function, after some (or no) data has been added to the hash.
Sha256 := { location : U64 }

## Represents the digest of some data produced by the SHA-256 cryptographic hashing function as an opaque type.

## `Digest256` implements the `Eq` ability.
Digest256 := { firstHalf : U128, secondHalf : U128 } implements [Eq]

## Returns an empty SHA-256 hasher.
emptySha256 : {} -> Sha256

## Adds bytes of data to be hashed by a SHA-256 hasher..
sha256AddBytes : Sha256, List U8 -> Sha256

## Returns the digest of the cryptographic hashing function represented by a SHA-256 hasher..
sha256Digest : Sha256 -> Digest256

## Applies the SHA-256 cryptographic hashing function to some bytes.
hashSha256 : List U8 -> Digest256
hashSha256 = \bytes -> emptySha256 {} |> sha256AddBytes bytes |> sha256Digest

# Assumes little-endian. Probably shouldn't.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume roc should be able to compile to big-endian architectures, too, Since, I can't see anyway of getting this information in roc-programmer space, I think some of this functionality will need pushing to zig

u128Bytes : U128 -> List U8
u128Bytes = \number ->
loop = \n, bytes, place ->
if place == 16 then
bytes
else
newByte = n |> Num.bitwiseAnd 255 |> Num.toU8
loop (Num.shiftRightBy n 8) (List.append bytes newByte) (place + 1)
loop number [] 0

expect
bytes1 = u128Bytes 1
bytes1 == [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

expect
bytes257 = u128Bytes 0x000102030405060708090a0b0c0d0e0f
bytes257 == [15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

## Returns the bytes of a SHA-256 digest as a list.
digest256ToBytes : Digest256 -> List U8
digest256ToBytes = \@Digest256 { firstHalf, secondHalf } ->
List.concat (u128Bytes firstHalf) (u128Bytes secondHalf)

# test data taken from https://ziglang.org/documentation/0.11.0/std/src/std/crypto/sha2.zig.html#L434
digestBytesOfEmpty : List U8
digestBytesOfEmpty = fromHexString "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"

digestBytesOfAbc : List U8
digestBytesOfAbc = fromHexString "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad"

digestBytesOfLong : List U8
digestBytesOfLong = fromHexString "cf5b16a778af8380036ce59e7b0492370b249b11e8f07a51afac45037afee9d1"

expect
data : List U8
data = []
want = digestBytesOfEmpty
got = data |> hashSha256 |> digest256ToBytes
want == got

expect
data = ['a', 'b', 'c']
want = digestBytesOfAbc
got = data |> hashSha256 |> digest256ToBytes
want == got

expect
data = Str.toUtf8 "abcdefghbcdefghicdefghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu"
want = digestBytesOfLong
got = data |> hashSha256 |> digest256ToBytes
want == got

expect
want = digestBytesOfEmpty
got = emptySha256 {} |> sha256Digest |> digest256ToBytes
want == got

expect
data = ['a', 'b', 'c']
want = digestBytesOfAbc
got =
emptySha256 {}
|> sha256AddBytes data
|> sha256Digest
|> digest256ToBytes
want == got

expect
want = digestBytesOfAbc
got =
emptySha256 {}
|> sha256AddBytes ['a']
|> sha256AddBytes ['b']
|> sha256AddBytes ['c']
|> sha256Digest
|> digest256ToBytes
want == got

fromHexString : Str -> List U8
fromHexString = \hex ->
fromHexDigit = \smallNumber ->
if smallNumber <= '9' then
smallNumber - '0'
else
smallNumber - 'a' + 10

fromHexDigits = \pair ->
first = pair |> List.first |> Result.withDefault 0
second = pair |> List.get 1 |> Result.withDefault 0
16 * (fromHexDigit first) + (fromHexDigit second)

hex
|> Str.toUtf8
|> List.chunksOf 2
|> List.map fromHexDigits
1 change: 1 addition & 0 deletions crates/compiler/builtins/roc/main.roc
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ package [
Box,
Inspect,
Task,
Crypto,
] {}
4 changes: 4 additions & 0 deletions crates/compiler/builtins/src/bitcode.rs
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,10 @@ pub const DEC_ROUND: IntrinsicName = int_intrinsic!("roc_builtins.dec.round");
pub const DEC_FLOOR: IntrinsicName = int_intrinsic!("roc_builtins.dec.floor");
pub const DEC_CEILING: IntrinsicName = int_intrinsic!("roc_builtins.dec.ceiling");

pub const CRYPTO_EMPTY_SHA256: &str = "roc_builtins.crypto.emptySha256";
pub const CRYPTO_SHA256_ADD_BYTES: &str = "roc_builtins.crypto.sha256AddBytes";
pub const CRYPTO_SHA256_DIGEST: &str = "roc_builtins.crypto.sha256Digest";

pub const UTILS_DBG_IMPL: &str = "roc_builtins.utils.dbg_impl";
pub const UTILS_TEST_PANIC: &str = "roc_builtins.utils.test_panic";
pub const UTILS_ALLOCATE_WITH_REFCOUNT: &str = "roc_builtins.utils.allocate_with_refcount";
Expand Down
2 changes: 2 additions & 0 deletions crates/compiler/builtins/src/roc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pub fn module_source(module_id: ModuleId) -> &'static str {
ModuleId::HASH => HASH,
ModuleId::INSPECT => INSPECT,
ModuleId::TASK => TASK,
ModuleId::CRYPTO => CRYPTO,
_ => internal_error!(
"ModuleId {:?} is not part of the standard library",
module_id
Expand All @@ -37,3 +38,4 @@ const DECODE: &str = include_str!("../roc/Decode.roc");
const HASH: &str = include_str!("../roc/Hash.roc");
const INSPECT: &str = include_str!("../roc/Inspect.roc");
const TASK: &str = include_str!("../roc/Task.roc");
const CRYPTO: &str = include_str!("../roc/Crypto.roc");
4 changes: 4 additions & 0 deletions crates/compiler/can/src/builtins.rs
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,10 @@ map_symbol_to_lowlevel_and_arity! {
NumF32FromParts; NUM_F32_FROM_PARTS; 1,
NumF64FromParts; NUM_F64_FROM_PARTS; 1,

CryptoEmptySha256; CRYPTO_EMPTY_SHA_256; 1,
CryptoSha256AddBytes; CRYPTO_SHA256_ADD_BYTES; 2,
CryptoSha256Digest; CRYPTO_SHA256_DIGEST; 1,

Eq; BOOL_STRUCTURAL_EQ; 2,
NotEq; BOOL_STRUCTURAL_NOT_EQ; 2,
And; BOOL_AND; 2,
Expand Down
15 changes: 15 additions & 0 deletions crates/compiler/gen_dev/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2216,6 +2216,21 @@ trait Backend<'a> {
self.build_fn_call(sym, intrinsic, args, arg_layouts, ret_layout)
}

LowLevel::CryptoEmptySha256 => {
let intrinsic = bitcode::CRYPTO_EMPTY_SHA256.to_string();
self.build_fn_call(sym, intrinsic, args, arg_layouts, ret_layout);
}

LowLevel::CryptoSha256AddBytes => {
let intrinsic = bitcode::CRYPTO_SHA256_ADD_BYTES.to_string();
self.build_fn_call(sym, intrinsic, args, arg_layouts, ret_layout);
}

LowLevel::CryptoSha256Digest => {
let intrinsic = bitcode::CRYPTO_SHA256_DIGEST.to_string();
self.build_fn_call(sym, intrinsic, args, arg_layouts, ret_layout);
}

x => todo!("low level, {:?}", x),
}
}
Expand Down
27 changes: 27 additions & 0 deletions crates/compiler/gen_llvm/src/llvm/lowlevel.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1431,6 +1431,33 @@ pub(crate) fn run_low_level<'a, 'ctx>(

call_bitcode_fn(env, &[], bitcode::UTILS_DICT_PSEUDO_SEED)
}
CryptoEmptySha256 => call_bitcode_fn(env, &[], bitcode::CRYPTO_EMPTY_SHA256),
CryptoSha256AddBytes => {
// Crypto.sha256AddBytes : Sha256, List U8 -> Sha256
arguments!(sha, data);

let list_ptr = create_entry_block_alloca(env, data.get_type(), "list_alloca");
env.builder.new_build_store(list_ptr, data);

call_bitcode_fn(
env,
&[sha, list_ptr.into()],
bitcode::CRYPTO_SHA256_ADD_BYTES,
)
}
CryptoSha256Digest => {
// Crypto.sha256Digest : Sha256 -> Digest256
arguments!(sha);

call_bitcode_fn_fixing_for_convention(
env,
layout_interner,
env.module.get_struct_type("crypto.Digest256").unwrap(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it is this

&[sha],
layout,
bitcode::CRYPTO_SHA256_DIGEST,
)
}

ListIncref | ListDecref | SetJmp | LongJmp | SetLongJmpBuffer => {
unreachable!("only inserted in dev backend codegen")
Expand Down
9 changes: 9 additions & 0 deletions crates/compiler/gen_wasm/src/low_level.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2163,6 +2163,15 @@ impl<'a> LowLevelCall<'a> {
NumF64ToParts => self.load_args_and_call_zig(backend, bitcode::NUM_F64_TO_PARTS),
NumF32FromParts => self.load_args_and_call_zig(backend, bitcode::NUM_F32_FROM_PARTS),
NumF64FromParts => self.load_args_and_call_zig(backend, bitcode::NUM_F64_FROM_PARTS),
// Crypto
CryptoEmptySha256 => self.load_args_and_call_zig(backend, bitcode::CRYPTO_EMPTY_SHA256),
CryptoSha256AddBytes => {
self.load_args_and_call_zig(backend, bitcode::CRYPTO_SHA256_ADD_BYTES)
}
CryptoSha256Digest => {
self.load_args_and_call_zig(backend, bitcode::CRYPTO_SHA256_DIGEST)
}

And => {
self.load_args(backend);
backend.code_builder.i32_and();
Expand Down
1 change: 1 addition & 0 deletions crates/compiler/load/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ const MODULES: &[(ModuleId, &str)] = &[
(ModuleId::HASH, "Hash.roc"),
(ModuleId::INSPECT, "Inspect.roc"),
(ModuleId::TASK, "Task.roc"),
(ModuleId::CRYPTO, "Crypto.roc"),
];

fn main() {
Expand Down
Loading