Change to deterministic 64 bit type_ids for cross-binary serialisation #2949

dipinhora · 2018-11-28T14:44:20Z

The goal of this PR/change is to make cross binary serialisation between different programs built using the same Pony compiler version possible via deterministic 64 bit type_ids.

This PR has the following open questions:

Performance impact of runtime type information lookup that used to be based on type_id numbering and bitwise operations on type_ids but is now based on a field in the type descriptor
Open questions about an edge case in pony_serialise_offset (see TODO added into that function)
type_ids are currently based on a hash of the reach_type_t->name but should ideally be based on the AST of the type instead
What to do about/how to handle 32 bit platforms
possibly other items I haven't considered

I've marked this as DO NOT MERGE until the open questions have been addressed.


This commit changes to using 64 deterministic type_ids for 64 bit
platforms.

NOTE: 32 bit platforms inherit most of the changes except for
the deterministic 64 bit type_ids (mainly due to serialisation /
deserialisation issues due to pointers on 32 bit platforms being
only 32 bits). Fixing this requires changing the serialisation
format for 32 bit platforms to not match the in-memory format
for types. This change is left as future work due to time
constraints. Hashing the AST tree for a type to generate its
type_id has also been left as future work due to time
constraints.

Changes include:
* changing to 64 bit type_ids
* changing all compile time and run time tests based on type_id
  numbering to now use `descriptor` based checks instead (the
  `descriptor` now holds additional information accoridngly).
  This change will likely had a performance impact which I have
  not quantified.
* remove numeric size table (the info is now part of the type
  descriptor)
* hashes the reach_type_t->name to generate the 64 bit type_id for
  64 bit platforms; 32 bit platforms get sequential numbers for
  type_id's
* changes the `key` for siphash to be based on the `md5` of the
  pony version

This commit changes to using 64 deterministic type_ids for 64 bit platforms. NOTE: 32 bit platforms inherit most of the changes except for the deterministic 64 bit type_ids (mainly due to serialisation / deserialisation issues due to pointers on 32 bit platforms being only 32 bits). Fixing this requires changing the serialisation format for 32 bit platforms to not match the in-memory format for types. This change is left as future work due to time constraints. Hashing the AST tree for a type to generate its type_id has also been left as future work due to time constraints. Changes include: * changing to 64 bit type_ids * changing all compile time and run time tests based on type_id numbering to now use `descriptor` based checks instead (the `descriptor` now holds additional information accoridngly). This change will likely had a performance impact which I have not quantified. * remove numeric size table (the info is now part of the type descriptor) * hashes the reach_type_t->name to generate the 64 bit type_id for 64 bit platforms; 32 bit platforms get sequential numbers for type_id's * changes the `key` for siphash to be based on the `md5` of the pony version

jemc · 2018-11-28T14:49:41Z

Performance impact of runtime type information lookup that used to be based on type_id numbering and bitwise operations on type_ids but is now based on a field in the type descriptor

Instead of changing the meaning of the type_id field, have you considered adding a new field to act as the deterministic id for serialisation purposes, and leaving the current type_id logic alone?

dipinhora · 2018-11-28T14:54:24Z

@jemc I have not. That's a good idea. I'll think about it and see how that might work.

dipinhora · 2018-12-04T17:17:22Z

src/libponyrt/gc/serialise.c

+  // twiddles.
+  // See TODO in pony_deserialise_offset for related issue
+
+  // If we are not in the map, we are an untraced primitive. Return the


@sylvanc would you be able to provide input on when this code path might be taken? (it would also be great if you were able to review this PR overall)

The high bit is set when serialising a primitive. In pony_deserialise_offset we use that to know that we can return the constant instance for the primitive type. So this can definitely happen.

SeanTAllen · 2019-01-22T17:19:34Z

The debug builds on Windows appear to be failing.

chalcolith · 2019-01-28T02:58:17Z

There's an LLVM type assertion that's thrown at gendesc.c:114:

Assertion failed: CastInst::castIsValid(Instruction::BitCast, C, DstTy) && "Invalid constantexpr bitcast!", file C:\Users\Gordon\Dev\Pony\ponyc-windows-libs\build\src\llvm-3.9.1.src\lib\IR\Constants.cpp, line 1703

I'm investigating this...

Also, the changes to wscript need Python 3 support:

diff --git a/wscript b/wscript
index 373c8c5a9..95a17dbe1 100644
--- a/wscript
+++ b/wscript
@@ -37,7 +37,10 @@ with open('VERSION') as v:

# build ponyc version md5
import hashlib
-temp_md5 = hashlib.md5(VERSION).hexdigest()
+if (sys.version_info > (3,0)):
+    temp_md5 = hashlib.md5(VERSION.encode('utf-8')).hexdigest()
+else:
+    temp_md5 = hashlib.md5(VERSION).hexdigest()
VERSION_FORMATTED_MD5 = "0x" + ",0x".join([temp_md5[i:i+2] for i in range(0, len(temp_md5), 2)])

# source and build directories

chalcolith · 2019-01-28T02:59:01Z

The reason the CI fails is that Windows usually pops up a dialog for assertions in debug mode. This is actually fixed in master, but it's fortuitous in this case :-)

chalcolith · 2019-01-31T02:15:06Z

@dipinhora the following patch fixes Windows builds for me:

diff --git a/src/libponyc/codegen/gendesc.c b/src/libponyc/codegen/gendesc.c
index 54a06cf2d..c0393094f 100644
--- a/src/libponyc/codegen/gendesc.c
+++ b/src/libponyc/codegen/gendesc.c
@@ -539,7 +539,7 @@ void gendesc_table_lookup(compile_t* c)
  codegen_finishfun(c);

  c->desc_table_offset_lookup_fn = make_desc_ptr(desc_lkp_fn,
-    c->descriptor_offset_lookup_type);
+    c->descriptor_offset_lookup_fn);
}

static LLVMValueRef desc_field(compile_t* c, LLVMValueRef desc, int index)
diff --git a/wscript b/wscript
index 373c8c5a9..95a17dbe1 100644
--- a/wscript
+++ b/wscript
@@ -37,7 +37,10 @@ with open('VERSION') as v:

# build ponyc version md5
import hashlib
-temp_md5 = hashlib.md5(VERSION).hexdigest()
+if (sys.version_info > (3,0)):
+    temp_md5 = hashlib.md5(VERSION.encode('utf-8')).hexdigest()
+else:
+    temp_md5 = hashlib.md5(VERSION).hexdigest()
VERSION_FORMATTED_MD5 = "0x" + ",0x".join([temp_md5[i:i+2] for i in range(0, len(temp_md5), 2)])

# source and build directories

SeanTAllen · 2019-05-19T21:18:30Z

Did you intend to close this @dipinhora?

…

On Sun, May 19, 2019, 15:10 Dipin Hora ***@***.***> wrote: Closed #2949 <#2949>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2949?email_source=notifications&email_token=AABPIPGGCI34525R3J2W6R3PWGX4TA5CNFSM4GG67R62YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGORQVMEJI#event-2351612453>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABPIPD33NNV44XN2ROKFPTPWGX4TANCNFSM4GG67R6Q> .

dipinhora · 2019-05-20T06:46:07Z

@SeanTAllen yes. Might get reopened or a new one created at some point in the future if I get back to this again.

SeanTAllen · 2019-05-20T19:48:20Z

Was there a bug in it other than the Windows issue that Gordon came up with a patch for?

…

On Mon, May 20, 2019, 01:46 Dipin Hora ***@***.***> wrote: @SeanTAllen <https://github.com/SeanTAllen> yes. Might get reopened or a new one created at some point in the future if I get back to this again. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2949?email_source=notifications&email_token=AABPIPD3RYUHYVGNJQYOQLDPWJCLBA5CNFSM4GG67R62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVX26BA#issuecomment-493858564>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABPIPEP6BQXK53LXFDKAELPWJCLBANCNFSM4GG67R6Q> .

dipinhora · 2019-05-20T20:34:52Z

There's the windows thing. And there are the open questions I listed in the original comment when I opened this PR.

Aside from the progress Gordon made regarding Windows, there hasn't been much progress or clarification around the open questions. Given that I am not likely to look at this in the near future, it seemed best to close the PR rather than leave it sitting stale but open.

dipinhora added the do not merge This PR should not be merged at this time label Nov 28, 2018

dipinhora force-pushed the deterministic_typeids branch from 7f2981c to 1aa0ea3 Compare November 30, 2018 05:11

Revert type_id to 32 bit; add 64 bit serialise_id

3b9c8bb

dipinhora force-pushed the deterministic_typeids branch from 1aa0ea3 to 3b9c8bb Compare November 30, 2018 05:15

dipinhora commented Dec 4, 2018

View reviewed changes

chalcolith mentioned this pull request Feb 5, 2019

Change to deterministic 64 bit type_ids for cross-binary serialisation #3002

Closed

dipinhora closed this May 19, 2019

dipinhora mentioned this pull request Dec 5, 2024

Deterministic serialisation for cross binary communication #4567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change to deterministic 64 bit type_ids for cross-binary serialisation #2949

Change to deterministic 64 bit type_ids for cross-binary serialisation #2949

dipinhora commented Nov 28, 2018

jemc commented Nov 28, 2018

dipinhora commented Nov 28, 2018

dipinhora Dec 4, 2018

sylvanc Dec 11, 2018

SeanTAllen commented Jan 22, 2019

chalcolith commented Jan 28, 2019

chalcolith commented Jan 28, 2019

chalcolith commented Jan 31, 2019

SeanTAllen commented May 19, 2019 via email

dipinhora commented May 20, 2019

SeanTAllen commented May 20, 2019 via email

dipinhora commented May 20, 2019

Change to deterministic 64 bit type_ids for cross-binary serialisation #2949

Change to deterministic 64 bit type_ids for cross-binary serialisation #2949

Conversation

dipinhora commented Nov 28, 2018

jemc commented Nov 28, 2018

dipinhora commented Nov 28, 2018

dipinhora Dec 4, 2018

Choose a reason for hiding this comment

sylvanc Dec 11, 2018

Choose a reason for hiding this comment

SeanTAllen commented Jan 22, 2019

chalcolith commented Jan 28, 2019

chalcolith commented Jan 28, 2019

chalcolith commented Jan 31, 2019

SeanTAllen commented May 19, 2019 via email

dipinhora commented May 20, 2019

SeanTAllen commented May 20, 2019 via email

dipinhora commented May 20, 2019