Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pointers / links in ASDL #1167

Open
certik opened this issue Sep 30, 2022 · 5 comments
Open

Pointers / links in ASDL #1167

certik opened this issue Sep 30, 2022 · 5 comments

Comments

@certik
Copy link
Contributor

certik commented Sep 30, 2022

@czgdp1807 had a great idea here how to improve GoTo / GoToTarget: #1163 (comment)

Change GoTo(int target_id, identifier name) to GoTo(stmt_t goto_target). Then one can do ASR::down_cast<ASR::GoToTarget>(goto_object->goto_target)->m_name. So it would look like this:

diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index 2f120d29e..2e71a694e 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -169,10 +169,10 @@ stmt
         -- GoTo points to a GoToTarget with the corresponding target_id within
         -- the same procedure. We currently use `int` IDs to link GoTo with
         -- GoToTarget to avoid issues with serialization.
-    | GoTo(int target_id)
+    | GoTo(stmt goto_target)
         -- An empty statement, a target of zero or more GoTo statements
         -- the `id` is only unique within a procedure
-    | GoToTarget(int id)
+    | GoToTarget(int id, identifier name)
     | If(expr test, stmt* body, stmt* orelse)
     | IfArithmetic(expr test, int lt_label, int eq_label, int gt_label)
     | Print(expr? fmt, expr* values, expr? separator, expr? end)

Except that this would not work, as GoTo would contain a new (copy) of GoToTarget, not related to the actual target.

The reason is that we can't represent links/pointers in ASDL. For the symbol table we implemented pointers implicitly: the ASDL->C++ translation script recognizes heuristically when the value is needed, and when a pointer is needed.

The way out of this could be to create an addition to ASDL, that when _ptr is appended after a type, it will be a pointer to this type, not the value.

So the final change would look something like this:

diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index 2f120d29e..e10350585 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -89,20 +89,20 @@ symbol
         access access, deftype deftype, string? bindc_name, bool elemental,
         bool pure, bool module, bool inline)
     | GenericProcedure(symbol_table parent_symtab, identifier name,
-        symbol* procs, access access)
+        symbol_ptr* procs, access access)
     | CustomOperator(symbol_table parent_symtab, identifier name,
-        symbol* procs, access access)
+        symbol_ptr* procs, access access)
     | ExternalSymbol(symbol_table parent_symtab, identifier name,
-        symbol external, identifier module_name, identifier* scope_names,
+        symbol_ptr external, identifier module_name, identifier* scope_names,
         identifier original_name, access access)
     | DerivedType(symbol_table symtab, identifier name, identifier* members,
-        abi abi, access access, symbol? parent)
+        abi abi, access access, symbol_ptr? parent)
     | Variable(symbol_table parent_symtab, identifier name, intent intent,
         expr? symbolic_value, expr? value, storage_type storage, ttype type,
         abi abi, access access, presence presence, bool value_attr)
     | ClassType(symbol_table symtab, identifier name, abi abi, access access)
     | ClassProcedure(symbol_table parent_symtab, identifier name, identifier
-        proc_name, symbol proc, abi abi)
+        proc_name, symbol_ptr proc, abi abi)
     | AssociateBlock(symbol_table symtab, identifier name, stmt* body)
     | Block(symbol_table symtab, identifier name, stmt* body)
 
@@ -158,9 +158,9 @@ stmt
     | Associate(expr target, expr value)
     | Cycle()
     -- deallocates if allocated otherwise throws a runtime error
-    | ExplicitDeallocate(symbol* vars)
+    | ExplicitDeallocate(symbol_ptr* vars)
     -- deallocates if allocated otherwise does nothing
-    | ImplicitDeallocate(symbol* vars)
+    | ImplicitDeallocate(symbol_ptr* vars)
     | DoConcurrentLoop(do_loop_head head, stmt* body)
     | DoLoop(do_loop_head head, stmt* body)
     | ErrorStop(expr? code)
@@ -169,10 +169,10 @@ stmt
         -- GoTo points to a GoToTarget with the corresponding target_id within
         -- the same procedure. We currently use `int` IDs to link GoTo with
         -- GoToTarget to avoid issues with serialization.
-    | GoTo(int target_id)
+    | GoTo(stmt_ptr goto_target)
         -- An empty statement, a target of zero or more GoTo statements
         -- the `id` is only unique within a procedure
-    | GoToTarget(int id)
+    | GoToTarget(int id, identifier name)
     | If(expr test, stmt* body, stmt* orelse)
     | IfArithmetic(expr test, int lt_label, int eq_label, int gt_label)
     | Print(expr? fmt, expr* values, expr? separator, expr? end)
@@ -193,15 +193,15 @@ stmt
     | Select(expr test, case_stmt* body, stmt* default)
     | Stop(expr? code)
     | Assert(expr test, expr? msg)
-    | SubroutineCall(symbol name, symbol? original_name, call_arg* args, expr? dt)
+    | SubroutineCall(symbol_ptr name, symbol_ptr? original_name, call_arg* args, expr? dt)
     | Where(expr test, stmt* body, stmt* orelse)
     | WhileLoop(expr test, stmt* body)
-    | Nullify(symbol* vars)
+    | Nullify(symbol_ptr* vars)
     | Flush(int label, expr unit, expr? err, expr? iomsg, expr? iostat)
     | ListAppend(expr a, expr ele)
-    | AssociateBlockCall(symbol m)
+    | AssociateBlockCall(symbol_ptr m)
     | CPtrToPointer(expr cptr, expr ptr, expr? shape)
-    | BlockCall(int label, symbol m)
+    | BlockCall(int label, symbol_ptr m)
     | SetInsert(expr a, expr ele)
     | SetRemove(expr a, expr ele)
     | ListInsert(expr a, expr pos, expr ele)
@@ -215,9 +215,9 @@ expr
         -- Such as: (x, y+z), (3.0, 2.0) generally not known at compile time
     | ComplexConstructor(expr re, expr im, ttype type, expr? value)
     | NamedExpr(expr target, expr value, ttype type)
-    | FunctionCall(symbol name, symbol? original_name,
+    | FunctionCall(symbol_ptr name, symbol_ptr? original_name,
             call_arg* args, ttype type, expr? value, expr? dt)
-    | DerivedTypeConstructor(symbol dt_sym, expr* args, ttype type, expr? value)
+    | DerivedTypeConstructor(symbol_ptr dt_sym, expr* args, ttype type, expr? value)
     | ImpliedDoLoop(expr* values, expr var, expr start, expr end,
                     expr? increment, ttype type, expr? value)
     | IntegerConstant(int n, ttype type)
@@ -263,7 +263,7 @@ expr
     | DictConstant(expr* keys, expr* values, ttype type)
     | DictLen(expr arg, ttype type, expr? value)
 
-    | Var(symbol v)
+    | Var(symbol_ptr v)
 
     | ArrayConstant(expr* args, ttype type)
     | ArrayItem(expr v, array_index* args, ttype type, expr? value)
@@ -277,7 +277,7 @@ expr
     | ArrayReshape(expr array, expr shape, ttype type, expr? value)
 
     | BitCast(expr source, expr mold, expr? size, ttype type, expr? value)
-    | DerivedRef(expr v, symbol m, ttype type, expr? value)
+    | DerivedRef(expr v, symbol_ptr m, ttype type, expr? value)
     | OverloadedCompare(expr left, cmpop op, expr right, ttype type, expr? value, expr overloaded)
     | OverloadedBinOp(expr left, binop op, expr right, ttype type, expr? value, expr overloaded)
     | Cast(expr arg, cast_kind kind, ttype type, expr? value)
@@ -325,8 +325,8 @@ ttype
     | Set(ttype type)
     | List(ttype type)
     | Tuple(ttype* type)
-    | Derived(symbol derived_type, dimension* dims)
-    | Class(symbol class_type, dimension* dims)
+    | Derived(symbol_ptr derived_type, dimension* dims)
+    | Class(symbol_ptr class_type, dimension* dims)
     | Dict(ttype key_type, ttype value_type)
     | Pointer(ttype type)
     | CPtr()
@@ -370,7 +370,7 @@ cast_kind
 
 dimension = (expr? start, expr? length)
 
-alloc_arg = (symbol a, dimension* dims)
+alloc_arg = (symbol_ptr a, dimension* dims)
 
 attribute = Attribute(identifier name, attribute_arg *args)
@certik
Copy link
Contributor Author

certik commented Sep 30, 2022

There is one issue with pointers: they require special handling in serialization and in printing. Currently every symbol is actually a pointer, and it is printed with the symbol table ID and a name; and serialized with the same approach. Deserialization requires special handling of this too.

So the stmt_ptr would require special handling as well and does not seem worth it.

There are two ways out:

  • keep the current GoTo/GoToTarget, where they "link" using the ID (and add a name for each as well).
  • Use the symbol table as the only place where links are allowed (from a node to a symbol table). To do that, we would create a symbol Label(identifier name) with a unique string ID (for Fortran that uses integers, we can use a string representation of the integer). Then GoToTarget and GoTo would just reference this symbol.

The second approach would look like this:

diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index 2f120d29e..9817e9ad5 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -105,6 +105,7 @@ symbol
         proc_name, symbol proc, abi abi)
     | AssociateBlock(symbol_table symtab, identifier name, stmt* body)
     | Block(symbol_table symtab, identifier name, stmt* body)
+    | GoToLabel(symbol_table parent_symtab, identifier name)
 
 storage_type = Default | Save | Parameter | Allocatable
 access = Public | Private
@@ -169,10 +170,10 @@ stmt
         -- GoTo points to a GoToTarget with the corresponding target_id within
         -- the same procedure. We currently use `int` IDs to link GoTo with
         -- GoToTarget to avoid issues with serialization.
-    | GoTo(int target_id)
+    | GoTo(symbol label)
         -- An empty statement, a target of zero or more GoTo statements
         -- the `id` is only unique within a procedure
-    | GoToTarget(int id)
+    | GoToTarget(symbol label)
     | If(expr test, stmt* body, stmt* orelse)
     | IfArithmetic(expr test, int lt_label, int eq_label, int gt_label)
     | Print(expr? fmt, expr* values, expr? separator, expr? end)

I think that should work, but it seems it might be more complicated than the first approach.

@certik
Copy link
Contributor Author

certik commented Sep 30, 2022

Here is how Wolfram does it: https://reference.wolfram.com/language/ref/Goto.html

@certik
Copy link
Contributor Author

certik commented Sep 30, 2022

It seems that all things considered, the simplest way seems to be this diff, applied to the current main:

diff --git a/src/libasr/ASR.asdl b/src/libasr/ASR.asdl
index a7721a7a7..eec1e8bac 100644
--- a/src/libasr/ASR.asdl
+++ b/src/libasr/ASR.asdl
@@ -173,10 +173,10 @@ stmt
         -- GoTo points to a GoToTarget with the corresponding target_id within
         -- the same procedure. We currently use `int` IDs to link GoTo with
         -- GoToTarget to avoid issues with serialization.
-    | GoTo(int target_id, identifier name)
+    | GoTo(identifier target_id)
         -- An empty statement, a target of zero or more GoTo statements
         -- the `id` is only unique within a procedure
-    | GoToTarget(int id, identifier name)
+    | GoToTarget(identifier id)
     | If(expr test, stmt* body, stmt* orelse)
     | IfArithmetic(expr test, int lt_label, int eq_label, int gt_label)
     | Print(expr? fmt, expr* values, expr? separator, expr? end)

And just use the string id. Fortran would just convert the integer to a string. Python would use the string directly.

@rebcabin
Copy link
Contributor

rebcabin commented Oct 1, 2022

I'm rereading this old classic for ideas.

https://dspace.mit.edu/bitstream/handle/1721.1/5753/AIM-443.pdf?sequence=2&isAllowed=y

@czgdp1807
Copy link
Collaborator

#1167 (comment) makes sense to me and should be simple enough. Does LLVM allow using string IDs for labels? If so then using string IDs makes more sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants