Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user-defined aggregate types #609

Open
YashasSamaga opened this issue Nov 11, 2020 · 13 comments
Open

user-defined aggregate types #609

YashasSamaga opened this issue Nov 11, 2020 · 13 comments

Comments

@YashasSamaga
Copy link
Member

YashasSamaga commented Nov 11, 2020

Issue description:

Most developers (ab)use enums to create aggregate user-defined types. There are few disadvantages to this method:

  • it's a hack! enum structs are handled in an unconventional way and has absurd rules and special cases to make it work
  • identifiers are spilled into global namespace

We can implement a new system to support user-defined aggregate types once in for all. To proceed we need to draft a spec for the syntax and semantics and also have consensus for the feature.

Here is my proposal (ad-hoc and not-well thought) to kick off discussions:

Tags are currently used for user-defined types. We can extend it to allow user-defined aggregate types.

/* static */ struct Player {
    id,
    playerid,
    Float:health,
    bool:spawned,
    
    bool:isValid() { return id != 0; }
    spawn(Float:pX, Float:pY, Float:pY) { /* do something */ }

    Player:setHealth(Float:hp) { SetPlayerHealth(playerid, hp); return this; }
    Player:setArmour(Float:ar) { SetPlayerArmor(playerid, ar); return this; }

    static Player:create() { /* no 'this' here */ } // just syntactic sugar: Player.create()? do we even need this in the first draft?
    static const Float:MAX_HEALTH = 100.0; // Player.MAX_HEALTH?
};

bool:operator==(Player:a, Player:b) { return a.id == b.id; } // operator overloads as member functions isn't required

sizeof(Player); // returns the size of the struct (number of cells)
tagof(Player:); // returns the tag id

new Player:x; // allocates a block of memory of sizeof(Player) -- practically equivalent to x[sizeof(Player)]
new Player:pInfo[MAX_PLAYERS]; // equivalent to x[MAX_PLAYERS][sizeof(Player)];

x.setHealth(100.0).setArmour(100.0);

doPlayerStuff(&Player:p) { } // passes the address
doPlayerStuff(Player:p) { } // passes a copy of the object in stack

We can have a new class for tags: aggregate-type tag and primitive-type tag. We can use the last but one bit to classify these (the MSB is currently used to mark STRONG and WEAK tags). Tag overrides b/w aggregate types might trigger a warning.

Questions:

  1. what should be printed by the following code:
    new Player:x;
    main () { printf("%d", _:x); }
    
  2. Does pawn allow references to be returned?
  3. Does pawn allow operator overloads to accept references?
  4. What is this? What should be the output of printf("%d", _:this);?
  5. Is call-by-value as default newbie friendly?
  6. What does new Player:a = Player:0; mean?
  7. How to embed debug information?
  8. What happens to an empty struct?

We could in principle implement the whole thing as an enum based struct internally and add special handling to prevent identifiers from being spilled out. This would mean passing an object by default would pass by reference.


Bonus: simulate namespaces with static members

struct Vector3f {
    Float:x,
    Float:y,
    Float:z

    Float:norm() { return VectorSize(x, y, z); }
};

struct Vehicle {
    static const INVALID_ID = 65535;

    static Create(modelid, Vector3f:pos) { return CreateVehicle(modelid, pos.x, pos.y, pos.z); }
    static Create(modelid, Float:x, Float:y, Float:z) { /* ... */ } // in case we support function overloading for non-publics some day
    static Destroy(id) { return DestroyVehicle(id); }

    static bool:IsValid(id) { }
};

new Vector3f:position;
position.x = 123.0;
position.y = 555.0;
position.z = 1234.0;

new id = Vehicle.Create(420, position);

While this does simulate a namespace, it uses a . instead of the _ convention for naming modules. I don't like this syntax but this is a possibility.


Bonus: unions with some tag overrides

struct TypeA {
    Float:x
};

struct TypeB {
    x
};

new TypeA:something;

new x = (TypeB:something).x;

We can similarly make variants by having a type member variable.


Bonus: abuse to support polymorphism (needs more work but it seems to be possible)

struct BaseClass {
    type,
    something() {
        switch(type) {
            case 0: { /* something for base class -- maybe crash if u want base to be abstract? very bad but ... */ }
            case 1: (DerivedClass:this).something();
        }
     }
};

struct DerivedClass {
    type,
    something() { }
}


Related issues:

@Daniel-Cortez
Copy link
Contributor

Daniel-Cortez commented Nov 11, 2020

While I like this idea (and would love to try implementing it when #234 is done), the draft itself definitely needs a few corrections.

/* static */ struct Player {

Obviously, we can't use struct as a keyword, as it's a valid identifier and thus can impact compatibility with existing code.
Maybe we could use __struct instead? (Or *struct, but the only keywords using the "*" prefix - *begin, *end and *then - may be removed in the future.)

doPlayerStuff(Player:p) { } // passes a copy of the object in stack

Not sure if this is a good idea. If I understand this correctly, internally p would be an array (as adding a new symbol type instead of reusing iARRAY/iREFARRAY would be a pain), and currently Pawn doesn't allow passing arrays by value.

bool:operator==(Player:a, Player:b) { return a.id == b.id; }

Not sure if this would be possible either, as operators can't have array arguments (except for destructors, but this is so multiple variables could be destroyed with a single destructor call).

what should be printed by the following code:

new Player:x;
main () { printf("%d", _:x); }

The address of x, probably? As internally x would be an array.

Does pawn allow references to be returned?
Does pawn allow operator overloads to accept references?

No for both.

What is this?

I think it must be the first (hidden) argument of a method, as described in #234.
Since we'll want to be able to map methods to existing functions, this would need to be of type iVARIABLE for single values, but for aggregate-tagged values we'll have to use iREFARRAY, as they are essentially arrays.

What should be the output of printf("%d", _:this);?

Same as for question 1.

Is call-by-value as default newbie friendly?

Can you please elaborate, what exactly do you mean by "call-by-value"?

What does new Player:a = Player:0; mean?

I think we should error when a primitive-tagged single value is overridden with an aggregate-type tag, as we can't convert a single value into an array.

What happens to an empty struct?

They probably shouldn't be allowed, as arrays can't have a zero length.

@Y-Less
Copy link
Member

Y-Less commented Nov 11, 2020

OK, I've not read too carefully yet - I'll get to that. But one thing is that I really don't think that whatever the feature is should hide the size of the data. For example:

bool:operator==(Player:a, Player:b) { return a.id == b.id; }

There's nothing in the function declaration there to expose the fact that Player is a structure, not a simple cell. Maybe enums have other problems, but at least they kept that clear and explicit.

@Y-Less
Copy link
Member

Y-Less commented Nov 11, 2020

  1. what should be printed by the following code:

    new Player:x;
    main () { printf("%d", _:x); }

Without major reworking, that would print the value of the first field. The same thing that happens with an enum in that case. Anything else would be almost impossible and require printf to be aware of the parameter type.

@Y-Less
Copy link
Member

Y-Less commented Nov 11, 2020

Something to consider, as a middle-ground between abandoning enums entirely for totally new syntax, and keeping the good parts without issues like name scope:

https://www.nextptr.com/tutorial/ta1423015134/scoped-class-enums-fundamentals-and-examples

@Y-Less
Copy link
Member

Y-Less commented Nov 11, 2020

If we did have enum struct instead, we could have enum const as well for enums explicitly only for use as values, with the same scoping rules. I.e. these would be different:

enum const Tag1
{
    A, // 0
    B,  // 1
}

enum const Tag2
{
    _NONE, // 0
    A, // 1
    B,  // 2
}

main()
{
    new Tag1:a = A; // 0
    new Tag2:a = A; // 1
    new Tag3:a = A; // Error
}

@Y-Less
Copy link
Member

Y-Less commented Nov 11, 2020

Sort of inspired by a hybrid of C++ enum struct and Pawn 4 const structs.

@Y-Less
Copy link
Member

Y-Less commented Nov 11, 2020

Because C++ enum struct is more like what I just suggested for enum const, I'll make my idea for pawn enum struct more explicit.

Old code:

enum E_HOOK_NAME_REPLACEMENT_DATA
{
	E_HOOK_NAME_REPLACEMENT_SHORT[16],
	E_HOOK_NAME_REPLACEMENT_LONG[16],
	E_HOOK_NAME_REPLACEMENT_MIN,
	E_HOOK_NAME_REPLACEMENT_MAX
}
static stock
	YSI_g_sReplacements[MAX_HOOK_REPLACEMENTS][E_HOOK_NAME_REPLACEMENT_DATA];
		if ((pos = strfind(name, YSI_g_sReplacements[idx][E_HOOK_NAME_REPLACEMENT_SHORT], false, pos + 1)) == -1)
		{
			++i,
			idx = YSI_g_sReplacementsShortOrder[i];
		}

New code:

enum struct E_HOOK_NAME_REPLACEMENT_DATA
{
	E_HOOK_NAME_REPLACEMENT_SHORT[16],
	E_HOOK_NAME_REPLACEMENT_LONG[16],
	E_HOOK_NAME_REPLACEMENT_MIN,
	E_HOOK_NAME_REPLACEMENT_MAX
}
static stock
	YSI_g_sReplacements[MAX_HOOK_REPLACEMENTS][E_HOOK_NAME_REPLACEMENT_DATA];
		if ((pos = strfind(name, YSI_g_sReplacements[idx][E_HOOK_NAME_REPLACEMENT_SHORT], false, pos + 1)) == -1)
		{
			++i,
			idx = YSI_g_sReplacementsShortOrder[i];
		}

Idiomatic code:

enum struct E_HOOK_NAME_REPLACEMENT_DATA
{
	SHORT[16],
	LONG[16],
	MIN,
	MAX
}
static stock
	YSI_g_sReplacements[MAX_HOOK_REPLACEMENTS][E_HOOK_NAME_REPLACEMENT_DATA];
		if ((pos = strfind(name, YSI_g_sReplacements[idx][SHORT], false, pos + 1)) == -1)
		{
			++i,
			idx = YSI_g_sReplacementsShortOrder[i];
		}

99% of existing tutorials are kept exactly the same. What people know is the same. You can literally upgrade just by adding the word struct. But, if you want to go a step further you can remove all the decollision name mangling as in the final example with no worries.

As for the .field syntax, I'd say that's independent of this. You could have it or not, it doesn't really matter (.SHORT is only one character less than [SHORT]).

@BitFros7y
Copy link

OK, I've not read too carefully yet - I'll get to that. But one thing is that I really don't think that whatever the feature is should hide the size of the data. For example:

bool:operator==(Player:a, Player:b) { return a.id == b.id; }

There's nothing in the function declaration there to expose the fact that Player is a structure, not a simple cell. Maybe enums have other problems, but at least they kept that clear and explicit.

Why would we declare in function header if its structure or something else? Its tagged, that tag is used for structure and compiler should figure out that its that exact structure, its members and size. If anything we could use some other and/or additional special character
bool:operator==(Player::a, Player::b) { return a.id == b.id; } should be enough to tell everyone and everything its actually a structure.
Or we could even go like (just a stupid example, but still...)
bool:operator==(Player@a, Player@b) { return a.id == b.id; }
And to be honest i personally would not mind if it was like double tag
bool:operator==(Struct:Player:a, Struct:Player:b) { return a.id == b.id; }

@Y-Less
Copy link
Member

Y-Less commented Nov 20, 2020

Pawn is based on cells. One variable is one cell. Anything more than one cell is an array, using []. This syntax is confusing the situation, because now just looking at a variable you can no longer tell if it is a single cell or more than one, nor can you tell how the data is passed (by-value vs by-reference). Fortunately, we already have syntax to show this - []; no need for any new syntax like Struct: or ::.

@BitFros7y
Copy link

You yourself said that in pawn, anything more then one cell is an array, and arrays are passed by reference. That is said in pawn docs and thats how we know that. So why not just expand that documentation and define what structure is, and how its passed? For example, "Structure is custom decorated, named array, that can have one or more cell's called members. Members should be accessed with . operator. Structure can have other structures as members and its size directly depends on its members."

That is vague and probably wrong description, but i hope you get my point. All that is needed is to define syntax and its meaning. If we chose :: to use for structures, then every time user sees Player::a, he will know, that is a structure called, a, of type Player, he checks (or IDE tells him) what Player structure contains as members and he can access them with a.membername

By the way, you mentioned confusing syntax, but pawn already has confusing syntax by default. For example, without looking at what function does, how can you tell that it returns array?

stock pName(playerid)
{
	new PlayerName[MAX_PLAYER_NAME];
	GetPlayerName(playerid, PlayerName, MAX_PLAYER_NAME);
	return PlayerName;
}

@Y-Less
Copy link
Member

Y-Less commented Nov 22, 2020

"There's already confusing syntax" is not a good argument for adding more confusing syntax.

@Y-Less
Copy link
Member

Y-Less commented Nov 22, 2020

There's already three different ways to pass a variable by reference. You're proposing a fourth, which is no shorter than any of the others; I just don't think it is worth the overhead at all.

@Y-Less
Copy link
Member

Y-Less commented Nov 30, 2020

Another common convention that already exists is using <> for things that are almost, but not quite, array-like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants