Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new features (April~May 2024) #63

Merged
merged 61 commits into from
Oct 18, 2024

Conversation

StanHash
Copy link
Member

@StanHash StanHash commented Apr 23, 2024

I've been hacking new features into ColorzCore for the past few weeks or so. This is getting close to ready.

I release binaries for this branch whenever here: https://github.com/StanHash/ColorzCore/releases.

Features

Some of these were suggested by other users.

Offsets are now Addresses

  • ORG now accepts addresses: ORG 0x08000010 is now equivalent to ORG 0x10 (if the rom base address is 0x08000000)
  • POIN (and friends) now only converts offsets to addresses for non-zero offsets that are within ROM boundaries (1~0x1FFFFFF by default)
    • this allows feeding non-ROM addresses to POIN statements, possibly output by tools such as lyn
  • new "--maximum-size" option for configuring ranges relevant to the above two features
  • labels values are now memory addresses rather than offsets within the ROM binary.
    • ORG 0x123 ; Label: ; MESSAGE "{Label:X8}" prints 08000123, not 00000123 (yes there strings now get format specifiers, see below).
    • This breaks the "SetSymbol" hack. If you need to set a Label to an arbitrary value, use the new symbol assignment syntax (Symbol := value, see below).
    • ColorzCore will emit a warning when it finds on such macro being used.
  • CURRENTOFFSET now expands to an address.
    • This breaks many existing ASSERTs, but those specific asserts are diagnosed by ColorzCore and will result in warnings rather than assertion failures

New operators

  • new operators: '==', '!=', '<', '>', '<=', '>=', '&&', '||', unary '!'. They work mostly like in C
    • || and && return "last meaningful value", which might not be booleans. So "0 || A" evaluates to A, "A || B" evaluates to A unless A is 0, and "A && C" evaluates to C if A is non-zero.
    • ASSERT behaves differently for compare and conditional operators (fails on 0 rather than negative)
  • "undefined-coalescing" operator '??' ('L ?? R' evaluates to L if L does not contain undefined symbols, R otherwise).
    • '??' behaves differently when used in data raws (its evaluation is delayed) than when used in directives (its evaluation is immediate).
    • one can define IsSymbolDefined macro that expands to 0 or 1 depending on whether that symbol is visible in terms of '??': #define IsSymbolDefined(name) "(((name) || 1) ?? 0)"
  • Add binary negation operator ('~') (closes Suggestion: unary not operator #42)
  • A ternary conditional operator ('A ? B : C') was considered but not added yet due to the implied parsing complexity (and laziness). Some of its uses can be emulated with '&&' and '||'.

Directives and Macros

  • New generic '#if' conditional directive (self-explanatory).
  • #undef now accepts multiple parameters.
  • Allow define to take any arbitrary sequence of tokens directly. Previously, such tokens needed to be enclosed within a string to work.
    • for example: #define MyData(a, b, c) ALIGN 4 ;WORD a ; SHORT b ; BYTE c (no quotes!)
    • this will break some existing macros that were expanded in non intuitive ways due to a quirk in the previous version (some of the new warnings help finding such cases).
  • Make object-like macros (aka "definitions") defined to be expanded into exactly their own name (#define MyName MyName) not participate in macro expansion. This allows defining macros for use with #ifdef and friends while still keeping the name available for symbols.
    • example: #define ItemTable ItemTable, one can do ItemTable: later without causing an infinite loop, yet also #ifdef ItemTable is validated.

Symbol assignment

  • Symbol assignment using ':=' (closes Implement Symbol Assignment #48)
  • Symbols are a generalization of Labels, and have the same properties:
    • They are visible to raws even before they are assigned.
    • They respect {} local scopes (unlike macros).
    • They are output in the .sym file if the --nocash-sym switch is given.

Formatted Interpolated strings

  • MESSAGE, ERROR and WARNING now print strings properly, and accept format bits: {expr:spec} which are expanded within.
    • format specifier are the same as the standard C# ones for integers (for example: {MyOffset:X8})

ReadByteAt

  • Add ReadByteAt(offset), ReadShortAt(offset) and ReadWordAt(offset) built-in macros. They allow reading data from the working binary. It only sees what was there before assembly.
    • They are only enabled when assembled to a ROM ('A' mode).

STRING and custom .TBL encodings

  • Add STRING statement, which emits the given string encoded into a given encoding (UTF-8 by default). This makes the String(...) builtin macro redundant.

Misc.

  • if no raws folder is given, and 'Language Raws' does not exist, a set of 7 built-in "fallback" raws will be available (those being BYTE, SHORT, WORD, POIN, SHORT2, WORD2 and POIN2).
    • old behavior was just an error
  • Add working __LINE__ and __FILE__ special identifiers (closes Position macros not implemented #61)
  • ALIGN now takes an optional second parameter that defines an alignment offset. Basically, ALIGN Align Offset skips bytes until CURRENTOFFSET % Align == Offset is true.
  • --nocash-sym now also outputs labels from local scopes (closes I want to create symbol information for local label. #56).
  • Add BASE64 statements. This renders Base64 #59 redundant.

Diagnostics

  • Better error messages in the very specific case where the expansion of a macro produces an invalid statement.
  • Partially address QoL Improvement: More descriptive errors on wrong parameter type. #22 (on raw param error, print the "signature" of that/those raw(s))
  • Any message/warning/error that was produced from within a macro now follows the whole macro expansion chain.
  • New command line options: --warnings:.... --warnings:... allows you to disable or enable some warnings.
    • Example: --warnings:no-redefine will turn off warnings when you redefine a label.
  • A new warning (enabled by default, disabled by --warnings:no-nonportable-pathnames) is emitted on non-portable paths (paths with incorrect capitalization) in include/incbin/inctbl directives on Windows (closes Feature Request: Add warning when including files with nonmatching case #16).
  • A new warning (enabled by --warnings:unguarded-expression-macros), will be emitted when a macro is expanded that features operators that aren't guarded by parenthesis.
  • A new warning (enabled by default, disabled by --warnings:no-unintuitive-expression-macros) will be emitted when a macro that looks like a self-contained expression will be expanded in a way that makes it merge with its surroundings in unintuitive ways.
    • for example: #define MyMacro(a, b) a + b BYTE MyMacro(1, 2) * 2 will result in BYTE 1 + 2 * 2 which would not result in what one would intuitively expect. This is particularly useful as it helps find cases where the changes in edge-case macro behavior described earlier would cause a change. An example of such case can be seen explained here: Address unintuitive macro expansion SkillSystem_FE8#637.
    • This is kinda a less aggressive version of the previous warning. This is why this one is enabled by default while the other isn't.
  • A new warning (enabled by --warnings:legacy) is emitted when using legacy constructs kept in for compatibility (this is only emitted when using the String macro for now).

Bug fixes

  • make macro([a, b, c]) result in it having 1 parameter instead of 3.
  • local Symbols that should shadow Symbols already defined in enclosing scopes would only take effect after their definition.
    • For example: MySymbol : ; WORD 0 1 2 3 ... { POIN MySymbol ; ... ; MySymbol : } the POIN would refer to the first MySymbol rather than the second.
  • Many other small fixes I don't recall

Technical changes

  • main project targets "modern" .NET with nullable annotations enabled, with an alternative projet provided that targets traditional .NET Framework. The Visual Studio Solution was updated to contain both of them.
  • 'Maybe' has been removed in favor of extensions that work with nullable types (for example: IfJust works on any 'T?')
    • for reference types, this works on any type (as nullable for reference types is just an annotation)
  • Definitions have been refactored to read directly from the token stream rather than accepting IParamNodes. Previous behavior has been moved to new abstract SimpleDefiniton base class.
    • Most directives still derive from it.
    • Directives that care about macro names (#define, #undef, and if[n]def) or arbitrary tokens (#define, maybe should be expanded to tools directives) are defined in terms of the input token stream directly.
  • Thanks to changes to directives interfaces, ParseParam and ParseAtom no longer need to do macro expansion conditionally. This simplifies their interface and implantation slightly.
  • Location now hold an optional MacroLocation field that link to the location of a potential macro invocation.
  • Some class renaming and splitting
    • Log is now Logger
    • Maybe (or what's left of it) is now NullableExtensions
    • EAInterpreter is now EADriver
    • The previously monolithic EAParser class was split into EAParser, EAInterpreter, StringProcessor and LoggerExtensions. Also the ParseAtom method was moved to an extension class called AtomParser.
  • Some namespace reshuffling
    • the Macros namespace was moved to Preprocessor (was in Parser).
    • many bits of Parser (that don't have much to do with parsing) were moved to new Interpreter namespace.
  • The Parse -> AST -> Evaluation chain was reworked. The new EAInterpreter "consumes" the parse results (AST nodes) and builds the list of ILineNode that would eventually be obtained by EADriver for evaluation. The ParseXyz functions don't return ILineNodes anymore.
  • Many unused or redundant methods were removed. Especially within AST classes.
  • CaseInsensitiveString was removed (it was unused)
  • directive classes for #ifdef and #ifndef were merged.
  • Many other small things.

Uncertain design points

  • The format specifiers within format strings use the stock .NET Int32 formatting capabilities, which does work, but we could consider define more thoroughly what we do accept, and if needed implement extras ourselves (it is noteworthy that the binary format 'B' specifier is not available on older .NET, which would be nice to continue supporting I think).

Testing script

In the Tests directory lies a single python script named run_tests.py. I have been working on this very basic test program in an attempt of making sure I (and possibly future contributors) don't break anything important as new stuff gets added.

Things that break

  • Changes to label values means that the "DefineSymbol" hack doesn't work anymore, because ORG 1; Label: would result in the value of Label being 0x08000001 and not 1. Of course, one wouldn't need one such hack because Label := 1 is now legal.
  • Changes to '#define' behavior means that definitions with unquoted arithmetic expressions as body don't get expanded early (they may have been before), which may result in weird (but well-defined and predictable) precedence related bugs showing up where they weren't before (as an example of this, see Address unintuitive macro expansion SkillSystem_FE8#637).
  • ASSERT UpperBound - CURRENTOFFSET would break if UpperBound is an offset rather than an address. This is detected and will produce a warning rather than an error if that assertion would not fail if CURRENTOFFSET was an offset.
  • "Marked pointers" (for example the setText macro that sets the anti-huffman mark bit) will not work when given a marked offset. Basically setText now only works when one gives it a label or a raw address.

Warnings have been introduced to try and diagnose most of these.

@Crazycolorz5
Copy link
Member

Crazycolorz5 commented Apr 23, 2024

This looks very impressive and well documented. I can merge this when you feel it's ready, though lemme give my 2 cents on a few things that you're (or I'm) unsure of:

Add ReadByteAt(offset), ReadShortAt(offset) and ReadWordAt(offset) built-in macros. They allow reading data from the working binary. It only sees what was there before assembly. Those macros are disabled in AA mode.

This makes me very uncomfortable. I think a strong point of the EA language is that it's agnostic to the file being edited. That said, AA mode is the default (correct me if I'm wrong?) -- which I guess then begs the further question of what would this be used for? It's not inherently bad but I think there's a thing to be thought of for complexity bloat and this could lead to very unintuitive conditionals being written that would be better to be handled elsewhere in tooling or have some thought put into it to avoid.

'Maybe' has been removed in favor of extensions that work with nullable types (for example: IfJust works on any 'T?')
for reference types, this works on any type (as nullable for reference types is just an annotation)

I don't really mind this if it's functionally equivalent and the nullable annotation on references is enforced by the type system. The only thing we really lose is the ability to next the constructor (e.g. Maybe<Maybe<Int>>) but that's not really useful if we have bind anyway.

The ?? operator seems like it would be redundant with IsSymbolDefined. However, they both have different semantics: a ?? expression is always evaluated at the very end of assembly (when data is inserted) while IsSymbolDefined is evaluated immediately, so it's not that simple. I wonder if there's something that can be done here to make this more elegant regardless.

My reading of your documentation makes me think of ?? as being useful for preprocessor stuff (or vice-versa). Since some preprocessor stuff care about immediate evaluation (for parameter passing) I can see having both be useful.

All Read...At macros could be defined in terms of any one other. I don't know if we want to keep all three or just one and let the user add their own.

Keep all of them as language defined. Same feel as BYTE SHORT WORD being definable in terms of BYTE, but having all 3 just makes it nicer.

the ternary operator A ? B : C is not implemented, but (A && B) || C would be functionally equivalent

This makes me deeply uncomfortable and I am honestly not very on board with && and || making sense for how preprocessors would handle things, but not with how a language would handle things, but I think your definitions are sensible so I won't object to this.

In parallel to this, I have been working on a very basic test program and set, in an attempt in making sure I (and possibly future contributors) don't break anything important. Idk if I should include this here or no.

Test cases/scripts should be included with the project, please do include them here if you have them.

Also I admittedly have not been hacking or maintaining this as much in recent years, so ... don't expect the most thorough of reviews or tests for functionality or any grand visions for design. Comments I give above are mostly just my opinions / what I've seen as commonly done in project/language design. If the people using this want certain features and they're implemented, I'll merge it, the most I'll do is bugfixing here and there.

@StanHash
Copy link
Member Author

AA mode is the default (correct me if I'm wrong?)

AA is the mode that outputs ASM+LDS that was contributed a while ago I don't think it sees much use beyond the one user. The default mode is just A, so the Read...At macros would be enabled basically always, I just thought I would note the technicality.

The ability to check existing values in the binary was brought up when I asked for suggestions (by Vesly). I was not a fan of it conceptually either but I do understand the potential uses (I believe Vesly is working on randomizer mods to be applied to existing mods? I which case this could be used to check whether some known patch was installed or no).

I was thinking making it require enabling the macros explicitly (probably through a command line flag) but couldn't reasonably convince myself it wouldn't just be annoying.

@StanHash
Copy link
Member Author

StanHash commented May 4, 2024

Last three things I want to (attempt to) implement before freezing this feature-wise:

  • Address Feature Request: Custom string encodings (table loading) #54. I'm thinking a #inctbl "encoding_name" "path" directive that loads a tbl file (https://datacrystal.romhacking.net/wiki/Text_Table) and gives it a name and a STRING "string" "encoding_name" statement that dumps the encoded string. We could also introduce some built-in encodings (UTF-8, Windows-1252 (Latin-1), CP932 (Shift-JIS) would be potentially useful for existing EA users).
  • Add "inline blocks" that would be a more "rigorous" alternative to AddToPool and pool. This is something Cam suggested and I like this approach more than what my 6 year younger self came up with.
  • Add the ternary conditional operator anyway.

Then I will do some larger scale testing (I've been periodically testing on SkillSystem to make sure nothing breaks too horribly but I'm looking to test on other (perhaps more involved) public buildfiles) and if nothing unexpected breaks this will be ready.

I guess a friendlier summary of changes could also be on the TODO list.

@StanHash StanHash changed the title Add new features (April 2024) Add new features (April~May 2024) May 4, 2024
@StanHash StanHash marked this pull request as ready for review May 5, 2024 11:21
@StanHash
Copy link
Member Author

StanHash commented May 5, 2024

This is ready feature-wise. Any further changes would be bugfixes.

Tested on Skill System and Pokemblem. Skill System builds with no changes, and Pokemblem (after addressing the diagnosed breakages) builds with some changes that relate to bugfixes which don't (seem to) break anything.

I also put up a build here: https://github.com/StanHash/ColorzCore/releases/tag/20240505

Copy link
Member

@CT075 CT075 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It strikes me that this never got merged, so I want to re-open this conversation before it bitrots any further.

It seems like @Crazycolorz5 isn't too interested in continuing to actively maintain this project,, so if it's been tested and seems to work, I don't see much of a reason not to merge it. There were some philosophical comments on whether the proposed features are a good idea (and I, for the record, agree with Colorz on most of them), but since this project has been borderline abandoned, it's doing users a bit of a disservice to hold up the entire PR over them if the project isn't otherwise moving forwards.

@Crazycolorz5
Copy link
Member

Sorry, I must've missed the email saying the changes were review-ready! This is good to merge, and I'll update release posts with the updated build.

@Crazycolorz5 Crazycolorz5 merged commit 6637809 into FireEmblemUniverse:master Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants