Add new features (April~May 2024) #63

StanHash · 2024-04-23T17:15:03Z

I've been hacking new features into ColorzCore for the past few weeks or so. This is getting close to ready.

I release binaries for this branch whenever here: https://github.com/StanHash/ColorzCore/releases.

Features

Some of these were suggested by other users.

Offsets are now Addresses

ORG now accepts addresses: ORG 0x08000010 is now equivalent to ORG 0x10 (if the rom base address is 0x08000000)
POIN (and friends) now only converts offsets to addresses for non-zero offsets that are within ROM boundaries (1~0x1FFFFFF by default)
- this allows feeding non-ROM addresses to POIN statements, possibly output by tools such as lyn
new "--maximum-size" option for configuring ranges relevant to the above two features
labels values are now memory addresses rather than offsets within the ROM binary.
- ORG 0x123 ; Label: ; MESSAGE "{Label:X8}" prints 08000123, not 00000123 (yes there strings now get format specifiers, see below).
- This breaks the "SetSymbol" hack. If you need to set a Label to an arbitrary value, use the new symbol assignment syntax (Symbol := value, see below).
- ColorzCore will emit a warning when it finds on such macro being used.
CURRENTOFFSET now expands to an address.
- This breaks many existing ASSERTs, but those specific asserts are diagnosed by ColorzCore and will result in warnings rather than assertion failures

New operators

new operators: '==', '!=', '<', '>', '<=', '>=', '&&', '||', unary '!'. They work mostly like in C
- || and && return "last meaningful value", which might not be booleans. So "0 || A" evaluates to A, "A || B" evaluates to A unless A is 0, and "A && C" evaluates to C if A is non-zero.
- ASSERT behaves differently for compare and conditional operators (fails on 0 rather than negative)
"undefined-coalescing" operator '??' ('L ?? R' evaluates to L if L does not contain undefined symbols, R otherwise).
- '??' behaves differently when used in data raws (its evaluation is delayed) than when used in directives (its evaluation is immediate).
- one can define IsSymbolDefined macro that expands to 0 or 1 depending on whether that symbol is visible in terms of '??': #define IsSymbolDefined(name) "(((name) || 1) ?? 0)"
Add binary negation operator ('~') (closes Suggestion: unary not operator #42)
A ternary conditional operator ('A ? B : C') was considered but not added yet due to the implied parsing complexity (and laziness). Some of its uses can be emulated with '&&' and '||'.

Directives and Macros

New generic '#if' conditional directive (self-explanatory).
#undef now accepts multiple parameters.
Allow define to take any arbitrary sequence of tokens directly. Previously, such tokens needed to be enclosed within a string to work.
- for example: #define MyData(a, b, c) ALIGN 4 ;WORD a ; SHORT b ; BYTE c (no quotes!)
- this will break some existing macros that were expanded in non intuitive ways due to a quirk in the previous version (some of the new warnings help finding such cases).
Make object-like macros (aka "definitions") defined to be expanded into exactly their own name (#define MyName MyName) not participate in macro expansion. This allows defining macros for use with #ifdef and friends while still keeping the name available for symbols.
- example: #define ItemTable ItemTable, one can do ItemTable: later without causing an infinite loop, yet also #ifdef ItemTable is validated.

Symbol assignment

Symbol assignment using ':=' (closes Implement Symbol Assignment #48)
Symbols are a generalization of Labels, and have the same properties:
- They are visible to raws even before they are assigned.
- They respect {} local scopes (unlike macros).
- They are output in the .sym file if the --nocash-sym switch is given.

Formatted Interpolated strings

MESSAGE, ERROR and WARNING now print strings properly, and accept format bits: {expr:spec} which are expanded within.
- format specifier are the same as the standard C# ones for integers (for example: {MyOffset:X8})

ReadByteAt

Add ReadByteAt(offset), ReadShortAt(offset) and ReadWordAt(offset) built-in macros. They allow reading data from the working binary. It only sees what was there before assembly.
- They are only enabled when assembled to a ROM ('A' mode).

STRING and custom .TBL encodings

Add STRING statement, which emits the given string encoded into a given encoding (UTF-8 by default). This makes the String(...) builtin macro redundant.
- Usage: STRING "my string" "<encoding name>"
- The only encoding supported by default is "UTF-8". Spelling is case-insensitive and dash is optional (basically: "utf8" is also correct).
- Custom encodings can be imported using .TBL files (see https://datacrystal.romhacking.net/wiki/Text_Table) via the new #inctbl directive.
  - Usage: #inctbl "<new encoding name>" "path/to.tbl"
  - Closes Feature Request: Custom string encodings (table loading) #54.

Misc.

if no raws folder is given, and 'Language Raws' does not exist, a set of 7 built-in "fallback" raws will be available (those being BYTE, SHORT, WORD, POIN, SHORT2, WORD2 and POIN2).
- old behavior was just an error
Add working __LINE__ and __FILE__ special identifiers (closes Position macros not implemented #61)
ALIGN now takes an optional second parameter that defines an alignment offset. Basically, ALIGN Align Offset skips bytes until CURRENTOFFSET % Align == Offset is true.
--nocash-sym now also outputs labels from local scopes (closes I want to create symbol information for local label. #56).
Add BASE64 statements. This renders Base64 #59 redundant.

Diagnostics

Better error messages in the very specific case where the expansion of a macro produces an invalid statement.
Partially address QoL Improvement: More descriptive errors on wrong parameter type. #22 (on raw param error, print the "signature" of that/those raw(s))
Any message/warning/error that was produced from within a macro now follows the whole macro expansion chain.
New command line options: --warnings:.... --warnings:... allows you to disable or enable some warnings.
- Example: --warnings:no-redefine will turn off warnings when you redefine a label.
A new warning (enabled by default, disabled by --warnings:no-nonportable-pathnames) is emitted on non-portable paths (paths with incorrect capitalization) in include/incbin/inctbl directives on Windows (closes Feature Request: Add warning when including files with nonmatching case #16).
A new warning (enabled by --warnings:unguarded-expression-macros), will be emitted when a macro is expanded that features operators that aren't guarded by parenthesis.
A new warning (enabled by default, disabled by --warnings:no-unintuitive-expression-macros) will be emitted when a macro that looks like a self-contained expression will be expanded in a way that makes it merge with its surroundings in unintuitive ways.
- for example: #define MyMacro(a, b) a + b BYTE MyMacro(1, 2) * 2 will result in BYTE 1 + 2 * 2 which would not result in what one would intuitively expect. This is particularly useful as it helps find cases where the changes in edge-case macro behavior described earlier would cause a change. An example of such case can be seen explained here: Address unintuitive macro expansion SkillSystem_FE8#637.
- This is kinda a less aggressive version of the previous warning. This is why this one is enabled by default while the other isn't.
A new warning (enabled by --warnings:legacy) is emitted when using legacy constructs kept in for compatibility (this is only emitted when using the String macro for now).

Bug fixes

make macro([a, b, c]) result in it having 1 parameter instead of 3.
local Symbols that should shadow Symbols already defined in enclosing scopes would only take effect after their definition.
- For example: MySymbol : ; WORD 0 1 2 3 ... { POIN MySymbol ; ... ; MySymbol : } the POIN would refer to the first MySymbol rather than the second.
Many other small fixes I don't recall

Technical changes

main project targets "modern" .NET with nullable annotations enabled, with an alternative projet provided that targets traditional .NET Framework. The Visual Studio Solution was updated to contain both of them.
'Maybe' has been removed in favor of extensions that work with nullable types (for example: IfJust works on any 'T?')
- for reference types, this works on any type (as nullable for reference types is just an annotation)
Definitions have been refactored to read directly from the token stream rather than accepting IParamNodes. Previous behavior has been moved to new abstract SimpleDefiniton base class.
- Most directives still derive from it.
- Directives that care about macro names (#define, #undef, and if[n]def) or arbitrary tokens (#define, maybe should be expanded to tools directives) are defined in terms of the input token stream directly.
Thanks to changes to directives interfaces, ParseParam and ParseAtom no longer need to do macro expansion conditionally. This simplifies their interface and implantation slightly.
Location now hold an optional MacroLocation field that link to the location of a potential macro invocation.
Some class renaming and splitting
- Log is now Logger
- Maybe (or what's left of it) is now NullableExtensions
- EAInterpreter is now EADriver
- The previously monolithic EAParser class was split into EAParser, EAInterpreter, StringProcessor and LoggerExtensions. Also the ParseAtom method was moved to an extension class called AtomParser.
Some namespace reshuffling
- the Macros namespace was moved to Preprocessor (was in Parser).
- many bits of Parser (that don't have much to do with parsing) were moved to new Interpreter namespace.
The Parse -> AST -> Evaluation chain was reworked. The new EAInterpreter "consumes" the parse results (AST nodes) and builds the list of ILineNode that would eventually be obtained by EADriver for evaluation. The ParseXyz functions don't return ILineNodes anymore.
Many unused or redundant methods were removed. Especially within AST classes.
CaseInsensitiveString was removed (it was unused)
directive classes for #ifdef and #ifndef were merged.
Many other small things.

Uncertain design points

The format specifiers within format strings use the stock .NET Int32 formatting capabilities, which does work, but we could consider define more thoroughly what we do accept, and if needed implement extras ourselves (it is noteworthy that the binary format 'B' specifier is not available on older .NET, which would be nice to continue supporting I think).

Testing script

In the Tests directory lies a single python script named run_tests.py. I have been working on this very basic test program in an attempt of making sure I (and possibly future contributors) don't break anything important as new stuff gets added.

Things that break

Changes to label values means that the "DefineSymbol" hack doesn't work anymore, because ORG 1; Label: would result in the value of Label being 0x08000001 and not 1. Of course, one wouldn't need one such hack because Label := 1 is now legal.
Changes to '#define' behavior means that definitions with unquoted arithmetic expressions as body don't get expanded early (they may have been before), which may result in weird (but well-defined and predictable) precedence related bugs showing up where they weren't before (as an example of this, see Address unintuitive macro expansion SkillSystem_FE8#637).
ASSERT UpperBound - CURRENTOFFSET would break if UpperBound is an offset rather than an address. This is detected and will produce a warning rather than an error if that assertion would not fail if CURRENTOFFSET was an offset.
"Marked pointers" (for example the setText macro that sets the anti-huffman mark bit) will not work when given a marked offset. Basically setText now only works when one gives it a label or a raw address.

Warnings have been introduced to try and diagnose most of these.

…tement

Crazycolorz5 · 2024-04-23T18:05:25Z

This looks very impressive and well documented. I can merge this when you feel it's ready, though lemme give my 2 cents on a few things that you're (or I'm) unsure of:

Add ReadByteAt(offset), ReadShortAt(offset) and ReadWordAt(offset) built-in macros. They allow reading data from the working binary. It only sees what was there before assembly. Those macros are disabled in AA mode.

This makes me very uncomfortable. I think a strong point of the EA language is that it's agnostic to the file being edited. That said, AA mode is the default (correct me if I'm wrong?) -- which I guess then begs the further question of what would this be used for? It's not inherently bad but I think there's a thing to be thought of for complexity bloat and this could lead to very unintuitive conditionals being written that would be better to be handled elsewhere in tooling or have some thought put into it to avoid.

'Maybe' has been removed in favor of extensions that work with nullable types (for example: IfJust works on any 'T?')
for reference types, this works on any type (as nullable for reference types is just an annotation)

I don't really mind this if it's functionally equivalent and the nullable annotation on references is enforced by the type system. The only thing we really lose is the ability to next the constructor (e.g. Maybe<Maybe<Int>>) but that's not really useful if we have bind anyway.

The ?? operator seems like it would be redundant with IsSymbolDefined. However, they both have different semantics: a ?? expression is always evaluated at the very end of assembly (when data is inserted) while IsSymbolDefined is evaluated immediately, so it's not that simple. I wonder if there's something that can be done here to make this more elegant regardless.

My reading of your documentation makes me think of ?? as being useful for preprocessor stuff (or vice-versa). Since some preprocessor stuff care about immediate evaluation (for parameter passing) I can see having both be useful.

All Read...At macros could be defined in terms of any one other. I don't know if we want to keep all three or just one and let the user add their own.

Keep all of them as language defined. Same feel as BYTE SHORT WORD being definable in terms of BYTE, but having all 3 just makes it nicer.

the ternary operator A ? B : C is not implemented, but (A && B) || C would be functionally equivalent

This makes me deeply uncomfortable and I am honestly not very on board with && and || making sense for how preprocessors would handle things, but not with how a language would handle things, but I think your definitions are sensible so I won't object to this.

In parallel to this, I have been working on a very basic test program and set, in an attempt in making sure I (and possibly future contributors) don't break anything important. Idk if I should include this here or no.

Test cases/scripts should be included with the project, please do include them here if you have them.

Also I admittedly have not been hacking or maintaining this as much in recent years, so ... don't expect the most thorough of reviews or tests for functionality or any grand visions for design. Comments I give above are mostly just my opinions / what I've seen as commonly done in project/language design. If the people using this want certain features and they're implemented, I'll merge it, the most I'll do is bugfixing here and there.

StanHash · 2024-04-23T20:01:33Z

AA mode is the default (correct me if I'm wrong?)

AA is the mode that outputs ASM+LDS that was contributed a while ago I don't think it sees much use beyond the one user. The default mode is just A, so the Read...At macros would be enabled basically always, I just thought I would note the technicality.

The ability to check existing values in the binary was brought up when I asked for suggestions (by Vesly). I was not a fan of it conceptually either but I do understand the potential uses (I believe Vesly is working on randomizer mods to be applied to existing mods? I which case this could be used to check whether some known patch was installed or no).

I was thinking making it require enabling the macros explicitly (probably through a command line flag) but couldn't reasonably convince myself it wouldn't just be annoying.

…m macros

…fset, nocash sym prints locals

…ations

StanHash · 2024-05-04T09:00:13Z

Last three things I want to (attempt to) implement before freezing this feature-wise:

Address Feature Request: Custom string encodings (table loading) #54. I'm thinking a #inctbl "encoding_name" "path" directive that loads a tbl file (https://datacrystal.romhacking.net/wiki/Text_Table) and gives it a name and a STRING "string" "encoding_name" statement that dumps the encoded string. We could also introduce some built-in encodings (UTF-8, Windows-1252 (Latin-1), CP932 (Shift-JIS) would be potentially useful for existing EA users).
Add "inline blocks" that would be a more "rigorous" alternative to AddToPool and pool. This is something Cam suggested and I like this approach more than what my 6 year younger self came up with.
Add the ternary conditional operator anyway.

Then I will do some larger scale testing (I've been periodically testing on SkillSystem to make sure nothing breaks too horribly but I'm looking to test on other (perhaps more involved) public buildfiles) and if nothing unexpected breaks this will be ready.

I guess a friendlier summary of changes could also be on the TODO list.

…ific ASSERTs that got broken with CURRENTOFFSET changes

…reter to EADriver and EAParseConsumer to EAInterpreter

… codes are not handled)

StanHash · 2024-05-05T11:26:10Z

This is ready feature-wise. Any further changes would be bugfixes.

Tested on Skill System and Pokemblem. Skill System builds with no changes, and Pokemblem (after addressing the diagnosed breakages) builds with some changes that relate to bugfixes which don't (seem to) break anything.

I also put up a build here: https://github.com/StanHash/ColorzCore/releases/tag/20240505

CT075

It strikes me that this never got merged, so I want to re-open this conversation before it bitrots any further.

It seems like @Crazycolorz5 isn't too interested in continuing to actively maintain this project,, so if it's been tested and seems to work, I don't see much of a reason not to merge it. There were some philosophical comments on whether the proposed features are a good idea (and I, for the record, agree with Colorz on most of them), but since this project has been borderline abandoned, it's doing users a bit of a disservice to hold up the entire PR over them if the project isn't otherwise moving forwards.

Crazycolorz5 · 2024-10-18T17:04:23Z

Sorry, I must've missed the email saying the changes were review-ready! This is good to merge, and I'll update release posts with the updated build.

StanHash added 27 commits January 31, 2021 11:30

port to .NET 5 and enable Nullable

f3712e4

remove build output

54bb84d

Merge remote-tracking branch 'feu/master' into dotnet

97eaa50

set target framework to .NET 6 (latest LTS)

b2d224a

Merge remote-tracking branch 'feu/master' into dotnet

d9c49f8

fix bad merge

4b72686

update gitignore because I think sme broke it

67af0b2

ORG allows addresses, POIN allows anything + extra options

7a5df38

fix some warnings

3f87c6c

refactor use of Maybe

840086d

add conditional operators, fallback raws, some qol

f1c9656

add #if directive

2d56cd7

add alternative projet targetting traditional .NET framework

f425f9f

basic symbol assignment

4d820ac

add binary negation (bit flip) operator

5a732ca

symbol assignment can be lazy

d421357

'??' operator + ASSERT changes + fixes

2c6f67f

purge remains of Maybe

4187197

fix macros treating vectors as multiple parameters

f11127a

IsSymbolDefined built-in macro

b2dca34

better error message when macro expansion doesn't result in valid sta…

2afb563

…tement

print better messages using format strings

194e24f

fixes to format string

69e5025

don't simplify parameters too early (broke ASSERT hack)

ee90e8f

refactor directive input

c289b64

merge both ifdef with ifndef into one class

2ba1770

add ReadByteAt (and friends) built-in macro

bd103bb

relocate macros to preprocessor

07eb1a6

StanHash added 13 commits April 27, 2024 16:49

allow tuning of warnings and extensions via commandline

7a4d069

attempt at a better diagnostic reguarding unintuitive expressions fro…

3608375

…m macros

adjustments

55fdf64

fixes

b7596c7

major parser restructuration

0f48430

decouple parse from interpretation

2efee5b

change AST production

1493cf2

UTF8, BASE64, directive refactor, better undef , less leaky CurrentOf…

898ceee

…fset, nocash sym prints locals

add more tests + fix PROTECT 0

8181f3f

add tests gitignore

fe6d4aa

fixes

cf37d5d

add extra tests

ccc0350

SetSymbol macro detection, parse fixes, many small internal simplific…

9487fc3

…ations

StanHash added 6 commits May 4, 2024 12:03

fix compile, UTF8 -> STRING, replace error with warning for some spec…

0b10055

…ific ASSERTs that got broken with CURRENTOFFSET changes

put parts Parser goes into new Interpreter namespace. rename EAInterp…

e0072f0

…reter to EADriver and EAParseConsumer to EAInterpreter

tbl and custom encoding support

f443830

token location fixes

4eee176

do not translate offset 0 to address 0 except for POIN

79d803f

fix STRING with no encoding + fix tests

0a474d0

StanHash changed the title ~~Add new features (April 2024)~~ Add new features (April~May 2024) May 4, 2024

StanHash added 4 commits May 4, 2024 20:22

fix major a oopsie (multi-line comments broke)

3c2f369

fix "garbage" characters printed on Windows cmd (ANSI terminal escape…

08dff62

… codes are not handled)

remove many unused classes and methods

5e5126a

add more tests

ead413d

StanHash marked this pull request as ready for review May 5, 2024 11:21

CT075 approved these changes Oct 18, 2024

View reviewed changes

Crazycolorz5 merged commit 6637809 into FireEmblemUniverse:master Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new features (April~May 2024) #63

Add new features (April~May 2024) #63

StanHash commented Apr 23, 2024 •

edited

Loading

Crazycolorz5 commented Apr 23, 2024 •

edited

Loading

StanHash commented Apr 23, 2024

StanHash commented May 4, 2024 •

edited

Loading

StanHash commented May 5, 2024

CT075 left a comment

Crazycolorz5 commented Oct 18, 2024

Add new features (April~May 2024) #63

Add new features (April~May 2024) #63

Conversation

StanHash commented Apr 23, 2024 • edited Loading

Features

Offsets are now Addresses

New operators

Directives and Macros

Symbol assignment

Formatted Interpolated strings

ReadByteAt

STRING and custom .TBL encodings

Misc.

Diagnostics

Bug fixes

Technical changes

Uncertain design points

Testing script

Things that break

Crazycolorz5 commented Apr 23, 2024 • edited Loading

StanHash commented Apr 23, 2024

StanHash commented May 4, 2024 • edited Loading

StanHash commented May 5, 2024

CT075 left a comment

Choose a reason for hiding this comment

Crazycolorz5 commented Oct 18, 2024

StanHash commented Apr 23, 2024 •

edited

Loading

Crazycolorz5 commented Apr 23, 2024 •

edited

Loading

StanHash commented May 4, 2024 •

edited

Loading