Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meta Data Handling #7

Closed
oliverschmidt opened this issue Feb 12, 2018 · 27 comments · Fixed by #13
Closed

Meta Data Handling #7

oliverschmidt opened this issue Feb 12, 2018 · 27 comments · Fixed by #13

Comments

@oliverschmidt
Copy link

oliverschmidt commented Feb 12, 2018

Hi,

I've pointing out the below several times in the past so in case it bothers you please just close this issue but I thought I'd give it another try here...

There are from my POV several options on how to transport meta data from a cross dev tool that produces Apple II files in some foreign filesystem (like Merlin32) to a tool that incorporates the files into a ProDOS / GS/OS filesystem (like Cadius):

  1. Sidecar file (like _FileInformation.txt):
    The lifecycle of the file doesn't match the lifecycle of the main file. It gets forgotten when copying/moving the main file. It doesn't get deleted when the main file is deleted.

  2. Filename extension:
    The name of the file changes when e.g. the load address of a plain BIN file is changed in its source code.

    2.1 Either the extended filename is part of the Makefile, then the Makefile needs to be adjusted. Given that the Cadius-like tool is most likely called from within Makefiles this means that there is no real gain over command line arguments for providing the meta data in the first place.

    2.2 Or the extended filename is not part of the Makefile but replaced with some $(wildcard) "hack". Then it gets pretty ugly when the Makefile is supposed to know the time stamp of that file to construct the right order of re-build commands.

  3. Proprietary foreign filesystem features:
    MacOS HFS+, Windows Alternate Datastreams, Linux XATTR etc. for sure allow to transport the meta data from the dev tool to the Cadius-like tool - if and only if they both run on the same filesystem. But as soon as it comes to sending a file to someone else, moving it to another filesystem, uploading it to some cloud space, putting it into an archive, <...> the meta data is lost.

  4. Proprietary file header:
    The dev tools and the Cadius-like tools should agree on a common format to allow for interoperability.

From my POV the last option is in fact the only really viable option!

So I'd say if we agree on some proprietary file header and ...

  • You implement it both in Merlin32 and Cadius

  • I implement it in cc65

  • I talk to the AppleCommander and Exomizer authors to have it implemented

... then we have for sure the momentum to create a standard and have this issue solved once for all!

Just my two cents,
Oliver

@mach-kernel
Copy link
Owner

Hello again! 😄

I've pointing out the below several times in the past so in case it bothers you please just close this issue but I thought I'd give it another try here...

Never feel bad to point something out or otherwise suggest an improvement! I agree with what you're saying here, because:

  1. I dislike the sidecar file too. I spent an hour trying to deploy my toy app for the first time because I didn't know you were supposed to use the file to set the type!

  2. The filename suffix is pretty ugly too but it appears to share some standardization with CiderPress. But this is not a good reason nor is it anything that I want to preserve if we can get this to be a uniform standard.

  3. Agree: no portability.

  4. Let's do it!

I would be happy to implement this in Cadius and Merlin! Correct me if I am wrong, but in addition to the file header we would also likely need to add a feature to dump files and strip header (e.g. if you want to transfer a source file for editing)?

@oliverschmidt
Copy link
Author

Let's do it!

:-))

Correct me if I am wrong, but in addition to the file header we would also likely need to add a feature to dump files and strip header (e.g. if you want to transfer a source file for editing)?

Okay, I see sort of a misunderstanding (?)...

From the perspective of Cadius (and alike) it's all the way from "exporting" a ProDOS file from the ProDOS filesystem to a foreign filesystem and back. There people may have very well different ideas on how they want things. If you look e.g. at CiderPress it asks you on exporting (aka extracting) a ProDOS file if you want to "preserve Apple II formats" or "easy access in Windows". I'm afraid there's no one-size-fits-all-approach here.

However from the perspective of Merlin32 (and alike) it's way simpler. The binary file it generates doesn't serve any purpose at all in the foreign filesystem. The only reasonable thing to do with it is to "import" it into a ProDOS filesystem. Therefore there's no downside in generating a proprietary header in Merlin32 (or alike) that is stripped in Cadius (or alike).

What does that mean from my POV?

  • The header should start with a 4-byte magic number (aka file signature).

  • Dev tools like Merlin32 should (by default) generate the header.

  • Tools like Cadius should when asked to import a file into a ProDOS filesystem check for the header and use it if found.

  • Tools like Cadius likely need several options (like CiderPress) when it comes to exporting a file from a ProDOS filesystem. Creating the header should be one of them (-> CiderPress: "preserve Apple II formats").

  • It may very well be desirable to additionally have some way to remove the header from a file that was already exported from a ProDOS filesystem.

To summarize:

  • If such a header is desirable for general file export/import actions is something where you'll find many opinions. And to make it a standard in this broader sense e.g. the author of CiderPress would need to support it.

  • But for the very narrow use case of Apple binaries created by cross dev tools I think it come close to a no-brainer.

I'm personally only maintaining a cross dev tool so for me personally both perspectives are identical.

AppleCommander already has two special options for importing files created by cross dev tools. It doesn't have options to preserve meta data on file export. I can't estimate if its author is interested to add support for writing a header on export.

Exomizer is targeting cross dev tools and I'm in contact with the author so I'm pretty sure he will both read and write such a header.

@mach-kernel
Copy link
Owner

Yes! I think we are saying the same thing 😃. I agree: default behavior should be to create the header unless the user explicitly tells it not to (via some kind of flag or option or similar).

The header should be an easy change to make here and in Merlin. I haven't made any edits to Merlin yet so I'll push it to source control sometime this week and start making the changes. I'm also going to open another issue for targetting block device support, since this is something that would be great to have when writing to our CF cards!

@oliverschmidt
Copy link
Author

Yes! I think we are saying the same thing 😃.

:-))

So we're left with the task to actually define the header. The cc65 toolchain is focusing on 8-bit development and only generates BIN and SYS files. So from that perspective the 1-byte filetype and the 2-byte auxtype would be all that's needed. However, I presume that you have (much) more needs.

I guess it's best when we come up with some extendable format, something with name (aka ID) - value pairs. I could imagine 2-byte IDs with e.g.

  • $0001 -> ProDOS 8 filetype
  • $0002 -> ProDOS 8 auxtype
  • $0000 -> EndOfHeader

The 2-byte ID is followed by a 2-byte size. This allows a header parser to easily skip IDs it doesn't understand and/or isn't interested in.

After the 2-byte ID and the 2-byte size there's data specific to the ID in a format specific to the ID.

For the ID $0001 the format is always a single bytes.
For the ID $0002 the format is always two bytes.

So just presuming for a moment the 4-byte file signature would be A1 B2 C3 D4 e.g. a classic BIN file to be loaded at $803 would have the header:

A1 B2 C3 D4 01 00 01 00 06 02 00 02 00 03 08 00 00

Regards,
Oliver

@oliverschmidt
Copy link
Author

oliverschmidt commented Feb 12, 2018

John pointed out to me that Apple specified the AppleSingle format for similar use cases.

It seems a bit heavy weight, especially as the length of the data fork needs to be placed in its entry descriptor. This might be a problem for tools which want to "stream" the data fork into the file without knowing the final length as they need to rewind to set the length.

Anyhow, for cc65 that's not a problem so if you strongly prefer AppleSingle over a homebrewn approach I'm personally fine with that.

@mach-kernel
Copy link
Owner

The only thing that would make AppleSingle attractive is integration with existing tools (or maybe Apple development tools like old APW / MPW / if there are even any that use it?). I prefer the lighter homebrew solution.

@mach-kernel
Copy link
Owner

It does appear that there are a fair amount of tools out there that use AppleSingle. I spent some time searching for software and found everything from disk imaging utilities to font conversion software. Netatalk also supports AppleSingle/AppleDouble.

Here's the RFC.
Apple official header spec is here

While I still have preference on the lighter (and IMO clearer) format, I think that the legacy support argument wins this one. @oliverschmidt, is that cool with you?

@oliverschmidt
Copy link
Author

At the time I wrote #7 (comment) I didn't know of AppleSingle.

When I learned about AppleSingle I wondered: Is this just some old spec we can re-use instead of re-inventing the wheel or are there actual synergy effects with existing software supporting AppleSingle. Several packages were linked to far in this thread but there was no free package so far that would allow to do something with an AppleSingle file on a non-MacOS machine :-(

Beside the - at least according to my current knowledge - missing synergy effect on Linux/Windows that means that it's hard to do interoperability tests without a MacOS machine. But maybe John is willing/able to help here when we send him AppleSingle files to check.

Apart from all that I see a issue with AppleSingle when compared to #7 (comment). It contains more than it needs / more than it should for our use case - and those things don't seem to be really optional.

Both the PDF I linked and the source file linked say Entry IDs 1, 3, and 8 are typically created for all files. Those IDs are:

  • 1: Data Fork
  • 3: Real Name
  • 8: File Dates Info

Data Fork is obvious but Real Name is an issue. Both from the implementer perspective and the user perspective: A modern tool chain likely consists of an assembler and a linker. The header is rather created by the assembler but the output filename is rather known to the linker. So there's no clean way to put the name of the actual AppleSingle file into the 'Real Name'. And even if there would be a good way to do it the AppleSingle filename might not conform to ProDOS name restrictions (like the 15 char length). So the assembler/linker would need to contain logic to shorten the AppleSingle filename for the Real Name. But I really don't see such logic in those tools. Such logic belongs only to tools like Cadius. So the better alternative would be that the user needs to specify the Real Name "somehow" to the assembler (e.g. as part of the source file). But that may require an extension of the assembler and that may not be what the user expects. Additionally I think that the user has the right to expect that when he renames the AppleSingle file before giving it to Cadius that then the ProDOS file is renamed too.

File Dates Info is to some extend a similar issue. What values should be written by the assembler? Why are they different to the ones of the AppleSingle file? Why should they ever be different? The PDF says When initially created, a file’s backup time and any unknown entries are set to $80000000 or 0x80000000, the earliest reasonable time. but the best case is that the resulting ProDOS file shows up as <NO DATE> which is a pitty given that the AppleSingle file has nice date values.

So what does that mean from my POV? I don't see me generating AppleSingle files with Entry ID's 3 and 8. I see for our use case AppleSingle files with those IDs:

  • 1: Data Fork (obligatory)
  • 11: ProDOS File Info (obligatory)
  • 2: Resource Fork (optional)

https://tools.ietf.org/html/rfc1740 says Each entry is optional and may or may not appear in the file. so the files as I see them are in general valid AppleSingle files. However, if the existing tools presume ID 3 and 8 to be present then the compatibility we were after doesn't exist. The conclusion could be to rather go for something like #7 (comment) in the first place.

Opinions?

@mach-kernel
Copy link
Owner

So the assembler/linker would need to contain logic to shorten the AppleSingle filename for the Real Name

Merlin32 has a special mnemonic (DSK) for specifying the output file name, but it is also its own linker. Most modern solutions expose this as two different tools.

Additionally I think that the user has the right to expect that when he renames the AppleSingle file before giving it to Cadius that then the ProDOS file is renamed too.

I suppose we could put the entirety of this burden on the imaging utility. CADIUS can create the name & date fields of the file (e.g. as sourced from the local filesystem) if we feed it raw data (not AppleSingle) while writing it to the ProDOS partition. If there is an existing AppleSingle header we can add a CLI flag to overwrite the filename contained in the header. Does this seem crazy?

@oliverschmidt @JohnMBrooks, are you guys interested in making a Slack or similar for Apple II development so we can chat about this with a slightly faster feedback loop (and possibly more people interested in this standardization discussion)?

@oliverschmidt
Copy link
Author

I'm sort of waiting for feedback from @JohnMBrooks on my #7 (comment) but feel it's inappropriate to not answer...

If there is an existing AppleSingle header we can add a CLI flag to overwrite the filename contained in the header. Does this seem crazy?

Hm, I'd say it's at least not very anticipation-compliant.

I've given this some more thought in the meanwhile. I'm personally only responsible for cc65, a cross dev tool only generating files with meta data, but not consuming them. I see for cc65 only two viable options:

  1. Some proprietary, simple but extensible header (like the one pointed out above).
  2. AppleSingle with exactly only the Entry IDs (in that order): 11, 1

However, you are working both on Cadius and Merlin32 and thus see both sides of the table. Apart from that I understand that Merlin32 creates Resource Forks (and potentially other meta data) which cc65 doesn't so you may have a different perspective.

I'd primarily like to see cc65 files to be consumable by Cadius. So if you have a strong preference for my 1.) or 2.) then I'll follow that. If you are open for both alternatives then I'd rather opt for 2.) for two reasons:

  • Nobody has to write/host a spec, one can just refer to the RFC ;-)
  • Without the unnecessary/duplicating Entry IDs 3 and 8 AppleSingle is just a little cumbersome (big endian, data fork length field, ...) but no actual issue.

... making a Slack or similar ...

I'm willing to join but I'm not keen.

... possibly more people ...

More opinions doesn't necessarily yield better results ;-)

@JohnMBrooks
Copy link

My preference would be using as lightweight an AppleSingle as possible, and then add fields only if/when they are useful. As Oliver recommended:

1: Data Fork (obligatory)
11: ProDOS File Info (obligatory)

I agree that embedding a ProDOS name (3) or file dates (8) doesn't really help us since the local filesystem metadata will be as-good or better than what we would put into the AppleSingle file anyway, so better to avoid the extra size and complexity.

I agree AppleSingle is heavyweight for what will often amount to 3 metadata bytes (filetype byte & auxtype short). But it also allows resource forks, icons, and finder info if needed.

AppleSingle also has the chance to be compatible with other tools, which makes it a net win IMO.

However, if the existing tools presume ID 3 and 8 to be present then the compatibility we were after doesn't exist.

If this turns out to be a problem, there could be an option to add name & date metadata simply to improve compatibility.

I suppose we could put the entirety of this burden on the imaging utility. CADIUS can create the name & date fields of the file (e.g. as sourced from the local filesystem) if we feed it raw data (not AppleSingle) while writing it to the ProDOS partition.

Yes, or we could require that AppleSingle files with ProDOS metadata must use ProDOS-valid file names, and then report any invalid names as an error so the user can 'fix' naming problems rather than auto-mangling the file name (or an option to auto-mangle vs warn vs error).

It would also be very handy if Cadius had an option to convert Ciderpress metadata format to AppleSingle and vice versa. This would allow quick 'native' access to simple ProDOS txt & bin files without having to use a separate AppleSingle extract/pack utility.

@oliverschmidt @JohnMBrooks, are you guys interested in making a Slack or similar for Apple II development so we can chat about this with a slightly faster feedback loop (and possibly more people interested in this standardization discussion)?

Sure. I haven't used Slack, but happy to try it. I primarily use Skype, Google Hangouts & Twitter for dev-team discussions.

-JB
@JBrooksBSI

@mach-kernel
Copy link
Owner

mach-kernel commented Feb 16, 2018

It would also be very handy if Cadius had an option to convert Ciderpress metadata format to AppleSingle and vice versa. This would allow quick 'native' access to simple ProDOS txt & bin files without having to use a separate AppleSingle extract/pack utility.

Is their metadata format documented somewhere? I searched but didn't find much, wanted to ask before RE-ing it. I agree with this and we can do it 👍

@JohnMBrooks
Copy link

Is their metadata format documented somewhere? I searched but didn't find much, wanted to ask before RE-ing it. I agree with this and we can do it 👍

It's just the #FFAAAA name suffix that you've already added to Cadius. 🥇

See "Extracting Files":
http://a2ciderpress.com/tutorial/index.htm

If you open the "cpt" folder in Windows Explorer (open My Documents from the desktop or Windows Start menu, then open "cpt"), you will see three new files:

DBL.BESSEL.PIC#062000
SAMPLE.AWP#1ac0fd
sample.text#04000
The junk starting with "#" that was added to the filename is a file attribute preservation sequence. The first two digits are the ProDOS file type, the next four are the ProDOS aux type. For DBL.BESSEL.PIC, it's $06 ("BIN") and $2000 (the typical load location of a hi-res image).

-JB

@oliverschmidt
Copy link
Author

@mach-kernel: What is the current status from your POV? Do we have an agreement on AppleSingle with Entry IDs 11 and 1? Or do you want/need to include more people into your decision making process? Or do you just need some more time to think about it? Or <...>?

@mach-kernel
Copy link
Owner

mach-kernel commented Feb 18, 2018

Yes, it looks like we have agreement! 👍 Let's use AppleSingle and IDs 11,1. I'll also try to add support to CADIUS for 3,8 in certain situations like described above.

@oliverschmidt
Copy link
Author

oliverschmidt commented Feb 18, 2018

Yes, it looks like we have agreement! 👍 Let's use AppleSingle and IDs 11,1.

Great :-)

I've attached my first AppleSingle file. It's a BIN file to be loaded at $803. As the name suggests it prints "Hello, World". Please check it out and tell me if it meets your expectations / your understanding of our agreement.

For easier understanding / communication here's the assembler source I created the header from. The input the file needs is __FILETYPE__ (2-byte value with the high-byte ignored by P8), __MAIN_START__ (load address) and __MAIN_LAST__ (last address loaded to).

        .import         __FILETYPE__
        .import         __MAIN_START__, __MAIN_LAST__

; Data Fork
ID01_LENGTH = __MAIN_LAST__ - __MAIN_START__
ID01_OFFSET = ID01 - START

; ProDOS File Info
ID11_LENGTH = ID01 - ID11
ID11_OFFSET = ID11 - START

START:  .byte           $00, $05, $16, $00                  ; Magic number
        .byte           $00, $02, $00, $00                  ; Version number
        .res            16                                  ; Filler
        .byte           0, 2                                ; Number of entries
        .byte           0, 0, 0, 1                          ; Entry ID 1 - Data Fork
        .byte           0, 0, >ID01_OFFSET, <ID01_OFFSET    ; Offset
        .byte           0, 0, >ID01_LENGTH, <ID01_LENGTH    ; Length
        .byte           0, 0, 0, 11                         ; Entry ID 11 - ProDOS File Info
        .byte           0, 0, >ID11_OFFSET, <ID11_OFFSET    ; Offset
        .byte           0, 0, >ID11_LENGTH, <ID11_LENGTH    ; Length
ID11:   .byte           0, %11000011                        ; Access - Destroy, Rename, Write, Read
        .byte           >__FILETYPE__, <__FILETYPE__        ; File Type
        .byte           0, 0                                ; Auxiliary Type high
        .byte           >__MAIN_START__, <__MAIN_START__    ; Auxiliary Type low
ID01:

The header is $3A bytes long. For your convenience here are those $3A bytes from the AppleSingle file attached.

0005160000020000000000000000000000000000000000000002000000010000003a00000b600000000b000000320000000800c3000600000803

@oliverschmidt
Copy link
Author

In the meantime...

Any progress on the Cadius / Merlin32 side of things?

@mach-kernel
Copy link
Owner

Not yet, have been very busy with work. I'll try to sit down this weekend and it done (or at the very least start).

@oliverschmidt
Copy link
Author

Thanks for the quick intermediate feedback :-) Sorry if I appear to be pushy. I was just curious if we're still on the same page...

@mach-kernel
Copy link
Owner

@oliverschmidt I started, will open a PR soon. Should be done by this weekend. We're all good! 😸

@mach-kernel
Copy link
Owner

@oliverschmidt, give #13 a test run when you have some time! I did some basic tests and everything seems OK, but would appreciate an extra set of eyes. 😸

@oliverschmidt
Copy link
Author

For sure I'm willing to test it with cc65 output! However, it would be nice if you would provide me with a Windows binary to do so.

@mach-kernel
Copy link
Owner

@oliverschmidt here you go!
cadius.zip

@oliverschmidt
Copy link
Author

here you go!

Thanks :-)

I did three tests:

  1. Create an AppleSingle BIN file with cc65 and do an ADDFILE. Worked :-)
  2. Create an AppleSingle SYS file with cc65 and do an ADDFILE. Worked :-)
  3. Do an EXTRACTFILE -A and use AppleCommander to add the AppleSingle file file to another disk image. Worked :-)

So everything worked just fine out of the box! However, I have one wish: When doing an EXTRACTFILE -A no _FileInformation.txt should be created.

@mach-kernel
Copy link
Owner

@oliverschmidt consider it done! I'm going to package everything up to get it ready for release.

@oliverschmidt
Copy link
Author

Thanks for making this happen :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants