Replies: 2 comments 3 replies
-
It was unexpected to me too that Forth-94 does not provide a portable way to read/write octets (not only in a file, but in memory too). You can only detect the character size and employ different custom implementations depending on that. The characters were intentionally made independent from octets. Ditto for address units. An API to read/write octets is possible anyway, but it was not designed. Forth-2012 does not provide any means for octet-oriented I/O too. A character size is at least one octet, and it may be more than one octet. In the next version a character size is always one address unit, but an address unit can still be more than one octet. In some Forth implementations 1 chars = 1 cells = 1 address unit that is 4 octets (for example, jsForth). If a program assumes that a character size is 1 octet, then this program has an environmental dependency.
A notion of primitive character was introduced in Forth-2012, but there is still a number of lacunae and inconsistencies. |
Beta Was this translation helpful? Give feedback.
-
I was half expecting to see Anyway, I see how cumbersome it can be to solve when the address unit is greater than 8 bits.
I had considered a new mode too and it probably would be easiest; just figured
Guess that is the big question. Would anyone else see this as useful enough to implement?
I see now how that makes sense. @ruv Thank you for your insight and time. |
Beta Was this translation helpful? Give feedback.
-
In Forth 2012 draft 19.1 section 11 File Access words, such as
READ-FILE
andWRITE-FILE
, talk about reading / writing characters. I suspect the text was meant to talk in terms of bytes (octets) since there are no other words in section 11 to read / write bytes. Referring to "characters" could imply UTF-8 or ASCII (UTF-8 subset), which involves additional considerations.The
READ-LINE
andWRITE-LINE
words also refer to characters instead of bytes. The size of a buffer in bytes can be very different from one counting UTF-8 characters and the handling of multibyte UTF-8 sequences.Given that the File Access words mirror similar functions of C stream I/O or POSIX file I/O, both which operate with bytes in mind, should the terminology used by section 11 be rephrased? I suspect the reference to "characters" is legacy, but with UTF-8 support the distinction between characters and bytes needs to be more clear.
Beta Was this translation helpful? Give feedback.
All reactions