FCC is an interpreter and compiler for ANS Forth. It has a portable C implementation, and a bootstrapping Forth library that runs on top of it.
- Fast - ideally competitive with Gforth.
- Currently the performance is respectable, but it doesn't rival
gforth-fast
.
- Currently the performance is respectable, but it doesn't rival
- Portable to many platforms both rich and embedded.
- It should be portable to most things with a C compiler, and translating the C to assembly by hand is tractable. Most of the heavy lifting is in Forth.
- The library is (barring bugs) flexible regarding the sizes of cells, chars and address units. It does assume cells are big enough for a pointer, so it probably won't work on 8-bit machines.
- Standard - follow the ANS Forth standard.
- Currently the Forth 2012 version, see below for compliance details.
- Interoperable with C libraries, similar to Gforth's technique.
- One of my projects is a Gameboy emulator; that requires SDL.
- No progress on this front so far.
- Unencumbered by the GPL. Gforth being licensed under the GPL and the
bootstrapping nature of Forth conspire to make it tricky, and sometimes
impossible, to release a Forth binary built with Gforth without including its
full source.
- For some of my projects, that is an unacceptable restriction.
- FCC is released under the Apache license 2.0, and therefore does not limit your ability to release FCC, or binaries built with it.
The portable C version is fairly complete. (Standards compliance details below.)
It has been tested in Linux on arm32
and x86_64
, built using gcc
. Your
mileage may vary on other platforms, processors or compilers.
One of my side projects is writing a Forth OS for the
DCPU-16,
the fake 16-bit CPU invented by Notch for the now-defunct game 0x10c
. A
partial attempt to assemble this Forth for it is in dcpu/
.
This section details the compliance of FCC with the Forth 2012 standard
FCC is a Forth-2012 System.
All standard CORE words are implemented.
Providing .(
, .R
, 0<>
, 0>
, 2>R
, 2R>
, 2R@
, :NONAME
, <>
, ?DO
,
ACTION-OF
, AGAIN
, BUFFER:
, C"
, CASE
, COMPILE,
, DEFER
, DEFER!
,
DEFER@
, ENDCASE
, ENDOF
, ERASE
, FALSE
, HEX
, HOLDS
, IS
, MARKER
,
NIP
, OF
, PAD
, PARSE
, PARSE-NAME
, PICK
, REFILL
, RESTORE-INPUT
,
ROLL
, SAVE-INPUT
, SOURCE-ID
, TO
, TRUE
, TUCK
, U.R
, U>
, UNUSED
,
VALUE
, WITHIN
, \
from the Core Extensions word set.
(Everything but S\"
and [COMPILE]
.)
Providing .S
, ?
, DUMP
, and WORDS
from the Tools word set.
(Everything but SEE
.)
Providing AHEAD
, SYNONYM
, N>R
, NR>
, [DEFINED]
, and [UNDEFINED]
from
the Tools Extensions word set.
Providing +FIELD
, BEGIN-STRUCTURE
, CFIELD:
, END-STRUCTURE
, and FIELD:
from the Facility Extensions word set.
As required in Section 4.1.1, and appearing in the same order.
- Cell-aligned addresses are aligned to the pointer size of the host machine (eg. 32 bits on 32-bit machines, 64 bits on 64-bit machines).
EMIT
will try to print whatever you give it; the result depends on the output device, generally the terminal.ACCEPT
uses GNU readline to support editing.- The character set for
EMIT
andKEY
is standard ASCII. - All addresses are character-aligned, since characters and address units are the same size (bytes, 8 bits).
- Spaces (
0x20
,' '
) and tabs (0x09
,'\t'
) are treated as spaces. - The control-flow stack is the data stack during compilation of a definition.
- Digits larger than 35 are not converted and will be treated as the end of the number being parsed.
ACCEPT
echoes the entered text. It does not put the newline character into the input, but it does display a newline.ABORT
is equivalent toQUIT
: it clears the stacks, sets the input source to the user input device, and continues interpreting.- Uses the GNU
readline
library for reading input, so line endings are abstracted. Whatever your terminal accepts, essentially. - Counted strings have a maximum length of 255 characters.
- Parsed strings have a maximum length of 255 characters.
- Definition names have a maximum length of 255 characters.
- No limit on the length of
ENVIRONMENT?
queries. - File names can be given on the command line; they will be loaded in order. Then the input device will remain the user's terminal.
- The only supported output device is the user's terminal. (It can be redirected to a file or other process, if the shell supports that.)
- Dictionary definitions take this form:
- link pointer (points to the previous definition, forming a linked list)
- metadata cell. The name is in the low byte.
0x100
indicates hidden,0x200
indicates an "immediate" word. - pointer to the name
- code field
- An address unit is that of the host machine, generally an 8-bit byte.
- Number representation is that of the host machine, generally 2s complement.
- Ranges of numeric types, where m is the number of bits in a pointer/cell
on the host machine:
n
: -(2^(m-1)) to 2^(m-1) - 1+n
: 0 to 2^(m-1) - 1u
: 0 to 2^m - 1d
: -(2^(2m-1)) to 2^(2m-1) - 1+d
: 0 to 2^(2m-1) - 1ud
: 0 to 2^(2m) - 1
- Writing to any part of data space is permitted. (Though changing eg. the dictionary's link pointers might result in ambiguous conditions.)
WORD
usesHERE
as its buffer; therefore it is large (and very transient).- A cell is the same size as a pointer on the host machine. Generally, an address unit is a byte, so a cell is 4 units on a 32-bit machine and 8 on a 64-bit machine.
- A character is a single address unit (generally a byte).
- The keyboard input buffer is dynamically allocated by
readline
, and is limited only by system RAM. (The parse buffer will truncate the input to 256 characters, however.) - The pictured numeric string area is
HERE
, therefore very large (and very transient). PAD
returns an area of 1024 address units (bytes, usually).- FCC is case-insensitive.
- The prompt looks like
" ok\n> "
. - Uses the host system's division routines. (Usually libc, which is symmetric.)
STATE
is either0
(interpreting) or1
(compiling).- Arithmetic overflow is that of the host system. Generally, wrapping around and not throwing exceptions.
- After a
DOES>
, the current definition is hidden.
General conditions, in the same order as Section 4.1.2.
-
A name is neither a valid definition name nor a valid number during text interpretation (3.4)
- The error message
*** Unrecognized word: xyz
is written to the standard error stream.
- The error message
-
A definition name exceeded the maximum length allowed (3.3.1.2)
- When a definition name is too long (more than 255 bytes) it might be considered immediate or hidden wrongly. Results will be unpredictable.
- TODO: This error case could be checked and reported nicely.
-
Addressing a region not listed in 3.3.3 Data space
- Addressing outside the data space might work normally, might segfault (or similar) or might do something else (bus errors, maybe).
- In other words, memory accesses are native system memory accesses, and trigger system errors.
-
Argument type incompatible with the specified input parameter (3.1)
- Types are not checked, and most types (eg. flags) are cell-sized integers.
- Passing wrong types therefore might result in odd behavior, but not in a checked type error.
-
Attempting to obtain the execution token of a definition with undefined interpretation semantics
- Asking for the
xt
of a word without interpretation semantics generally will return anxt
, but executing it will be an ambiguous condition (probably a segfault).
- Asking for the
-
Dividing by zero
- Dividing by zero will cause a system exception and exit FCC.
- TODO: Catch and handle that more gracefully, probably by
QUIT
ting.
-
Insufficient data-stack space or return-stack space (stack overflow)
- Stack overflows might cause a segfault (or similar) but might also overwrite other memory.
-
Insufficient space for loop-control parameters
- Loop-control parameters are on the return stack, so see above.
-
Insufficient space in the dictionary
- The dictionary headers and data space are the same block. The portable C
version
malloc()
s 4 megabytes by default; overflowing it will probably cause a segfault or similar. - TODO: Make
ALLOT
check this condition and allocate more space when possible.
- The dictionary headers and data space are the same block. The portable C
version
-
Interpreting a word with undefined interpretation semantics
- Interpreting a word with undefined interpretation semantics (like
IF
orWHILE
) will usually consume and/or add junk on the stack, and may read or write memory unpredictably, and therefore may cause a segfault.
- Interpreting a word with undefined interpretation semantics (like
-
Modifying the contents of the input buffer or a string literal (3.3.3.4, 3.3.3.5)
- Modifying the input buffer is not formally supported, but it will work as one would expect: the edited text is what gets parsed. Likewise, editing string literals should work sanely, though it's not formally supported.
-
Overflow of a pictured numeric output string
- Pictured numeric output uses data space after
HERE
, so overflow is unlikely (and described above).
- Pictured numeric output uses data space after
-
Parsed string overflow
- Parsed strings are dynamically allocated by
readline
, so overflow there is usually impossible (other than exhausting the system RAM). A maximum of 256 bytes of each input line are copied to the Forth parse buffer, so no overflow is possible there.
- Parsed strings are dynamically allocated by
-
Producing a result out of range
- Out-of-range results are treated like any arithmetic overflow. Usually that means the result is truncated to fit in a cell.
-
Reading from an empty data stack or return stack (stack underflow)
- Not checked. Might return nonsense results, or might cause a segfault.
-
:
checks that the definition name has nonzero length, and outputs*** Colon definition with no name
to standard error if a 0 length is found.CREATE
and others do not check, and will compile a word with a 0-length name, which can never be executed.- TODO:
CREATE
should check that properly.
- TODO:
Conditions specific to particular words, in the same order as Section 4.1.2.
>IN
greater than size of input buffer (3.4.1)- Should be handled as though it were equal to the parse length, ie. as an empty parse area.
RECURSE
appears afterDOES>
- Unknown. Probably a segfault or infinite loop.
- argument input source different than current input source for
RESTORE-INPUT
RESTORE-INPUT
is not implemented.
- Data space containing definition is de-allocated (3.3.3.2)
- De-allocating space containing definitions will do nothing in the near term - the latest-definition pointer is unchanged, and the definitions still exist. However, when anything is written to data space, including a new definition, the dictionary's linked list will be broken, with unpredictable results (probably an infinite loop, but maybe not).
- Data space read/write with incorrect alignment (3.3.3.1)
- Reading and writing from unaligned addresses might work normally, but might also cause a bus error or other system error. It depends on the operating system and architecture.
- Data-space pointer not properly aligned for
C,
or,
- A misaligned data pointer in
,
,C,
etc. is the same as above. This condition is not checked. - TODO:
,
could check this and give a good error message.
- A misaligned data pointer in
- Less than u+2 stack items with
PICK
andROLL
PICK
andROLL
with too few stack items is likely to cause a segmentation fault (see stack underflow), but might read/write other memory at random.
- Loop-control parameters not available (
+LOOP
,I
,J
,LEAVE
,LOOP
,UNLOOP
)- Loop-control parameters go on the return stack at runtime. If that stack has changed, then both the loop and returning will be broken unpredictably.
- Most recent definition does not have a name (
IMMEDIATE
)IMMEDIATE
works whether the previous definition has a name or not.
TO
not followed directly by a name defined by a word with "TO
name runtime" semantics (VALUE
,(LOCAL)
)TO
without a following name will emit an error message, do nothing, and continue.
- name not found (
'
,POSTPONE
,[']
,[COMPILE]
)- An
xt
of0
is silently returned. That will segfault if passed toEXECUTE
or compiled into a definition which is later executed. - (
[COMPILE]
is obsolete and not implemented.) - TODO: This condition is easily checked.
- An
- Parameters are not of the same type (
DO
,?DO
,WITHIN
)- Parameter mismatch to
DO
,DO?
, andWITHIN
is ambiguous. It's unchecked, and unknown what might happen.
- Parameter mismatch to
POSTPONE
,'
or[']
is applied toTO
- Attempting to get the
xt
ofTO
(with'
,POSTPONE
or[']
) should succeed silently. Executing thatxt
is ambiguous (probably segfault).
- Attempting to get the
- String longer than a counted string returned by
WORD
- Lengths longer than 255 will be reduced modulo 256.
- u greater than or equal to the number of bits in a cell (
LSHIFT
,RSHIFT
)- Shifting by more than the width of a word will probably result in 0, harmlessly.
- That might not be true on architectures that handle arithmetic overflows with errors rather than truncating to fit in a cell.
- word not defined via
CREATE
in>BODY
,>DOES
>BODY
for words not defined byCREATE
will return the address of the second cell after their code field. Harmless, but not useful in general.DOES>
for words not defined byCREATE
will overwrite the cell 2 cells after the code field, which might be any other data, like the start of the next word.
- Pictured numeric output words improperly used outside
<#
and#>
- Unpredicable in general. Usually, this will mangle the top few cells on
the stack, and write into data space after
HERE
.
- Unpredicable in general. Usually, this will mangle the top few cells on
the stack, and write into data space after
- Access to a deferrred word which has yet to be assigned to an xt
- Attempts to deference
0
, causing a segfault.
- Attempts to deference
- Accessing a non-deferred word like it was deferred
- Generally will treat the word like a variable. That is, the first cell after its code field will be read and written.
- TODO This could be checked, probably.
POSTPONE
,'
or[']
toACTION-OF
orIS
- Will probably work normally.
ACTION-OF
andIS
are pretty ordinary (immediate) words.
- Will probably work normally.
\x
is not followed by two hexadecimal characters (S\"
)S\"
is not supported.
- a
\
is placed before any character other than those defined inS\"
S\"
is not supported.
- There are no nonstandard words that use
PAD
. It belongs to the user. - The terminal input is read with
readline
, so line editing it supported nicely.- History is not supported, however.
- TODO: History would be handy.
- Data/dictionary space is allocated as 2^12 address units (usually 4 megabytes)
by default.
- (That's roughly 1 million cells on a 32-bit machine, or half a million cells on a 64-bit machine.)
- More can be allocated with the nonstandard phrase
nnn (ALLOCATE) (>HERE) !
if desired (wherennn
is some number of address units).
- The return stack holds 1024 cells.
- The data stack holds 16,384 (16K) cells.
- The system uses less than 2000 cells of data space currently.