lpForth
A Linux Forth
This page contains materials I wrote for the first and second releases as well as the current one. While most of the statements can be applied to the current release, a few statements are no longer valid for the current version. You should scroll down to the end of the page if you only care about the new material. If you are not sure about something, try it out or look at the source. Source is the ultimate documentation. You can also write me, I will see what I can do.
Readme.txt from first release
The purpose to release this alpha version of my Forth is actually to try to get some helps in the further development and also to invite people who are interested in this project to join. I setup some goals that I try to achieve. Some of them have been accomplished, but there are still a lot of work should be done. I'm hoping people here can help me make improvement to this Forth, or, even better, join the efforts. I'm hoping someday this system can become useful to somebody. For now, it is just fun for me to do it.
The goals I try to accomplish in this Forth implementation are as following -
. A Forth system running under Linux OS.
. Be able to call functions (procedures) in the dynamic linked library
(.so files).
. A practice to do a metacompiling to move from one platform to another
platform.
. Follow eForth spirits to keep low level words as few as possible.
. Keep system simple, easy to understand first, concern speed and
elegance
later.
. Include utility words to make system easy to be explored.
. Include debugger to help system and application development.
. Include assembler for metacompiling and improvement of speed.
. Include metacompiler to complete development cycle.
x Conform ANS Forth requirement.
x Include floating point math.
x Hopefully, be able to do X windows programming later.
So far, I have been able to pirate from various kinds of sources and put them together to accomplished the ones listed under ".". The ones that are under "x" are the those I am hoping you can help me out so I can learn how to do that. Because I was trying to complete the full cycle of development as soon as possible when I did it, there were a lot places can be vastly improved. Sometimes, I just worked around problem without give a proper solution to certain problem. Those are also the places where I hope you can help me out. I will list the questions I can think of, but any kinds of suggestions are all welcome - including style, concept, typo.. etc.
1. How can I call .so file without using the help from C wrapper? In order to call .so file, "dlopen" and "dlsym" functions are needed to get starting address of given function. After the address is obtained, a call can be issued. Most of the time, the parameters are passed through stack, but in reverse order. To call "dlopen" and "dlsym" before we have a mechanism to call functions in "libc.so" became a problem for me. To get around this, I stole the concept from Win32Forth and Marcel Hendrix's implementation of Linux eForth to use a C wrapper. By calling into C wrapper, "dlopen" and "dlsym" become available. This is about the only place this C wrapper is necessary. If Forth can do what "dlopen" and "dlsym" do, then this C wrapper is not needed any longer. The problem is how can we do "dlopen" and "dlsym" in Forth. I tried to look at the code of these two functions in "gcc" source code, but it was too tough for me. Can somebody help? Or are there some other better methods to accomplish this goal.
2. How are "double" type numbers passed in "gcc"? When I wrote floating point math in this Forth system, I tried a lazy way to use the equivalent available functions in C libraries. A floating point number (double type) is a 64 bits number in this case. I don't know how "gcc" passes the parameter to the mach functions and how it passes the result out. Hence, I have to implement each mach function as a new C function in a way I can pass the parameters from stack and get the result back to the stack. I am hoping someone can explain to me how "gcc" pass the doubles in and out of math functions.
3. What happens to tcl/tk libraries in Red Hat Linux distribution 4.2? In the attempt to include Windows ability in this Forth system, I tried to use tcl/tk libraries. When I attempt to load the libraries with "dlopen", Linux complained a lot of unsolved symbols. I also tried Windows version of tcl/tk DLL files with Win32Forth, it worked just fine. I wonder why the Linux version of tcl/tk .so files can not be loaded just like "libc.so", or the Windows version of DLL files. Does someone has any clue to what's happening here?
The History
This project started as a attempt to implement a eForth without using MASM. I just wanted to implement a Forth and then tried to experiment some ideas. The reason I didn't use MASM is because I didn't have MASM at hand. The assemblers I have are the ones come with F-PC and Win32Forth. I know it defies the original idea of eForth. But for me, I just want to try to write a metacompiler. To use eForth as a model is because it is simple.
My first implementation was a eForth produced by F-PC. It runs under DOS. Later, after I gained more and more interests in Linux, I though it may be fun to port this implementation to Linux. Because Linux is 32-bit OS, I port this Forth system first to Win32Forth and make it a 32-bit system. At first, I made it conform to ELF file format and run as stand alone program. When I tried to add the C function calling ability to this Forth system, I can not help but to use a C wrapper, which is also used in Win32Forth. The Forth became a image which was loaded after the C wrapper running.
To this point, I had a Forth running in Linux. It can call the available C libraries in Linux if it is in .so file format. Because there are so many C libraries available, it became pretty easy to add more and more functionality to this Forth system. I tried to add memory word set, file word set, and floating point word set to the system. Although it is not complete yet, it started to become a useful system to me. I ported my filedump utility, which Tom Zimmer put in Win32Forth, to the system without a lot of trouble. This is where this Forth system stands now. I hope, by releasing it to public, it can be nurtured by all the people who are interested and someday becomes useful to some people. At least, I hope I can learn lessons from so many people out there.
The Mechanism
Metacompiling
In order to make a target system, I first allocate a block of memory for target. When the metacompiler is running, it will put the target codes in this target memory. After the whole system is done, the target system is saved as a file. In DOS, it is .com file which conform .com file format. In Linux, it follows ELF file format. When it is a image file which is loaded by C wrapper, the format is not important as long as the C wrapper can find the entry point. There are metacompiling examples in F-PC and Win32Forth, they are kind of hard to understand. I decided to do it my own way so I have full control over it.
I used the word "tname" to do the target compiling for the low level words. "tname" is a define word.
: tname ( name | -- ) \ make link field, fill name, keep cfa
check-stack
>in @
bl word "head
>in !
t_here
create ,
does> @ t_, ;
example: tname abc
In compiling mode, it will compile the following string (abc) as a Forth word as well as word in the target memory area and keep the current target "here" value (t_here) in its parameter field. The t_here will be the CFA of the target word (abc) In the interpret mode, the word abc simply put its CFA into the target area. The example for "dup" is as following -
tname dup
code l_dup
begin-cdef
mov
ebx, esp
push 0
[ebx]
lodsd
jmp
eax
c; end-cdef
"tname" makes a word "dup" in Forth dictionary which we are using. In the mean time it set up a word "dup" in the target memory area with its CFA in the first "dup" parameter filed. At this time, we have three "dup". The old "dup" which will duplicate the top of stack, the new "dup" which will compile the CFA of target "dup" to the target memory area, and the third "dup" which stays in the target memory area. The third "dup" will not be executed until the target system is finished and runs as an independent program. I then make a code word "l_dup" with the definition of "dup" we want to put on target system with the assembler in Win32Forth. The code of "l_dup" will stay in current Forth system. In order to move the code to target system, I use "begin-cdef" to mark the beginning of the code, and end-cdef to virtually copy the code to the target area. At this time, I will have a "l_dup" in the current system and a "dup" in the target system, whose definition are the same. The definition of "l_dup" can be different from the "dup" in current system if you wish to.
For the high level words, I use the word "t_:".
: t_: ( name | -- ) \ make link field, fill name, keep cfa
check-stack
>in @
bl word "head
>in !
t_here
create , 232 t_C, dolist-t
t_HERE cell+ - t_,
does> @ t_, ;
"t_:" works in a way similar to "tname" except the CFA of target word has a segment of code ("call dolist") in front of the list of CFAs that we will fill in later. To define "2drop", we use the code like this -
t_: 2drop drop drop t_;
"t_:" sets up a "2drop" in the current system and another "2drop" in the target system. Again, we have three "2drop" at this point. The old "2drop" which will drop two integer in the stack, the new "2drop" which will compile the CFA of target "2drop" to the target memory area, and the third "2drop" which stays in the target memory area. The first "drop" will put the CFA of target "drop" in the parameter field of target "2drop" following "call dolist". The second "drop" will do the same thing, but this time the CFA of target "drop" is following the CFA of first "drop". "t_;" will compile the CFA of target "exit" in the list. So the whole thing look like a compiler, but actually it stays at interpret mode all the time.
This is the mechanism of my metacompiler. It is not elegant, but it works. After I wrote my metacompiler in my own way, the metacompilers in F-PC and Win32Forth started to make sense to me. It seems writing a metacompiler is a way understand another one.
There are a lot of other things that need some explaining. I hope you can figure them out by the source code and limited comments. I will welcome any question about this system if you need to know. Please contact me at jpai@rocketmail.com. I will try to answer it for you. If it is of general interest, I will put it here, too. If there are interests, I will add more materials to explaining how I do it later.
The Documentation
After unpack the file, you can type "lpforth" to start the system. This C wrapper will then load "lpforth.img", with is the image file of the Forth system. You will get a "ok" prompt, and you can start to do what you normally do in forth. The system is case-insensitive in general. Not all of the ANS Forth words are available. This is also a part which I hope you can join and contribute. Some of the words are very useful for exploring the system.
"words" - To see the words in the system with substring ability. You can include a substring after words. "words move" will list only the words with substring "move" in them. "words" along lists all the words in the context.
"ll" - To see the source code of a forth word. "ll words" will show the source code of "words". It also show you which file the source code is in. If you need to change the source, you know which file to find it.
"see" - To see the definition of a high level word by decompiling it. It shows definition of a high level word by first showing the address then the name if it can be find, otherwise the content will be shown as it is. It is a good tool to peek the inside of this Forth system. A lot of under-the-hood operations can be revealed by this tool, such as immediate words.
"dump" - To dump a segment of memory.
".libs" - To show the libraries that have been loaded.
".procs" - To show the procedures that have been used or set.
"debug" - To trace a word's execution. For example, after typing "debug words", next time you execute "words", debugger will stop at each step.
To make a C function call, first you have to load the library by typing "library <library-file-name>". For example, "library libm.so" will load the math library. If you are working in the interpret mode, you can then issue the call as "<n1> <n2> .. <nn> n ncall <proc-name>". For example, if you type -
z" taygeta.com" 1 ncall gethostbyname
you will get the point to hostent of taygeta. The 1 here indicates there is only one input item in the stack is for "gethostbyname". Because the C function call in Linux won't clear the input items in the stack, we have to explicitly tell "ncall" how many items to clear after the call is done.
If you plan to use C functions in compiling mode, you have to declare what you want to use first by "set-proc". "set-proc gethostbyname" will prepare the C function call "gethostbyname" ready for you to place it in the definition of a high level word. The reason for this is just because it is easier to implement. Anyone want to improve this?
To metacompile the system, you can do it from Win32Forth or you can do it from lpforth itself. Before I have metacompiler for lpforth, I use Win32Forth to do metacompiling. In Win32Forth, type "fload lpforth" and you will produce a image file call "pforth.img". You can then switch to Linux and type "lpforth" to run lpforth. This version of lpforth is minimal. You can type "fload kernel.f" to build a complete system. In the beginning, I had two computers, one ran in Windows 95, the other ran at Linux, and I use floppy to transfer "lpforth.img" back and forth. Later, I ran WINE in Linux so I could run Win32Forth in Linux environment. It worked pretty well actually.
After the system became more sophisticated, I port the metacompiler to lpforth itself. To metacompile lpforth itself after you make your modification, you type "fload meta.f" to get the new minimal "lpforth.img" file. You can then quit lpforth and then enter lpforth again. This time, you will get a minimal lpforth. You can type "fload mkernel.f" to get a complete lpforth. I recommend you to backup the old "lpforth.img" file before you do any metacompiling in case something serious happens.
When you add your application on top of lpforth system, you can use "simage pforth.img" to save a new lpforth image which includes your application.
Like I said, if there is need, I will put more materials here later.
For now, that is it. Hope this will be interesting to you, too.
References:
F83
F-PC
Win32Forth
eForth
Linux eForth
.........
Update since first release
Since last release of lpForth, I have added some features and fix some bugs. The original goal is still the same - to have a Forth running in PC Linux box with the ability to call the dynamic linked library. lpForth only depends on C wrapper for a few functions. If someone can find a way to code dlopen and dlsym in Forth, the C wrapper will become unnecessary. The major changes are as following--
. Window programming. With the ability to call the .so files. It is possible to do window programming in lpForth. To show a window is not a big problem. To manage callback is more of a issue. I include three examples in this release to show three ways of programming with windows. The first one (hello.f) use X11 library directly. No callback is implemented yet in this case. The example is just to show that you can display a window from lpForth. The second example use Tcl/Tk to display window. Callback has been implemented to communicate between Tcl/Tk and lpForth. This example (tkman.f) is also handy when you need to look up manual for functions. The third example (ghello.f) use GTK as a vehicle to display window. Callback has been implemented so you can code reactions in lpForth when certain events occur. All these examples are still very crude. The purpose is just to show you can do window programming by lpForth.
. The floating point math is now more complete. With a few low level words to access floating point stack, it became relatively easy to code the floating point words without C wrapper.
. Stack comments have been added in meta.f file.
. lpForth now doesn't occupy a lot of system time when it is idle.
. "fsave" is implemented to save the system as another name. Usage: "fsave <name>"
In general, the whole system is not mature yet. I will welcome any kind of suggestions, discussions, and questions. Hope you will find it useful or interesting.
Jih-tung Pai
January, 1999
Current release
There are several additions to this release. The most interesting one to me is the target compiler. When I have finished a Forth program and make it an executable, one thing bothering me frequently is that the system contains a lot of words that are no longer needed. One way to get rid of them is manually take away the files which are no longer needed in application. This procedure is tedious and still leaves a lot of unused words in the system just because some used words reside in the same file. Wouldn't it better if I can pick up all the words that are needed and put them in the final executable and leave all the unused words behind? This way you can make the executable much smaller. In today's standard, even the whole Forth system is considered to be relative small for a program. It seems few people care about the size of program any more. However, the idea of leaving a lot of junks in the final executable makes me feel a little bit unsatisfied. The simpler the program is, the worse I feel about having a big system with a lot of junks. TCOM seems to be the system to do what I just described in DOS platform. Although you translate a program from F-PC to TCOM fairly easily, TCOM is essential a different system with different design. Back to the my DOS days, I was always thinking why I can not just develop my program in F-PC using its convenient utilities, and make a small executable when it is done -- all in F-PC. I didn't know F-PC well enough to write this target compiler. Now I have written lpforth from ground up, I though I should know this system well enough to write a target compiler. I try exactly that, and now I have a primitive version which seems to run well in regular cases.
To use the target compiler, you do "fload extract.f" to load the
program.
Afterward, typing
"target <target-word> <target-filename>" will
generate
a file named <target-filename> in the same directory. Under the
shell,
you should be able to run <target-filename> to do the work
<target-word>
is designed to do.
When you run target compiler, the main procedures are as following: First, it will tag all the words used by <target-word>. All the words used by the words which define <target-word> will also get tagged. This procedure will go on until all the words used are tagged. The next step will be building a table of all the words used. Finally, one by one, the target compiler will move the used word to target memory area. All the code field addresses for the high level words will be replaced according to the new addresses in the target area by looking up the table. Once they are all done, target compiler will write the target memory area into a new file and make it a executable.
You can try out target compiler on the example of GTK tutorial that I rewrote in lpforth -- gtk14.4.f. It was translated directly from C code. It is not Forth style at all. It is written in a way that is easier for you to compare the C code. After you load the example, you can then load "extract.f". Run "target main gtk14.4" will produce a executable named "gtk14.4". Under the shell, you can execute "gtk14.4" directly.
I have said that before, I will like to say it again. In writing a GTK program, I found it is very straightforward except figuring out the data structure. For the time being, I usually examine the C header file to calculate the offset to a field manually in the case I need to access that field. Sometimes it is very easy, but it could be hard in other cases. When the data structure is convoluted or having a lot of compiling time conditions, it is hard to figure that out by looking at the header file. In some extreme case, I even have to write a small program to test the close-by addresses just to figure out which one is correct by trial-and-error. It is a stupid and tedious way to do it. I will like to know a better way to deal with it if you know such. I will appreciate if you can share it.
You can download the current release here:
Please use it at your own risk. I will not be responsible for any damage it may cause to your system. If it makes you feel better, I have used it constantly for many months without causing any crash to my Linux box.
Unzip it to your desired directory. Type "./lpforth" in your shell to call up lpforth. Look into "meta.f" file to see how you can metacompile the whole system.
Have fun!
Jih-tung Pai
Last updated 05/07/2001