So I have been working on side project over the past couple of weeks, designing a programming language. At this point everything is still somewhat crude. I do have the tokenizer and parser complete, although the latter doesn't actually build an AST (parse tree) just yet. Before doing that I really wanted to nail down the opcode format. So I have a simple interpreter set up now which reads binary files and executes them. The code is currently being generated "by hand" in a C program (basically just writing the opcodes and data to an array which then gets written to file). I can already write simple programs with loops and such. Here is the output of one of them:
Another related worry is that I don't want the code to be able to make arbitrary jumps into a raw data section (which is of course stored within the bytecode itself) and continue execution there. I would also like to avoid the possibility of anyone abusing the bytecode format to store unreachable code or data, as that what would most likely only aide hackers/malware-makers I think so I would really like to explicitly disallow such constructs. But again, that of course means a much more complex loading process.
I realize that these are really quite abstract problems and perhaps I should just ignore them for now and move on with the project. I guess it just feel like if I don't address these issues NOW then it will only hinder things later on down the road and may even force an entire rewrite of the bytecode format (and I REALLY don't want to have to do that).
Anyone here have any thoughts on how I might proceed from here?
Pretty cool, but there is still SO much to do and I have many concerns over the bytecode format being abused or in any way allow the interpreter to enter into any kind of invalid state. For example currently badly crafted bytecode could easily jump OVER the initialization of a variable. The only problem with detecting that is that the bytecode is not so easy to scan in the FORWARD direction at that particular point of execution. To do that I would basically have to run a separate instance of the interpreter in some special mode just to be able to look ahead and detect potential invalid jumps.Value of `Index`: 0
Value of `Limit`: 3
Entering loop...
Value of `Index`: 0
Value of `Index`: 1
Value of `Index`: 2
Done!
Another related worry is that I don't want the code to be able to make arbitrary jumps into a raw data section (which is of course stored within the bytecode itself) and continue execution there. I would also like to avoid the possibility of anyone abusing the bytecode format to store unreachable code or data, as that what would most likely only aide hackers/malware-makers I think so I would really like to explicitly disallow such constructs. But again, that of course means a much more complex loading process.
I realize that these are really quite abstract problems and perhaps I should just ignore them for now and move on with the project. I guess it just feel like if I don't address these issues NOW then it will only hinder things later on down the road and may even force an entire rewrite of the bytecode format (and I REALLY don't want to have to do that).
Anyone here have any thoughts on how I might proceed from here?