Memory management in fluffos for the vm

I still think we should migrate the virtual machine to using some form of garbage collection or perhaps the new std::shared_ptr<> interface. The readability of the current code and the maintainability is eroded by the presence of ref bumping since in some cases in relatively straighforward code reasoning about these details are not always clear.

My thoughts on the garbage collector-

add a mark field to the relevant values and use this to support a mark and sweep collector.
remove the all the ref count fields.

use the vm stack and the otable/callout/living hash tables as the root set (and other things that are stored internally by the driver).

I’ve been dying to know what situation will cause leak in current implementations.

Instead of implementing GC (which is a huge undertaking I assume), I would love to see following work done:

  1. 1-depth or 2 depth cycling detection on assignment , would this help catches biggest mistakes?
  2. lifetime accounting, have some interface for user to write code to account lifetime of dangling items, help them find the leakage and provide means for them to deallocate it within LPC .
  3. even if we do implement a m&s GC, do it with mark first. Put in necessary fields, make a repeating tick event to mark stuff and output result, have a interface to hookup to 2) , make the problem more apparent to the user.

I have implemented a mark and sweep collector before in lpc. I think the key in getting it right is to have a bug-free trace routine and correctly identifying the root set. One additional requirement is that all allocations need to be recorded since one needs a double pointer system to sweep up the unreferenced data.

I looked it over just now and it doesnt seem particularly difficult to do. There are still some details I need to figure out including where to hook the mark/sweep function into and how often it should be called.

A bigger problem is I think it would be good to have some sort of a plan on how to “evolve” the vm. For example if one were to support a llvm ir based JIT this would have to be planned for since llvms bitcode underlying data structures are in some ways closer to C than C++.

My current thoughts are to try to somehow collect the svalue and various other lpc types into a single hierarchy. I probably need to spend some effort studying vm tech a bit more. v8 looks good as an example though it is much more complex than fluffos since it has a lot of optimization.

problems of note from an OO perspective… it would be nice if internally the lpc datatypes mirrored their counterparts in terms of usage in lpc. In lpc everything is a mixed type though there is some unsound “static” type checking that makes this look like it isn’t the case. I think it would be nice if operator overloading and custom type constructors (initializer lists?) could be used to get the same effect in C++ with perhaps the exception of the anonymous function expressions.

Main difficulty here with be discriminating between hand managed data and vm data. I looked over the gc inside ldmud and it seems that it isn’t really a true gc more of a hybrid since it still uses the ref counts.