Off Heap Memory?

Concurnas provides support for managing data off heap. Since Concurnas is an Object oriented garbage collected language, data, in the form of Objects, is managed in a subsection of the RAM of the machine upon which it's operating called the heap. This is generally only a portion of the RAM available to and a fraction of the persistable storage available (SSD's, disk drives etc) to the machine.

Overview?

The off heap memory management functionality provided by Concurnas affords us three key advantages:

  • We are able to work with datasets which are significantly larger than what is possible to store on heap. For instance, we may be running our program with a with a 8GB heap, 128GB of RAM and many terabytes of physical disk based storage - with off heap memory management we can reside and work with our data in this RAM and physical disk seamlessly.

  • We can perform our own memory management. This is often preferable in cases where we are working with large datasets which are resident in memory for large durations of time and/or have access patterns which we are aware of and in control of ahead of runtime. This frees up our garbage collector to focus on other areas of our program's operation.

This functionality is provided in the form of off heap stores and key value pair maps which can be backed in either RAM or disk. These are intuitive and easy to use data structures and are already a very popular industry approved means of working with large datasets.

Serialization of objects?

All objects in Concurnas are sterilizable to and from binary format - this extends naturally to object graphs (i.e. the total data structure referenced by an object and all its fields, and it's fields fields etc). Concurnas is even able to cater for cycles in serialized object graphs by default. It is through this mechanism of serialization that objects are able to be managed off heap, being marshalled back into an object form when they are required in heap memory, and marshalled into binary form for storage/persistence off heap.

This default serialization scheme is added to all classes executed by Concurnas at runtime.

sizeof?

We can make use of the sizeof keyword in order to determine the amount of bytes a serialized object graph will consume off heap:

anArray = [1 2 3 4]

sz = sizeof anArray //sz == 37

This can be useful for when when working with large objects in an environment where we have a limited amount of off heap memory.

Custom Serialization?

Sometimes the default serialization mechanism is not adequate. In these instances a custom user defined serialization mechanism may be defined. This may take the form of either implementing the Serializable interface or Externalizable interface.

Implementing the Serializable interface?

A class may implement the java.io.Serializable interface, in doing so it will use the default serialization strategy employed by Java in order to serialize its graph. For example:

class MyClass(firstName String, secondName String) ~ java.io.Serializable

Optionally, as par Serializable one may define a pair of writeObject and readObject methods in order to perform the serialization. For example, on a custom ArrayList:

class MyArrayList ~ java.io.Serializable{
  items String[]
  highwatermark = 0
	
  this(){
    items = new String[10]
  }
	
  this(startsize int){
    items = new String[startsize]
  }
	
  def add(what String){
    if(highwatermark >== items.length){
      newitems = new String[Math.ceil(items.length * 1.2)as int]
      System.arraycopy(items, 0, newitems, 0, items.length)
      items = newitems
    }
    items[highwatermark++] = what
  }
	
  private def substr(){
    items[0 ... highwatermark] if items  <> null else 'its null'
  }
	
  override toString() => "" + substr()
	
  private def writeObject(s java.io.ObjectOutputStream) void{
    s.defaultWriteObject()
    s.writeInt(highwatermark)
    for (i=0; i<highwatermark; i++) {
      s.writeObject(items[i])
    }
  }
	
  private def readObject(s java.io.ObjectInputStream) void {
    s.defaultReadObject()
    highwatermark = s.readInt()
    items = new String[highwatermark] 
		
    for (i=0; i<highwatermark; i++) {
      items[i] = s.readObject() as String
    }
  }	
}

Implementing the Externalizable interface?

A class may implement the java.io.Externalizable interface, in doing so we are obliged to define a pair of writeExternal(outx java.io.ObjectOutput) and readExternal(inx java.io.ObjectInput) methods which perform our serialization and deserialization. For example, on a custom ArrayList:

class MyArrayList ~ java.io.Externalizable{
  items String[]
  highwatermark = 0
	
  this(){
    items = new String[10]
  }
	
  this(startsize int){
    items = new String[startsize]
  }
	
  def add(what String){
    if(highwatermark >== items.length){
      newitems = new String[Math.ceil(items.length * 1.2)as int]
      System.arraycopy(items, 0, newitems, 0, items.length)
      items = newitems
    }
    items[highwatermark++] = what
  }
	
  private def substr(){
    items[0 ... highwatermark] if items  <> null else 'its null'
  }
	
  override toString() => "" + substr()
	
  public def writeExternal(outx java.io.ObjectOutput){
    outx.writeInt(highwatermark)
    for(n=0; n < highwatermark; n++){
      outx.writeUTF(items[n])
    }
  }
	
	
  public def readExternal(inx java.io.ObjectInput){
    highwatermark = inx.readInt()
    items = new String[highwatermark]
		
    for(n=0; n < highwatermark; n++){
      items[n] = inx.readUTF()
    }
  }
}

Unserializable Objects?

All objects in Concurnas are serializable with exception of:

  • Actors.

  • Any class marked as transient. See here for more details.

Transient fields?

Classes may have transient fields declared within them. These behave like regular fields except that when serialized via the default strategy provided by Concurnas, or the default java.io.Serializable strategy, they are not converted to/from binary format. A field may be declared transient by using the transient keyword as follows:

class MyClass(transient firstName String, secondName String){
  transient yearOfBirth int
}

Upon serialization and deserialization transient fields will not be populated, thus in a deserialized object any non primitive, non array type transient fields will have a default value of null attributed to them, and primitive types the equivalent of 0. It is because of this behaviour that non primitive, non array type transient fields are always nullable.

This can be useful in instances where a local resource is tied up to a Object which needs to be persisted or otherwise managed off heap, for instance a database connection. Note that excessive use of transient fields can be a code smell indicating unorthodox design.

Default Transient fields?

When it comes to the Serialization of transient fields with default values the behaviour differs contingent upon the variant of serialization used. For example:

class MyClass(transient firstName String = "dave", secondName String){
  transient yearOfBirth int = 1970 //transient field with default value
}

The fields firstName and yearOfBirth will be deserialized to their respective default values if either:

  • The default serialization strategy offered by Concurnas is used.

  • Explicit defaulting of the fields is within the appropriate methods of a class extending either java.io.Externalizable or java.io.Serializable

Off Heap Stores?

Concurnas offers Off Heap Stores, these may reside either in memory via the com.concurnas.lang.offheap.storage.OffHeapRAM class or on disk via the com.concurnas.lang.offheap.storage.OffHeapDisk class - these are subtypes of the com.concurnas.lang.offheap.storage.OffHeapPutGettable class. They allow us to store object graphs off heap and provide us with objects we can use in order to interact with those off heap objects.

Creating Off Heap RAM Stores?

In order to create an off heap RAM store we must specify the size of our off heap stores in bytes, for instance, 10MB is:

10meg = 10 * (1024**2)

This can then be used within our OffHeapRAM store, with generic qualification to store an array of Strings as follows:

from com.concurnas.lang.offheap.storage import OffHeapRAM
from com.concurnas.lang.offheap import OffHeapObject

msg1 = ["hello" "world"]
msg2 = ["nice" "day"]

10meg = 10 * (1024**2)

offHeapRamStore = new OffHeapRAM<String[]>(10meg)
offHeapRamStore.start()

offHeapObj1 OffHeapObject<String[]> = offHeapRamStore.put(msg1)
offHeapObj2 OffHeapObject<String[]> = offHeapRamStore.put(msg2)

gotMsg1 = offHeapRamStore.get(offHeapObj1)
gotMsg2 = offHeapRamStore.get(offHeapObj2)

offHeapRamStore.close()

assert gotMsg1 == msg1
assert gotMsg2 == msg2//equal by value

assert gotMsg1 &<> msg1
assert gotMsg2 &<> msg2//different by reference

There are a few things going on with the OffHeapRAM store above:

  1. The OffHeapRAM store is explicitly started via a call to the start method.

  2. Then we store objects within them via the put method, this returns to us an object reference of type OffHeapObject which we can use in order to obtain a copy of the object from the store.

  3. We then obtain a copy of the stored objects from the store using the get method. Note that these objects are copies, so they are (by default) equal by value, but different by reference.

  4. We then shut down the OffHeapRAM store using the close method. It is important that this is done so as to avoid a memory/resource leak.

Using OffHeapObject's?

The returned OffHeapObject object references from our object store can be passed around our program as par normal objects. They may be deleted by using the del keyword or calling the delete method - this will remove their referenced object from the object store. Similarly, when OffHeapObject object references go out of scope and become garbage collected, the object to which they refer is removed from their host object store - however, it is still best practice to explicitly delete the object when it is known to not be of use. Here is an example:

del offHeapObj1
offHeapObj2.delete()

When working with OffHeapObject objects it is not necessary to have immediate knowledge of the object store to which they reference, since they have a getManager method which can provide this information, additionally, the get method may be called in order to obtain a copy of the object to which the OffHeapObject object refers, for example:

gotMsg1a = offHeapObj1.getManager().get(offHeapObj1)
gotMsg1b = offHeapObj1.get()

assert gotMsg1a == gotMsg1b //equal by value
assert gotMsg1a &<> gotMsg1b//different by reference

Creating Off Heap Disk Stores?

The Off Heap Disk Store is a good mechanism for storing large amounts of temporary data off heap outside of RAM. The Off Heap Disk Store is backed by a memory mapped file which greatly enhances performance as spatially localized data is cached in memory.

As with the OffHeapRAM store, the OffHeapDisk store must be provided with a store size. Additionally however, a file path may be provided. This file will be used to store the data held in the OffHeapDisk store. If a file path is not provided, a temporary file will be created. If the file path exists already it will be erased, additionally, the file will be removed upon the close method of the store being called.

The OffHeapDisk store exposes an additional method setPreallocate which if called with true before the store is started, will result in the temporary file used to back the data of the store being fully allocated on disk. If this is not set then the file will grow as required.

Here is an example of the OffHeapDisk store in action, it is very similar to the OffHeapRAM store:

from com.concurnas.lang.offheap.storage import OffHeapDisk
from com.concurnas.lang.offheap import OffHeapObject

msg1 = ["hello" "world"]

10meg = 10 * (1024**2)

offHeapRamStore = new OffHeapDisk<String[]>(10meg)
offHeapRamStore.start()

offHeapObj1 OffHeapObject<String[]> = offHeapRamStore.put(msg1)

gotMsg1 = offHeapRamStore.get(offHeapObj1)

offHeapRamStore.close()

assert gotMsg1 == msg1//equal by value

assert gotMsg1 &<> msg1//different by reference

The Off Heap Disk Store is not designed for permanent Object persistence since at the point of shutdown of a process with a OffHeapDisk store the necessary handles (such as OffHeapObject objects) are lost. For true persistence, an off Off Heap Disk Map is recommended, since they provide a key reference that can be used in order to refer to objects post process shutdown and resumption.

Managing Off Heap Stores?

We can examine the amount of space we have allocated and have remaining in the store via the getCapacity and getFreeSpace methods respectfully.

Over time an object store may become fragmented (see: Defragmentation). As such although there may appear to be plenty of space available for an object allocation, there in fact may not be due to fragmentation. In this situation Concurnas will automatically compact and defragment the store in order to free up space. This is a slow operation, for this reason it is recommended that stores be monitored for becoming close to capacity, a good rule of thumb is to over allocate space by 50% more than what is expected to be required.

We can adjust the amount of space allocated to the store by calling the setCapacity(size long) method. This method may be called on the store before or after it has been started via a call to the start method. If it is called after the start method, and the amount of space reduced, the store will be compacted and defragmented so as to ensure that it can fit into the newly allocated reduced space. If it cannot an exception will be thrown.

If a OffHeapRAM or OffHeapDisk store is closed (by calling of the close method) all outstanding OffHeapObject objects are invalidated. Attempting to extract an object referenced by a OffHeapObject will result in an exception. Additionally, Objects may not be persisted after the store has been closed.

Off heap stores and OffHeapObject's cannot be shared between isolates. In cases where shared access to an off heap map is required, it is recommended that an actor be used in order to achieve this.

Off Heap Map?

Now let us examine the core of the off heap memory support in Concurnas, off heap key value pair maps. These may reside either in memory via the com.concurnas.lang.offheap.storage.OffHeapMapRAM class or on disk via the com.concurnas.lang.offheap.storage.OffHeapMapDisk class. Both of these implementations implement the java.util.Map interface. They both behave in a very similar manner to the Off Heap stores we have previously explored.

The OffHeapMapDisk class offers all the same functionality as the OffHeapMapRAM class but with the added benefit of being disk backed, enabling permanent persistence of off heap objects. Although the OffHeapMapDisk implementation it is not as fast as the RAM based backing since it is disk backed, it does make use of memory mapped files in order to improve access times to data.

Creating Off Heap RAM Maps?

In order to create an off heap map we must specify the size of our off heap structure in bytes, for instance, 100 MB is:

100meg = 100 * (1024**2)

This can then be used within our OffHeapMapRAM store, with generic qualification to map from a String as key to String[] as value as follows:

from com.concurnas.lang.offheap.storage import OffHeapMapRAM

msg1 = ["hello" "world"]
100meg = 100 * (1024**2)

offHeapRamStore = new OffHeapMapRAM<String, String[]>(100meg)
offHeapRamStore.start()

offHeapRamStore.put('msg1', msg1)

gotMsg1 = offHeapRamStore.get('msg1')

offHeapRamStore.close()

assert gotMsg1 == msg1//equal by value
assert gotMsg1 &<> msg1//different by reference

As with the Off heap stores there are a few things above going on:

  1. The OffHeapMapRAM map is explicitly started via a call to the start method.

  2. Then we store objects within the map via the put method, this returns a copy of the previous object persisted if any.

  3. We then obtain a copy of the stored objects from the store using the get method. Note that these objects are copies, so they are (by default) equal by value, but different by reference.

  4. We then shut down the OffHeapMapRAM map using the close method. It is important that this is done so as to avoid a memory/resource leak.

Notice that, unlike off heap stores, OffHeapObject objects are not returned from the put method calls. We do not need OffHeapObject objects because we can use the keys we have referenced.

Using Off Heap Disk Maps?

The Off Heap Disk Map is a good mechanism for storing large amounts of temporary data off heap outside of RAM. The Off Heap Disk Map is backed by a memory mapped file which greatly enhances performance as spatially localized data is cached in memory.

As with the OffHeapMapRAM store, the OffHeapMapDisk store must normally be provided with a map size. Additionally however, a file path may be provided. This file will be used to store the data held in the OffHeapMapDisk store. If a file path is not provided, a temporary file will be created. If the file path is populated and points to a file which already exists, this file will be used as the backing store, any previously persisted mappings within the file will be accessible. It is through this means that persistence of data may be achieved.

The OffHeapMapDisk store exposes a number of additional methods:

  • setPreallocate - if called with true before the store is started, will result in the temporary file used to back the data of the store being fully allocated on disk. If this is not set then the file will grow as required.

  • setRemoveOnClose - if called with true, will result in the backing file used for the map being removed upon the close method being called on the map.

  • setCleanOnStart - if called with true before the store is started, will result in the file used to back the data of the store being erased when the map is started.

Here is an example of the OffHeapMapDisk store in action, it is very similar to the OffHeapMapRAM store:

from com.concurnas.lang.offheap.storage import OffHeapMapDisk

msg1 = ["hello" "world"]
100meg = 100 * (1024**2)

offHeapRamStore = new OffHeapMapDisk<String, String[]>(100meg)
offHeapRamStore.start()

offHeapRamStore.put('msg1', msg1)

gotMsg1 = offHeapRamStore.get('msg1')

offHeapRamStore.close()

assert gotMsg1 == msg1//equal by value
assert gotMsg1 &<> msg1//different by reference

Off heap map management?

As with off heap stores, the same points regarding management apply to off heap maps...

We can examine the amount of space we have allocated and have remaining in the map via the getCapacity and getFreeSpace methods respectfully.

Over time an object map may become fragmented (see: Defragmentation). As such although there may appear to be plenty of space available for an object allocation, there in fact may not be due to fragmentation. In this situation Concurnas will automatically compact and defragment the map in order to free up space. This is a slow operation, for this reason it is recommended that maps be monitored for becoming close to capacity, a good rule of thumb is to over allocate space by 50% more than what is expected to be required.

We can adjust the amount of space allocated to the map by calling the setCapacity(size long) method. This method may be called on the map before or after it has been started via a call to the start method. If it is called after the start method, and the amount of space reduced, the map will be compacted and defragmented so as to ensure that it can fit into the newly allocated reduced space. If it cannot an exception will be thrown.

Objects may not be persisted to a Off heap map after it has been closed.

Off heap maps cannot be shared between isolates. In cases where shared access to an off heap map is required, it is recommended that an actor be used in order to achieve this.

Schema evolution?

One of the strongest points of the Concurnas off heap map implementation is its support for schema evolution. We term schema evolution as being changes to a class after it has been persisted. For instance, adding a new data field, changing a type etc. This turns out to be a surprisingly normal operation performed in enterprise computing and unfortunately the subsequent required data migration is something which consumes a lot of time and effort. Traditionally this would cause a problem for us upon deserialization since the persisted version of the class code would not match that of the current 'live' version, but Concurnas is largely able to account for these sorts of evolutionary changes.

Additionally, Concurnas is able to store multiple evolved versions of the same class within an off heap data structure (either the map or store objects above). In this way Objects which have been serialized in a previous format usually do not require explicit migration to a new format.

Supported evolutions?

Concurnas is able to support the following evolutions to Objects in isolation and in combination:

Removing a field?

class MyClass(firstName String, sirName String)
//Later version:
class MyClass(firstName String)

When we deserialize a class having an evolved definition with a removed field, this removed field will simply be ommitted from the deserialized object.

Adding a field?

class MyClass(firstName String)
//Later version:
class MyClass(firstName String, sirName String)

When we deserialize a class having an evolved definition with an additional field, this additional field will be set to its default/initial value, the equivalent of 0 for a non array primitive type, and null otherwise.

If a default value/initial value for the new field is specified then this value will be populated for the new field in deserialized objects.

Changing the type of a field?

class MyClass(firstName String, sirName String, userid byte)
//Later version:
class MyClass(firstName String, sirName String, userid long)

When we deserialize a class having an evolved definition with a field with a different type from that of its persisted version the behaviour we encounter is contingent upon the variant of type evolution employed. This is summarized below:

Variant

Example

Behaviour

Boxed? primitive to Boxed? primitive

int -> double

Equivalent to a cast operation

Boxed? primitive array to Boxed? primitive array

int[] -> Double[]

Equivalent to a cast operation

Any array to scalar

Integer[] -> Integer

Cannot be converted

Any scalar to array

Integer -> Double

Cannot be converted

Any type to String

MyClass -> String

Equivalent to toString

Any array to String array

MyClass[] -> String[]

Equivalent to array^toString

Class to trait

Child -> MyTrait

Equivalent to a cast operation

Subclass to superclass

Child -> Parent

Equivalent to a cast operation

Superclass to subclass

Parent -> Child

Cannot be converted

Unrelated classes

Myclass -> MyOtherClass

Cannot be converted

In situations in which a value cannot be converted, the default value for the type (0 for a non array primitive type, and null otherwise) will be used unless a default value/initial value for the field is specified.