Hi all, I was recently asked about the possibility to store large amounts of linked data efficiently in DXL.
So in this post we want to take a look at two questions: MotivationA very simple example would be to simply store the contents (or the links) of a module from DXL. Most people would probably use a "Nested Skip" approach. The code could look like this:
// ********** library that caches our objects **********
Skip skObjects = create() // key: AbsNo, value: AttributeSki
void cacheObject(Object o, Skip skAttrs) {
int nr = o."Absolute Number";
// key: Attribute Name, Value: Attribute Value
Skip skValues = createString()
// collect attribute values in skValues
string s; for s in skAttrs do put(skValues, s, o.s "");
// put skip in skip
put(skObjects, nr, skValues)
}
Skip getValues(int nr) {
Skip sk = null;
find(skObjects, nr, sk);
return sk;
}
// ****************************************************
Module m = current
Skip attrList = create();
AttrDef ad;
for ad in m do {
if (ad.object) {
string sName = ad.name;
put(attrList, sName, sName);
}
}
Object o; for o in entire m do cacheObject(o, attrList);
// Get attribute values of object 4
Skip sk = getValues(4)
string s; for s in sk do { print ((key sk) string) ":" s "\n"; }
The Problem of allocations and complexity of nested structures
One can imagine that this becomes very complex, once more complex data is being stored, e.g. store all links of an object. The same is true for storing "class instances", i.e. object like data in DXL. Whenever a large amount of instances will be created, Skip lists are a bad choice for storage. While for more recent DOORS versions the allocations and deallocations are much quicker, the effect of having a large number of allocations is still relevant. You can read about allocations here: or run the attached example script (allocation_problem.dxl), to see the effect. Example ProblemSo imagine you want to store the structure of a complete DOORS project, with all the links inside, so you can make your awesome traceability analysis, without constantly closing and opening modules. So normally you will come up with a class structure like this (classes prefixed with S to avoid name clashes with DOORS types):
class SModule {
// modules have top level objects, and module attributes
SObject[] topLevel;
SAttrValue[] moduleAttributeValues;
}
// Objects have childs, links and attribute values
class SObject {
SObject[] childs;
SLink[] outlinks;
SAttrValue[] attributeValues;
}
// Links have a source object, a target object and link attributes
class SLink {
SObject source;
SObject target;
SAttrValue[] linkAttributes;
}
// an attribute value has an attribute name and an attribute value
class SAttrValue {
string name;
string value;
}
As you can see, with this structure there a two problems: Using classes - how to use arrays to reduce allocations
Most of you should by now know, that we can translate the above class hierarchy to DXL code, by creating custom structs. I already described how to make structs at several places in the forum, e.g. here the first time 2010: So to create a class like this:
class myType {
string StringProperty;
}
instead of doing this:
struct myType {}
DxlObject DxlObjectOf (myType T) { return (addr_ T) DxlObject }
string getStringProperty (myType T) { return ((DxlObjectOf T)->"stprop") string }
void setStringProperty (myType T,string s) { (DxlObjectOf T)->"stprop" = s }
myType create_myType () {
DxlObject x = new()
myType mt = (addr_ x) myType
setStringProperty (mt, "Initial Value")
return mt
}
we will be doing this:
struct myType {}
Array garmyTypeInstances = create(100000, 1) // one column
int gmyTypeInstanceCount = 0;
string getStringProperty (myType T) { return return (get(garmyTypeInstances, (addr_ T) int, 0)) string }
void setStringProperty (myType T,string s) { put(garmyTypeInstances, s, (addr_ T) int, 0) }
myType create_myType () {
myType mt = (addr_ gmyTypeInstanceCount) myType
setStringProperty (mt, "Initial Value")
gmyTypeInstanceCount++
return mt
}
What have we won?
- Instead of allocating one DxlObject per instance, we only allocate 1 (!) array for all instances! So the following template can be used for ALL classes, you just need to replace <<<TYPE NAME>>> by the name of the class:
// atomatic generated interface for class <<<CLASSNAME>>>
int gi<<<CLASSNAME>>>PropertyCount = 20; // Number of properties
Array gar<<<CLASSNAME>>>PropertyMemory = create(10000, gi<<<CLASSNAME>>>PropertyCount );
// The maximum allocated object count
int gi<<<CLASSNAME>>>HandleCount = 0;
// Stack of Free Indices to be reused on next allocation
Skip skFree<<<CLASSNAME>>>Nodes = create();
int giFree<<<CLASSNAME>>>Nodes = 0;
struct <<<CLASSNAME>>> {};
// constructor & destructor for List
<<<CLASSNAME>>> create<<<CLASSNAME>>>_ () {
<<<CLASSNAME>>> x = null
// Do we have a free node to reassign?
if (giFree<<<CLASSNAME>>>Nodes > 0) {
//print "Reallocating Stack Nr " giFree<<<CLASSNAME>>>Nodes "\n"
if (!find(skFree<<<CLASSNAME>>>Nodes, giFree<<<CLASSNAME>>>Nodes--, x)) error "<<<CLASSNAME>>> memory error!"
} else {
// allocate a new node
x = (addr_ (++gi<<<CLASSNAME>>>HandleCount) ) <<<CLASSNAME>>>;
}
//print "Allocated <<<CLASSNAME>>> node " ((addr_ x) int) "\n"
return x
}
void delete<<<CLASSNAME>>>_ (<<<CLASSNAME>>> &t) {
// put in free list nodes
<<<CLASSNAME>>> deref = t
if (null deref) {
print "Trying to delete null <<<CLASSNAME>>> at : " dxlHere()
halt
}
put (skFree<<<CLASSNAME>>>Nodes, ++giFree<<<CLASSNAME>>>Nodes, deref, true)
// print "Freeing <<<CLASSNAME>>> Slot: " (<<<CLASSNAME>>>Handle t) " Value " (strVal t) " in Stack Nr: " (giFree<<<CLASSNAME>>>Nodes-1) "\n"
}
Note the following though:
- When a node is deleted and then later reallocated it will still have the values of the previous allocation, so make sure you properly initialize all values in the real constructur. While this template solves the problem of not creating allocations for class instances we still need to look on how to create properties for the class and how to solve the nesting problem (i.e. storing arrays inside a class instance without creating allocations). Class PropertiesProperties can be added to an array managed class in an easy way (as already shown above):
string getMyProperty ( MyClass x ) { return (get(garMyPropertyPropertyMemory, (addr_ x) int, 3)) string }
void setMyProperty ( MyClass x, string val) { put(garMyPropertyPropertyMemory, val, (addr_ x) int, 3) }
So making this a generic template we can do:
// Simple property of type <<<INDEX>>>
// Getter for <<<PROPERTY NAME>>>
<<<PROPERTY TYPE>>> get<<<PROPERTY NAME>>> ( <<<CLASSNAME>>> x ) { return (get(gar<<<CLASSNAME>>>PropertyMemory, (addr_ x) int, <<<INDEX>>>)) <<<PROPERTY TYPE>>> }
// Setter for <<<PROPERTY NAME>>>
void set<<<PROPERTY NAME>>> ( <<<CLASSNAME>>> x, <<<PROPERTY TYPE>>> val) { put(gar<<<CLASSNAME>>>>PropertyMemory, val, (addr_ x) int, <<<INDEX>>>) }
where Making a "virtual function" template is left as an exercise for the reader (See github tutorial for virtual functions). Solving the problem of array propertiesOk, now lets take a look on how we can implement those array properties efficiently. As already mentioned using a Skip property is not indicated since it will again create one allocation per instance. Therefore what we will do is replace the Skip by our own implementation called "List" which has the same efficient memory management as shown above.
class ListNode {
_x value
ListNode nextNode;
ListNode currentNode; // for iteration over the list
}
Note that we have an "underscore" type parameter in the list, so our list can use any data type. For these kinds of properties the property definition is a bit different (see the attached List.dxl file) We added a currentNode property to our list so we can iterate it later from DXL with an easy syntax. Unfortunately in DXL we cannot implement a custom for loop, for iterating over the list. To implement an iteration protocol for DXL with a nice syntax I propose:
ListNode iterator = iterate(myList);
int value;
while (next(iterator, value)) { print "Value: " value "\n"; }
which is pretty similar to iterating over a skip list:
Skip sk;
int value;
for value in sk do { ...}
Feel free to adapt the example list implementation in the attached file as you which (fast append to the end of the list, etc.). The important stuff is, that now having a List which items do not produce any allocation we are free to implement our classes that will not use any allocation. Applying the approach to the Example ProblemSo assuming that we have made our classes SObject, SModule, SLink and SAttrValue according to the above description we can now easily code our complete project cache script:
Project p = current;
Item I;
Skip skModules = createString();
for I in P do {
if (type I != "Formal") continue;
Module mod = read(fullName I, false);
SModule smod = cacheModule(mod);
put(skModules, fullName sMod, smod);
}
with the cacheModule code looking like this:
void cacheLink(Link L) {
// ... we need some cache here to make sure the link target modules and objects are not created twice ...
// ... left as excercise for the reader ...
}
void cacheObject(Object obj) {
SObject sobj = createSObject();
Object oChild; for oChild in all obj do {
appendNode(getChilds sobj, cacheObject oChild);
}
Link L; for L in all obj->"*" do {
appendNode(getLinks sobj, cacheLink L);
}
AttrDef ad; for ad in module obj do {
if (ad.object) {
string sName = ad.name;
string sValue = obj.sName;
appendNode(getAttributeValues sobj, createSAttrValue(sName, sValue));
}
}
}
void cacheModule(Module mod) {
SModule smod = createSModule();
appendNode(getLinks obj, createListNode(slink))
AttrDef ad; for ad in module obj do {
if (ad.module) {
string sName = ad.name;
string sValue = mod.sName;
appendNode(getAttributeValues smod, createSAttrValue(sName, sValue));
}
}
}
Imagine now the difference in the code if we had used Skips or DXLObjects to store all values. Look how nice and easy the code looks, using our classes. We have complete type safety, so that we will never have a wrong type inside a skip list and get crashes. This is a huge benefit! ConclusionAs we saw, with the above approach we can
- store arbitrary nested data type safe with an easy syntax For all that made it to the end, thanks for reading. Like the post if you read it, so I know there is still DXL programmers interesting in tutorials. As always, have phun, regards, Mathias
P.S: For all of those who need to store tree data and are looking for an optimized Tree implementation look here: Mathias Mamsch - Wed Apr 12 07:36:02 EDT 2017 |