If you create a utility to operate on buffers, there is a hidden danger that I have found a way to circumvent. Consider a useful routine like "Trim". We all know what a really useful "Trim" routine should do: it should "top-and-tail" a buffer to remove leading and trailing whitespace. For example: Buffer x = create x = " a string with too much whitespace " Buffer b = Trim( x )
This is all hunky-dory so far, since the Trim function will be a CONSTRUCTOR of a buffer. We will realise that we later have to delete both 'x' and 'b'. But now a problem arises if we write: x = Trim( x ) Why? Because the original content of x will become leaked memory!
So to deal with this little problem, I have devised a function-result pointer utility, derived from some code-snippets that Mathias has left lying around the forum. So here we go: //------------------------------------------------ void *FunctionResult(void &ptr, int Parameters) //------------------------------------------------ { void *result = *(addr_ ((int addr_ ptr)-4*(1+Parameters))) return result } This little beauty will allow you to write something like this: Buffer ExampleSlicingRtn( int FirstParam, Buffer &SecondParam ) { void StackMark; Buffer &Result = addr_ FunctionResult(StackMark, 2 ) if (int addr_ Result)==(int addr_ SecondParam) then { print "You are overwriting your input with your output!\n" print "So I shall delete it first!\n" Buffer temp = SecondParam[0:FirstParam] delete Result return temp } return SecondParam[0:FirstParam] } You will notice the '2' in the call to FunctionResult. This tells our utility that there are two parameters on the stack above where our function result pointer lies. And the StackMark 'variable', despite being of type 'void', will provide our utility with a starting point to count backwards from. The outcome of all this is that we get a pointer to where the function result will be written by the return statement! Thus we can use this pointer to find out if it points to the same place as our input buffer parameter! If that is the case, then we will know that we are in grave danger of leaking memory if we allow a result to overwrite that result location! Because to do so would already prevent the lost memory from being deleted after this routine returns! So now we are in a position to write a half-decent trimming function: //---------------------- bool iswhite(char ch) //---------------------- { return (ch==' ') || iscntrl(ch) } //----------------------- Buffer Trim(Buffer &s) //----------------------- // NB: Requires named Buffers only - can't be called with slices or inline create. { void StackMark; Buffer &Result = addr_ FunctionResult(StackMark,1) int firstc=0,lastc=length(s)-1 for (lastc=lastc; lastc>=0; lastc--) if !iswhite(s[lastc]) then break for (firstc=firstc; firstc<=lastc; firstc++) if !iswhite(s[firstc]) then break if (int addr_ Result)==(int addr_ s) then { Buffer temp = s[firstc:lastc] delete Result return temp } return s[firstc:lastc] }
Voila!
Incidentally, many thanks to Mathias for his little countAllocatedObjects() utility he provided elsewhere on the forum, which helped me check that this all has the desired outcome.
AlexTidmarsh - Tue Nov 18 16:29:28 EST 2014 |
Re: A USEFUL FUNCTION RESULT POINTER In fact - here is a better version still! This will detect when the result buffer already has data in it whether or not it was the same as the input buffer! This gives an even better memory-leak avoidance! //---------------------- bool unassigned(_ &x) //---------------------- // Unassigned Variable Test: Courtesy of Mathias Mamsch { int *ptrx = addr_ x int unassigned int *ptrUnassigned = &unassigned // This will be the value DXL always uses! return (*ptrUnassigned) == (*ptrx) } //------------------- bool novalue(_ &x) //------------------- // Returns null if unassigned or null - gets around the problem that certain // unassigned types (such as unassigned buffers) simply cannot be tested for null. { return unassigned(x) || null(x) } //----------------------- Buffer Trim(Buffer &s) //----------------------- // NB: Requires named Buffers only - can't be called with slices or inline create. { void StackMark; Buffer &Result = addr_ FunctionResult(StackMark,1) int firstc=0,lastc=length(s)-1 for (lastc=lastc; lastc>=0; lastc--) if !iswhite(s[lastc]) then break for (firstc=firstc; firstc<=lastc; firstc++) if !iswhite(s[firstc]) then break if !novalue(Result) then { Buffer temp = s[firstc:lastc] delete Result return temp } return s[firstc:lastc] }
So now we can safely do stuff like: Buffer x = create x = " Text with stuff to trim " Buffer b = x[0:27] // Creates a new buffer, because slicing perm is a constructor! x = Trim(b) // original x will be deleted, not leaked! b = Trim(b) // original b will be deleted, not leaked! delete x delete b I shall leave it to the reader to further develop this Trim example into other useful buffer utility routines. For example, a "safer" slicer routine such as: Buffer slice( int start, end, Buffer &b ) which doesn't cause the problems that b[x:y] can cause!
|
Re: A USEFUL FUNCTION RESULT POINTER AlexTidmarsh - Tue Nov 18 17:47:51 EST 2014 In fact - here is a better version still! This will detect when the result buffer already has data in it whether or not it was the same as the input buffer! This gives an even better memory-leak avoidance! //---------------------- bool unassigned(_ &x) //---------------------- // Unassigned Variable Test: Courtesy of Mathias Mamsch { int *ptrx = addr_ x int unassigned int *ptrUnassigned = &unassigned // This will be the value DXL always uses! return (*ptrUnassigned) == (*ptrx) } //------------------- bool novalue(_ &x) //------------------- // Returns null if unassigned or null - gets around the problem that certain // unassigned types (such as unassigned buffers) simply cannot be tested for null. { return unassigned(x) || null(x) } //----------------------- Buffer Trim(Buffer &s) //----------------------- // NB: Requires named Buffers only - can't be called with slices or inline create. { void StackMark; Buffer &Result = addr_ FunctionResult(StackMark,1) int firstc=0,lastc=length(s)-1 for (lastc=lastc; lastc>=0; lastc--) if !iswhite(s[lastc]) then break for (firstc=firstc; firstc<=lastc; firstc++) if !iswhite(s[firstc]) then break if !novalue(Result) then { Buffer temp = s[firstc:lastc] delete Result return temp } return s[firstc:lastc] }
So now we can safely do stuff like: Buffer x = create x = " Text with stuff to trim " Buffer b = x[0:27] // Creates a new buffer, because slicing perm is a constructor! x = Trim(b) // original x will be deleted, not leaked! b = Trim(b) // original b will be deleted, not leaked! delete x delete b I shall leave it to the reader to further develop this Trim example into other useful buffer utility routines. For example, a "safer" slicer routine such as: Buffer slice( int start, end, Buffer &b ) which doesn't cause the problems that b[x:y] can cause!
Ok, I needed to read that three times and actually try it, until I finally realized the geniousity / madness of what you are doing ;-) Will need to experiment with that myself, before I comment on it. Just one remark: bool novalue(_ &x) { return null(x) // returns true for assigned values ??? // return null(x) || false // returns false, should return the same as above??? } string s = "a" print "NoValue Perm:" (novalue s) "\n" // WTF?
Edit: I just noticed that the call null (refVar) will actually return true if you have an unassigned reference, not an unassigned value. To check for an unassigned value, you would need to dereference, and then you would probably need to decide which null perm to use. |
Re: A USEFUL FUNCTION RESULT POINTER I really like this idea. One remark about FunctionResult: void *result = *(addr_ ((int addr_ ptr)- 4 *(1+Parameters))) This "4" is probably only valid in 32 bit versions of DOORS. When you port your script to a 64 bit version, you might have to double the value. But, when moving to 64 bit, you should anyway test all scripts very thoroughly which mess around with the internal data structures of DOORS. |
Re: A USEFUL FUNCTION RESULT POINTER Mike.Scharnow - Wed Nov 19 07:17:22 EST 2014 I really like this idea. One remark about FunctionResult: void *result = *(addr_ ((int addr_ ptr)- 4 *(1+Parameters))) This "4" is probably only valid in 32 bit versions of DOORS. When you port your script to a 64 bit version, you might have to double the value. But, when moving to 64 bit, you should anyway test all scripts very thoroughly which mess around with the internal data structures of DOORS. Yes - I was well aware of that. That is always a problem when you go to the native implementation details, which is what this was doing. Did I mention somewhere that I already assumed this would be specific to Windows? I haven't a clue if it works the same way on Linux, but I'd guess that it probably does. The leap to 64-bit is probably going to break a lot more things than this though. But I'd be very surprised if the 64-bit DXL (if that ever happens) is going to be anything like the same language as the 32-bit. Hopefully by then a lot of this won't be needed because moving to 64-bit would only be worth doing if at the same time they "repair" the DXL language from the roots up. Remember that I am only using this trick to prevent "accidentally" leaked memory. It would have been nicer if the interpreter raised an error in the described circumstances, or dealt with the outcome and prevented leaks - since the line: x = x[0:5] // where x is a buffer just seems like such a logical thing to do. Most people will be totally unaware just how much memory they are leaking! |
Re: A USEFUL FUNCTION RESULT POINTER Mathias Mamsch - Wed Nov 19 06:45:17 EST 2014
Ok, I needed to read that three times and actually try it, until I finally realized the geniousity / madness of what you are doing ;-) Will need to experiment with that myself, before I comment on it. Just one remark: bool novalue(_ &x) { return null(x) // returns true for assigned values ??? // return null(x) || false // returns false, should return the same as above??? } string s = "a" print "NoValue Perm:" (novalue s) "\n" // WTF?
Edit: I just noticed that the call null (refVar) will actually return true if you have an unassigned reference, not an unassigned value. To check for an unassigned value, you would need to dereference, and then you would probably need to decide which null perm to use. Yeh - you are right, Mathias, I probably went a little to far by incorporating that unassigned / no-value feature that we both experimented with a while back. But the novalue checker was simply wrapping the test I used in a specific situation into a general purpose routine. I must admit I hadn't fully tested its suitability for such general purpose use, when I did that. I take your point that it probably means the "wrong null test" is being used! So really, it will have to be re-done as a lot of type-specific overloads. P.S. I appreciate the fact that you actually took the time to read my posts three times, Mathias! Much appreciated, as I am given to somewhat madcap ideas at times. :-) ... The rest of this post was deleted, because it was superceded .... |
Re: A USEFUL FUNCTION RESULT POINTER Incidentally, folks. I did get some weird behaviour when I tried the FunctionResult routine's trick on an overload of "new". I am suspecting that when the DXL interpreter encounters "new", it applies some special semantics to it - a little bit like the general constructor implementation in Delphi does. The only "built-in" implementation (perm) for new appears to be the one to create a DxlObject. Interestingly, it uniquely has an avoidance strategy that appears to prevent memory loss if you reapply it to an existing DxlObject. The create constructors for all the other complex types do not have this. Anyways, the result of this is that if you try and overload new, then the FunctionResult routine's trick does not appear to work properly. I haven't discovered exactly why yet. An overloaded new routine will work in itself, but it appears that the DXL interpreter is doing things in a strange way - perhaps starting to apply the semantics of a DxlObject new and then patching to compensate for the fact that it has recognised that it wasn't the DxlObject new after all. Certainly, an overload of new does not appear to do the same trick of checking whether it has to first delete the target, where as this does appear to happen in the original DxlObject new. I have a suspicion that the DXL designers changed paradigms somewhere between "create" as a universally applied constructor concept and the concept behind the DxlObject new constructor. This of course also happened with Borland, when they went from the Object paradigm to the Class paradigm. The result being that the pre-call stack is built somewhat differently for class constructors compared to other routines.
|
Re: A USEFUL FUNCTION RESULT POINTER Mathias Mamsch - Wed Nov 19 06:45:17 EST 2014
Ok, I needed to read that three times and actually try it, until I finally realized the geniousity / madness of what you are doing ;-) Will need to experiment with that myself, before I comment on it. Just one remark: bool novalue(_ &x) { return null(x) // returns true for assigned values ??? // return null(x) || false // returns false, should return the same as above??? } string s = "a" print "NoValue Perm:" (novalue s) "\n" // WTF?
Edit: I just noticed that the call null (refVar) will actually return true if you have an unassigned reference, not an unassigned value. To check for an unassigned value, you would need to dereference, and then you would probably need to decide which null perm to use. Your comment on "novalue" explains why my set of conditional destructors didn't work! That's why you are a genius. I had created a set of routines that are all called "free" that would be used like the universal "FreeAndNil" in Delphi. Some were not working because they called "novalue". I can now fix this by creating type-specific overloads for novalue - or by doing the work explicitly in each "free" overload. So thanks for spotting that one, Mathias. |
Re: A USEFUL FUNCTION RESULT POINTER Mathias Mamsch - Wed Nov 19 06:45:17 EST 2014
Ok, I needed to read that three times and actually try it, until I finally realized the geniousity / madness of what you are doing ;-) Will need to experiment with that myself, before I comment on it. Just one remark: bool novalue(_ &x) { return null(x) // returns true for assigned values ??? // return null(x) || false // returns false, should return the same as above??? } string s = "a" print "NoValue Perm:" (novalue s) "\n" // WTF?
Edit: I just noticed that the call null (refVar) will actually return true if you have an unassigned reference, not an unassigned value. To check for an unassigned value, you would need to dereference, and then you would probably need to decide which null perm to use. I deleted this post because it is superceded.... |
Re: A USEFUL FUNCTION RESULT POINTER AlexTidmarsh - Thu Nov 20 13:37:00 EST 2014 I deleted this post because it is superceded.... Ok - sanity check here! :-) Just realised how many types there actually are in DXL! So I did a little bit more work on finding out what the EXCEPTIONS are to the general solution. It turns out that this is more realistic than creating a novalue test for every type! There are FAR less exceptions than there are types! So here's the latest novalue test, upon which I did a C-UNIT style test harness for the following types: int, real, bool, char, string, Skip, Buffer, DxlObject, Array, struct types, AttrDef, AttrType, Baseline, BaselineSet, BaselineSetDefinition, Column, Group, IntegrityResultsData, IPC, Item, Folder, Project, Link, LinkRef, Linkset, LockList, Lock, OleAutoArgs, DBE, PartitionDefinition, Regexp, Stat, Trigger, User, ViewDef, Object, Discussion, Stream, Module, Template and finally ModName_. The great thing about C-Unit style testing (look up "Extreme Programming") is that once you have created a good harness, you pretty much copy and paste it for all your scenarios - so it was fairly quick to do. Anyways... //--------------- // NO-VALUE TESTS //--------------- // Note: For the three simplest types. null is zero or false, which ARE values, thus novalue is just a renamed "unassigned" call! // For strings and skips, special work is required to detect nulls, so special novalue overloads are present. // For char, I have opted to treat '\0' as novalue: this is useful in string termination tests. // In all other cases, the generic novalue routines do the job! // All pointers work generically - but note that 'unassigned' refers to the POINTER, not the value pointed to. // Furthermore, since DXL does NOT ALLOW assignment of 'null' to a pointer, there's no point in looking for it! // To test a value POINTED to, pass the dereferred value: i.e. novalue(*x). // I have tested all routines with virtually all available types. // Novalue cannot be called directly on function results (disallowed by DXL) // const int UNASSIGNED // Generic solution for variables of all types bar the special cases later below. bool novalue(_&x) {int*px=addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px||addr_ x==0} // VARIABLES bool novalue(_*&x) {int*px=addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px} // POINTERS // Special cases bool novalue(int &x) {int*px =addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px} bool novalue(real &x) {int*px =addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px} bool novalue(bool &x) {int*px =addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px} bool novalue(char &x) {int*px =addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px||x==null} bool novalue(string &x){int*px =addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px||null x} bool novalue(Skip &x) {int*px =addr_ x;int*pCompare=&UNASSIGNED;bool IsNull(Skip x){return null x};return*pCompare==*px||IsNull x} //----------------- // UNASSIGNED TESTS //----------------- // Generic solutions work here, because this does not include a check for null! // That said, for many types novalue and unassigned are exactly the same thing! // For types that require deletion to avoid memory leak, novalue is actually BETTER, because 'null' NEVER requires deletion! // Indeed, for certain types, attempting to delete a value set to 'null' will crash DOORS! // Unassigned cannot be called directly on function results (disallowed by DXL) // bool unassigned(_&x) {int*px=addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px} bool unassigned(_*&x) {int*px=addr_ x;int*pCompare=&UNASSIGNED;return*pCompare==*px} Note that the case for Skip really did require some careful handling (an embedded helper routine, because testing null on a reference Skip doesn't work, and simply testing for zero doesn't either). As Mathias predicted, string also needed a special-to-type routine. Of the other simple types, I opted to recognize the special status of the null char, but the other simple types are not actually checked for 'null' because this is a meaningful zero in these cases. So for those, I'm only testing for the unallocated value. I simply haven't found any other types in DXL that cannot be handled by the generic routine (which must be defined first, once, and never again). ModName_ was a (pleasant) surprise: I had always thought of it as some kind of special string - but it turns out it handles generically! So I guess that kinda answers the unasked question of whether ModName_ would in itself pollute the string table. I suspect if you load ModName_ from a buffer via tempStringOf, you would get zero pollution! The Item type is also peculiar. Trying to assign a null to it will raise a DXL error. Suppress this with noError, and you will STILL get unassigned!
As I have mentioned (tirelessly) before, this kind of utility is best presented as a "perm" by pre-loading the DXL via the -C switch when starting DOORS. That way, no-one will get screwed by DXL behavioural bugs that occur when stuff like this is accidentally included more than once in random sequence. To help prevent this, the UNASSIGNED constant is quite deliberately a global constant - which should ensure you are reminded about this!
P.S. As Jeremy Clarkson (UK Top Gear presenter) would say: "For those of you who like to know this sort of thing..." the speed of the novalue test is about 5usec on my AMD FX 6-core 3.5GHz machine. This falls to about 2usec for the simple types. An empty for loop is just 0.1uSec. So it pretty much falls within the realm of "not worth optimising". (Figures obtained from 1 million iterations)
I've finished the "extreme programming" tests - for those who may be interested, the CodeSite log output is enclosed as a PDF Attachments Pointer Utils.pdf |
Re: A USEFUL FUNCTION RESULT POINTER I only browsed this thread. Instead of this:
What about this:
-Louie |
Re: A USEFUL FUNCTION RESULT POINTER llandale - Sun Nov 23 16:23:33 EST 2014 I only browsed this thread. Instead of this:
What about this:
-Louie I kinda see what you are getting at in your pseudo code (I think). You appear to be suggesting using a combination of setempty and combine to work around accidentally destroying a pre-existing buffer (and so leaking its memory), if (and I'm guessing here) that buffer was passed as a reference? Or did you mean to be using my special function result pointer and merely omitted that in your pseudo-code by accident? In which case I'm guessing that your point is actually that setempty and combine are probably faster than deleting the function's result target variable. I must admit, I hadn't considered that. So thanks either way, Louie!
|
Re: A USEFUL FUNCTION RESULT POINTER As I said, I need to think about this ... I did now. Let me try to state precise and clearly, what this is about and what the advantages and risks are. The intriguing/new/mad thing here is, that a function can know about a variable it is assigned to, without that variable being passed to the function. Lets take an example: int function () { void StackMark; int &var = addr_ FunctionResult(StackMark, 0 ) print "The previous value " var " but now I am assigning 10 to it!\n" return 10 } int a = 15; a = function () // -> The previous value was: 15 but now I am assigning 10 to it!
// ... int a = 15; a = 1 + function () // crash, boom, bang .. Exception // ... // or just: function () // unassigned variable result ...
void PrintStack (int &varStart, int &varEnd) { int *ptr = &varStart; print "\n" while ( ((addr_ ptr) int) <= ((addr_ varEnd) int) ) { print ptr " / " (*ptr) "\n" ptr += 4 } } int a = 15; int function () { int local = 99; PrintStack (a, local); return 15; } a = function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> &a 178774796 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a call function() // will put the return value on the stack! call ::= // will pop the value and the reference from the stack and assign */ function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 99 -> local Operations Executed (Pseudocode): --------------------------------- call function() // will put the return value on the stack! pop // cleanup the return value from the stack ... */ a = 1+function () /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> result of expression = &a 178774796 / 1 -> expression 178774800 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a push 1 call function() // will put the return value on the stack! call ::+ // will pop two values from the stack, add and push the result call ::= // will assign the result value to &a */ So DXL works like most other languages:
Now there comes the problem:
So there is no 'parser logic' inside the assignment. And that means, that the stack is entirely controlled by the caller, and there is no way for a function to know how the stack outside the function looks. So my conclusion: While is is a nice trick for a function to get access to the 'assignee', it is not stable to do so. For me it is much harder to understand the concept of the function result reference, as it is to understand that I need to deallocate the result of certain functions on my library. In DXL for me it is kind of a known catch, that for a function that has a specification like this: Buffer trim (Buffer) { ... } I need to check if it will actually return the same buffer or a new one. There is examples in the native DOORS code for both. And I think most developers are aware of this, so the problem of creating memory leaks a known issue. The best way I found so far for finding and eliminating memory leaks, is to use the allocated objects code, in conjunction with a technique of "hooking" ... To check for Buffer "substring" leaks, I would do something like this: Buffer indexOld(Buffer x, int iFrom, int iTo) { Buffer bufResult = x[iFrom:iTo]; return bufResult; } Buffer ::[](Buffer b, Range_ x) { // call indexOld and pass parameters ... indexOld (b, x) // beware of stack overwrite bug! // This function stores the DXL location of the assignment using dxlHere() // and the memory address of the buffer ... addAllocatedObject ((addr_ bufResult) int, "Buffer"); return bufResult; } void deleteBufferOld (Buffer &buf) { delete buf } void delete (Buffer& buf) { Buffer x = buf; removeAllocatedObject ((addr_ x) int, "Buffer"); deleteBufferOld (buf) } Do this function will now track any substring buffer allocations, and deallocations. After my program finishes (and I assume all variables) should be cleaned up again, I can inspect, which variables are left allocated and where they have been allocated. This way I can immediately find any memory leak in my code. I think this approach is more successful, than trying to avoid memory leaks beforehand:
Hope this helps someone, regards, Mathias |
Re: A USEFUL FUNCTION RESULT POINTER Mathias Mamsch - Mon Nov 24 08:39:42 EST 2014 As I said, I need to think about this ... I did now. Let me try to state precise and clearly, what this is about and what the advantages and risks are. The intriguing/new/mad thing here is, that a function can know about a variable it is assigned to, without that variable being passed to the function. Lets take an example: int function () { void StackMark; int &var = addr_ FunctionResult(StackMark, 0 ) print "The previous value " var " but now I am assigning 10 to it!\n" return 10 } int a = 15; a = function () // -> The previous value was: 15 but now I am assigning 10 to it!
// ... int a = 15; a = 1 + function () // crash, boom, bang .. Exception // ... // or just: function () // unassigned variable result ...
void PrintStack (int &varStart, int &varEnd) { int *ptr = &varStart; print "\n" while ( ((addr_ ptr) int) <= ((addr_ varEnd) int) ) { print ptr " / " (*ptr) "\n" ptr += 4 } } int a = 15; int function () { int local = 99; PrintStack (a, local); return 15; } a = function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> &a 178774796 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a call function() // will put the return value on the stack! call ::= // will pop the value and the reference from the stack and assign */ function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 99 -> local Operations Executed (Pseudocode): --------------------------------- call function() // will put the return value on the stack! pop // cleanup the return value from the stack ... */ a = 1+function () /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> result of expression = &a 178774796 / 1 -> expression 178774800 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a push 1 call function() // will put the return value on the stack! call ::+ // will pop two values from the stack, add and push the result call ::= // will assign the result value to &a */ So DXL works like most other languages:
Now there comes the problem:
So there is no 'parser logic' inside the assignment. And that means, that the stack is entirely controlled by the caller, and there is no way for a function to know how the stack outside the function looks. So my conclusion: While is is a nice trick for a function to get access to the 'assignee', it is not stable to do so. For me it is much harder to understand the concept of the function result reference, as it is to understand that I need to deallocate the result of certain functions on my library. In DXL for me it is kind of a known catch, that for a function that has a specification like this: Buffer trim (Buffer) { ... } I need to check if it will actually return the same buffer or a new one. There is examples in the native DOORS code for both. And I think most developers are aware of this, so the problem of creating memory leaks a known issue. The best way I found so far for finding and eliminating memory leaks, is to use the allocated objects code, in conjunction with a technique of "hooking" ... To check for Buffer "substring" leaks, I would do something like this: Buffer indexOld(Buffer x, int iFrom, int iTo) { Buffer bufResult = x[iFrom:iTo]; return bufResult; } Buffer ::[](Buffer b, Range_ x) { // call indexOld and pass parameters ... indexOld (b, x) // beware of stack overwrite bug! // This function stores the DXL location of the assignment using dxlHere() // and the memory address of the buffer ... addAllocatedObject ((addr_ bufResult) int, "Buffer"); return bufResult; } void deleteBufferOld (Buffer &buf) { delete buf } void delete (Buffer& buf) { Buffer x = buf; removeAllocatedObject ((addr_ x) int, "Buffer"); deleteBufferOld (buf) } Do this function will now track any substring buffer allocations, and deallocations. After my program finishes (and I assume all variables) should be cleaned up again, I can inspect, which variables are left allocated and where they have been allocated. This way I can immediately find any memory leak in my code. I think this approach is more successful, than trying to avoid memory leaks beforehand:
Hope this helps someone, regards, Mathias Wow! Excellent study! I shall have a long think about this.
Regards, Alex |
Re: A USEFUL FUNCTION RESULT POINTER Mathias Mamsch - Mon Nov 24 08:39:42 EST 2014 As I said, I need to think about this ... I did now. Let me try to state precise and clearly, what this is about and what the advantages and risks are. The intriguing/new/mad thing here is, that a function can know about a variable it is assigned to, without that variable being passed to the function. Lets take an example: int function () { void StackMark; int &var = addr_ FunctionResult(StackMark, 0 ) print "The previous value " var " but now I am assigning 10 to it!\n" return 10 } int a = 15; a = function () // -> The previous value was: 15 but now I am assigning 10 to it!
// ... int a = 15; a = 1 + function () // crash, boom, bang .. Exception // ... // or just: function () // unassigned variable result ...
void PrintStack (int &varStart, int &varEnd) { int *ptr = &varStart; print "\n" while ( ((addr_ ptr) int) <= ((addr_ varEnd) int) ) { print ptr " / " (*ptr) "\n" ptr += 4 } } int a = 15; int function () { int local = 99; PrintStack (a, local); return 15; } a = function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> &a 178774796 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a call function() // will put the return value on the stack! call ::= // will pop the value and the reference from the stack and assign */ function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 99 -> local Operations Executed (Pseudocode): --------------------------------- call function() // will put the return value on the stack! pop // cleanup the return value from the stack ... */ a = 1+function () /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> result of expression = &a 178774796 / 1 -> expression 178774800 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a push 1 call function() // will put the return value on the stack! call ::+ // will pop two values from the stack, add and push the result call ::= // will assign the result value to &a */ So DXL works like most other languages:
Now there comes the problem:
So there is no 'parser logic' inside the assignment. And that means, that the stack is entirely controlled by the caller, and there is no way for a function to know how the stack outside the function looks. So my conclusion: While is is a nice trick for a function to get access to the 'assignee', it is not stable to do so. For me it is much harder to understand the concept of the function result reference, as it is to understand that I need to deallocate the result of certain functions on my library. In DXL for me it is kind of a known catch, that for a function that has a specification like this: Buffer trim (Buffer) { ... } I need to check if it will actually return the same buffer or a new one. There is examples in the native DOORS code for both. And I think most developers are aware of this, so the problem of creating memory leaks a known issue. The best way I found so far for finding and eliminating memory leaks, is to use the allocated objects code, in conjunction with a technique of "hooking" ... To check for Buffer "substring" leaks, I would do something like this: Buffer indexOld(Buffer x, int iFrom, int iTo) { Buffer bufResult = x[iFrom:iTo]; return bufResult; } Buffer ::[](Buffer b, Range_ x) { // call indexOld and pass parameters ... indexOld (b, x) // beware of stack overwrite bug! // This function stores the DXL location of the assignment using dxlHere() // and the memory address of the buffer ... addAllocatedObject ((addr_ bufResult) int, "Buffer"); return bufResult; } void deleteBufferOld (Buffer &buf) { delete buf } void delete (Buffer& buf) { Buffer x = buf; removeAllocatedObject ((addr_ x) int, "Buffer"); deleteBufferOld (buf) } Do this function will now track any substring buffer allocations, and deallocations. After my program finishes (and I assume all variables) should be cleaned up again, I can inspect, which variables are left allocated and where they have been allocated. This way I can immediately find any memory leak in my code. I think this approach is more successful, than trying to avoid memory leaks beforehand:
Hope this helps someone, regards, Mathias Having had more time (and some new toolkit routines) to study this stuff regarding stacks, I now understand somewhat more why the trick worked for specific cases, but simply cannot be used "universally".
I should warn any reader that the following contains a mix of "seen" behaviour and some lashings of conjecture about what this all means. In the end, only the IBM technicians who actually maintain the DXL interpreter could state whether it is accurate. That, however, is probably a closely held trade secret - to avoid third parties creating free replacements, etc. (i.e. competition). So.... I reckon that the "stack" in DXL is not really a stack in the "traditional" microprocessor sense. For instance: it does not hold subroutine return addresses. It only holds the data used by what is in effect a (presumably reverse-Polish) stack-based calculator. Its a bit like that used in that old language "Forth". The routines expected to do the work on that data probably have their return addresses placed on a separate stack (invisible and inaccessible to the DXL programmer). You will NEVER see these. You will NEVER be able to access or manipulate them. Indeed it is highly probable (although not essentially so) that this is the "real" machine stack. Indeed, any information about "memory addresses" as seen in DXL data is probably a white lie: These are merely offsets into frames created by the interpreter to hold things like string tables, stacks, etc. The addresses have no need to (and thus most probably don't) relate to actual physical machine memory addresses. The address is (after all) something the DXL interpreter "lets you know" via one of its implementing kernel routines. You could not (for example) pass that to an external C DLL and expect that DLL to correlate it to the physical machine. It does not have to have any meaning in that outside world, and therefore most likely does not. This explains, somewhat, why DXL does not implement the direct calling of functions via pointers - even though it still implements the C-based parsing to allow the creation of function pointer variables. But you can't use them! I note with interest in this regard the work done by Mathias and others in creating function aliases, which have to rely on subroutine parameter definitions to create "callable" data! This parameter-only mechanism was obviously created specifically to allow call-backs for dialogs, but made highly restrictive non-the-less. Again, this must mainly be because the virtual-machine implemented by the DXL interpreter lacks the kind of memory map that would allow direct application of the original C-style function pointers (amongst other things). There is not "one memory" in that sense, and allowing "full C behaviour" simply is not feasible in that kind of scenario. Indeed, it seems highly likely to me that the memory frames created for menu and view code, and indeed any code not in the "top context", are basically discarded in full when that particular context ends. Thus a lot of the worry about "lost memory" should be held in this light: it only becomes a serious issue if that memory is "taken" from the top-level frame - which cannot be discarded until DOORS ends. The "eval_" operation is a prime example of such a discardable context. I strongly suspect that menu code and view code have such contexts created for them when they "run" which is discarded the very moment that they "halt". I would go even further and suggest that attribute DXL cannot see data from other attribute DXL simply because their code does not execute in the same "time and space". When they finish, their frames are simply dumped. It would even explain the rather lazy and "unrecoverable" way in which memory is consumed (by strings, floating point variables, etc.) - the intent is always to "ditch the lot" as soon as the context ends! How else would you explain the total lack of effort to make identical string literals share the same memory location? It is because, in the end, it really doesn't matter - the whole lot will get dumped. One thing is however clear: you are NEVER executing native code until you get the DXL interpreter to do that "by proxy" on your behalf. The DXL does not compile down at any point to machine code. It is merely a large data set of frames containing lists that is manipulated by core routines in the DXL interpreter. Where I'm going with this is that the "virtual machine" that the DXL code runs on literally has nothing whatsoever to do with the underlying processor architecture. This should not surprise us: The DXL is not "complied" as such - it is merely turned into intermediate code specific to its own-design VM. This allows it to run on any target system that supports that VM in an essentially identical manner. That is VERY desirable. And of course, this means that I (and others) can stop worrying about the following, when it comes to the essential "internal" DXL code: Will the code run differently under Linux or Windows? Nope! (Except where it requests external stuff from the OS) Will it run differently on 64-bit and 32-bit machines? Nope! Is it likely to change between 32-bit and 64-bit DOORS? Probably not. Will data always be stored and manipulated in 4-byte units? Highly likely. Will DXL change fundamentally before end-of-life? Very Unlikely.
P.S. The likes of Mathias and Louie will probably read all this and think "duh! what's new?". Surely all this was obvious? However, I do wonder whether a lot of the concerns about memory leaks are sometimes a bit "overboard". The behaviour I saw regarding willy-nilly duplication of identical string literals that the DXL interpreter does would only make any sense if the intention is to dump the stuff at the first opportunity. This would mean that it only becomes relevant to code that really does have to process the entire content of whole modules without ever exiting its execution context. This is not the case for the vast majority of attribute DXL. It would surely only be relevant for menu-driven tools that do massive amounts of pan-module work?
|
Re: A USEFUL FUNCTION RESULT POINTER Mathias Mamsch - Mon Nov 24 08:39:42 EST 2014 As I said, I need to think about this ... I did now. Let me try to state precise and clearly, what this is about and what the advantages and risks are. The intriguing/new/mad thing here is, that a function can know about a variable it is assigned to, without that variable being passed to the function. Lets take an example: int function () { void StackMark; int &var = addr_ FunctionResult(StackMark, 0 ) print "The previous value " var " but now I am assigning 10 to it!\n" return 10 } int a = 15; a = function () // -> The previous value was: 15 but now I am assigning 10 to it!
// ... int a = 15; a = 1 + function () // crash, boom, bang .. Exception // ... // or just: function () // unassigned variable result ...
void PrintStack (int &varStart, int &varEnd) { int *ptr = &varStart; print "\n" while ( ((addr_ ptr) int) <= ((addr_ varEnd) int) ) { print ptr " / " (*ptr) "\n" ptr += 4 } } int a = 15; int function () { int local = 99; PrintStack (a, local); return 15; } a = function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> &a 178774796 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a call function() // will put the return value on the stack! call ::= // will pop the value and the reference from the stack and assign */ function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 99 -> local Operations Executed (Pseudocode): --------------------------------- call function() // will put the return value on the stack! pop // cleanup the return value from the stack ... */ a = 1+function () /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> result of expression = &a 178774796 / 1 -> expression 178774800 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a push 1 call function() // will put the return value on the stack! call ::+ // will pop two values from the stack, add and push the result call ::= // will assign the result value to &a */ So DXL works like most other languages:
Now there comes the problem:
So there is no 'parser logic' inside the assignment. And that means, that the stack is entirely controlled by the caller, and there is no way for a function to know how the stack outside the function looks. So my conclusion: While is is a nice trick for a function to get access to the 'assignee', it is not stable to do so. For me it is much harder to understand the concept of the function result reference, as it is to understand that I need to deallocate the result of certain functions on my library. In DXL for me it is kind of a known catch, that for a function that has a specification like this: Buffer trim (Buffer) { ... } I need to check if it will actually return the same buffer or a new one. There is examples in the native DOORS code for both. And I think most developers are aware of this, so the problem of creating memory leaks a known issue. The best way I found so far for finding and eliminating memory leaks, is to use the allocated objects code, in conjunction with a technique of "hooking" ... To check for Buffer "substring" leaks, I would do something like this: Buffer indexOld(Buffer x, int iFrom, int iTo) { Buffer bufResult = x[iFrom:iTo]; return bufResult; } Buffer ::[](Buffer b, Range_ x) { // call indexOld and pass parameters ... indexOld (b, x) // beware of stack overwrite bug! // This function stores the DXL location of the assignment using dxlHere() // and the memory address of the buffer ... addAllocatedObject ((addr_ bufResult) int, "Buffer"); return bufResult; } void deleteBufferOld (Buffer &buf) { delete buf } void delete (Buffer& buf) { Buffer x = buf; removeAllocatedObject ((addr_ x) int, "Buffer"); deleteBufferOld (buf) } Do this function will now track any substring buffer allocations, and deallocations. After my program finishes (and I assume all variables) should be cleaned up again, I can inspect, which variables are left allocated and where they have been allocated. This way I can immediately find any memory leak in my code. I think this approach is more successful, than trying to avoid memory leaks beforehand:
Hope this helps someone, regards, Mathias Incidentally, Mathias? I have now bottomed out the control of the behaviour of the novalue routine.
What was needed to make it work "universally" was to essentially stop the DXL interpreter from trying to apply heuristics when it encountered the use of "addr_". The "undocumented" but "widely understood" behaviour of this perm is (it appears) a bit of a lie. What I did was constrain its use inside a strongly-typed pointer mechanism (which I based on a struct called DWORD, and thus had a strongly-typed DWORD * pointer as a result). This allowed me to implement constrained math overloads that work exclusively with the type DWORD* and thus not get hindered by the heuristics that DXL tries to apply between integer pointers and integers, courtesy of those addr_, * and & perms. This gave a level playing field. The primary outcome of this is as small set of routines that truly allow access to data and addresses without being "confused" by interpreter interventions. As evidence, I provide the following subset: // THE STRONG POINTER TYPE //------------------------ struct DWORD {} DWORD*AddressOf(_&v){return addr_ v} // Address of values DWORD*AddressOf(_*&v){return addr_ v} // Address of pointers // STRONGLY TYPED POINTER ARITHMETIC //---------------------------------- DWORD*::+(DWORD*p,int ofs){return addr_(int addr_(*p)+4*ofs)} // Add offset (named and anonymous) DWORD*::-(DWORD*p,int ofs){return addr_(int addr_(*p)-4*ofs)} // Sub offset (named and anonymous) void ::++ (DWORD*&p) {p=addr_(int addr_(*p)+4)} // Post-increment (named only) DWORD*::++ (DWORD*&p) {p=addr_(int addr_(*p)+4);return addr_(*p)} // Post-increment (named only) void ::-- (DWORD*&p) {p=addr_(int addr_(*p)-4)} // Post-decrement (named only) DWORD*::-- (DWORD*&p) {p=addr_(int addr_(*p)-4);return addr_(*p)} // Post-decrement (named only) void ::_++ (DWORD*&p) {p=addr_(int addr_(*p)+4)} // Pre-increment (named only) DWORD*::_++(DWORD*&p) {p=addr_(int addr_(*p)+4);return addr_(*p)} // Pre-increment (named only) void ::_-- (DWORD*&p) {p=addr_(int addr_(*p)-4)} // Pre-decrement (named only) DWORD*::_--(DWORD*&p) {p=addr_(int addr_(*p)-4);return addr_(*p)} // Pre-decrement (named only) void ::+= (DWORD*&p,int ofs){p=addr_(int addr_(*p)+4*ofs)} // Direct add void ::-= (DWORD*&p,int ofs){p=addr_(int addr_(*p)-4*ofs)} // Direct sub // UNCHECKED CONVERSION //--------------------- int dwordIn(DWORD *x){int*px=addr_ x;return (*px)+0} DWORD dwordIn(DWORD *x){DWORD*px=addr_ x;DWORD v=(*x);return (v)} int dword(_ &x){int*px=addr_ x;return (*px)+0} // ADRRESS IN ... //--------------- _*AddressIn(_*&p){return addr_ (**&p)} _**AddressIn(_**&p){return addr_ (**&p)} _***AddressIn(_***&p){return addr_ (**&p)} // WORKING NOVALUE ROUTINE //------------------------ // Note - "null" is meaningless for "int" and "bool", but still detected. // Null is never detected for string or real, due to indirection. // Nonetheless, this works for ALL OTHER TYPES (about 40 tested, including // the obvious ones like, Buffer, Skip, Array, char, Date, Module, Object) const int UNASSIGNED const int *pUNASSIGNED=&UNASSIGNED bool novalue(_&x){DWORD*px=AddressOf x;return*pUNASSIGNED==dwordIn(px)||0==dwordIn(px)} // Check int x print novalue(x) You will notice that the use of "AddressOf" and "dwordIn" in the "novalue" routine aught to be easily replaceable by the code in those routines. HOWEVER: the moment you do this, you have to resort to using at least two addr_ executions. Whichever way you do this, you start a running battle with the Interpreter, because it fails to consistently adjust the behaviour of addr_ according to the universal type and so gets it wrong at least some, if not all, of the time. It most certainly never behaves coherently for all types. By hiding the use of addr_ behind my strongly typed "firewall" routines "AddressOf" and "dwordIn" I essentially stop this annoying behaviour by the DXL interpreter. The result is, it now behaves as it says on the tin. I did about 200 repeatable C-UNIT style tests covering the 40-odd types in different scenarios to verify this. I could only get it to work consistently once I introduced the "firewall" routines! It really is the case that "null" for all but two of those types is represented by storing ZERO in the variable. The only two that are actually different are "real" and "string". In the case of both of these, the ADDRESS of a memory location is always stored rather than the value - even when that value is null. However, ALL variables, including real and string, store the hex value 87654321h directly into the variable when that variable is unassigned. All variables, except those of real and string, store null directly in the variable, even when the normal practice for some (most!) types is to store an implicit pointer to any non-null value. The problem with addr_ is that it tries to hide the indirection for the majority of variable types. So the only way you can really get actual pointer processing is to hide the use of addr_ behind routines like "AddressOf" and "dwordIn". i.e. behind the firewall of a strong type.
P.S. I also have a rather nice memory dumper that uses the above firewall type. This allows you to dump memory a little bit like most debuggers do: as lines of hex dwords, wrapped to a specific address increments. Putting marker data into variables (like DEADBEEFh) lets you easily see what is really going on.
P.P.S. I can almost hear you asking: "What is this lie about addr_?". In a nutshell, the values of most types are stored as implicit DWORD pointers to variables. However, some are not (int char, bool). This is why EVERYTHING on the stack is always "entirely" stored in DWORD units. The types that can't be stored in a DWORD (which includes the QWORD real type) are stored using implicit pointers. When you use addr_, it tries to take this indirection into account. However, when you have a routine using a universal type, addr_ can no-longer work properly. There is no type information stored for it to do this. The "lie" is that addr_ simply applies untyped conversion - but this does not always mean what you think it might. Also, this adverse behaviour is compounded by anything you try do in a return statement that includes a deference - for instance: return addr_ (*x). This will nearly always go wrong! I only "get away with it" in my "AddressIn" routines because I also have an "&" symbol in the mix and addr_ is truly aware of what type it started out with - in this case: what is to all intents and purposes the equivalent of a void* by the time addr_ gets its grubby little hands on it. So it ends up "doing the right thing".
|
Re: A USEFUL FUNCTION RESULT POINTER Mathias Mamsch - Mon Nov 24 08:39:42 EST 2014 As I said, I need to think about this ... I did now. Let me try to state precise and clearly, what this is about and what the advantages and risks are. The intriguing/new/mad thing here is, that a function can know about a variable it is assigned to, without that variable being passed to the function. Lets take an example: int function () { void StackMark; int &var = addr_ FunctionResult(StackMark, 0 ) print "The previous value " var " but now I am assigning 10 to it!\n" return 10 } int a = 15; a = function () // -> The previous value was: 15 but now I am assigning 10 to it!
// ... int a = 15; a = 1 + function () // crash, boom, bang .. Exception // ... // or just: function () // unassigned variable result ...
void PrintStack (int &varStart, int &varEnd) { int *ptr = &varStart; print "\n" while ( ((addr_ ptr) int) <= ((addr_ varEnd) int) ) { print ptr " / " (*ptr) "\n" ptr += 4 } } int a = 15; int function () { int local = 99; PrintStack (a, local); return 15; } a = function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> &a 178774796 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a call function() // will put the return value on the stack! call ::= // will pop the value and the reference from the stack and assign */ function() /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 99 -> local Operations Executed (Pseudocode): --------------------------------- call function() // will put the return value on the stack! pop // cleanup the return value from the stack ... */ a = 1+function () /* Stack looks like this: ------------------------------ 178774784 / 15 -> a 178774788 / -2023406815 -> (not assigned) 178774792 / 178774784 -> result of expression = &a 178774796 / 1 -> expression 178774800 / 99 -> local Operations Executed (Pseudocode): --------------------------------- push &a push 1 call function() // will put the return value on the stack! call ::+ // will pop two values from the stack, add and push the result call ::= // will assign the result value to &a */ So DXL works like most other languages:
Now there comes the problem:
So there is no 'parser logic' inside the assignment. And that means, that the stack is entirely controlled by the caller, and there is no way for a function to know how the stack outside the function looks. So my conclusion: While is is a nice trick for a function to get access to the 'assignee', it is not stable to do so. For me it is much harder to understand the concept of the function result reference, as it is to understand that I need to deallocate the result of certain functions on my library. In DXL for me it is kind of a known catch, that for a function that has a specification like this: Buffer trim (Buffer) { ... } I need to check if it will actually return the same buffer or a new one. There is examples in the native DOORS code for both. And I think most developers are aware of this, so the problem of creating memory leaks a known issue. The best way I found so far for finding and eliminating memory leaks, is to use the allocated objects code, in conjunction with a technique of "hooking" ... To check for Buffer "substring" leaks, I would do something like this: Buffer indexOld(Buffer x, int iFrom, int iTo) { Buffer bufResult = x[iFrom:iTo]; return bufResult; } Buffer ::[](Buffer b, Range_ x) { // call indexOld and pass parameters ... indexOld (b, x) // beware of stack overwrite bug! // This function stores the DXL location of the assignment using dxlHere() // and the memory address of the buffer ... addAllocatedObject ((addr_ bufResult) int, "Buffer"); return bufResult; } void deleteBufferOld (Buffer &buf) { delete buf } void delete (Buffer& buf) { Buffer x = buf; removeAllocatedObject ((addr_ x) int, "Buffer"); deleteBufferOld (buf) } Do this function will now track any substring buffer allocations, and deallocations. After my program finishes (and I assume all variables) should be cleaned up again, I can inspect, which variables are left allocated and where they have been allocated. This way I can immediately find any memory leak in my code. I think this approach is more successful, than trying to avoid memory leaks beforehand:
Hope this helps someone, regards, Mathias Incidentally, the realisation that "real" does not store "UNASSIGNED" into the same place as it normally stores its working values, .... (deep breath) ...has made me realise that DXL can do coherent math after all!
real x = (-5 * 113 * 3581251) -2023406815.000000 You just have to make sure that you only do "real" math (involving full number-ranges) using the real type! (How convenient is that aide-de-memoire? only do real math using real!) All that's now needed is a routine that will print that as an integer without actually using an integer! :-)
|
Re: A USEFUL FUNCTION RESULT POINTER AlexTidmarsh - Thu Nov 27 15:15:42 EST 2014 Having had more time (and some new toolkit routines) to study this stuff regarding stacks, I now understand somewhat more why the trick worked for specific cases, but simply cannot be used "universally".
I should warn any reader that the following contains a mix of "seen" behaviour and some lashings of conjecture about what this all means. In the end, only the IBM technicians who actually maintain the DXL interpreter could state whether it is accurate. That, however, is probably a closely held trade secret - to avoid third parties creating free replacements, etc. (i.e. competition). So.... I reckon that the "stack" in DXL is not really a stack in the "traditional" microprocessor sense. For instance: it does not hold subroutine return addresses. It only holds the data used by what is in effect a (presumably reverse-Polish) stack-based calculator. Its a bit like that used in that old language "Forth". The routines expected to do the work on that data probably have their return addresses placed on a separate stack (invisible and inaccessible to the DXL programmer). You will NEVER see these. You will NEVER be able to access or manipulate them. Indeed it is highly probable (although not essentially so) that this is the "real" machine stack. Indeed, any information about "memory addresses" as seen in DXL data is probably a white lie: These are merely offsets into frames created by the interpreter to hold things like string tables, stacks, etc. The addresses have no need to (and thus most probably don't) relate to actual physical machine memory addresses. The address is (after all) something the DXL interpreter "lets you know" via one of its implementing kernel routines. You could not (for example) pass that to an external C DLL and expect that DLL to correlate it to the physical machine. It does not have to have any meaning in that outside world, and therefore most likely does not. This explains, somewhat, why DXL does not implement the direct calling of functions via pointers - even though it still implements the C-based parsing to allow the creation of function pointer variables. But you can't use them! I note with interest in this regard the work done by Mathias and others in creating function aliases, which have to rely on subroutine parameter definitions to create "callable" data! This parameter-only mechanism was obviously created specifically to allow call-backs for dialogs, but made highly restrictive non-the-less. Again, this must mainly be because the virtual-machine implemented by the DXL interpreter lacks the kind of memory map that would allow direct application of the original C-style function pointers (amongst other things). There is not "one memory" in that sense, and allowing "full C behaviour" simply is not feasible in that kind of scenario. Indeed, it seems highly likely to me that the memory frames created for menu and view code, and indeed any code not in the "top context", are basically discarded in full when that particular context ends. Thus a lot of the worry about "lost memory" should be held in this light: it only becomes a serious issue if that memory is "taken" from the top-level frame - which cannot be discarded until DOORS ends. The "eval_" operation is a prime example of such a discardable context. I strongly suspect that menu code and view code have such contexts created for them when they "run" which is discarded the very moment that they "halt". I would go even further and suggest that attribute DXL cannot see data from other attribute DXL simply because their code does not execute in the same "time and space". When they finish, their frames are simply dumped. It would even explain the rather lazy and "unrecoverable" way in which memory is consumed (by strings, floating point variables, etc.) - the intent is always to "ditch the lot" as soon as the context ends! How else would you explain the total lack of effort to make identical string literals share the same memory location? It is because, in the end, it really doesn't matter - the whole lot will get dumped. One thing is however clear: you are NEVER executing native code until you get the DXL interpreter to do that "by proxy" on your behalf. The DXL does not compile down at any point to machine code. It is merely a large data set of frames containing lists that is manipulated by core routines in the DXL interpreter. Where I'm going with this is that the "virtual machine" that the DXL code runs on literally has nothing whatsoever to do with the underlying processor architecture. This should not surprise us: The DXL is not "complied" as such - it is merely turned into intermediate code specific to its own-design VM. This allows it to run on any target system that supports that VM in an essentially identical manner. That is VERY desirable. And of course, this means that I (and others) can stop worrying about the following, when it comes to the essential "internal" DXL code: Will the code run differently under Linux or Windows? Nope! (Except where it requests external stuff from the OS) Will it run differently on 64-bit and 32-bit machines? Nope! Is it likely to change between 32-bit and 64-bit DOORS? Probably not. Will data always be stored and manipulated in 4-byte units? Highly likely. Will DXL change fundamentally before end-of-life? Very Unlikely.
P.S. The likes of Mathias and Louie will probably read all this and think "duh! what's new?". Surely all this was obvious? However, I do wonder whether a lot of the concerns about memory leaks are sometimes a bit "overboard". The behaviour I saw regarding willy-nilly duplication of identical string literals that the DXL interpreter does would only make any sense if the intention is to dump the stuff at the first opportunity. This would mean that it only becomes relevant to code that really does have to process the entire content of whole modules without ever exiting its execution context. This is not the case for the vast majority of attribute DXL. It would surely only be relevant for menu-driven tools that do massive amounts of pan-module work?
Hi Alex! About the relevance about memory leaks: yes, you're right, this topic is highly relevant if you write scripts that process a lot of modules, like traceability matrixes or data model consistency checkers or module comparison scripts or other things. In these scripts you always stay in the context of the "main" dxl program and every Buffer you don't get rid of will eventually crush your script, as soon as you reach the 3 GB level (well, with 32 bit DOORS at least...)
Is it likely to change between 32-bit and 64-bit DOORS? Probably not. Will data always be stored and manipulated in 4-byte units? Highly likely. I agree that there should not be an explicit need to change DOORS' behavior for 64-bit, but nevertheless there are some changes in internal data structures. Try for yourself - 64-bit DOORS Windows client is out for some months now (one fix pack for 9.5.2.2(?) and a client for 9.6). When we ported our scripts to 64-bit, we had to make some changes, mainly because some data is indeed stored in 8-byte units now. For example, in 32-bit DOORS, the name of a Trigger is stored 4 bytes after the addr_ of the Trigger. In 64-bit, the name is stored 8 bytes after the addr_. And the data structure of a DB struct has been changed, it is now also arranged in 8-bit-blocks. (plus some other minor changes in the struct).
Keep up the good work!
|
Re: A USEFUL FUNCTION RESULT POINTER AlexTidmarsh - Thu Nov 27 16:37:52 EST 2014 Incidentally, Mathias? I have now bottomed out the control of the behaviour of the novalue routine.
What was needed to make it work "universally" was to essentially stop the DXL interpreter from trying to apply heuristics when it encountered the use of "addr_". The "undocumented" but "widely understood" behaviour of this perm is (it appears) a bit of a lie. What I did was constrain its use inside a strongly-typed pointer mechanism (which I based on a struct called DWORD, and thus had a strongly-typed DWORD * pointer as a result). This allowed me to implement constrained math overloads that work exclusively with the type DWORD* and thus not get hindered by the heuristics that DXL tries to apply between integer pointers and integers, courtesy of those addr_, * and & perms. This gave a level playing field. The primary outcome of this is as small set of routines that truly allow access to data and addresses without being "confused" by interpreter interventions. As evidence, I provide the following subset: // THE STRONG POINTER TYPE //------------------------ struct DWORD {} DWORD*AddressOf(_&v){return addr_ v} // Address of values DWORD*AddressOf(_*&v){return addr_ v} // Address of pointers // STRONGLY TYPED POINTER ARITHMETIC //---------------------------------- DWORD*::+(DWORD*p,int ofs){return addr_(int addr_(*p)+4*ofs)} // Add offset (named and anonymous) DWORD*::-(DWORD*p,int ofs){return addr_(int addr_(*p)-4*ofs)} // Sub offset (named and anonymous) void ::++ (DWORD*&p) {p=addr_(int addr_(*p)+4)} // Post-increment (named only) DWORD*::++ (DWORD*&p) {p=addr_(int addr_(*p)+4);return addr_(*p)} // Post-increment (named only) void ::-- (DWORD*&p) {p=addr_(int addr_(*p)-4)} // Post-decrement (named only) DWORD*::-- (DWORD*&p) {p=addr_(int addr_(*p)-4);return addr_(*p)} // Post-decrement (named only) void ::_++ (DWORD*&p) {p=addr_(int addr_(*p)+4)} // Pre-increment (named only) DWORD*::_++(DWORD*&p) {p=addr_(int addr_(*p)+4);return addr_(*p)} // Pre-increment (named only) void ::_-- (DWORD*&p) {p=addr_(int addr_(*p)-4)} // Pre-decrement (named only) DWORD*::_--(DWORD*&p) {p=addr_(int addr_(*p)-4);return addr_(*p)} // Pre-decrement (named only) void ::+= (DWORD*&p,int ofs){p=addr_(int addr_(*p)+4*ofs)} // Direct add void ::-= (DWORD*&p,int ofs){p=addr_(int addr_(*p)-4*ofs)} // Direct sub // UNCHECKED CONVERSION //--------------------- int dwordIn(DWORD *x){int*px=addr_ x;return (*px)+0} DWORD dwordIn(DWORD *x){DWORD*px=addr_ x;DWORD v=(*x);return (v)} int dword(_ &x){int*px=addr_ x;return (*px)+0} // ADRRESS IN ... //--------------- _*AddressIn(_*&p){return addr_ (**&p)} _**AddressIn(_**&p){return addr_ (**&p)} _***AddressIn(_***&p){return addr_ (**&p)} // WORKING NOVALUE ROUTINE //------------------------ // Note - "null" is meaningless for "int" and "bool", but still detected. // Null is never detected for string or real, due to indirection. // Nonetheless, this works for ALL OTHER TYPES (about 40 tested, including // the obvious ones like, Buffer, Skip, Array, char, Date, Module, Object) const int UNASSIGNED const int *pUNASSIGNED=&UNASSIGNED bool novalue(_&x){DWORD*px=AddressOf x;return*pUNASSIGNED==dwordIn(px)||0==dwordIn(px)} // Check int x print novalue(x) You will notice that the use of "AddressOf" and "dwordIn" in the "novalue" routine aught to be easily replaceable by the code in those routines. HOWEVER: the moment you do this, you have to resort to using at least two addr_ executions. Whichever way you do this, you start a running battle with the Interpreter, because it fails to consistently adjust the behaviour of addr_ according to the universal type and so gets it wrong at least some, if not all, of the time. It most certainly never behaves coherently for all types. By hiding the use of addr_ behind my strongly typed "firewall" routines "AddressOf" and "dwordIn" I essentially stop this annoying behaviour by the DXL interpreter. The result is, it now behaves as it says on the tin. I did about 200 repeatable C-UNIT style tests covering the 40-odd types in different scenarios to verify this. I could only get it to work consistently once I introduced the "firewall" routines! It really is the case that "null" for all but two of those types is represented by storing ZERO in the variable. The only two that are actually different are "real" and "string". In the case of both of these, the ADDRESS of a memory location is always stored rather than the value - even when that value is null. However, ALL variables, including real and string, store the hex value 87654321h directly into the variable when that variable is unassigned. All variables, except those of real and string, store null directly in the variable, even when the normal practice for some (most!) types is to store an implicit pointer to any non-null value. The problem with addr_ is that it tries to hide the indirection for the majority of variable types. So the only way you can really get actual pointer processing is to hide the use of addr_ behind routines like "AddressOf" and "dwordIn". i.e. behind the firewall of a strong type.
P.S. I also have a rather nice memory dumper that uses the above firewall type. This allows you to dump memory a little bit like most debuggers do: as lines of hex dwords, wrapped to a specific address increments. Putting marker data into variables (like DEADBEEFh) lets you easily see what is really going on.
P.P.S. I can almost hear you asking: "What is this lie about addr_?". In a nutshell, the values of most types are stored as implicit DWORD pointers to variables. However, some are not (int char, bool). This is why EVERYTHING on the stack is always "entirely" stored in DWORD units. The types that can't be stored in a DWORD (which includes the QWORD real type) are stored using implicit pointers. When you use addr_, it tries to take this indirection into account. However, when you have a routine using a universal type, addr_ can no-longer work properly. There is no type information stored for it to do this. The "lie" is that addr_ simply applies untyped conversion - but this does not always mean what you think it might. Also, this adverse behaviour is compounded by anything you try do in a return statement that includes a deference - for instance: return addr_ (*x). This will nearly always go wrong! I only "get away with it" in my "AddressIn" routines because I also have an "&" symbol in the mix and addr_ is truly aware of what type it started out with - in this case: what is to all intents and purposes the equivalent of a void* by the time addr_ gets its grubby little hands on it. So it ends up "doing the right thing".
Goody for the day: Replace your "4" with getDoorsWordSize(), as in int giIs64 = null bool is64BitPlatform () { if (null giIs64) { string DxlCode_Is64Bit = " // if function to determine platform is defined and it returns true, 64 Bit is true. Otherwise 32 Bit if (isRunOn64BitPlatform()) { return_ \"1\" } else { return_ \"\" } " string ErrMess = checkDXL (DxlCode_Is64Bit) string sIs64 = "" if (null ErrMess) { noError() sIs64 = eval_(DxlCode_Is64Bit) lastError() } else { // print ErrMess } lastError() if (null sIs64) then giIs64 = -1 else giIs64 = 1 } return (giIs64 > 0) } int getDoorsWordSize () {if is64BitPlatform() then return 8 else return 4}
(yes, I know that it's not elegantly written) :-) |
Re: A USEFUL FUNCTION RESULT POINTER Mike.Scharnow - Thu Nov 27 19:53:21 EST 2014
Hi Alex! About the relevance about memory leaks: yes, you're right, this topic is highly relevant if you write scripts that process a lot of modules, like traceability matrixes or data model consistency checkers or module comparison scripts or other things. In these scripts you always stay in the context of the "main" dxl program and every Buffer you don't get rid of will eventually crush your script, as soon as you reach the 3 GB level (well, with 32 bit DOORS at least...)
Is it likely to change between 32-bit and 64-bit DOORS? Probably not. Will data always be stored and manipulated in 4-byte units? Highly likely. I agree that there should not be an explicit need to change DOORS' behavior for 64-bit, but nevertheless there are some changes in internal data structures. Try for yourself - 64-bit DOORS Windows client is out for some months now (one fix pack for 9.5.2.2(?) and a client for 9.6). When we ported our scripts to 64-bit, we had to make some changes, mainly because some data is indeed stored in 8-byte units now. For example, in 32-bit DOORS, the name of a Trigger is stored 4 bytes after the addr_ of the Trigger. In 64-bit, the name is stored 8 bytes after the addr_. And the data structure of a DB struct has been changed, it is now also arranged in 8-bit-blocks. (plus some other minor changes in the struct).
Keep up the good work!
As I said, take my suggested overview with large quantities of salt - because a lot of it is conjecture based on observed behaviour and some experiences with other emulators, languages, and processors. However, what you say about the 64-bit DOORS is fascinating. Unfortunately, I am unlikely to be in the position to use it for some time because my customer is not keen to rush-in. And I have to use what my customer provides. If what you say about 8-byte units replacing the 4-byte ones is true, then I am surprised! The memory "reach" that that would in theory give you will surely be far overwhelmed by the "wasted space" of storing small integer, character and Boolean data in 8-byte words. (Not to mention the pointers themselves, which DXL uses heavily!) Although it of course it would allow them to remove the indirection implicit in storing QWORD elements such as real. Do you know whether the same implicit pointers are used for all types that cannot be stored in DWORD? See later post by me below for details about how variables are stored in 32-bit DXL. Also take note about what I say there about addr_ - it tries its level best to hide these implicit pointers, because it wants to give you the actual value, not the implicit pointer. So you have to take care when using it to "inspect" the actual memory usage. |
Re: A USEFUL FUNCTION RESULT POINTER Mike.Scharnow - Thu Nov 27 19:59:22 EST 2014 Goody for the day: Replace your "4" with getDoorsWordSize(), as in int giIs64 = null bool is64BitPlatform () { if (null giIs64) { string DxlCode_Is64Bit = " // if function to determine platform is defined and it returns true, 64 Bit is true. Otherwise 32 Bit if (isRunOn64BitPlatform()) { return_ \"1\" } else { return_ \"\" } " string ErrMess = checkDXL (DxlCode_Is64Bit) string sIs64 = "" if (null ErrMess) { noError() sIs64 = eval_(DxlCode_Is64Bit) lastError() } else { // print ErrMess } lastError() if (null sIs64) then giIs64 = -1 else giIs64 = 1 } return (giIs64 > 0) } int getDoorsWordSize () {if is64BitPlatform() then return 8 else return 4}
(yes, I know that it's not elegantly written) :-) Excellent advice! And thanks for the code, however elegant (or not). :-)
|
Re: A USEFUL FUNCTION RESULT POINTER Mike.Scharnow - Thu Nov 27 19:59:22 EST 2014 Goody for the day: Replace your "4" with getDoorsWordSize(), as in int giIs64 = null bool is64BitPlatform () { if (null giIs64) { string DxlCode_Is64Bit = " // if function to determine platform is defined and it returns true, 64 Bit is true. Otherwise 32 Bit if (isRunOn64BitPlatform()) { return_ \"1\" } else { return_ \"\" } " string ErrMess = checkDXL (DxlCode_Is64Bit) string sIs64 = "" if (null ErrMess) { noError() sIs64 = eval_(DxlCode_Is64Bit) lastError() } else { // print ErrMess } lastError() if (null sIs64) then giIs64 = -1 else giIs64 = 1 } return (giIs64 > 0) } int getDoorsWordSize () {if is64BitPlatform() then return 8 else return 4}
(yes, I know that it's not elegantly written) :-) Goody for the day, on my part :-) For anyone who like me, likes to experiment... Usual Caveat Emptor apply : use with extreme care!
NOTE: Designed for use on 32-bit DXL. Probably requires rework for 64-bit. Attachments Pointer Utils.dxl |
Re: A USEFUL FUNCTION RESULT POINTER Mike.Scharnow - Thu Nov 27 19:59:22 EST 2014 Goody for the day: Replace your "4" with getDoorsWordSize(), as in int giIs64 = null bool is64BitPlatform () { if (null giIs64) { string DxlCode_Is64Bit = " // if function to determine platform is defined and it returns true, 64 Bit is true. Otherwise 32 Bit if (isRunOn64BitPlatform()) { return_ \"1\" } else { return_ \"\" } " string ErrMess = checkDXL (DxlCode_Is64Bit) string sIs64 = "" if (null ErrMess) { noError() sIs64 = eval_(DxlCode_Is64Bit) lastError() } else { // print ErrMess } lastError() if (null sIs64) then giIs64 = -1 else giIs64 = 1 } return (giIs64 > 0) } int getDoorsWordSize () {if is64BitPlatform() then return 8 else return 4}
(yes, I know that it's not elegantly written) :-) Just as an illustration of the kind of investigative work you can get up to with this stuff (see uploaded file in the last post): void Test_a_type( real &x ) { int stackmark = -559038737 // =DEADBEEF print "stackmark >> "; MemDump( AddressOf(stackmark)-1,2 ) print "real &x >> "; MemDump( AddressOf(x) ) } real MyValue Test_a_type( MyValue ) MyValue = null Test_a_type( MyValue ) DWORD *p= addr_ dword( MyValue ) // Dirty part! print "Actual zero >> ";MemDump( AddressIn(p) ) This produces the output: 1a. stackmark >> 0E4C2BB0: 0E4C2AB8 DEADBEEF 1b. real &x >> 0E4C2AB8: 87654321 2a. stackmark >> 0E4C2BB0: 0E4C2AB8 DEADBEEF 2b. real &x >> 0E4C2AB8: 0E533BA0 3. Actual zero >> 0E533BA0: 00000000 Which, taken line by line, demonstrates: In 1a, the stack, walking back from 'DEADBEEF' shows a pointer for the variable 'x' This we can confirm via the address in 1b. The "value" is 87654321h (the unassigned constant). Upon the second call (after setting the variable to 0.0 - the same as calling null, incidentally), 2a shows exactly the same thing, but 2b. shows not the value 00000000, but a pointer! Line 3., which comes courtesy of our dirty trick of "promoting" a pointer, allows us to inspect what that pointer is referencing. And here we find the "null" (or at least, half of it - remember a "real" is actually a QWORD). Remember that this pointer is an implicit one used purely by DXL to get around the fact that a QWORD real won't fit into a DWORD pot!
If you repeat this experiment with other types, (EXCEPT for string, which behaves the same way as real), you will find the null is stored in the same place as the 87654321h. There is no indirection for the null. Furthermore, doing the experiment for int, bool and char, you will find that other values are stored directly where the 87654321h was stored. For all other types, non-null values always use the indirection pointer, because their values essentially won't fit into the DWORD pot. For strings, for instance, the indirection pointer points directly at the first character of a null-terminated string. You can see from this why addr_, which hides the indirection when it knows what type it is accessing, can get things badly wrong when that type information is lost - either via pointer processing, or as the result of a "universal type" parameter (which is treated essentially as a dword pointer to "something unknown").
P.S. From what Mike has said, if you are already using 64-bit DXL, all this may look very different. AND you will probably have to rework the Pointer Utils.DXL file to make it work at all! |
Re: A USEFUL FUNCTION RESULT POINTER AlexTidmarsh - Thu Nov 27 18:32:13 EST 2014 Incidentally, the realisation that "real" does not store "UNASSIGNED" into the same place as it normally stores its working values, .... (deep breath) ...has made me realise that DXL can do coherent math after all!
real x = (-5 * 113 * 3581251) -2023406815.000000 You just have to make sure that you only do "real" math (involving full number-ranges) using the real type! (How convenient is that aide-de-memoire? only do real math using real!) All that's now needed is a routine that will print that as an integer without actually using an integer! :-)
Works for me: real x = (-5 * 113 * 3581251) print intOf(x ) // -> -2023406815 // don't do: // int y = intOf x // print y // --> unassigned variable y ... Nice way to "unset" a variable without any vodoo/magic ;-) Mathias |
Re: A USEFUL FUNCTION RESULT POINTER Mathias Mamsch - Fri Nov 28 04:35:19 EST 2014 Works for me: real x = (-5 * 113 * 3581251) print intOf(x ) // -> -2023406815 // don't do: // int y = intOf x // print y // --> unassigned variable y ... Nice way to "unset" a variable without any vodoo/magic ;-) Mathias Damned good point: And quite probably faster than the "AsUNASSIGNED" routines in my pointer utils! I shall have to look into that. Stops himself .... but optimising the utils wasn't ever going to be the point : they are for investigation / debugging / repeatable test-harness (DUNIT) work, rather than for production code. But nice lateral thinking there! :-)
|