CS 3723: Storage Management

	CS 3723 Programming Languages
Storage Management

Introduction: Run-time refers to the time when an application actually executes. In this discussion compile-time means everything before run-time, that is, compilation, linking, and loading. As programming languages and environments have become more complicated, managing the storage at run-time has gotten extremely difficult indeed. Systems and algorithms to manage run-time storage are now among the most difficult in existence.

For a compiled program, its static structure is the structure of the source program, how it is organized. The dynamic structure is the structure that evolves during run-time.

Static Storage Allocation: With static storage, the location of every variable is fixed, allocated and known at compile-time. In principle, every variable has a fixed constant machine address.

One uses the word binding to mean an association of a property with an entity in a programming language. All binding of storage locations to program names occur at compile-time. The bindings are fixed and unchanged throughout run-time.

This is the storage model used for the FORTRAN language up to 1977. It is a simple model, easy to set up, with very little to manage at run-time. In fact the main changes at run-time are with the parameters. In FORTRAN, parameters are always passed by reference, that is, the location (or reference) of the parameter is passed. FORTRAN passes arrays just the same as C or Java, but FORTRAN also passes simple variables by reference, rather than by value, as C and Java do. In a FORTRAN program, this is the main thing that changes during run-time, since a function may be passed different addresses for its parameters during different calls at run-time.

These features of FORTRAN can lead to a curious error: If one passes a constant, say 2, as a parameter, what is actually passed is a reference to the constant. Suppose inside the function, the formal parameter (call it x) is incremented, say with x = x + 1. Then this can result in a change of the constant itself, so that a = 2 would then actually give a the value of 3.

The static storage model allows simple allocation (at compile-time) and allows efficient execution (at run-time). So why don't we still use this method? Why not just use static storage?

Answers to "Why not use static storage?":

Historically storage was expensive, and one wanted to use it sparingly. The static method is simple to implement and efficient in execution, but it is more wasteful of storage than other methods. (Though now large cheap memory is available.)
One can't implement recursion statically.
One can't have dynamic data structures, that is, the structure of the data can't be decided on and created at run-time.

All three of C, Java, and C++ allow the declaration of static variables, so at run time, each of these languages has a static portion of storage for these variables.

Stack Allocation: A run-time stack is a simple and efficient way to provide the storage needed for function calls. When a function is called, all storage needed for the function is allocated on the stack in a section called the activation record. The storage includes room for the return address, the return value (possibly a pointer), actual parameters passed to the function, and any variables declared within the function, including static arrays (C and C++ only), static structs and classes (C and C++ only), and variable-sized arrays (GNU C and Algol, but not standard C, C++, or Java). The activation record also holds locations for temporary values, and pointers to other parts of the stack (to facilitate access and deallocation). On return from the function, the storage on the stack is deallocated. It is always an error to make use of the stack after returning from the function (except perhaps to retrieve a value returned). For an example of the misuse of stack storage, see dangle example.

Allocation on the stack can be as simple as modifying a single stacktop pointer, and deallocation is typically just as easy. Implementing functions using only static storage (as was done in FORTRAN) does not allow recursive functions, whereas the stack allocation supports recursion. This method has just a little extra run-time overhead compared with static allocation.

This method uses storage more efficiently, since local variables for all defined functions are allocated using static allocation, whereas the stack allocation only needs to allocate storage for currently executing functions. For example, if there are three functions A, B, and C, each with a 100-byte local array, then all three arrays must be allocated (300 bytes) in the static model, while only the arrays for executing functions are allocated in the stack model.

This stack allocation method was introduced in 1960 for Algol 60 by Peter Naur. (Of course he worked on it before then.) It might seem plausible that Naur developed Algol 60, and then hit on the clever way to implement function calls. The reality is more impressive though: Naur thought of the stack allocation method and then invented a language that would utilize this method. Since 1960 all "standard" compiled languages have used this method, including later dialects of Algol, Pascal and descendants, PL/I, C and C++, and Java.

Note that actual parameters (possibly expressions) are evaluated in the environment of the calling function. Then each resulting value of this evaluation is copied into the spot in the called function's activation record for the corresponding formal parameter (so that the formal parameter behaves like a local variable). Then all the additional information is copied into the activation record and the function is put into execution.

The Pascal language allowed the definition of a "helper" function to be buried inside the definition of the function it is helping. The provided a rudimentary version of the "object-oriented" approach, since the information in the helper function was hidden from view (so-called information hiding). Function definitions could be buried within other functions to an arbitrary depth of nesting, and this feature made Pascal implementation much more difficult than it otherwise would have been. (The stack needed extra pointers to access variables at different levels of nesting efficiently.) These mechanisms provided interesting examples for implementation in compiler construction classes, but in practice they were seldom used. The developers of C, and those of C++ and Java who followed, did not allow a function definition to be nested inside another function.

Dynamic Allocation, from the Heap: Long before C, language designers recognized the need for true dynamic storage allocation, to obtain new storage during run-time from somewhere besides a stack. This storage would be explicitly allocated and would remain until explicitly deallocated. The source of the storage is now commonly called a heap (an Algol 68 term). The Lisp language, among others needed similar storage. So Pascal, C, C++, and Java made all stack allocation relatively simple, not allowing an array to be allocated that was of a size determined at run-time. (Java doesn't allow any arrays on the stack. When C and C++ allowed only sizes of arrays that were known at compile time on the stack, that made the stack management simpler.) Then all these languages allowed special storage of any size (dynamic) to be allocated at run-time.

In each case, a desired amount of storage is allocated and a pointer to the storage is returned. Later it is possible to return the storage to the heap (sort of like recycling) to be used during other allocations. Pascal, C, and C++ have explicit deallocation as the norm, whereas automatic deallocation, called garbage collection, is possible. In Java automatic garbage collection is the norm, although explicit deallocation is possible.

Here are the different functions for allocating and deallocating in the different languages:

Language	Allocate	Deallocate	Automatic Garbage Collection
Pascal	`new()`	`dispose()`	No
C	`malloc()`	`free()`	Not normally
C++	`new`	`delete`	Not normally
Java	`new`	`System.gc()`	Yes

(Note: In Pascal and C, the above are functions. In C++, new and delete are operators, although you can also use the C functions, assuming that you do not mix them with the C++ operators (causing a catastrophe). In Java, System.gc() is a function, while new is a keyword, a part of the language used to allocate storage.)

Implementation of allocation/deallocation: malloc and free in C: Even in C it is very tricky to write functions malloc and free. An clever and relatively short implementation (considering how hard the task is) is found in the text The C Programming Language 2nd Ed., by Kernighan and Ritchie, pages 185-189. We will go over this material in class. Notice that this C implementation is partly machine-dependent.

Implementation of malloc and free: here.

Pitfalls with allocation and explicit deallocation: There are two mistakes one can make when using explicit allocation and deallocation in a language like C or C++. Notice that neither of these mistakes is possible in Java.

Mistake 1: Memory Leak.

garbage

memory leak

Memory leaks are a serious problem in C, but they are even more of a problem in C++, because of the complexity of this language, particularly the complexity of hidden allocations and deallocations. Bjarne Stroustrup, the inventor of C++, had this to say in in his FAQ about C++:

Question posed to Bjarne Stroustrup: "How do I deal with memory leaks?"

Answer: By writing code that doesn't have any. Clearly, if your code has new operations, delete operations, and pointer arithmetic all over the place, you are going to mess up somewhere and get leaks, stray pointers, etc. This is true independently of how conscientious you are with your allocations: eventually the complexity of the code will overcome the time and effort you can afford. It follows that successful techniques rely on hiding allocation and deallocation inside more manageable types. Good examples are the standard containers. They manage memory for their elements better than you could without disproportionate effort.

(Several complex examples here) ...

If systematic application of these techniques is not possible in your environment (you have to use code from elsewhere, part of your program was written by Neanderthals, etc.), be sure to use a memory leak detector as part of your standard development procedure, or plug in a garbage collector.

Mistake 2: An active pointer to garbage. A much more serious mistake than the previous one is to deallocate storage when the program still has an active pointer to the storage. Mistake 1 just leads to programs that keep using more memory as they run; they have to be restarted every now and then. But Mistake 2 can lead to a program that crashes -- a potential catastrophe.

Revision date: 2014-02-25. (Please use ISO 8601, the International Standard.)