5380

At the level of compiled code – what a computer actually runs – a program is just a series of numbers stored in the computer’s memory. The program acts on data by manipulating the numbers representing the data which are also stored in the computer’s memory. Theoretically, then, it would be possible for a program to modify itself by manipulating the numbers that represent its own instructions. This is called ‘self-modifying code’ and most modern programming languages and operating systems aggressively preve

    Since both the instructions that a computer executes and the data it manipulates are numbers stored in memory, it is possible to imagine having the program manipulate the numbers that represent its instructions – self-modifying code. Essentially all modern programming languages are structured so that this is impossible and modern operating systems include memory-management systems to prevent this. For example, one way to get the C-programmer’s dreaded “segmentation fault” error on a modern operating system is to try to write to program memory. In addition, most microcontrollers maintain a strict distinction between program, which lives in ROM, and data, which lives in RAM. The reason for this is that self-modifying code is difficult to write, very difficult to debug, and, if not done with extreme care, could lead to over-writing the OS and many other forms of large-scale computational chaos.
   In the 1950’s, however, self-modifying code was not a taboo but, in fact, essential for even trivial tasks as a result of the limitations of early CPUs. To understand this, it is important to know that instructions in early computers were often one word that contained an opcode that specified the action in the high-order bits and an address in memory where that action took place in the low order bits. For example, the instruction LD100 would mean “load the contents of location 100 into the arithmetic unit”. In order to step through an array of numbers, programmers would have to write a loop that incremented the address part of that instruction so that it would execute LD100, LD101, LD102, etc. Similar tricks were necessary to call a subroutine with parameters and return properly. Without the ability for the program to modify itself, these would not be possible.
   The article will describe some of the programming tricks needed to implement for loops and subroutine calls with 1950’s technology and self-modifying code. I will then discuss how these can be accomplished more safely with modern CPUs that now have index registers and stack pointers and compliers that make use of them. I will use examples from the 1950’s era Whirlwind computer as well as modern C code. This should be interesting from several points of view: the challenges early programmers faced; the innovations these prompted; and what actually happens when you write a for loop or call a subroutine.