A Pathological Case of Self Reference. (Hmm, maybe that's stretching the term SR too far?)
Self-modifying code is a piece of software which achieves its goal by rewriting itself as it goes along. It is almost universally Considered Harmful now, but it used to be a neat trick in the toolbox of programmers of small or old computers.
But hang about, people work by modifying their own code!! Isn't it going to rather hamstring the development of Artificial Intelligence if we prohibit Self Modifying Code. -- David Elsdon
We're self-modifying, sure, but Are We Code?
Some older computers (circa 1960) had subroutine returns that worked by changing an address in an instruction. Thus it wasn't a trick but an essential part of the architecture. The architecture was, needless to say, "quaint" by modern standards -- Robert Field
It was also used to step through an array, by changing the address part of a memory reference instruction inside of a loop. This was done in the Edsac before it got index registers. The PDP-1 had an instruction for overwriting just the address part of a word in memory from the address part of the accumulator.
Now it is mostly used for fun, or to confuse people.
Usually, the term refers to code which causes the compiled instruction stream for the underlying processor to be mutated; it does not (usually) refer to any of the following:
the use of Meta Circular Interpreters (i.e. the "eval" function in languages like Lisp Language or Java Script) to execute code fragments -- including code fragments that may have been computed at runtime.
Programs which perform Manual Library Loading -- using functions like Dll Import (in the Microsoft Windows Api) or the equivalent elsewhere, to manually cause other libraries to be loaded into the address space. (For many simple programs, the loader does this automatically when the program starts. But for many complicated programs, Manual Library Loading is commonplace; often to get around Dll Hell issues).
But Dynamic Compilation is Self Modifying Code - for a suitable definition of "self". Or must the compilation be sufficiently advanced (read: generic)? I think Self Modifying Code carries the air of an ad-hoc solution, a hack. But a) that need not be so and b) the principle is the same: write an instruction, that is executed by the same executable later (by the way, executable is a ver highlevel concept to use to define such a simple concept).
Examples
Someone suggested a Turing Machine (presumably meaning a Universal Turing Machine). I don't think this need be an example; the "code" of a Turing Machine is in its state table, which can't be changed. If you use this Turing Machine to emulate another one, I don't think there's any reason why you can't do so without separating "code" and "data".
Some old games used self-modifying code to confuse disassemblers and make it harder to disable copy protection.
It was commonly used as part of program initialization on the Bbc Micro to save memory, although strictly speaking there isn't much actual modification to "shuffling yourself down to overwrite the disk filing system storage area". Other tricks included deleting from the BASIC part of the program the assembler code, once it had been assembled to machine code.
The practice was knocked on the head somewhat, at least in the Acorn Computer timeline, mid way through the Acorn Archimedes machines. The early ARM chips handled self-modifying machine code without any problems (Von Neumann Architecture), but the StrongARM upgrade with its Harvard Architecture broke the assumptions made by many programmers. (argh, memory fog! What was the problem the ARM3 had, caused by caching?)
Someone suggested:
Variables that are set and changed for the purpose of controlling the behavior of code are commonly called flags. Flags Are Self Modifying Code.
That is NOT self-modifying code. Flags control which, pre-existing code gets run. They do not modify the pre-existing code. They do control behavior, but it is possible to usefully distinguish running immutable code with varying behavior due to varying inputs vs. direct mutation of the code.
Machine code controls which pre-existing behaviors of the CPU get run. How is changing one different from changing the other?
Huh? Neither of those is necessarily Self Modifying Code. If you change one or the other, it was clearly not a self modification. For machine-code, Self Modifying Code would be machine-code running on the CPU that reaches into its own in-memory or on-disk representation and tweaks that representation. And, even in the case of FPGAs and transmeta toolsets where you have a CPU that is modifiable, it generally isn't self modifying.
In many situations commonly accepted as Self Modifying Code, the machine code being modified is a different part of the program than that doing the modifying.
But still the same program, right? i.e. there is already code in the running program that is reachable and configured to execute to the location being modified. Self Modifying Code isn't the same as runtime compilation or dynamic link loading.
Still, for the sake of argument, going by your definition, what's the difference between a flag that instructs the program to modify the flag doing the instructing and a machine code instruction that instructs the CPU to modify the machine code instruction doing the instructing?
The answer to that is the same as the answer to: What's the difference between flags and code? Code is for describing and composing behaviors. Flags don't offer the latter capability as, by nature, flags cannot reference one another. Additionally, flags lack the former capability; the only 'behavior' they can describe is whatever primitive behavior is imbued into them: every flag is a new primitive. Flags don't constitute code. Code may be data, but not all data is code.
Game over
The Harvard Architecture, at least as the term is used here, has separate caches for data and instructions. This means that modifications to code just about to be executed may not yet have been written to main memory, or may be obscured by an older copy in the instruction cache. (The x86 processor is a notable exception to this). In addition, many operating systems disallow execution of user-writeable pages (and conversely disallow user modification of the Code Segment); any attempt to create Self Modifying Code on such a platform results in a General Protection Fault.
Sudden changes like banning self-modifying code make backwards compatibility difficult in some cases. They generally seem to spawn numerous Vicious Hacks to provide the backwards compatibility at some other cost.
switching the caches off avoids problems but cripples performance
Mac Os support(s|ed) multiple platform binaries, to allow for the change from 68000 series to PowerPC. (It did this by emulating the 68000; this didn't always hurt as much as you might think, because all the system routines - the Toolbox, for instance - were in native code. But for applications like numerical software, chess playing, etc., you really wanted a native version.)
Nameless Concept: employing horrible kludges in the name of backwards compatibility. All systems I've seen have done it. Win16 --> Win32 is not one I have suffered from directly, but I hear many complain about it. Shoehorning Compatibility? Backwards contortability? Compatiblity Barnacles?
There are, of course, cases for SMC. I wrote a program several years ago that would run on an 8080/8085 and, naturally, the Z80. There were places in the code, however, where using the Z80's LDDR/LDIR and other high-speed block instructions was a desirable thing, if available.
Part of the program's startup procedure was a bit of sword swallowing to determine which flavor of CPU was in use and, if it was a Z80, patch the jump/call vectors of certain routines.
A similar technique was used to patch the jumps/calls to certain video handling routines based on the kind of display.
The rationale was simple: don't continually evaluate things whose state can be known at the start and which will not change during execution; instead, pre-determine such execution paths and let the thing just run.
Exactly. This can be one way of implementing Once And Only Once. My example was the inner loop of a Text Editor search routine. It turned out that forward and reverse searches had inner loops that were identical but for a couple of machine instructions. It ended up saving code and duplication to have a single search routine and copy in the appropriate instructions when the forward or reverse function was called. Of course, this is serious micro-optimization; I only bothered because this was written in Assembly Language already, and it was meant to be a "twee" editor (< 4K). -- Ian Osgood
To lower the maintainability cringe-factor, this sort of stuff is only done in extreme cases these day. It's hard enough to debug a normal program, but debugging a self-modifying program can be plain impossible.
A notable exception to this trend, however, is Just In Time Compilation. The JIT can simply create different code depending on factors that are constant at runtime. While this is different from all-out SMC, it still gets you a lot of the same benefits.
From coding on the Atari 800 up until DOS TSRs, I wrote dozens of self-modifying programs. This was occasionally intentional.
See original on c2.com