Virtual Machine

A *Virtual Machine* is a synthetic computer. It behaves as a real machine would, if only there were such a machine. Because there isn't, people use a virtual machine to emulate the one they wish they really had. Virtual machines are interesting only because of what they execute. For a Smalltalk virtual machine, that something is called the *image*. -- LIU, Chamond, 2000. Smalltalk, objects, and design: the classic guide, now in paperback. Reprinted from the original 1996 ed. with minor corrections. San Jose: toExcel. ISBN 978-1-58348-490-6, p. 181

⇒ Image

A virtual machine, sense 1, is an abstraction that defines a Computing Model. Also called an Abstract Machine.

A virtual machine, sense 2, is also an implementation, done over another virtual machine, sense 2, or directly into hardware, of a virtual machine, sense 1.

In recent years the term has come to have the connotation of a software platform that exists on multiple hardware platforms and provides an abstraction layer that allows programmers to write code without considering the hardware that it will run on.

Examples throughout the history of our field:

For general-purpose Programming Languages:

Java Virtual Machine

Microsoft's Common Language Runtime

Pascal Language Pea Code (PCODE)

Smalltalk Virtual Machine

Parrot Code Virtual Machine

Self Language Virtual Machine

Inferno Os Virtual Machine (Dis)

Low Level Virtual Machine (LLVM)

Special purpose machines:

Zed Machine (used to power the Infocom text adventures)

Zend Engine 2 (runs Php Language bytecode)

Interesting subtype of Virtual Machine:

Safe Virtual Machine (guarantees, that code cannot violate its typing, thus enforcing security):

Ms Virtual Pc considered an example of Virtual Machine implementation?

I just added Virtual Computer to try to clarify the distinction, at least on this Wiki.

Previous definitions

def.Virtual Machine: fancy name for computer program, see also: Turing Machine

A Virtual Machine is a program that translates a standard series of instruction codes into hardware specific Machine Language. In recent years the term has come to have the connotation of a software platform that exists on multiple hardware platforms and provides an abstraction layer that allows programmers to write code without considering the hardware that it will run on.

That sounds more like a definition of a compiler/assembler than a Virtual Machine. Particularly in the context of an interpreter (or microcode), Machine Language instructions, while executed to perform the higher level functions, are not necessarily first emitted as the result of some translation process. And I'm not sure about calling a Virtual Machine a program, instead it is the non-tangible specification of a device, which can be used only through an emulation program.

Any interpreted language and most operating systems operate as virtual machines but the term has gained prominence in part due to the popularity of the Java Language.

The term has gained prominence in the past, due to Pcode and Byte Codes

What about the following definition:

A virtual machine is a hypothetical device that is capable of executing a specific instruction set. Such an instruction set is commonly called Byte Code. A common implementation strategy for programming languages is to compile code into instructions for a specialized Virtual Machine. Subsequently, these instructions can be actually executed in a number of different ways:

Interpretation by software

Translation into the instruction set of a real processor

Direct interpretation in hardware In the last case, the virtual machine isn't "virtual" anymore, though the name may still stick if there's a long history during which the machine was really

"virtual".

Confusingly, the software interpreter for a virtual machine is itself also sometimes called a virtual machine. It is more correctly called a Byte Code Interpreter.

An implementation strategy which compiles Byte Code into the instruction set of the real processor just prior to execution, is called Just In Time Compilation.

Not all virtual machines are interpreters, not all virtual machines use a stack-based paradigm, and not all software interpreters (sic) for virtual machines are Byte Code Interpreters. I suggest that we therefore broaden the definition of virtual machine.

At Component Software Corporation, we built a virtual machine that ran compiled code in a virtual machine that used a RISC-based register-intensive paradigm. The virtual machine was therefore a generic RISC engine for which we defined intermediate code (like that of a portable compiler). Compilers (we supported Smalltalk and Objective Cee) emitted intermediate code for each method, which was then persisted (instead of Byte Codes). Native code was compiled by a platform-specific code generator as each method was imported into the environment or compiled after being edited.

Thus, the Component Virtual Machine always ran compiled code, it only required Just In Time Compilation for performs and similar exotica, and its performance was dramatically better than comparable byte-code approaches. The code generator for RISC machines (our prototype ran on MacII's and Sparc's) could generate blazing native code.

--Tom Stambaugh

I would like to point out that a virtual machine can be done in such a way to run fast, the Java Virtual Machine is not a very good example.

For example, being stack based is very nice in theory. It is a clean design and computer scientists love clean designs. But they are a bad idea. See this example of translating "a = b + c":

push b LS push c LS add LLS pop a LS

Compare with a more flexible approach, where operators can use memory directly for example:

add b, c, a LLS

Pushes and pops have to store/load, because most of the time they are implemented with a piece of regular memory. This makes the stack-based example do 5 loads and 4 stores, spanning 4 instructions (think instruction decoding overhead). A saner (and not as clean, design-wise) way to do this is to have an 'add' operator that will add two memory locations and put the result in a third memory location. A single instruction, two loads, one store.

The stack example can be rewritten using modern compiler techniques and simplified to your 3-operand example. Static Single Assignment Form is your friend! I believe this is what most native-code Java Language compilers do, and with quite admirable results. This allows a restoration of using the clean design of the stack, while factoring out the brutal realities of register-based architectures. --Samuel Falvo

This may be true for stack architectures made by register-familiar engineers, but true stack architectures (e.g., actual hardware that runs stack-oriented code, a la Forth CPUs) are actually notably faster than register-oriented architectures, by virtue of the fact that the interconnects to/from the ALU are fixed and substantially shorter, going through fewer gates, and therefore, can be driven at substantially higher clock rates (Chuck Moore's MuP21 achieved 500MIPS peak performance with 0.8u technology, for example). Of course, that being said, implementing a stack virtual machine on a register architecture will slow things down considerably, just as implementing a register-based VM on a stack architecture will bog it down considerably.

Also, stack architectures with small, finite stack depths are able to be 'register-optimized' with moderate ease. For example, anyone here can probably write an optimizing virtual machine implementation for the Steamer16 microprocessor that runs close to, if not at, native hardware speed on a register-based architecture.

Do this with other instructions (complicate them) and try it. Speed! Still less than native code, of course, but getting better. And those "complex instructions" have more context to them, so that a Just In Time Compiler could possibly have more success optimizing them. For example, a very complex instruction (think really stupidly complex, like a 'sort' operator!) could be compiled to a function call into some hand-optimized C or assembler code, as tight as you could want. This would be much faster than implementing it in the virtual machine.

For an example of such a Virtual Machine, look at www.cs.bell-labs.com , by Rob Pike. --Pierre Phaneuf

Actually, this is very common (Sun just seems to miss all of these things). If you look at IBM's BAL for the 360/370/390 lineage or MI for the AS/400 [As Four Hundred Machine Interface] you will see this kind of an approach. IBM has always used an "abstract" assembler, which is to say that you aren't programming the processor in a direct fashion. On the AS/400 all program objects run in a virtual machine environment. On the mainframe this is less the case, but still close. Many of the opcodes are optimized for minimum load/stores. Also, you will tend to find "higher level" opcodes than in say pure RISC or other CISC assemblers. -- Jeff Panici

It's still not good enough for a portable virtual machine (not necessarily the goal of the IBM/3?0 group). See sunir.org and sunir.org for a discussion of this.

I wonder if the best form for a VM instruction set might not be some sort of Static Single Assignment Form. SSA is elegant enough to satisfy the Computer Science propellerheads, and is easy to compile into good code - SSA is used as an intermediate form in many (most?) modern compilers. The only serious problem i can think of is running it by interpretation or fast compilation (where proper register allocation is too expensive); this might be ameliorated by doing some sort of register allocation at source-compile-time, and including the results as a hint to the interpreter. -- Tom Anderson

Low Level Virtual Machine (LLVM) uses this, I believe. -- Jebadiah Moore

I think compressed Abstract Syntax Trees might be better. I found a reference (although I'm not going to look right now) that showed that due to the growing latency gap between processors and disks, loading compressed abstract syntax trees off disk and compiling them on the fly was faster than loading a purely native program off disk.

There was also another operating system (Inferno Os?) that stored all the programs and system libraries as ASTs and recompiled each function on the fly as it was invoked. They saw something like a magnitude speed improvement over purely native call (think dynamic constant folding).

Then again, www.bytecodes.com claims their embedded JVM is comparable to embedded JVMs with JIT compilers because they can stuff the VM runtime into the instruction cache and most programs into the data cache. JIT compilers would thrash both caches very badly. -- Sunir Shah

True, but they also say "Large desktop and server platforms with megabytes of memory to spare and processors running over 2 Gigahertz with 256K on-chip caches are better served by using the fully optimizing versions of server and desktop Just-In-Time Compilers (JITs) coupled with our EBCI Interpreters. JITs use our interpreters to jump start at a faster speed because JITs start out interpreting Java code before compiling it."

TAOS (now Tao Intent Os) was designed to operate on heterogenous clusters (eg a mix of PC and sparc hardware on an ethernet), so it defined programs in terms of a virtual processor (and so an associated Virtual Machine); it was translated into native code at load time. They say: "It would be wrong to think that this slows the system down. Most processors are able to translate VP into native code faster than the VP code can be loaded from disk and sent across the network, so there is no visible overhead. Indeed, VP code is often more compact than the native code; therefore less disk space is used and code is loaded faster than if it were the native for the processor."

www.uruk.org

-- Tom Anderson

Parrot has a register-based design for its JIT VM.

www.parrotcode.org

One of the world's earliest and most widely ported virtual machines is the z-code interpreter from Infocom. In various incarnations it was used to implement the 35 or so text adventure games they released allowing their games, including the famous Zork Trilogy, to be platform independent. Z-code interpreters exist for platforms as diverse as the Commodore64, AppleII, PalmOS, KayproII, IBMPC, TRS-80 and the TI-89 calculator. A compiler for this virtual machine, Inform, is freely available although the original games were not written with this compiler.

-- Dan Piponi

UCSD Pascal was earlier and more widely ported, note. Mainframe emulators were widely used back in the 1960s to maintain backward compatibility. Z-code might hold a record for the product of EARLIEST * WIDELY_PORTED; back then it was unusual for anything at all to be widely ported.

(See Ucsd Pcode)

There are probably more people served by Pick Language in Manufacturing Systems and Human Relations Systems, even now in 2006. On a Virtual Machine within many corporate operating systems, including Windows, Unix, MVS.

Category Computer Architecture Category Operating System (because of the close associations)

See original on c2.com