Transparent Persistence

Also known as orthogonal persistence, transparent persistence is a mechanism for bringing data into RAM implicitly, i.e. without expressing input procedures explicitly, and writing data back to its persistent form implicitly, i.e. without expressing output procedures explicitly. Typically this mechanism is in the context of an object-oriented programming language, but this may simply be a coincidence since most popular languages are object-oriented.

A transparent Persistence Mechanism may be fairly simple with one operating system process associated with one Persistence Store, the store having one Persistence Root, and no Garbage Collection other than that provided by the Programming Language Runtime.

A transparent Persistence Mechanism may be fairly complex with many distributed processes associated with one or more instances of a Persistence Store, each store having more than one Persistence Root, and a Persistence Aware distributed Garbage Collection mechanism beyond that provided by the Programming Language Runtime.

Transparent persistence can be implemented in many ways, using an Object Oriented Database, Serialized Data, Relational Database, or other form.



Transparent persistence imposes a Programming Model


Apparently the Python Language Zope Application Server has some form of transparent persistence.


Gem Stone/Small Talk and Gem Stonej are two implementations of Transparent Persistence. Both are fairly complex mechanisms, but not identical. Their differences are due mainly to the relative simplicity of the Smalltalk Language and the relative complexity of the Java Language, respectively.

The complexity of Java in this case has to do with the assumptions built into the language regarding the Compile Time features of the language vs. the Run Time features of the language as well as how static aspects of the language are supposed to be implemented. (Static Means Class In Java)


The Productivity Environment For Java (PE:J) provides an implementation of Transparent Persistence. It generates a Persistence Layer based on the Java Data Objects (JDO) specification.


Eros Os implements the idea of Transparent Persistence. Operating system and application state are saved (checkpointed) at regular intervals, and restored when the system is started. Programmers allocate memory (or create objects) and put data there, and it automatically is saved to disk and is thus implicitly (and transparently), as opposed to explicitly, made persistent.


I've never seen a transparent Persistence Mechanism that performs adequately - unless that is, it was hand coded to the domain it was serving. The problem with transparent persistence is that you usually lose all the cool things about relational calculus that makes data queries fast - out left joins, searching, and such. The worst thing about transparent persistence is the traffic between the data engine and the application model. For example, you either end up retrieving objects one row at a time or retrieving the entire set of objects - when you are dealing with millions of records, that is a drag. -- Robert Di Falco

Hmm... if you had a turing complete query language, wouldn't one's typical rdbs count as TransparentPersistence?

It's not that simple. You can make SQL turing complete by wrapping the right SQL statements in an infinite loop. E.g. if the SQL is such as to emulate a cpu. Quite easy to do as a demonstration. But the result would be vastly slower than the regular way of programming, which goes to Robert's point: the problem isn't doing persistence, it's getting all 3 of (1) persistence (2) transparency and (3) performance, simultaneously.

Because of this, you probably don't really want utterly transparent persistence (with ACID guarantees, that is), you just want it to be easy. Then you can get it when you want it, without much effort, and take the performance hit, or avoid it when you don't absolutely need it, to run at normal speed.

If you don't care about ACID guarantees, then the above comment about ErosOS possibly makes it a solved issue.

Once upon a time, when everyone was using non-volatile core memory, some of these issues were a lot simpler; part of the complication today comes from needing to map data from volatile RAM to non-volatile disk. -- Doug Merritt



Orthogonal Persistence means you don't have to "save" and "load" files, the Operating System just swaps out memory to disk when it shuts down. (Eros Os does this; it also resembles the Smalltalk image). ''Actually Eros writes all changes to a log immediately, and snapshots periodically. You can (as you should be able to) pull the plug and you're safe to a very recent point.

Is this the same as Transparent Persistence?


Orthogonal Persistence sounds cool, but what happens if the OS crashes or if there's a power cut? It can't just save when you shutdown, it has to be able to recover the state just before a power outage. And what about when you want to force a reload from some known good state (because something has corrupted memory)? Or when you want to revert to a previously saved copy of something (because you realize your changes made things worse)? -- Gavin Lambert

Notice how lacking orthogonal persistence doesn't help you on any of these issues. So bringing them up is essentially asking "How can we make Orthogonal Persistence even cooler than it already is?" One answer is to have a Logging File System underneath everything, making regular snapshots of the log's metastate (to make reboots extremely fast). Another thing is to allow the user to specify the granularity of snapshots (every hour, every minute, every second?) and to be able to force snapshots.

Huh? If you don't have orthogonal persistence, you have the currently "normal" way to do things, have load images on disk that load various bits of code into memory on startup until everything is running again. By definition, the bits of code on disk are in a "good" state, and when you're loading them all into "clean" memory you're basically loading a known good configuration. If later on in the session one of those programs completely bollixes memory, you can clear out the memory and reload from the known good configuration - in other words, whack the reset button. -- Gavin Lambert

And of course, all of this is done manually. The user or programmer has to manually specify that this can and should be done with programs X, Y and Z. Everything you can do without Orthogonal Persistence, you can do with it. But in the general case, you can't just restart a bunch of processes and hope everything magically works out. What if they were using a very rare and expensive resource? What if they were using a one-time accessible resource (a remote process that just terminated or a user that just had a coronary)? What if they modified a remote resource during processing? If you want to do things the dirty way and cross your fingers then Orthogonal Persistence can do that too. But it should be able to do better.

I think that a perfect implementation of orthogonal persistence requires you to do away with the notion of a "disk" altogether. Somewhat like Palm Os does it, which uses database records which in essence are nothing more than memory blocks, you control read and writes to them by requesting access to the database record, accessing it and then hand it back to the operating system. If the hypothetical new operating system used this technique coupled with full transaction management with logs and rollback, we would have something really cool. It would even be possible to roll the database back to what it was earlier, thereby getting a system wide undo functionality. -- Elias Martenson

What does 'undo' mean in a distributed multi-user system?

Undo means that you can go back to an earlier state. For something like the above described undo functionality to work in a distributed environment naturally means that you need some form of transaction manager working across the involved systems. Of course, the overhead of having such a transaction manager running might outweigh the advantages of having transactionally safe execution. These things become evident once you go from theory to implementation. The perfect design rarely translates to a perfect implementation. -- Elias Martenson

That addresses distribution, but what about multi-user? What are the semantics of undo? Who can force the entire system to go back to which earlier states?

Consider, is the undo queue public property? There's no security in that. Is it owned individually? You have to deal with conflicts somehow. Is there something between public and private? Perhaps you can have ownership of undo queues match ownership of objects, so that an object's owner can undo what others (non-owners and owners alike) did with impunity.

And I don't think you need a transaction manager at all. If you can't undo your changes on another system, then this is equivalent to that system's owner forbidding you from doing so. A system can never be entirely owned by a foreign user. I guess the semantics of having multiple users makes most distribution problems irrelevant. Just consider a lightning bolt a superuser ....

If you want a transactionally safe distributed operating system, then I would presume you need a transaction manager. The reason is, that if you want to be able to do rollback, it's an all or nothing approach. Compare this to a database. The only person that is able to do a system wide rollback is the systems administrator, and he is only able to do it while the system is down.

Consider the undo functionality more of a type of backup rather than something that would be used all the time. The traditional rollback, however, could be very useful. Imagine being able to try out a program that you are not sure what it does by starting a transaction, run the program and then rollback.

Now, please note that I'm not saying that this kind of operating system really is the Killer Operating System. I'm just saying it's an interesting idea worth studying a bit further. Perhaps it's material for a page of it's own? -- Elias Martenson


For security reasons, it is necessary for the OS to provide memory which is not versioned. Passwords may survive a reboot only if there is a big shiny button a user may push to "delete all passwords from system NOW" when the cops come crashing through the door with a search warrant. And it might be better for that big shiny button to be the computer's reset switch.


Here is a transparent persistence problem that crops up occasionally.

Say I am using a transparent persistence framework such as Hibernate. Now, generally, I want to model my domain with POJOs that have no persistence semantics and use the framework to implement persistence of those objects. All well and good and easily implemented using Hibernate for typical models.

But say I have this interface for a model entity.

public interface Region { public Set getOrders(); public Set getOrdersBetween(int low, int high); }

getOrdersBetween(...) will select orders from the entire set that match the criterion. But, to do that in a POJO, I have to write search code into the application, e.g. an iterator that checks each order or a binary tree that is maintained by the Region implementer as orders are added to the relationship. But, any conceivable persistence mechanism that I eventually hook this to will have no trouble performing the selection directly. So, implementing the search in the POJO is a waste of effort and could easily be less efficient than using the selection facilities of the eventual RDBMS.

So, to avoid the problem, I either add persistence code into the Region implementer or add an appropriate finder to the associated DAO interface ( e.g. regiondao.getOrdersBetween(Region forRegion, int low, int high) ) and eliminate the business method from the interface.

Neither of these solutions feels right. The first violates transparency. The second moves the operation to a less logical object.

Is there a data access pattern that solves this?

Use a smarter "data aware set" that looks and behaves as a normal set in the domain, but when given a query such as between, generates an appropriate query and sends it to the DBMS allowing the illusion of clean domain code while letting SQL do what SQL does well. SQL has Select, no reason your Set/Lists can't have the same, in fact, why not more, just go look at a real collection hierarchy like Smalltalk. Consider a few basics at least standard on all your collections like say: forEach, find, findAll, findNot, collect, matches, anyTrue, allTrue, first, second, third, fourth, fifth, rest, and maybe a few others you find handy. It's not so difficult to make a SQL DBMS pretty much look like just any collection.

I like that idea. Of course, I'd still have to implement the non-passthru versions of those operations so that code could be tested without a DB. But if it was done in a standardized way (that is, implemented once :) ), it might not be so painful. Then, in deployment, non-passthru versions could be injected. But first, I'll check that there isn't already an available Java library that implements smart collections. The logical next step is to extend a persistence framework to implement those smart collections(e.g. generate SQL) or at least provide a standardized integration mechanism custom SQL implementation. It's possible that some of the persistence frameworks already do this so I'll look into it. -- Greg Wiley

Today I started experimenting with an Aspect Jay approach. I have found that I can use it to solve a couple of transparency problems in Hibernate: First, one thing that bugs me is the requirement that model classes contain a primary key field. For an in-memory object, its reference is its identity so adding a field for that purpose is anti-transparent in the pure domain model. My experimental approach to solving it in AspectJ involves static crosscutting and looks like:

public aspect H''''''asPrimaryKey { private interface hpk { } declare parents: model.*Impl implements hpk;

private Long hpk.key; public Long hpk.getKey() { return key; } public void hpk.setKey(Long key) { this.key = key; } }

Second, the problem I originally proposed can be solved with an advice that replaces finder methods in the model:

public aspect D''''''atabaseFind { O''''''rderDao orderdao = new O''''''rderDao();

pointcut ordersInRange( Region region, int low, int high ) : execution( protected Set model.R''''''egionImpl.findOrdersInRange(..) && args(low,high) && this(region);

Set around(Region region, int low, int high ) : ordersInRange(region, low, high) { return orderdao.findRegionOrdersInRange(region, low, high); } }

Now, what I'd really like to do is to use a wildcard in that pointcut and have the advice base its DAO method selection on a decoding of the join point's method name. But I don't think that's doable in Aspect Jay. I'll keep looking. Meanwhile, all the finder methods have to be entered into the aspect manually (some typing could be saved by using anonymous pointcuts). -- Greg Wiley

I came up with a way to do the wildcard substitution above. The code is long so I'll just describe the approach.

establish a convention for protected finder helper method signatures in the model. e.g. model.Impl.findBy(...)

establish a convention for DAO finder method names. e.g. dao..findForBy( e, ...)

in an aspect, define a pointcut that selects execution of the finder helpers. e.g. pointcut finder() : model.*Impl.find*By*(..)

in the same aspect, define an advice that uses thisJoinPoint to

parse the method name to obtain the entity, and target

obtain entity instance, method arguments, and return type

select the appropriate DAO

produce a method name conforming to the DAO finder convention

reflect on the DAO to discover the finder method with the appropriate signature

call the DAO's finder and return the result

I am increasingly of the mind that Transparent Persistence Is Impossible. -- Moe Aboulkheir


Moved issues on definition to Persistence Definition


Using database tools, one stops thinking in terms of files. When doing most of the processing in collection-oriented systems, such as Ex Base, one tends to see a pattern of a distinction between "cataloged" info and non-cataloged. Cataloging means that one explicitly gives something a name and/or identify, which may be a "path". I've used a lot of "temp tables" which are not assumed to be "kept" and are treated more or less like locally-scoped variables. I've used other non-DB systems, such as statistical packages, that have "session" collections that are only "saved" if you explicitly go through a "save" step of some kind, or pre-designate the collection as a cataloged collection. When using a file system explicitly, the processing of "saving" is generally the cataloging step. It's not so much about disks, but about the human action of "cataloging". Temporary collections may also use disk, but it's a matter of having some kind of "index" or registering step so that it can be found again later. -t

Consider a different User Story: you power down your computer in the middle of an Ex Base process that involves a lot of "temp tables". When you power up your computer again, the Ex Base process fires up and continues that process right from where it left off (or close enough that you can't tell the difference). You didn't need to restart that operation. You didn't need to rebuild those "temp tables" from scratch. You didn't even need to give those 'temp tables' any names for an index or catalog. This would widely be considered Transparent Persistence of process (likely provided by the Operating System) despite the fact that those tables and the whole Ex Base operation are 'volatile' compared to the other processes.

But you may have that with everything if you had something similar to bubble memory. Powering down wouldn't rid the variable stack and program position pointer. However, there may be operations in an undetermined state if the CPU suddenly stopped due to loss of power. To get around such we'd need some kind of A.C.I.D. system.

Why do you state that as an objection? There is no known fundamental reason that we cannot have widespread Transparent Persistence. There are many reasons we don't widely have Transparent Persistence, of which you name just one: consistency issues. Others include: recovery of networked services, and dealing with upgrade (if everything is persistent, then you are essentially forced to treat all upgrades as occurring at runtime, which requires we revise our software distribution models), and achieving all this without sacrificing performance or security. That's a lot to bring together at one time, and it would help a lot to have support from the hardware (Bubble Memory, MMU), OS, language implementations, language and service design (idempotence, Live Programming), and network protocols. I will note that Atomic Consistent Isolated Durable is stronger than necessary: for persistence we don't care about 'atomic' or 'isolated'; we do want 'durable' and 'consistent'. 'Consistent', in this case, means that the power disruption is either invisible to the program or has well-defined semantics. You might not be willing to pay for the atomicity or isolation properties, though they may be useful for other purposes.

Our software architectures have reflected hardware architecture for so long that we simply haven't explored alternatives sufficiently. But "work-space" versus "storage" is a common idiom in both manual work and the human brain.

Transparent Persistence helps separate the logical lifetime from the physical power-cycle. Logical organizations of storage vs. workspace are still very useful for understanding system behavior under concurrency, security, system modularity, extensibility, and various forms of partial failure. We still benefit from understanding temporary variables/tables as distinct from Blackboard Metaphor or Publish Subscribe Model or Data Base or Tuple Space or other high-level organizing principles for code and data.


See original on c2.com