Use Tracing

Distributed embedded multi-threaded exception-throwing code can be mighty hard to understand with a debugger. Sometimes the debugger misbehaves. Sometimes you misunderstand what happens when someone catches or fails to catch an exception. Sometimes you get confused about threads. Sometimes you can't attach a debugger to a remote process. Sometimes you can't get things synced up. Sometimes you have a bug that occurs only when the system is deployed and under load.

Therefore,

Most languages will either trace for you or permit you to build little tracer objects to stick in your code. You can construct these so that they will be compiled out and/or turned off when they're inconvenient. Ideally, what you want is to be able to turn them on selectively without recompiling. You want to be able to turn them on in a particular thread, in a particular object, at a particular severity, in a particular module, or in a particular method. You also want to be able to specify tracing for all threads, all objects, all modules, all severities or all methods. You want the tracing output to be mutex'ed and/or synchronized so you can read it. And you want to be able to redirect it to a particular client, file, or database.

You want to be able to trace method entry and exit. You want to be able to leave customized trace statements in the code (remember they'll get compiled out when you don't want 'em) and associate them with severity levels. Perhaps you'll even use some Purify/Quantify style external tracing to characterize memory leakage and performance issues. This is all in addition to any assert statements you might think to build in.

Most importantly, you want to build all this crap into your code from the get-go. It'll do you no good to try to retrofit it after the fact - too much stuff going on in too many places for that. Where you are tempted to throw in a little debug printf as you develop, don't just pull it back out again - you never know when it'll be relevant again. Turn it into one of these fancy trace statements and leave it there. You'll be very glad you did. -- Peter Merel


For a useful pattern for implementing tracing, see [1] from PLOP2.


I don't agree that you should build it into your code from the beginning. That leads to masses of output which takes forever to understand. With a few exceptions, trace statements should be put in on the fly, directed to the task at hand, to show information that you know will be useful. It doesn't take long to write a trace statement and recompile, not if you're doing it right. (At least in my environment.)

That does mean you must acquire or build a good trace engine. Mine can give activity from different threads different indentation, and time-stamps each line in milliseconds, which is sometimes useful. I have a routine called "Trace.here()" which adds the current source file name and line number to the trace log - I can paste in dozens of these very quickly to see where the code is getting to, and each makes itself unique without further editing.

Also, it's worth getting objects to print themselves nicely. In Java this probably means overriding toString(). (If nothing else, the default version is too verbose - it includes the entire package name, like "uk.co.bhresearch.Pixie." which too much text.) However, you should again feel free to edit the code to print just the information that you need. A print routine that's useful for one bug may not be best for a different one 6 months later.

Why try to parameterize this stuff with options and switches, when you can edit the code on the fly?

Because you often want the capability of having a customer throw some switches based on the symptoms they're seeing, so that you can get a log file that's small enough to email. Looks like a database problem? Then don't trace GUI events.

Every successful project I've been on the the past decade has built in tracing from day one. By exercising judgement, and by doing any on-the-fly debugging code in private builds, we've usually avoided the mass-of-useless info problem.

Having seen how beneficial this can be both in problem turnaround and in maintaining a healthy relationship between engineering and tech support, I strongly recommend planning for traceability from the outset. -- Dave Smith

Seconded. One of the nicest examples I've seen is in ILU, an ORB from Xerox PARC (ftp://parcftp.xerox.com/pub/ilu/ilu.html). I had an awful problem finding a memory smash in their multi-threaded C++ version ages ago and simply could not have found it without excellent tracing. As a user, it would have been awful to have had to read through all the ILU code to understand where to add my own tracing - to say nothing of the delayed turnaround while recompiling.


One nice Java trick that may be appropriate for some situations is to make your trace engine an output stream and redirect System.err and/or System.out to it. I'm using that approach in the server side of a distributed app. The nice thing is that "vanilla" code is giving me nice timestamped-to-the-millisecond messages. And by using the Composite pattern, I get the messages in a command prompt window and saved in a file. -- Kiel Hodges


Why try to parameterize this stuff with options and switches, when you can edit the code on the fly?

In distributed code editing/recompiling can be less trivial, as it may require compilation/installation on multiple machines; there can be similar issues with embedded code, and distributed embedded code can compound that problem further. With distributed code one is often only actively working on one part, with the rest of the larger system running production code (but tracing in those components is helpful too).

There's no question that selectively editing in tracing during active development/debugging can be helpful, but a solid infrastructure, always in place, allowing selective tracing, can be invaluable. Plus, in such complex environments this sort of tracing can help solve problems even in the field - you may think you'll compile this stuff out for production, but that may not be a good idea in practice. (We have a three-level system: all tracing compiled in, no tracing compiled in, only non-"development" tracing compiled in; in practice we've hardly ever built other than the first). -- Jim Perry


Having tracing code in shipping applications also can help isolate environmental problems outside of the app. If you've ever had a customer swear up and down that "it's not working now and I didn't change a thing!" you'll appreciate the value of having the ability to turn on a trace feature to record versions of DLLs, OS version info, checksums of configurations files that the user shouldn't touch, etc. -- Dave Smith


OK - I'm convinced. I hadn't realized you were talking about a shipped product, in the field. In fact, the mention of a debugger, and compiling out, suggests you are not in the field at all. Can someone edit the opening statement to make the context more clear? -- Dave Harris


There is a lot of confusion regarding the difference between development and debugging, especially with regards to logging errors and other messages. For argument's sake, let me define the two. Development logging means pre-release and debug logging means post-release. The two can be very different in that developers log lots of stuff during development, but when problems occur in production systems, a consistent cohesive set of messages needs to be generated. -- Georgia Mc Namara


The Lisp TRACE macro is nice, too. -- Simon Heath


See original on c2.com