Xp Challenge Compilers


From Extreme Programming Challenge ... Here is a scenario that takes XP out of its element. How would we do XP in this circumstance?


Scenario 1:

The phone rings.. it is the development manager of a startup company that wants to make an E++ development suite for a new platform. E++ is a new language. No one at the company has written a compiler for it. Time to market is critical. You have six developers, three of which have extensive compiler writing experience. Two of them have programmed extensively in E++ throughout its definition stage. The compiler must be correct and it must outperform the stats of several competing products which are under development at other companies. Correctness and speed of compilation are critical issues. The ISO E++ standards committee has just created the final standards document set for E++, in record time, by adding 2 to each character in the C++ standards documents. The team is charged with making the compiler portion of a development tool. No UI, just raw text input and code generation. The ISO committee is not available for User Story sessions. The company wants you to come in and show their experienced compiler hacks how to make a compiler the XP way.

Can the spirit of XP be drawn in here? How?


I'm changing my mind. I think this project has set its expectations too high (outperforming existing products in compile time and compiled code performance out of the box). If that's what is intended, this project needs more than XP, it needs reasonable objectives. It is possible to read Alistair's unilateral reduction in requirements as a similar response. If this is a Death March waiting to happen, this XPer wouldn't sign up. The point of XP is to win, not die bravely. -- Ron Jeffries


The emphasis on User Story is misleading in XP writeup (imho, and let me know if I'm off base, XP wiz). Instead, use a card for every piece of deliverable functionality. I am not a compiler writer, but I can guess that the compiler grows by what pieces of the language it compiles, and to which platforms it delivers. Create a delivery card (my substituted word for User Story) for each chunk of the language and each output form, for the various levels of error message reports, etc. Now you have a large pile of tasks, upon which you can work Extreme Planning to build release envelopes. Regression testing is regression testing, and easily applies to compilers. Teaming issues I leave the XP wiz to recommend. -- Alistair Cockburn

Power-sharing between user and developer is a cornerstone of XP. I personally believe that the techniques can be used in situations where the stories have come from some other source, even the developers with another hat on, but the user component is key to a lot of the process: the planning interactions, and the determination of what is really meant by a story. Kent, what do you think? -- Ron Jeffries


I just have trouble believing that Do The Simplest Thing That Could Possibly Work is appropriate in this scenario. I suspect that production compiler development is a Black Art, like game programming and cryptography in which the faster algorithms and better approaches are not the simplest. Especially when market pressure is factored in. -- Michael Feathers

I've written commercial-quality compilers, operating systems, database management systems, and even a couple of games. (Why am I not rich?, he cried.) In all of these cases there were tiny places where the Black Art served us. As a Master Programmer with Lightning Crackling From Your Fingertips, you used the Art as often as you can. It bit us more often than it helped ... and it consumes mass quantities of that most important resource, time. It's good to be able to dig deep into the bag of tricks once in a while, but 90% of everything can, and in my experience should be,

simple. -- Ron Jeffries

Do The Simplest Thing That Could Possibly Work depends partly on knowing what will and will not work. For example, I would never write a compiler without separating out the parser as a separate component. Strictly speaking, this is not necessary, and there are probably some compilers for which it is simpler to combine it with everything else. However, I always like to think of the parser as a separate component, and there are lots of times where it is important to think that way. Reusing the standard design of a compiler saves time because it eliminates thinking, it usually works well, and it makes it easier for other people who know about compilers to read my code.

My point is that knowing more about the problem and having more experience with the problem will change what you think will work. So, it will change what you do. A compiler expert will have an outline of a compiler in his or her head and will ask lots of questions to find out the details. Someone who doesn't know as much will have to build a prototype and find out the hard way.

None of this argues against Do The Simplest Thing That Could Possibly Work. It just means that different people will interpret that differently.


For me, XP is 4 values (Communication, Simplicity, Testing, Aggressiveness) plus a slew of principles and practices (Pair Programming, Do The Simplest Thing That Could Possibly Work, Continuous Integration, Refactor Mercilessly, etc.). From what I've read, I would have trouble saying "XP and 30 people", or "XP and COBOL", but I don't see why "XP and compiler" causes any hardship. Still build the simplest thing that can work at any instant, still regression test, still pair program, still continuously integrate, still don't document outside the code (until you get to the user manual). Why would anyone think a non-interactive program runs against anything XP? Why are "users" as actual real-bodied program users an essence of XP? The word User Story is to me a placeholder for Unit Of Delivery and Balance Of Power: Who Sets Functional Priorities; the word User denotes Who Sets Functional Priorities.

So for the above scenario, the compiler, I read the XP challenge as - who sets priorities on what goes out in the first release of the compiler?


Yes. In this scenario the main issue is how to deal with a process that is not user driven. In some ways, it is simpler. You know what you need up front. It is task driven. Although it would help to have language experts play the role of user. Never saw a completely unambiguous spec. -- Michael Feathers


I would say that 'compile it all' is not a proper project management instruction to the team. I would reject that instruction, even without XP. I would ask the team, "what is the smallest thing we can deliver that will show us and our marketing team that we are delivering this beast?" I would then ask some unknown one, possibly my team, "What is the smallest usable compiler that people will pay money for?" With these two questions answered, I would create two internal delivery milestones, the first to measure our process and gain confidence, the second to give the marketing team a fall-back position (marketing people can always use such a one). -- Alistair Cockburn

I can see not making performance paramount in the first market release much more easily than implementing only 80-90% of a standardized language. But, I agree fully that many internal milestones would be needed and the team needs focus on each of those rather than the whole banana. I agree about the unreality of hitting #1 in the first iteration with a newly formed team. What the heck was I thinking? -- Michael Feathers


By the way, with this scenario I was really trying to get whether there is a formulation of XP which works with development that is not user-centered, and where Do The Simplest Thing That Could Possibly Work may be problematic. There are oodles of simple inefficient ways to make compilers. And it seems that Do The Simplest Thing That Could Possibly Work mitigates risk that comes from spec changes. And, we all know that languages never change their specs (haha) (see tying comment in Extreme Programming Challenge). -- Michael Feathers


Re: the simplest thing is "problematic". I worked with the great Jay Fenton, a fabulous game programmer. All those clever tricks you mention helped in maybe 10% of the cases he used them. The rest of the time the best I can say is that they didn't kill us, not quite. I am a reasonably experienced programmer, but every time I get away from Do The Simplest Thing That Could Possibly Work I get clobbered. I see no reason that compilers should be any different.

Re: balance of power. You certainly have conflicting goals here- time to market, performance, conformance, and probably optimization and integration with the rest of a programming environment. There is a role in XP that we call Customer, or sometimes just Business. The person in that role is responsible for deciding what to do first. If you asked me to coach XP1, I would want a person I could rely on to answer such questions as "we could fix this obscure bug or we could add this optimization, which should we do first?"


I agree that Do The Simplest Thing That Could Possibly Work is true for compilers, as well. My problem is that people seem to think that the simplest thing is obvious, where my experience is that it is often not obvious. One of the things that makes someone an expert is that they KNOW the simplest thing that will really work.

We all know that you don't prove that a program is correct by testing it. If reliability is important, you need a way of ensuring that other than by testing. (Testing is crucial, of course!) For example, how do you make sure that a compiler accepts exactly the grammar described in the specification?

The best way is to make the parser match the specification as closely as possible, and to inspect them to make sure they are the same. You can either have a parser generator that takes the grammar and automatically produces the parser (in which case you have to decide whether it is correct) or you use some technique like recursive descent parsing in such a way that any of your developers can compare the grammar and the parser and decide whether or not it is correct. Each production of the grammar can be a programming task, though a better task is a set of productions that don't reference any undefined nonterminals. A pair of programmers can write some test cases and write the code to handle the test cases as usual. But they also compare the code they write with the grammar to make sure that they can defend it. Later on a separate group (probably another pair) will inspect the entire parser to make sure that every production in the grammar is in the parser, and that the parser has nothing that is not in the grammar.

I think this is true of other parts of the compiler, as well. The simplest way to handle register allocation for most computers is to have most of the compiler assume an infinite number of logical registers. There is a register allocation phase that uses one of the graph coloring algorithms to assign the logical registers to physical registers. This technique is overkill if your machine has only a few registers, or if you don't really need to optimize register usage. But it is the standard solution if you need to optimize registers on a modern RISC or CISC machine.

Using a separate register allocation phase makes the rest of the compiler simpler, and the register allocator itself isn't all that complicated, but register allocators seem to be hard to make reliable. I think it is because it is hard to come up with a complete set of test cases. Maybe there are people who are able to write register allocators that are correct by inspection, but we had trouble with that. We kept having obscure bugs in register allocation until we wrote a small routine that would audit the output of the register allocator to make sure that two logical registers were never in the same physical register at the same time. Once we did that, we had no more problems.

There are straightforward ways of generating and optimizing code for a single machine that don't scale well if you have to support multiple machines. But it seems like this system only has to support one platform and that optimization is not important, just compilation speed. So, I'd treat the target like a stack machine and generate code for it that way, with some peephole optimization. That should be fast. Over half your time will be in scanning, over half of the rest of your time will be in parsing. So, if speed is the issue then I'd pick the programmers best at writing fast code and have them make the scanner fast. After it is right, of course. I'd make the scanner separate from the parser, as usual, so that the ugliness in the scanner would not affect the rest of the compiler. I'd first make the scanner and parser work, and then worry about producing code and making the scanner fast.

I'd also make a separate code generator. For a simple compiler, it is tempting to just embed code generation in the parser. This will be a bad idea later when you get told to add an optimization phase. I like for the parser to produce abstract syntax trees, in which case the code generator is a set of methods you add to the AST classes, or perhaps a Visitor. (I am often telling people not to use the Visitor pattern, but this is one place where it can actually make things simpler.) It might seem more complex to separate code generation, but it makes the system easier to test (you can test parsing separately from code generation), easier to understand (because classes are smaller and do one thing), and might even reduce code duplication. And it will make your life easier when they ask for code optimization! This might seem to be violating YAGNI, but I don't think it is.

The XP practices will work fine for writing an E++ compiler. However, I think there will need to be some other practices, such as comparing code to specification, as well as appreciating the fact that you sometimes must be an expert to know what simple things will work and which won't.


There is another interesting question here.. could someone end up at a phased compiler purely through refactoring, or is it something which requires forethought when you choose a System Metaphor? Everyone knows that phases are the way to go, but seeing what happens when they are not used and the target language changes might be a good thing to think about.

Side note.. I've only designed a compiler once.. actually a translator for a domain-specific structured language.. we wanted to be able to translate it to C++, C etc. I looked all over the place for information about object-oriented compilers. I even accosted Peter Coad at a training (this was around the time of the release of his OOA book) and asked him to model a compiler as an example. I could tell by the look on his face that he really didn't expect that one! I used the traditional phased approach because that is the way that everyone does it and nothing else seemed to make sense.

I wonder what it would have been like to refactor if I had said You Arent Gonna Need It to the abstract representation.

I've seen Kent Beck refactor a procedural 99 Bottles program into objects. I can believe anything. -- Ron Jeffries

A case of not knowing When To Stop Refactoring? -- An Aspirant


"There is another interesting question here.. could someone end up at a phased compiler purely through refactoring, or is it something which requires forethought when you choose a System Metaphor? Everyone knows that phases are the way to go, but seeing what happens when they are not used and the target language changes might be a good thing to think about."

I wrote a simple compiler when I was preparing the lab sessions for our compiler construction course. I deliberately tried to avoid introducing abstractions before they became obvious. I have the cvs repository on my home page, in case anyone wants to take a look.

You can see the compiler concepts emerging naturally. If you apply Once And Only Once to the code generation for evaluating expressions on a register machine, you naturally end up with three address code. If you did this for a stack machine, you'd probably come up with a stack based intermediate language.

Of course I had read The Dragon Book before I started programming, so that probably influenced the direction the refactorings took, but I think much of the refactoring came naturally (as they should). Some of the abstractions were a little different from those the dragon book. For example, I use a simpler abstraction than LALR parsers because I haven't had to pay attention to compilation speed yet.

See original on c2.com