Software engineering research, its critics claim, is poorly validated and ultimately of little use to anyone. Darrel Ince assesses the evidence.
I am external examiner for the computing department at Strathclyde University. I thoroughly enjoy doing the job: it gives me an opportunity to get out of my own department to see what others are doing; I am regally entertained at Strathclyde; it is an opportunity to visit Glasgow, a city which has been transformed over the past decade; and it enables me to talk to electronic engineers.
The degrees at Strathclyde have a high electronics component and the department sensibly brings all its external examiners up at the same time.
Of the pleasures I have enumerated I find talking to staff who are engaged in a real engineering task the greatest pleasure: it is a novel experience for a computing academic to hear engineers talking about building electronic artefacts which are faster, smaller, consume less power or are capable of doing more than existing artefacts, and of their use of theory to accomplish this.
It is a novel experience because I very rarely hear software researchers doing the same thing. In the conversations I have had with electronic engineers at Strathclyde and at other universities where I have been an external examiner, I have felt some degree of discomfort in not being able to indulge in a similar discourse.
Usually I have comforted myself with the thought that what I have been doing is more theoretical. However, a number of events over the past year have convinced me that we have a real problem with software research, and software engineering research in particular.
The first event was the publication of a survey by the American researchers Tichy, Lukowicz, Prechelt and Heinz, of the computing research contained in 400 recently published papers from some very prestigious computing journals.
The question they were interested in was: do computer scientists, as evidenced by their publications, carry out proper validations of their work? In order to test this they examined the papers and judged them against some relatively mild criteria which covered validation.
They discovered that overall, 40 per cent of the papers published contained no evaluative component, even judged by the very generous criteria they set up, and that the figure was as high as 50 per cent for software engineering papers.
In order to counter the criticism that computing is a new subject and still groping its way towards some degree of respectability, they used two new subject areas for comparison.
These were optical engineering and neural computing - two subjects which have a very short pedigree. Of the papers in neural computing 88 per cent contained a validation of the research while for optical engineering it was slightly lower at 85 per cent.
My own experiences have been similar. After I read the Tichy paper I examined about 30 papers drawn randomly from a major British journal on software engineering. The results from my limited sample were even more depressing than the Tichy results: I felt that only two of the papers had done an adequate job of validation while six more had done a partial job.
This is also reflected in many of the grant proposals that I am asked to referee for bodies such as the Engineering and Physical Sciences Research Council.
The second event, or rather series of events, that has convinced me that we have some major problems with our software research is the increased incidence of academic computer scientists worrying about the small amount of technology transfer that has occurred from our subject - something I believe is closely linked to the issue of validation. The two most dramatic examples of this occurred recently.
David Parnas is regarded as one of the world's foremost academic software engineers. At the 17th International Conference on Software Engineering Parnas shocked the assembled delegates with a speech which effectively pointed out that software engineering research had contributed little, if anything, to industrial software development.
The conference, the largest and most prestigious gathering of software engineers in the world, has the tradition of looking back ten years and selecting papers presented at previous conferences that have stood the test of time. Parnas examined the papers selected by the conference as the best in the year for seven years - two of which were authored by him - and made an effective case that these publications, which represent the peak of software engineering research, had achieved nothing: the techniques, ideas and methods detailed in these papers had passed industry by.
A second equally shocking admission was made by Tony Hoare, an Oxford academic who is generally recognised as the foremost authority on mathematical or formal methods for developing critical systems. In addressing a workshop on failure in safety critical systems, he stated that software developers seem to have coped quite well in developing safe systems without using the results of research into formal methods, a claim backed up by Donald Mackenzie of Edinburgh University who, in reviewing system failures which had caused death, discovered that the vast majority arose not out of system failures but from human factors.
Both Parnas's and Hoare's testimonies can be backed up by my own experience of British industry. For the past ten years I have given seminars on technical subjects to industrial staff. At the beginning of each seminar, in order to find out what mix of people are attending, I give the participants a short questionnaire in which they are asked if they have heard of a particular advanced topic such as object-oriented programming and whether their company uses it extensively. After giving this questionnaire to over 400 industrial staff I stopped the practice because it had an increasingly depressing effect on me. I usually listed about a dozen research topics: the vast majority of participants had never heard of them, with a minuscule number ever having experienced a project which used them. For example, only one participant had come across a project which used formal methods of software development, and that was a non-profit making research project that had been mainly paid for by the Department of Trade and Industry.
What is even more worrying is that when software academics turn to developing software there is little evidence that they use their own research. I use quite a lot of software that has been developed in university departments. For example, two software systems which I and many tens of thousands of academic and non-academic users use, literally every day, are Lycos and TeX. The former enables me to scan the massive resources of the Internet to find information, and the latter is an excellent document processing system that enables me to print very high quality mathematical typography. Both are fabulous pieces of software which were initially developed by academics, but both have drawn upon virtually none of the research carried out by software engineers over the past two decades.
Backing for the sort of personal testimonies I have quoted in this article is now emerging via empirical studies of software companies. One thing which should worry all computer scientists is that those studies which have been carried out on successful, money-making software companies have indicated that such companies do not seem to draw upon research.
In a study of the way in which Microsoft has become a massive force in software development and marketing, Michael Cusunamo, an academic at MIT, listed six principles of success that apply to Microsoft. All of these principles go against the orthodoxies with which we as academics have permeated both our teaching and research.
Eric Karmel's research on successful package developers seems to indicate that the best way to make money initially out of software package development is to let some talented hackers work with a good marketing department and not worry about academic concerns such as software development methods, specification notations and design methods.
Obviously there are exceptions to this lack of technology transfer, but there are so few of them that we as software engineering academics must start to feel some degree of concern.
There are a number of potential explanations why software engineering research is not being exploited in the industrial world. For example, one possible reason is the timid investment policies of companies. I believe that many of the ills of British companies would be reduced if we started regarding accountancy as a crime rather than a profession, and recognised that the accountants who run British companies are very good at telling us about the past, but pretty hopeless at formulating policies about the future.
However, this could just be a local effect: American companies have pretty good capital investment records and yet most of the evidence of lack of technology transfer comes from American sources.
Rather than debate the reasons for lack of technology transfer, something which an article of this nature is not really fully equipped to do, it is perhaps worth focusing on validation as a precondition for technology transfer: if we as academics do not validate our work in engineering terms, then we should not expect industry to accept it. If we improve our validation, then at worst we can feel a sense of moral superiority that we have now done everything required of us and that the faults are purely industry's faults: and at best we might just be more successful in convincing some of the computing industry that what we do is worth adopting.
How can we as academics start the process? First, I believe that there are now too many journals. If you are asked to serve on the board of a new computing journal you should refuse to do so, unless the publishing company launching the journal closed down at least one other journal. There is now an insatiable demand for computing articles from too many journals, which can only be satisfied by relaxing some of our critical faculties. Many of the areas which are growing in computing can quite adequately be covered in existing journals.
Second, the funding agencies should insist that at least half of every bid for cash should address the topic of validation. Third, we should all be tougher referees. No paper should leave our hands with an acceptance recommendation without the author addressing the nature of his or her research - whether it is a scientific or engineering piece of work - and devoting major parts of the paper to explaining why it should be regarded as a proper contribution to scientific or engineering literature.
It is certainly not easy to carry out validation, but it is not impossible. One area where I find it frequently carried out is in PhD theses - often it is done qualitatively but it is still done. Successful PhD students do it because they have usually had it drummed into them by their supervisors that there is such a thing as a scientific method or an engineering research method, and they will be examined on that in their thesis. We as computing academics should be saying precisely these things to our colleagues and adopting measures such as the three above in order to establish the same type of culture as exists with computing PhDs.
I have outlined a selection of palliatives. What is perhaps needed as a first step is a re-examination of the nature of computing research. I believe that the problems that we have in computing research emerge from two sources: first there is a lack of focus on what our subject is about: whether it is a science, a branch of mathematics or an engineering subject; second, there often seems to be a haziness about what exactly is meant by scientific or engineering method.
Perhaps to start the debate it is worth stating my view that the "theoretical" research that is carried out in computing, since it involves the use of mathematics in analysing man-made systems, is applied mathematics, and that everything else we do is engineering; that just because we invented the term "computer science" in the past to gain academic respectability, it does not mean that we are scientists and what we do is science.
I believe that this is not a popular message, since the standards of validation used by applied mathematicians, and researchers such as those working in electronic, civil and mechanical engineering, are immensely tough. However, it might just get us away from the "Look at this, I think it's a good idea" Tinkertoy attitude that bedevils current computing research.
Darrel Ince is professor of computer science at the Open University.