These two books present problems associated with the related areas of risk and safety in the context of the increasing use of computers throughout our society.
Safety concerns in computer systems are even more confusing than normal. Such systems consist of many subcomponents which are tightly coupled and have highly complex interactions. There are of the order of 1020 unique end-to-end paths even in a moderate-sized program. Standards for safety-critical systems, where lives are at risk, can demand system failure rates of 10-9 per hour, or even lower. However, software testing only appears to be able to achieve failure rates of around 10-4 per hour. Sometimes, and worryingly, hazard analysis assumes a failure rate of zero in software components. Software does not wear out in the way that hardware does, but normally contains bugs which can potentially take many years to occur in practice.
The use of computers within safety-critical systems has increased by around an order of magnitude over the past decade and this rate of increase is set to continue. Perhaps surprisingly, and fortunately, computers have been directly responsible for relatively few deaths so far, compared to car accidents, for example. The Therac-25 radiation therapy machine (covered in both books) is often cited as a prime example of a set of fatal overdose incidents where incorrect software was a major contributing factor; but even in this iatrogenic example, only six people were known to be affected since the problem occurred so rarely. This compares to thousands killed in non-computer-related disasters such as the Bhopal atmospheric release of toxic chemicals in India (covered in Safeware). However this situation could easily change as the use of computers increasingly permeates our lives.
Computer Related Risks documents many incidents and associated discussions reported in issues of the highly successful on-line RISKS electronic mailing list digest and its associated comp.risks newsgroup. This is moderated by the book's author, Peter Neumann of SRI International in California. Some of the material has previously been published within regular sections of the ACM Software Engineering Notes, and Communications of the ACM.
Founded in 1985, the RISKS forum has recently celebrated its tenth anniversary, and debate continues apace. It is one of the most highly read and professionally respected online discussions, with an estimated readership of more than 100,000 worldwide.This is in no small measure due to Dr Neumann's dedicated editorial efforts over the years. This book represents a distillation of the discussion, with the addition of Dr Neumann's expert insight and analysis. As well as safety, the book also covers the related (although distinct) area of security, where unauthorised access to information must be prevented. Issues such as reliability, privacy and human well-being are also covered. Although the book is technically oriented, the non-specialist can also glean much by scanning its pages. The index is excellent, and a glossary of terms is provided, making it a good reference book.
Safeware covers risk in the first part of the book. It then goes on to introduce the concept of system safety, followed by some important terminology and accident models. The final part, forming the majority of the book, presents a more personal (but very expert) view of desirable elements in the production of a "safeware" program. Central to the approach is hazard analysis, together with other techniques such as verification and validation. Professor Leveson draws on intimate knowledge and experience of real and relevant projects such as the United States Traffic Collision and Avoidance System (TCAS) for aircraft. The appendices provide fascinating and lucid accounts of a number of disasters in the medical, aerospace, chemical and nuclear sectors. While computers were not involved in many of these (although they were major contributing factors to some), lessons can be learnt by perusing the facts in each case.
Normally a whole set of events is necessary to cause a disaster. This is why disasters are so difficult to predict. These reports detail these causes in a factual way, providing useful and accessible case studies on how disasters occur. Normally the "safety culture" of the people involved is seriously deficient. If computers are involved, this can extend to the production of software, which is far more error prone that most other engineering disciplines, even at the best of times.
Both books provide a US view of the subject matter. A European perspective could be slightly different. Here, for example, formal methods, which apply mathematical specification techniques and reasoning to software, are more widely advocated for application in safety-critical systems. But they are only very briefly mentioned in each book. The references in both books are also somewhat biased towards US authors.
There is great concern for standards in safety-critical systems. These are changing rapidly, so any specific information presented in the books would date very quickly. Standards are only briefly covered in Safeware, and are not even in the index of Computer-Related Risks. Legal aspects are also only briefly mentioned in Computer-Related Risks. A fuller discussion of the issues involved in the introduction and application of standards could have been worthwhile.
In the United Kingdom, the draft Ministry of Defence 00-55 Interim Defence Standard has created quite a stir in industrial and academic circles by potentially mandating the use of formal methods in safety-critical applications, but you will find no mention of this debate here.
These books will quickly become classics in their respective overlapping fields of computer risk and safety. Computer-Related Risks is more accessible to the lay reader as well as being of interest to the specialist. It provides excellent background reading for Safeware, which is intended more particularly for professionals, both technical and managerial, be they practitioners, researchers or regulators.
The aim of these books is to educate and inform, rather than train or provide ready answers to the problems, which are unlikely to be solved satisfactorily in the foreseeable future. Both authors are foremost and outspoken experts in their field. All professionals and advanced students working in the area of safety-critical systems should read and learn from these books. I hope they will help to avoid some serious computer-related accidents in the future.
Jonathan Bowen is a senior research officer at Oxford University Computing Laboratory. He is about to take up a lecturer post at the University of Reading.
Author - Nancy G. Leveson
ISBN - 0 201 11972 2
Publisher - Addison Wesley
Price - ?37.95
Pages - 680