Common sense software engineering – Part III: Risk analysis
In the third part of his common sense software engineering series, blogger Steve Naidamast takes us through risk analysis and the techniques you’ll need to estimate risk exposure, including a handy a risk exposure calculation.
The following piece describes a process for performing “Risk Analysis”, also known as “Risk Management”. What the reader will find is that contrary to popular development paradigms, true software engineering practices require quite a bit of upfront analysis for new project development as the prior piece on Requirements Analysis demonstrated.
In the frenzy of so called “new development environments”, many technical managers as well as professional developers have attempted and still are attempting to find techniques that will allow them to avoid such in-depth processes and yet still create quality software deliverables. No matter how much marketing, PR, and other technical propaganda is thrown over the issue of quality analysis, without it, quality will never be part of the end result.
On another note, the reader will note that Steve McConnell and other software engineering analysts of years ago are relied upon for this work as has been the same with the other articles. Steve McConnell’s 1996 classic, “Rapid Development”, to this day has never been refuted and is still in fact being corroborated by subsequent studies in this arena. As a result, for many quality business technical personnel, it is still considered the Bible of software engineering.
An introduction to Risk Analysis
One of the most important aspects of the management of any software project, large or small, is the management of risk. Risk in terms of project management is any situation that either prevents a project from reaching a successful and timely conclusion or interferes to delay a well-run project towards that same conclusion.
Overwhelmingly, IT project management ignores this very crucial aspect of software development. As a result, project development schedules are affected negatively with the same managers putting pressure on their staff to make up for lost time, when it should have been properly planned for.
Every software project is faced with risks of all types and severities. In fact, any endeavour is faced with similar potential obstacles to completion. If you attempt to climb a mountain you risk the possibility of breaking a limb and in the more treacherous climbs, your life.
Even simply commuting or driving to work in the morning presents commuters with any number of potential risks from simple schedule delays to derailment. Though such situations tend to have low levels of occurrence they do in fact happen.
Every software project when initiated always begins with one primary risk to timely completion; that of technical support for setting up development resources when needed as well as moving the completed project to production. Though many companies have extensive technical support infrastructures, their response to development requirements are always measured against the needs of the production environment making all such requests difficult to measure against for scheduling constraints. Production support is somewhat better. However, here too, like development, it is also dependent on who is assigned to your task and their experience level.
By no means does this mean that every project will experience technical support delays but it can happen and must be planned for accordingly. Planning for project risk usually falls into one of five categories with the most credible and most valuable finding itself relegated to the bottom of the heap:
- Crisis management: “Firefighting” – addresses risks only after they have become problems
- Fix on failure: Detect and react to risks quickly but only after they have occurred
- Risk mitigation: Plan ahead of time to provide resources to cover risks if they occur, but do nothing to eliminate them initially
- Prevention: Implement and execute a plan as part of the software project to identify risks and prevent them from becoming problems
- Elimination of root causes: Identify and eliminate factors that make it possible for risks to exist at all
Planning for the elimination of potential risk factors is by far the best cost and time saving process a software project manager can possibly infuse into his or her planning of a project. To ignore this most valuable asset for any other level of risk management is simply to increase the gambling quotient against a project. Here, a project manager begins to enter wishful thinking syndrome as he or she gambles against the chance that a risk factor will not crop up that can’t be handled properly.
Risk management, outside of schedule estimation, is the most complex issue facing software project managers. Risk management, also known as risk assessment, requires quite a bit of up front effort prior to the beginning of any technical aspects of a project. As such, it is an effort that most managers would prefer to avoid and the still high numbers of project failure in the industry demonstrate this.
This effort can then be broken down into the following components which the following chart demonstrates:
The most common project risks also fall into the classic mistakes category and can be summed up in the following list:
- Feature creep
- Requirements or developer gold-plating
- Shortchanged quality
- Overly optimistic schedules
- Inadequate design
- Silver-bullet syndrome
- Research oriented development
- Weak personnel
- Contractor failure
- Friction between developers and users\customers
Risks to a successful project life-cycle can come in all different sizes and shapes making software development one of the most risk-adverse technical endeavours. Some observers who have studied the field believe that gambling in a Las Vegas casino offers better odds at success than those currently proffered for the completion of a project successfully.
As a result, any project that has no risk management planning implemented is guaranteed to fail and those that don’t would have simply succeeded by sheer luck. However, it is surprising to find out just how many project managers who have succeeded with a few small projects tend to use them as their yardstick when planning more complex and difficult projects.
It is also just as surprising to see technicians come out of very difficult development environments convinced that the arduousness of the scheduling and the monumental stress are actually part and parcel to successful development stratagems when instead such environments are simply examples of poor planning, poor scheduling practices, and poor management.
Though not listed in the preceding group, stress is actually a terrible risk to quality software development and has been found to be a major factor in high rates of defects; sometimes up to 40 percent of errors are due to the stress levels placed on developers.
Even so, many managers and even developers have come to believe that the risk-factor of stress is simply part and parcel to the software industry and so much so it is not even considered as a risk-factor when planning projects. Since management likes to portray high stress levels with a reduction in costs and development time many technicians have come to believe that without such working conditions projects cannot be completed on time. However, just the opposite has been found; stress instead increases cost and the time it takes to complete a given project and for a variety of reasons. For example, high levels of stress reduce developer creativity, lowers morale, lowers the ability to concentrate, and simply fatigues technicians to a point where they cannot produce optimally (see chapter 9.1, “Overly Optimistic Scheduling”, Rapid Development, Steve McConnell – 1996).
Most such stress comes from schedules determined by management without any understanding of what they are asking. And again we can turn to the financial organisation in the trading room where we often find stress levels that are so enormous that most often technicians leave after around 12 to 18 months. High turnover, as a result, drives up the costs to such companies and their project development dramatically but few give it any thought and in fact even plan it into their budgets.
Despite the reputations of the quality of the technicians that work for American financial organisations, such organisations that do engage development with high levels of stress do not turn out high quality products because they simply can’t.
Excessive or irrational schedules are probably the single most destructive influence in all of software (Jones – 1991, 1994).
To get an idea just how many risk factors are involved in software development beyond the few already mentioned, please take a look at the detailed risk-factor chart that follows:
|Risk type||Risk detail|
|Schedule creation||– Schedule, resources, and project definition have all been dictated by the customer or upper management and are not in balance
– Schedule is optimistic, “best case” (rather than realistic, “expected case”)
– Schedule omits necessary tasks
– Schedule was based on the use of specific team members, but those team members were not available
– Cannot build a project of the size specified in the time allocated
– “Product” of the project is larger than estimated (in lines of code function points or percentage of a previous project’s size)
– Effort is greater than estimated
– Re-estimation in response to schedule slips is overly optimistic or ignores project history
– Excessive schedule pressure reduces productivity
– Target date is moved up with no corresponding adjustments to the project scope or available resources
– A delay in one task causes cascading delays in dependent tasks
– Unfamiliar areas of the product take more time than expected to design and implement
|Organisation and management||– Project lacks an effective top-management sponson
– Project languishes too long in fuzzy front end
– Layoffs and cutbacks reduce team’s capacity
– Management or marketing insists on technical decisions that lengthen the schedule
– Inefficient team structure reduces productivity
– Management review/decision cycle is slower than expected
– Budget cuts upset project plans
– Management makes decisions that reduce the project team’s motivation
– Nontechnical third-party tasks take longer than expected (budget approval, equipment purchase approval, legal reviews, security clearances etc
– Planning is too poor to support the desired project speed
– Project plans are abandoned under pressure, resulting in chaotic, inefficient development
– Management places more emphasis on heroics than accurate status reporting, which undercuts its ability to detect and correct problems
|Development environment||– Facilities are not available on time
– Facilities are available but inadequate (e.g. no phones, networking wiring, furniture, office supplies etc)
– Facilities are crowded, noisy or disruptive
– Development tools are not in place by the desired time
– Development tools do not work as expected; developers need time to create workarounds or to switch to new tools
– Development tools are not chosen based on their technical merits and do not provide the planned productivity
– Learning curve for new development tool is longer or steeper than expected
|End-users||– End-user insists on new requirements
– End-user ultimately finds product to be unsatisfactory, requiring redesign and rework
– End-user doesn’t buy into the project and consequently doesn’t provide the needed support
– End-user input is not solicited, so product ultimately fails to meet user expectations and must be reworked
|Customer||– Customer insists on new requirements
– Customer review/decision cycle for plans, prototypes and specifications are slower than expected
– Customer will not participate in review cycles for plans, prototypes and specifications or is incapable of doing so, resulting in unstable requirements and time-consuming changes
– Customer communication time (e.g., time to answer requirements-clarification questions) is slower than expected
– Customer insists on technical decisions that lengthen the schedule.Customer micro-manages the development process, resulting in slower progress than planned
– Customer-furnished components are a poor match for the product under development, resulting in extra design and integration work
– Customer-furnished components are poor quality, resulting in extra testing, design and integration work and in extra customer-relationship management
– Customer-mandated support tools and environments are incompatible, have poor performance or have inadequate functionality, resulting in reduced productivity
– Customer will not accept the software as delivered even though it meets all specifications
– Customer has expectations for development speed that developers cannot meet
|Contractors||– Contractor does not deliver components when promised
– Contractor delivers components of unacceptably low quality, and time must be added to improve quality
– Contractor does not buy into the project and consequently does not provide the level of performance needed
|Requirements||– Requirements have been baselined but continue to change
– Requirements are poorly defined and further definition expands the scope of the project
– Additional requirements are added
– Vaguely specified areas of the product are more time-consuming than expected
|Product||– Error-prone modules require more testing, design and implementation work than expected
– Unacceptably low quality and requires more testing, design and implementation work to correct than expected
– Pushing the computer science state-of-the-art in one or more areas lengthens the schedule unpredictably
– Development of the wrong software functions requires redesign and implementation
– Development of the wrong software interface results in redesign and implementation
– Development of extra software functions that are not required (gold-plating) extends the schedule
– Meeting product’s size or speed constraints requires more time than expected, including time for redesign and reimplementation
– Strict requirements for compatibility with existing system required more testing, design and implementation than expected
– Requirements for interfacing with other systems, other complex systems, or other systems that are not under the team’s control result in unforeseen design, implementation and testing
– Requirements to operate under multiple operating systems takes longer to satisfy than expected
– Operation in an unfamiliar or unproved software environment causes unforeseen problems
– Operation in an unfamiliar or unproved hardware environment causes unforeseen problems
– Development of a kind of component that is brand new to the organization takes longer than expected
– Dependency on a technology that is still under development lengthens the schedule
|External environment||– Product depends on government regulations, which change unexpectedly
– Product depends on draft technical standards, which change unexpectedly
|Personnel||– Hiring takes longer than expected
– Task prerequisites (e.g. training, completion of other projects, acquisition of work permit) cannot be completed on time
– Poor relationships between developers and management slow decision making and follow through
– Team members do not buy into the project and consequently do not provide the level of performance needed
– Low motivation and more reduced productivity
– Lack of needed specialisation increases defects and rework
– Personnel need extra time to learn unfamiliar software tools or environment
– Personnel need extra time to learn unfamiliar hardware environment
– Personnel need extra time to learn unfamiliar programming language
– Contract personnel leave before project is complete
– Permanent employees leave before project is complete
– New development personnel are added late in the project, and additional training and communications overhead reduces existing team members’ effectiveness
– Team members do not work together efficiently
– Conflicts between team members result in poor communication, poor designs, interface errors and extra rework
– Problem team members are not removed from the team, damaging overall team motivation
– The personnel most qualified to work on the project are not available for the project
– The personnel most qualified to work on the project are available for the project but are not used for political or other reasons
– Personnel with critical skills needed for the project cannot be found
– Key personnel are available only part time
– Not enough personnel are available for the project
– People’s assignments do not match their strengths
– Personnel work slower than expected
– Sabotage by project management results in inefficient scheduling and ineffective planning
– Sabotage by technical personnel results in lost work or poor quality and requires rework
|Design and implementation||– Overly simple design fails to address major issues and leads to redesign and reimplementation
– Overly complicated design requires unnecessary and unproductive implementation overhead
– Poor design leads to redesign and reimplementation
– Use of unfamiliar methodology results in extra training time and rework to fix first time misuses of the methodology
– Product is implemented in a low-level language (e.g. Assembler) and productivity is lower than expected
– Necessary functionality cannot be implemented using the selected code or class libraries; developers must switch to new libraries or custom-build the necessary functionality
– Code or class libraries have poor quality, causing extra testing, defect correction and rework
– Schedule savings from productivity enhancing tools are overestimated
– Components developed separately cannot be integrated easily, requiring redesign and rework
|Process||– Amount of paperwork results in slower progress than expected
– Inaccurate progress tracking results in not knowing the project is behind schedule until late in the project
– Upstream quality assurance activities are shortchanged, resulting in time-consuming rework downstream
– Inaccurate quality tracking results in not knowing about quality problems that affect the schedule until late in the project
– Too little formality (lack of adherence to software policies and standards) results in miscommunications, quality problems and rework
– Too much formality (bureaucratic adherence to software policies and standards) results in unnecessary, time-consuming overhead
– Management-level progress reporting takes more developer time than expected
– Half-hearted risk management fails to detect major project risks
– Software project risk management takes more time than expected
Estimating risk exposure
Estimating risk exposure is a rather subjective form of analysis but nonetheless must be performed in order to be able to understand the severity of risk factors to a project. However, even subjectivity can be made more accurate by the use of basic risk exposure analysis.
Risk exposure analysis is comprised of two estimations; that of the size of potential loss (in time) from an identified risk-factor and the corresponding probability that the loss will actually occur. A simple formula is standard in the industry for determining a risk-exposure calculation: (% probability of loss) * (size of loss in weeks) = risk exposure factor.
For example, if we were to use any development project at one major financial organisation that the author worked at a number of years ago as a baseline, we would have to assume that there is upwards of a 75% probability of some impact on a project by delays from technical support. If we assume a conservative estimate of a loss in time of approximately 4 weeks than the risk exposure would be .75 * 4 = 3 (weeks).
The result is a risk exposure factor of a possible loss of 3 weeks. Since we are only estimating a 75% probability of this occurring we are then not expecting to lose the complete 4 weeks in time.
Estimating the size of a loss is much easier than doing the same for the probability of a loss. As a result, there are several possible activities that can be performed to more accurately determine this part of the equation.
- Have the person most familiar with the system as well as the development environment and political infrastructure estimate the probability of each risk and then convene a risk-estimate review.
- Use the Delphi approach where each project team-member estimates each risk-factor individually. Then convene risk-estimate reviews to discuss and determine the most likely probability of each risk-factor until the entire team is satisfied with a final analysis.
- Use simple betting analogies with personally significant amounts of money. For example, “If the facilities are ready on time you would win $125.00, if they are not you would lose $100.00. The risk probability then would be the dollar amount on the downside divided by the total dollar amount at stake (100.00 / (100.00 +$125.00) = .44. Paraphrased from “Rapid Development” Steve McConnell – 1996)
- Use adjective calibration where each team member would select a risk level in terms of a verbal scale of phrases from “highly likely” through “highly unlikely”. Then convert the verbal assessments to quantitative assessments (Boehm – 1989).
These methodologies may look rather unscientific. However, such estimation initially has no basis in fact so such a process would be only a “best estimate” at the time this is accomplished. Estimating the effort and time for a complete project life-cycle is theoretically impossible at initiation unless there is a good amount of metrics available from previous projects that allow a new project to be compared with the measurements of a similar project that has already been accomplished.
Once a list of risk factors have been determined along with their loss-probabilities, loss sizes in weeks, and risk exposure (RE) percentages, a sorted listing based upon the RE should be laid out for planning purposes. A sample of such a listing can be found below:
Example of a prioritised risk assessment table
Setting up the table in the above sorted manner produces a listing of risks that are ordered in terms of exposure, the greatest amount of exposure topping the list. If the top five risks in the above table were to be planned for successfully, 9.8 weeks of scheduled overruns could be eliminated.
The table above also provides information on those risks that could be considered low priority as a result of their minimal chance of occurrence. By documenting such information in such a way then, time wasted on planning for low-exposure risks is also eliminated.
It should be noted that risk assessment as the chart above demonstrates is only a subjective estimation of the possibilities that can affect the outcome of a project. Because such an assessment is performed in a subjective manner the accuracy is completely dependent on the quality of the input given to it. The more effort on both the part of the project manager as well as the team given to this process, the more likely the resulting assessments will be within the assigned time ranges thus providing a clearer understanding as to how much effort could be saved by proper planning for such project delays. However, it cannot be stressed enough that no matter how well thought out and performed, project risk assessment will always remain a best estimate scenario.
The question then arises after such a process has been completed is how one goes about managing such risk factors. If we were to apply risk resolution to the top 10 risk factors (see page 32) that could affect any software development project then the common approaches and recommendations for controlling such problems can be found below.
Means of controlling the most common schedule risks
- Use customer oriented approaches
- Use incremental development practices
- Control the feature-set
- Design for change
- Scrub requirements
- Timebox development
- Feature-set control
- Use of staged-delivery
- Use throw-away prototyping
- Design to schedule
- Allow time for QA activities and use QA fundamentals
Overly optimistic schedules
- Use multiple estimation practices, multiple estimators, and automated estimation tools
- Use principled negotiation
- Design to schedule
- Use incremental development practices
- Use an explicit design activity and schedule enough time for it
- Hold design inspections
- Be skeptical of product claims
- Set up a software tools group
Research oriented development
- Don’t try to do research and maximise development speed at the same time
- Use risk-oriented life-cycle
- Staff project with top talent
- Recruit and schedule team members long before project begins
- Check references
- Assess contractor’s ability before hiring
- Actively manage relationship with the contractor
Friction between developers and customers
- Use customer-oriented practices
This article first appeared on Tech Notes, Black Falcon Software’s technical articles for .NET Development.