Software Quality: 2008

Importance of Peer Review

The Software Maintenance blog entry on Peer Review outlines an example of bad code being eliminated during code inspection. This team has an advantage over average software development teams. They regularly conduct code reviews. It seems to happen for every bug they fix.

Sometimes it is not enough for software to perform correctly. You also want the implementation to be clean. Most software spends many years being maintained. It is not success if you hack together something that will cause a disaster later.

The goal of code/peer review is to improve the quality of the code. This may involve detecting subtle bugs. It may also mean that the code itself is improved. Once you get to that level, you have a better chance of producing that high quality code.

Build Issues

We are delivering the first drop of next year’s new functionality to internal test next week. So a configuration management guy was testing out our build scripts. Nothing seemed to be working. At first he did not have his database connection set up correctly (the build requires a connection to the database). Then there were some problems with the setup of Visual Studio directories. Next there were some bugs with the Visual Studio projects created by development. Finally the configuration management guy declared success.

So I finished up my work. The CM guy asked the whole team to verify his builds. I ran the install he produced for one of the applications I worked on. Nothing happened. Apparently the build did not work after all. He just ran the build script, looked in the log file for errors, and decided it must have been successful. Nice try guy. It is really not this dude’s fault. The development team writes the build scripts. Where were the build script people? They had either left for the day, or not even come in. That was a bit disturbing.

Me and another developer that have been around a while wondered how they would do the actual builds to send to test. I inquired with the team lead and software development manager. They said another CM guy would build off our development branch. I had thought the process was for CM to build to work off the CM branch in source control. Apparently the CM guy thought this was the case as well. I recommended we not try to do that next week. There were enough problems with just getting the build working off the development branch.

Once we got an application or two to build, me and the senior developer tried to smoke test the applications. There were a lot of problems found. We both are starting to fix these problems. I told the software manager that things seemed sketchy overall here. He told me I could go work with the CM guy when they do the actual builds next week. I guess my rewards for worrying about how things would work is that I get to go make sure things work. What a crock.

I am getting the feeling that we are going to be in for a tough transition to test. Where is the quality in this system? Is the fix to implement some better processes? Or do we need to get some people in charge who have a clue.

Lines of Code

Our configuration management team sent me an unusual request. They had identified all the files in our source code control system. They wanted to know which file types can be counted to arrive at a source line of code count. The idea was to tally up the numbers and give a report to our client. Now this task alone was not difficult. Any developer worth his salt knows which files are source files and which are not. Actually the CM guys should be able to figure this out. Does anybody really thing a bitmap (bmp) file contains any source code?

The real concern was why somebody wanted to count the lines of code. Such a metric is not evil in and of itself. But if you do not know what you are doing, you can become dangerous having such a metric at your disposal. Lines of code is a relatively unambiguous metric. Interpreting this metric can be difficult to fathom. For example, you might refactor your code and arrive at a lower count. Somebody looking solely at the count might think that you have then made negative progress toward some development goal. Or you might see that there are 20k SLOC which took 20 months to develop, and assume that a new 1k SLOC change will then take just 1 month to complete. You get the picture.

I do not like to second guess our CM team. They are pretty sharp technical guys. If they have a task to count up the lines of code, then I can support them. I rely on them often to help me out with Rational Clearcase. So I answered their questions regarding source files that should be counted. In the end they came up with a count of 215,000 lines of code in our two largest applications. This seemed to be on the correct order of magnitude. A couple year ago I did a lie of code count using the UNIX wc command. I came up with 270,000 lines of code. My count may have included the source code for some third party tools we were using. So my count may have been artificially high.

CMMI

Recently my company announced that it was not seeking to achieve a Capability Maturity Model (CMMI) level of 4. Instead we have decided to remain at CMMI level 3. This decision was made at the company level. It does not mean that certain project may not individually aspire to the higher levels. The reasoning behind the decision was the result of a detailed cost benefit analysis. I guess the heavy cost to get to the next level was not expected to be offset by any gains it would produce. My company is very methodical about things like this. They are not afraid to spend money. But it is only done when there is a definite return on investment perceived by the decision makers.

Personally I was happy with this decision. As a developer, I am not really interested in ultra high CMMI certification. Getting to level 4 would have most likely meant that I would have to collect a lot more metrics on a recurring basis. That does not sound like fun. At a deeper level, I doubt it would yield any positive results either. Developers can game the system once they identify how the metrics are being collected. This can only lead to improper results.
It is not that I am against process per se. Process is good if it is applied properly and within moderation. I was surprised to find out that our company had been awarded a level 3 certification anyway. The contract for our current project was recently won. But I do not sense that we are performing at a CMMI level 3 of maturity. There are a number of important processes which are not documented for development. Luckily for me I have been on the project for a long time. But the new guys often have trouble figuring out what to do. I wish there was a document I could point them to in order to explain things are done.

I suspect our project will follow the company lead in remaining at CMMI level 3. Our job should be to correctly document the current processes in place. That would be a good start to improving the process around here. We have some good developers on the team. I am hoping that this will result in a positive development experience. I have been through enough failures to know that nobody likes the pain of a botched project. Our team has ambitious goals for the next year of development on this project. Extra work to collect metrics for a CMMI level 4 certification might put those goals in jeopardy. I will plan to report back in 6 months with how we did.

Duplicate the Problem

Our customer submitted a trouble ticket on one of the applications in our software suite. I immediately called the sys admin at the site where the problem was happening. And I was given a set of steps that were followed that result in the problem. So I used these steps to trace through the code in a development environment. I spent a couple days doing this. My conclusion was that there was no way for the specified problem to occur. It was just not possible.

Then I got a call from the sys admin. When the sys admin stood over watching the users follow the documented procedures for using the app, the problem never happened. That was the key that I needed to make some progress. I started thinking about other ways the users are not supposed to use the application. Then I was able to recreate the problem in development. The fix for this problem was not far behind.

So I got a build ready to be released to our customers with the fix. And I was out sick the day our test was supposed to test and release the software. But I left another developer responsible for this. When I got back, I was relieved that the software shipped on time. But I got a visit from a tester and from the project coordinator. Apparently the test team was unable to duplicate the problem. But they shipped the release out anyway. The tester said he followed all the steps from my unit test plan. However he could not make the problem happen with the old version of the code.

Now this scenario was wrong on multiple levels. First of all, a tester should not rely on a developer's unit test plan. Otherwise they are just repeating what a developer did and adding little to no value. Second it was troubling that they could not duplicate the problem. If you cannot make a problem happen, then you certainly cannot tell whether the problem is still happening with a new build. Third, the software should not have been shipped out if we were unsure whether it was right. There are a lot of things wrong in this world. But if your job is to test software fixes, you should at least make sure you know what you are doing. Otherwise you not only do not add value, you waste time in the process.

These are the last days on this particular project since another company won the contract away from us. However I at least went over to the tester, and helped him duplicate the problem. Then he was able to have a set of steps to execute against the new build with the fix. This still is not optimal, since I brought all my biases into the test situation by telling the tester what to do. However it was better than nothing. I hope we have a more thorough independent test team on my next project.

Independent Testing

We receive a lot of trouble tickets written against out application suite. When we develop a fix, we write and execute unit test procedures. This step alone forces the developer to do more unit testing than normal. It also helps that we have a peer review process which reviews the unit test documentation. However you should also have independent testing.

Luckily our project has a whole team dedicated to doing internal testing. This team reports to a different management than the developers so there is no conflict of interest. The team is tasked with conducting independent tests of any software changes we perform. Since this is an internal team, they do their work before any software changes are released to the customer. They hold the veto authority to hold back any software delivery due to problem developed in test.

This independent testing creates the most value when it is truly independent. That is, the testers read the trouble tickets and independently duplicate the problems documented. Then they can take our fix and make sure it corrects the problem and does not break anything else. When the testers have to rely on developers to explain what changes they made, and how they unit tested the code, the testing sometimes degrades to a repeat of a unit test. Normally a repeated unit test is a waste of resources.

They say that the earlier you detect a bug, the cheaper it is to fix. When our independent testing team discovers a bug before we ship software, all kinds of problems are averted. We as a team look better. And in the end we have to deal with less reworked problems. The overall trouble ticket count opened by the customer decreases too. In this age of cost cutting, our customer has been asking tough questions about where we could trim costs in the project. One of the areas they question is our independent testing team. The only result of eliminating this team, in my opinion, would be a lot of pain. Software development is tough enough without extra grief. Here's to your impact on the bottom line, test team.

Quality Job 1

I work in the hectic world of software maintenance. When there is a high priority problem, there is pressure to cut corners to quickly ship out a fix. This works great when there are no problems. But we get in trouble when this leads to even more problems. And you know that is bound to happen.

The same goes for new features we implement in our legacy system. It seems no matter how many new managers we get, there is still always a rush to implement new features and deliver. Then things get tough and developers take short cuts to ship on time. The result is some seriously buggy code. More often it is code that does not meet customer expectations.

It is time to pay more than lip serve to software quality. You need to make a large initial investment and take the hard line on this. In the end you will benefit. At least that is my premise. The question is how do you implement quality improvement when there is the insane desire to ship software fast? This is the question I want to answer in the upcoming posts of this blog.

Software Quality