Code Analysis Techniques
There are a number of code analysis tools available for .NET developers, including some stats that are built into the pricier SKUs of Visual Studio. Recently, I’ve been playing with a relatively new product (released earlier this year by Microsoft agile consulting shop NimblePros.com) called Nitriq. Nitriq is a bit like LINQPad for your code. If you’re not familiar with it, go download LINQPad now – it’s a great tool worth paying for. I’ll wait until you’re back…
Back? Great! I think we were discussing using Nitriq for code analysis, which by the way you can run for free on a single assembly. Before going any further, it’s worth mentioning something that many of us often forget or perhaps never learned, and that is that code is data. Some languages, like Lisp, make it very easy to manipulate code as if it were data (and vice versa), but even in static, managed languages like C# it is worth remembering that our code itself is data that we can query, analyze, and from which we can learn.
So before we go any further, what kinds of questions might we ask of our code? One of the best use cases for such code analysis is to find code smells, anti-patterns, and other things that would generally indicate a lack of quality. Static Cling is one-such anti-pattern I’ve described previously that it would be great if we could detect it.
Static Methods That Instantiate Objects
var results =
from m in Methods
let ConstructorCalls = m.Calls.Where(
callMethod => callMethod.IsConstructor).Count()
where m.IsStatic && !m.IsConstructor && ConstructorCalls > 0
select new { m.MethodId, m.Name, m.FullName, ConstructorCalls };
There’s nothing inherently wrong with static methods from a testability and code quality standpoint provided they are leaf nodes in your object graph. It’s when they start newing up objects that things tend to become tightly coupled and you start down the path toward the Big Ball of Mud architecture. So, being able to find static methods that instantiate objects would be a worthwhile query to run. It’s likely there are a few static methods that deserve to be made into instance methods on classes, and whatever objects they’re instantiating could probably be passed into the method or the new class’s constructor using dependency injection. (I’m a big fan of keeping things loosely coupled by explicitly declaring dependencies – if you’re interested I have a few posts on dependencies).
I know from experience that finding these kinds of static methods and cleaning them up is a very worthwhile exercise and will result in a better system. The time to clean it up is usually either when you have some extra time in an iteration, or when you’re already touching the code in question – I don’t generally advocate spending large amounts of time purely on cleanup. Customers like to see fixed bugs and new features, and delivering customer value needs to be the top priority. But being able to maintain the ability to deliver that value requires keeping our codebase tidy – so clean up these dust bunnies when you encounter them.
General Stats
The code I’m running Nitriq on at the moment has the following stats (see image at right). I’m analyzing 3 related assemblies in an ASP.NET web forms application. You can see that these 3 “core” assemblies being analyzed have a total of 81 namespaces, 1067 types, 8876 methods, etc. The total physical line count is about 64k. These assemblies call other assemblies that aren’t included in the analysis, including third party DLLs and of course the .NET framework itself. All told, that includes another 40 assemblies and another 2608 methods that aren’t directly included in the analysis, but are used by my assemblies.
Running the query above to seek out static methods that instantiate objects, I get 228 such methods. This works out to about 2.5% of the total codebase, which is pretty small, but in looking at the actual methods returned I immediately recognize several that I’ve struggled to test or refactor in the past, so I know I can make it better.
Trouble Methods
A number of computer science papers have linked cyclomatic complexity and line count of methods and files to the number of bugs occurring in these methods or files. More recently, some papers have attempted to link design flaws with defect incidence, with some success[1]. There Naturally if the likelihood of bugs per lines of code is constant, one would expect there to be more of them in longer files than in shorter ones, but in fact the bug rate per line of code increases with the length of the file[1]. Unfortunately, accurate prediction of software defects is a difficult problem, as this 11-year-old IEEE Critique of Software Defect Prediction Models indicates[2]. But until we have better tools, it remains useful for us to attempt to keep our code neat, clean, and as simple as possible (but no simpler).
One of the design flaws noted in D’Ambros’ paper is Dispersed Coupling, which relates to the number of other types used by a particular class (or method). We can pull out this information, along with other useful data points like LOC, cyclomatic complexity, and parameter count using the following LINQ query against our code:
Methods to Refactor
var results =
from method in Methods
where (method.Cyclomatic > 25 ||
method.PhysicalLineCount > 200 ||
method.TypesUsed.Count > 30 ||
method.ParameterCount > 7)
&& method.Type.IsInCoreAssembly
select new { method.MethodId, method.Name,
method.Cyclomatic, method.PhysicalLineCount,
OutTypes = method.TypesUsed.Count,
method.ParameterCount };
The intuition underlying the Henderson-Sellers method of calculating Lack of Cohesion of Methods (LCOM) is that in a cohesive class C, many methods access the same fields of C. Formally, let
- M = set of methods in class
- F = set of fields in class
- r(f) = number of methods that access field f
- ar = mean of r(f) over f in F
We then define LCOM of the class under consideration to be
LCOM = (ar - |M|) / (1 - |M|)
var results =
from type in Types
let methodCount = type.Methods.Count
let instanceFields = type.Fields.Where(f => !f.IsStatic)
let fieldAccesses = instanceFields.Select(f => f.GotByMethods.Union(f.SetByMethods).Distinct()
.Where(m => m.Type == type).Count())
let accessAverage = fieldAccesses.Count() == 0 ? 0 : fieldAccesses.Average().Round(2)
let lcomHS = ((accessAverage - methodCount) / (1 - methodCount)).Round(2)
where lcomHS > .9 && instanceFields.Count() > 0 && type.IsInCoreAssembly
orderby lcomHS descending
select new { type.TypeId, type.Name, lcomHS, methodCount, fieldCount = instanceFields.Count(), accessAverage, type.FullName };
&& !type.FullName.Contains("Web")
Piece of Chalk $1.00
Knowing What To Mark With It $49,999.00
References
[1] On the Impact of Design Flaws on Software Defects, D’Ambros, Bacchelli, Lanza
[2] A Critique of Software Defect Prediction Modules, Fenton