Static Code Analysis

In a recent post James Robertson claims that static typing doesn't help with refactoring. The original claim:

What's the real reason I get ClassCastExceptions? Because I refactor a lot. After a refactoring the wrapper class to use B instead of A you might be putting a type B into the collection and incorrectly casting it to the old A in a wrapper class method

James Robertson replies:

You don't have this problem in Smalltalk because it's not the sort of problem you tend to get yourself into, period. Say I had a collection holding Foos. If I refactor, and I end up having a collection holding Bars (completely incompatible), then I have bigger problems. Even so, I would have had to refactor all the surrounding code that accesses the collection elements - and if I didn't have tests under those circumstances, I'm in trouble whether I have static typing or dynamic typing. To be brutal, if you trust the compiler to solve this for you, then you shouldn't be writing code.

Well, that's a little harsh. While Ryan Lowe's specific examples don't really apply to a dynamically typed language, the idea is valid. Refactoring tools are easier to write and more reliable with static typing. A tool can analyze statically typed code and say with some confidence exactly where a class or method is used -- in Smalltalk, Python, or other dynamically typed languages, refactoring is just a string match. Good naming practices can make that string search more reliable, but it's still just strings, not a fully type-annotated source.

This is one of those unfortunate places where you can't have it both ways. Dynamic typing and late binding is resistant to static analysis, and static analysis can be used for good things. (And Python is actually more resistant than most dynamic languages.)

It would be interesting in these dynamic languages if you could assert a type (or interface) not just at runtime, but in a way that can be statically determined. I don't know if programmers would bother to put in those assertions reliably enough to make it useful, or if the assertions would ultimately become fragile and unwieldy. Maybe a more appropriate goal would be to watch the code as it runs (particularly unit tests) and view types there; then while you wouldn't be absolutely guarenteed that you have correct type information, you'd usually have good information for analysis. We do this for JIT and other optimizations, but I haven't seen it used a great deal as a programming tool.

Still, while static typing isn't useless, it is rather limited. You can say "float", but you can't say "float between 0 and 1, inclusive". Contracts are a more general idea, with much more potential for error detection. Static analysis of contracts would be very interesting indeed.

Created 10 Aug '04
Modified 14 Dec '04

Comments:

"in Smalltalk, Python, or other dynamically typed languages, refactoring is just a string match"

No.

The Rewrite Tool in the Smalltalk Refactoring Browser is a parse-tree matcher - this is *not* textual search and replace.

http://www.whysmalltalk.com/Smalltalk_Solutions/ss2003/pdf/roberts.pdf

# Isaac Gouy

'You can say "float", but you can't say "float between 0 and 1, inclusive"'

We *can* define Float subranges in Ada

http://goanna.cs.rmit.edu.au/~dale/ada/aln/4_basic_types.html
# Isaac Gouy

Yes, yes, I know no real refactoring tool works on the source as text. Even Real Programmers don't call sed a refactoring tool ;) But it is working on the source in a relatively isolated manner, where only the name is available, and that name is compared as a string. Whereas in Java the name can be fully identified as a particular method of a particular class.

Re: Ada; the problem with this is that the limited operators available don't work well for complicated systems. You can have number ranges. What about only odd numbers? What about invariants? What about objects that can have many valid states, but must be in a particular state when passed to a specific function? That's more restrictive than an interface, but a very common restriction. Realistic requirements are complex enough to require a full language, not a limited type definition language.
# Ian Bicking

Yes, yes, I know...
We have to make do with what you write - what you may or may not know is a mystery.

But it is working on the source...
Again, no; the refactoring tool is working on ASTs.

Ada; the problem with this...
Before we move the goal-posts, are we agreed that for the last 25 years, in a mainstream language, we've been able to say "float between 0 and 1, inclusive"?

# Isaac Gouy

The AST (Abstract Syntax Tree) is just a data structure describing the source. It may lose a bit of information (e.g., whitespace), or it may not. It's not magic, it carries no extra information. It's very handy when you need to programmatically read the source, but it doesn't endow any special powers or information that original text didn't already have. But that's the same as what Java is doing, too, it's just that the Java source contains some pertinant information that Python or Smalltalk does not contain. So "text" vs. "whatever it is Java uses to represent source" is probably a bad way of describing the difference. It's really a question of what kind of reasoning you can do given the source.

As far as Ada, sure. But then, it was an example of a statically defined contract -- while Ada satisfies the example, that's not a case of contracts. I don't know much about Ada, so maybe it does provide all this contractural information, but whatever it is, it's more complex than can be expressed in terms of types.
# Ian Bicking

One of the problems of static type checking in Python would be the plasticity of runtime introspection. Take, for example, one of my favorite ways to generically wrap methods, used in the example of a constructor:

def __init__(self, *args, **kargs): superclass.__init__(self, *args, **kargs)

There are countless places where this is desirable, such as dynamic xmlrpc servers or object brokers, or meta programming. I would say that overall, the plasticity of Python will forever prevent any kind of meaningful type checking.

An alternative I would prefer to subclass based type checking would be interfaces. Checking to make sure an object passed implements an "iterable" interface instead of checking to see if it's a list or it has whatever methods you use.

# Python and Static Typing Problems

Typechecking and Typechecking are different things. You have a full spectrum of different ideas when you look at type systems. You can have fully manifest typing (as Python or Smalltalk have) and still can have usefull type informations - by using type inference. So actually the ASTs a refactoring tool can be enriched by information that goes beyound the source. Type inference isn't thant new - several Lisp implementations use it to optimize their code. Of course you could it to make better refactoring tools, too.

Of course it is more problematic to have enriched information for refactoring in fully dynamic languages with strong focus on manifest typing. But it's still up for discussion wether that really is needed. By gut feeling I would go with James Robertsons assertion that static typing in the long run doesn't help. But then, I do Smalltalk since somewhere in the 80s and Lisp since the end of the 80s, so I am a bit biased.

# Georg Bauer

IMO people are mixing apples and orange. the fact that Java's type ssytem sux does not imply that all static typing systems should be like that. Nice is a nice example of a type system where you add very little informations to those that can be inferred automagically.

Also, I won't bet that you can't define an Even or Odd class in haskell, it should be something like:
class Even n where check_even :: n

I think we must face the simple law that actual "dynalanguages" like python or ruby are just a mix of
1 simplicity
2 expressivity
3 large simple+expressive libraries

and that static typed languages that provide #1 and #2 massively lack #3.
# verbat

oh, btw haskell's type system is turing complete, people say :)
# verbat

So moving the goal posts: "What about only odd numbers?"

See todays LtU posting. Although you should probably dig into dependent types for a more satisfactory answer. It's all a mystery to me ;-)
# Isaac Gouy

static typed languages that provide #1 and #2 massively lack #3
Does Java satisfy #3?
If so then presumably languages (Nice, Scala) that use standard Java libraries also satisfy #3?
# Isaac Gouy

String matching would not distinguish a local variable or instance variable named "X" from a method named "X" from a class name "X".

My understanding of the Refactoring tools (Smalltalk and otherwise) is that they are operating on the same level of information the compiler operates on: it does distinguish between a class named X, a method named X, a local variable named X, etc.

There is a danger when strings (perhaps read from an external file) are used to look up classes, methods, or whatever... the exact same danger exists in Java. On the other hand, C++ doesn't have any way to go from a string to a class or method, so it's "safe", though the syntax of C++ is so hairy, and the lack of reflection such a big problem, that there are no refactoring tools for C++ yet. [I hear one is in development, but that's it.]

# keith ray

isaac: I don't think so. The java library is an overengineered beast and I won't say that it is simple and expressive. This is a side effect of it being a library for an "enterprise" language like java.
The scripting attitude is to ignore details and have something working even if it's not 'perfect', while the java lib pushes the "do it the exact way, then bother about pleasure of use" mood.

Anyway I have to specify that #1 and #2 are provided from many static languages but a static inference able to encompass the expressivity of python/ruby/whatever is yet to come. I bet it is possible, just not yet done.
i.e I'm not sure but IIRC in nice you still had to specify an ad hoc interface explicitly, or can this be inferred automagically?
I'm sorry I don't know Scala enough to comment on it. I was just confused from seing it has traits+mixins :)

# verbat

Perhaps, you are looking for a interface checker like this one:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/204349
# ch3m4

Ian Bicking: the old part of his blog

Static Code Analysis

Comments: