Friday, August 10, 2007

Line Of Code Counter

Someone was asking for a way to count lines of code, so I had a look in my toolbox and found the following. By providing this I do not wish to support such code metrics in any but the most general sense.

For what can be learned from counting lines of code? Should we assume that the programmer who produces more bulk is more productive. Or that the language that produces less is more efficient? I wouldn't want to (re)start any of those flame wars!

This function uses a variant on my directory tree walker, as published in the Python Cookbook. It's a straightforward but nonetheless helpful function that wraps os.walk with some useful extra functionality.

The LOC function itself has some basic logic to skip comment lines. This could get confused by the use of triple quotes for multiline strings. To prevent that, do not start such a line with the quotes.

import os
import fnmatch

def Walk(root='.', recurse=True, pattern='*'):
"""
Generator for walking a directory tree.
Starts at specified root folder, returning files
that match our pattern. Optionally will also
recurse through sub-folders.
"""
for path, subdirs, files in os.walk(root):
for name in files:
if fnmatch.fnmatch(name, pattern):
yield os.path.join(path, name)
if not recurse:
break

def LOC(root='', recurse=True):
"""
Counts lines of code in two ways:
maximal size (source LOC) with blank lines and comments
minimal size (logical LOC) stripping same

Sums all Python files in the specified folder.
By default recurses through subfolders.
"""
count_mini, count_maxi = 0, 0
for fspec in Walk(root, recurse, '*.py'):
skip = False
for line in open(fspec).readlines():
count_maxi += 1

line = line.strip()
if line:
if line.startswith('#'):
continue
if line.startswith('"""'):
skip = not skip
continue
if not skip:
count_mini += 1

return count_mini, count_maxi


And here's how you use it on the current directory:

print '%d : %d' % LOC('.', False)


I know there are common cross-language tools to do this, but whipping up a pure-Python implementation was quick and saves me installing YAT (Yet Another Tool). Also, this code is freely available under the Python license. It has also been published as recipe 527746.

RELATED POSTS

2 comments:

Zach said...

This is also available on a Unix system as the following, without any nonstandard tools:

find -iname *py | xargs egrep -v "((^\W+#)|(^$))" | wc -l

In order, this:
- Finds all PY files
- Uses grep to remove all lines that:
- The first nonspace character is #
- Are empty lines
- Counts the number of remaining lines

All of the above tools are very useful in many situations (find, xargs, and grep).

robin said...

Thanks Zach -- right you are! And one could also use a grep function in Python, but I am allergic to regular expressions. I think my time with Perl burnt out those particular brain cells.

Post a Comment