Monday, April 10, 2006

Tab Versus Spaces

There are several famous holy wars in the history of computing. There's emacs versus vi, Mac versus PC, and tab versus spaces. These are commonly referred to as "religious issues", meaning that they are a matter of belief only. Now, the first I couldn't care less about, since I use neither text editor and preferences for keyboard mappings seem a relic of the days when dinosaurs walked the punch-card strewn floors of computing. Mac versus PC is an easy one too: use a Mac if you like spending more money for other people to make up your mind for you, otherwise use a PC. (Actually, Macs are becoming PCs1 so what's the difference?)

But on the tab versus spaces issue I must take a stand and make a few declarations. First, the common wisdom, as espoused most emblematically by Jamie Zawinski, is dead wrong. And second, it is not a religious issue at all, because there is a correct answer.

This relates to Python in a very direct way. Python is unique among programming languages in mandating indentation as part of the language syntax: every code block must be indented.

Some slack-brains take one look at this and run away screaming "Whitespace as syntax! Whitespace as syntax! Garrrrrrr!" But anyone who actually uses Python for longer than .3 seconds realises that this rule simply enforces what any good programmer already practices. This correct indentation becomes a guide to the compiler, so curly braces and other excess syntactic cruft are not needed. It's a neat, simple solution that results in code of greater readability -- a real win-win scenario as managers say on television. (And possibly elsewhere. I wouldn't know; I avoid managers2.)

But back to the issue at hand. First, why is tab versus spaces not a trivial issue? I suppose that for any given programmer it is, but if you work on a team you must come to some sort of agreement or be constantly converting back and forth. This is possibly time-consuming, possibly error-prone, and possibly not even that easy depending on your tools. So it's best avoided by agreeing on a standard.

Before going any further we should acknowledge the most important article on the subject and read jwz's "Tabs versus Spaces: An Eternal Holy War."

Jamie breaks the issue down into three parts. He notes that hitting the tab key does different things on different systems, but as you can configure this behaviour it's not the main problem at hand. Furthermore, encountering a tab character in a file is the same: you can do what you wish to do.

The third part of the issue (actually it's his #1) must then be the real problem: people care about how many columns a tab/indent represents. He calls this a "religious war" because he has misidentified the problem. His solution is to have tabs expand to spaces before writing a file to disk, so tabs never exist in interchanged files.

This assumes one will never have to open a file containing spaces and intuit what tabs are supposed to be there, or how the spaces should be interpreted. Perhaps jwz never has bugs or has to re-edit his code. Or perhaps he's used to write-only languages like C++ and Perl where the least of your problems are relating to tabs because you are already committed to spending inordinate amounts of your precious life wading through code trying to find the crucial lines that actually do something.

Then he spends the second half of the rant talking about specific settings in arcane software.

Where did jwz go wrong?

First, his three points are not separable, at least not in the way he would like. Discussing the tab key he says "this is an editor user interface issue" in order to dismiss the problem. Editors treat tabs here as "indent to a column position", there as "add so many spaces", and in a third case as a single character of value "ASCII 9". So, according to jwz, it's not an important part of the problem.

But how is this different from the point he says is the most important, that when reading and writing code, people "care about how many screen columns by which the code tends to indent when a new scope (or sexpr, or whatever) opens"? (Strangulated syntax in original.)

Answer: it is not different. There are not three points; there are two.

A tab has both syntax, a representation, and semantics, a meaning.

With this plain and simple approach it is dead obvious that using spaces to represent tabs at the level of encoding is inherently wrong because we are throwing away meaning. A tab no longer has its own representation but is instead subsumed into how we represent spaces.

An example illustrates the problem. If you save all tabs as two spaces and the next person who opens the file instead wants to see tabs as indenting to a given column position, how are they going to do that? First they'll have to assume that all two-space sequences are tabs, and then they can interpret those tabs. But the assumption is dangerous and wrong. Everywhere that two spaces do not in fact represent a tab there will be an error of interpretation. Meaning will have been lost.

Spaces cannot represent tabs as encoding. They can in a display, but that is the choice of the viewer at the instant and not something to be persisted eternally in the file.

Preserving tab characters allows them to mean something different to each user. This makes no assumptions about the capabilities of the tools used. It does not require everyone to use the same editor with certain macros playing to automatically convert, or any such nonsense.

Furthermore tabs have the following advantages:
* one key to hit instead of up to 8 3
* not open to error when 7 or 6 spaces are hit instead of 8
* makes diffing files easier (because of previous two points)
* smallest file size

Use tabs not spaces. Why would anyone be so foolish as to suggest any different?

1 Not only do they use Intel processors and ATI video cards but also boot Windows.

2 For that matter I avoid television as well.

3 You think you won't ever have to do this because you have your tabs set to 4 characters and inserted automatically? Just wait until you look at someone else's code and all you have are spaces as guides. Sucker.



wx said...

your cause is futile. reality does not work like you say it should.

robin said...

Since I deny the "obvious futility" I deny your conclusion. Further, I see that Python is not a mistake so your hypothesis is wrong.

Thanks for spotting Occam, though. Besides that, many languages that used punch-cards needed correct column positioning of certain code elements, though this was not a matter of code block indentation.

wx: I await with bated breath your attempt to explain how in fact "reality" "works". I was under the impression that the reality principle stopped working some time ago.

Anonymous said...

Pretty much right on. I really don't understand the hatred of tabs to represent tabs.

The only problem is horrible programmers that mix tabs and spaces.

"Uh, let my figure out what weird tab stop they were editing this file with" *fiddles with tabstops for 15 seconds until it looks right*

The solution is to get rid of the spaces, and let the user display it at whatever tabstop he wants.

Skotty said...

I love this line:

"A tab has both syntax, a representation, and semantics, a meaning."

I've been trying to draw up a good software metaphor in support of tabs. Lately I've been relating it to a program with a model and a view, two things that should be kept separate. In tabs vs spaces, spaces are the view, and tabs should be the model. The line I quoted is short and succinct, embodying the core of the concept.

robin said...

Thanks Skotty. It was fun to revisit the old polemic of mine by way of your recent comment.

Post a Comment