NOTE: This website is deprecated. All the same blogs + comments are now available at http://blog.quaddmg.com. You can go to the same article by going to http://blog.quaddmg.com/articles/YYYY/MM/DD/article-name

10/28/2005

The terrible lie of intuition

I was just going to put a comment on his blog, but Julian has suggested something bordering on heresy, so I figure it's appropriate for subject matter here. In addition, this is probably going to take a while, and I think it's going to go beyond tackling Julian's assertion.

He's suggesting case preserving, case insensitivity is appropriate for programming languages. Case insensitive, case preserving in this case means don't change the case of identifiers, but match all identifiers regardless of case (eg: "Blah" is still displayed as "Blah", but is the same as "bLah", which is still displayed as "bLah"). You'd have to read his blog to get the entire argument, but I'll display (hopefully) the critical bits here. He first makes some arguments against case-sensitivity (or rather, some reasons against the benefits of case sensitivity) but he does not mention the strongest reasons for case sensitivity. I'll give my reasons later, but first, I'll address the case for case insensitivity.

Taking apart the argument

The basic gist of his point is that KEANU REEVES is understood as being the same as KeAnu ReEvEs by humans, so it should be understood as being the same by computers. However, humans assign significance to capitalisation, so capitalisation should be preferred. This makes the programming language more human. However, there are certain cases in human language where capitalisation or context is the discriminator, and computers don't always have context. Let's say Keanu's first name was 'Boing' ("Mr Boing Reeves"). In this case, 'Boing' would clearly be different to 'boing'. While in English this kind of case does not crop up often, in programming this is untrue. While Julian does mention the class Foo foo("bar"); case, he claims that the practise is indefensible. I think it's perfectly fine. Say 'Foo' is a singleton, it's often acceptable to call the instance 'foo'. Even for classes which tend to have only a single instance in a program, I think it's perfectly acceptable to call the object by the class name. In fact, we often do this in English ("the clock" to refer to an unspecified "Clock", or "bike" for a particular "Bike", whereas multiple clocks will be given some more specific names).

The worst thing about his argument is the rule of engineering: Consistency is good. Forced consistency is good. This is why we have coding conventions, checkstyle, indent. Any sane coding convention is going to force consistent capitalisation anyway. Worse, because the language allows inconsistent naming, and naming is difficult to check for a formatter or something like checkstyle, warnings can only possibly show up by the compiler. Should the compiler show a warning that inconsistent naming has been used? If so, what's wrong with the same warning in a case-sensitive compiler?

The real reason for his argument comes from something I saw when he was setting up his framework. As soon as he mentioned case, I thought "he's been using PHP, and is pissed because you don't have to declare variables and messing up the case will leave a nasty bug with no warnings". I know that sounds like a long shot, but it's happened to me, and I went through the same thing, so I just went "if it were me, this is where I'd be going". The place where he leaves his argument open is where he says:
The two most common capitalisation errors I make are: HOlding DOwn THe SHift KEy TOo LOng, and being inconsistent in CamelCasing the term “fileName” (I never did resolve satisfactorily whether it was one word or two!)
It's unforgivable that you could have a spelling mistake in your code "KEanu" and it still runs correctly, giving no warnings. There's no actual advantage to being able to have improper spelling (or is it grammar?). In addition, someone else reading the code could actually wonder if you meant different things when you said "filename" and "fileName". The solution is declaring variables or having warnings for inconsistent case.

In case no one believes that someone might think "filename" and "fileName" were different, I'll give you a story from uni. At uni we'd often get answers to questions which were wrong. People with a clue often figured out that the answer had a silly mistake and would continue nonetheless. People with less of a clue just got plain confused until year 3 or 4 where they realised that they saw this stuff so often that there must be mistakes in the answers. People with little or no clue would construct alternate abstract mathematical universes where the answers would somehow become correct. It was really quite scary to see them solve problems sometimes. In the same way, if code that looks funny executes correctly, we're going to see people with strange voodoo consistency which they won't play with. This is most definitely not good.

The correct solution is declaring variables. I was always undecided about the topic of declaring variables. I thought there was no need, and no point. I thought it was just there to make it easy for the compiler. I thought anime was lame when the characters declared their attacks. Then I saw martian successor Nadesico, and now I know that you declare attacks for more than just allowing the audience to know what you're doing. You do it for style, and you do it because it's what you believe in. It's the same with variables. It's not just for the compiler, it's for style, and it's what you want the variable to be...
gekigan punch;
gekigan flare;

Why case sensitivity is good

Julian mentions some lame reasons for why case sensitivity is good, and then takes them down like burnt effigies. The only thing I can salvage (other than the "Foo foo" thing) is how he mentions that the difference between A and a is minuscule. If you had a variable named 'a' and another named 'A', you would think of them as different. 'a' sounds like a scalar, or a vector, whereas 'A' sounds like a matrix. 'Ax' is "intuitively" a matrix multiplication. Surely case here is more important than the actual identifier used. 'Ax' or 'By' is still just a matrix multiplier.

I've already made the point of forced consistency. I can kind of extend this point by saying that you can be sure that a particular capitalisation has the connotations you attach to it. This has already been mentioned in one of the comments to Julian's blog entry, but it's what one good turn deserves. THIS_IDENTIFIER is clearly a constant, ThisIdentifier is clearly a class, thisIdentifier is clearly a variable. You can't accidentally type thisIdentifier and get a class in a case-sensitive world, whereas you can in the case-insensitive world. In addition, this may form a sort of "hungarian notation", which is evil. For example: "c_this_identifier" for a constant, "clThisIdentifier" for a class, or "vThisIdentifier" for a variable.

The final point is important, but subtle. Case preserving, case insensitive identifiers encourage "more human" thinking. The problem is, when you're thinking human, you're almost definitely thinking wrong. The only reason people zone out when coding is that they're thinking in the problem domain, and in the language of the problem domain. When you're writing in C, you're thinking in C. When you're writing in something "intuitive", you're thinking "intuitively", which is to say, less precisely. I can only speak for myself, but I find it hard to zone out in languages that are imprecise, like SQL or BASIC. I believe a part of that can be attributed to the imprecise nature of the language itself.

The fallacy of intuition

The real problem I have with his proposal is the ending. Julian ends with:
There is no longer any excuse for making humans learn and handle the quirks of the way computers store upper- and lower-case characters. Instead, software should handle the quirks of human language.
It is time for integration of the cases! Case-Preserving Case-Insensitivity: equal and yet different!
It sounds a lot like:
"Why won't the machine just do what I want"
which sounds to me like:
I cant type properly and ny shuft ky is stuk itd be good if the puter fixed all my typing an dint crash all my 1338 code LOL!!!1
I occasionally have to type my password in two or three times to get in, because I get it wrong the first time. At times like that I think "maybe it'd be nice if it'd let me pass if I was close enough, or had a couple of close-enough guesses". Then I come to my senses. LOL indeed.

Nothing against Julian in that last bit, btw. He certainly doesn't type like that.

I'm a person who spends a lot of time thinking about how one should interact with the PC. I'm really keen on tablet PCs. I think "intuitiveness" is a load of fucking shit. A fallacy, a lie, a failure of higher thinking. It's what happens when you've stayed up too long and your body is trying to hurt you so you'll get some rest. I wish I had stronger words, but I don't. Every intuitive program I've ever seen is a piece of shit. It's always non standard, slower, and less flexible than whatever "less intuitive" thing was before it. I remember programs that had pictures of a virtual room which you could click on to do things. A desk on which you'd work on documents, a briefcase, a calculator, walls and TVs and shit.

Those programs don't exist anymore.

You know why they ship solitaire with every copy of windows? So you'll learn how to use a mouse. If you didn't, I'm betting people would've stayed with whatever they were using before. Microsoft may or may not have known it, but they were probably betting that people would while away hours playing minesweeper and solitaire, honing their mousing skills before they'd ever want to do anything "intuitive" on their machines.

I can't use macs. Never have. I thought those buggers were meant to make sense. I went to Nathan's house and started using his mini while he was in the shower. I felt really uncomfortable until I found the terminal.

Anyone who ever says anything is intuitive is probably lying. Try picking up CAD and figuring out how to use it. I guarantee you'll give up unless you've used some other CAD program, regardless of how "intuitive" the program claims to be. Hell, even go from the "drawing" model CAD programs to the CSG ones, and you'll probably be screwed. This is because programs deal with concepts. If you don't grasp the concept already, you think the program is not intuitive. Most people have written a letter, so they think they "get" word processing packages. Most people haven't designed something to be built on a lathe, so they can't "get" CAD.

In conclusion, case insensitivity is bad because it allows inconsistency, allows errors, and makes reading code harder. Case sensitivity is good because it's consistent, gives more information to both the compiler and the reader, and allows for better "zoning". Intuitiveness is bad because nothing touted as intuitive is ever standard, flexible, and powerful, and the idea of intuition as a goal is a fucking lie. Power is good because it allows professionals to do their jobs properly.

I think it's time to expose intuition-loving hippies for the frauds they are. Power to the people! Olé!
 Comments (4)
Anonymous Anonymous
Sunny,

I am glad to have provoked some thought on the matter - I knew I was treading into controversial territory, so I spent sometime preparing the arguments.

Unfortunately, I left one minor point out. My praise for "Dictionary Definition Canonical Form" support in an IDE was added as a comment to my original post, just minutes before you posted this to your blog. To some extent, this shows that I agree with many of your objections, and propose that we can rely on simple technology for us to overcome it.

Let's say Keanu's first name was 'Boing' ("Mr Boing Reeves"). In this case, 'Boing' would clearly be different to 'boing'.

I am not convinced it would be *that* clearly different, as I am sure many people with names that are homonyms with English words might attest!

While in English this kind of case does not crop up often, in programming this is untrue.

I am not saying that it is true for programming. I am saying it *should* be true for programming!

Say 'Foo' is a singleton, it's often acceptable to call the instance 'foo'.

So, if I had my way, this wouldn't be possible. Is it such a great loss, in return for the benefits? I argue the answer is "No".

"the clock" to refer to an unspecified "Clock", or "bike" for a particular "Bike"

I strongly agree with the first example. The variable could be called the_clock, and we would both be happy. I don't think the second example is true. We don't say "Pick up bike". Call it the_bike, my_bike or a_bike, but bike is a type of object not an instance of an object. (I can think of two minor exceptions to this: calling a dog "Dog" or calling a boy "Boy". I don't think they invalidate my argument.)

The worst thing about his argument is the rule of engineering: Consistency is good. Forced consistency is good. This is why we have coding conventions, checkstyle, indent.

You should explain why consistency is good. Where it promotes clarity, where it allows higher level abstractions because you agree on the lower-level definitions, where it makes items more likely to plug together, consistency is good.

Where it is forcing the user to jump through hoops to make the computer understand you, I don't agree.

Any sane coding convention is going to force consistent capitalisation anyway. Worse, because the language allows inconsistent naming, and naming is difficult to check for a formatter or something like checkstyle, warnings can only possibly show up by the compiler. Should the compiler show a warning that inconsistent naming has been used? If so, what's wrong with the same warning in a case-sensitive compiler?

Here's where I point to powers of a decent IDE to say "let it take care of this for you".

As soon as he mentioned case, I thought "he's been using PHP, and is pissed because you don't have to declare variables and messing up the case will leave a nasty bug with no warnings". I know that sounds like a long shot, but it's happened to me, and I went through the same thing

Sure that's happened to me. Sure that's happened to you too. It happens to everyone who uses PHP (and Python, and Perl, and CSS, and...). So let's fix it our development environments, so it doesn't.

It's unforgivable that you could have a spelling mistake in your code "KEanu" and it still runs correctly, giving no warnings. There's no actual advantage to being able to have improper spelling (or is it grammar?).

We could make it illegal to write x = 1.200. We could insist that it be written as x = 1.2, but it wouldn't add any value to be so pedantic, because 1.200 and 1.2 are both considered legal. So why isn't KEanu just as legitimate? Why do you consider it a spelling error? (My argument would be more forceful here if we weren't using a human name, and stuck to a typical function name - e.g. get_URL versus get_url.)

In addition, someone else reading the code could actually wonder if you meant different things when you said "filename" and "fileName".

Some languages are case-insensitive when you write key-words. You can write IF, if or If. Do people wonder what is meant by the different capitalisations? No.

Sure, people who are still hung-up on case-sensitivity will take a while to get used to it, just like they have to get used to != versus <>.


The correct solution is declaring variables.

Amen, brother! SmallTalk has brought a great evil upon our world, and it has spread to many scripting languages. But this is a different topic, and one which is at least as controversial.

'a' sounds like a scalar, or a vector, whereas 'A' sounds like a matrix.

An interesting point. For mathematicians, the world is case-sensitive. I guess I would add physicists too, and their units.

It is not surprising their worlds are case-sensitive. They are writing out equations on the blackboard so often that they want to cram the symbols in quite densely.

I don't think that is a typical case for programmers though. Soon, we would end up back at Fortran, where the name of the variable determines its type, because that is how it works in maths!


THIS_IDENTIFIER is clearly a constant, ThisIdentifier is clearly a class, thisIdentifier is clearly a variable.

Oh, so by your own arguments, 'A' must therefore be a constant matrix class? :-)

These are conventions (and I argue deplorable ones) that have arisen from case-sensitive languages. These case conventions can still be used! You just can't have the identifiers overlap.

You can't accidentally type thisIdentifier and get a class in a case-sensitive world, whereas you can in the case-insensitive world.

I think that you have this backwards! In the case-sensitive world, a simplistic case-shift (which I argue is an easy-to-overlook change - proof) could lead you to the wrong item. In a case-insensitive world, you would need to change the name of the variable in order to get the wrong type.


In addition, this may form a sort of "hungarian notation", which is evil.

The most common form of Hungarian notation is, indeed, evil. There are some spirited defences of some versions based closely on the original. But, whatever your reason for arguing that the Hungarian notation is evil, the exact same arguments apply to the very case conventions that you were just defending!

You go on to attack "intuitive" software which will attempt to "do what I want". That's a straw-man. I am not a "intuition-loving hippie"; I am not asking for a computer with super-intelligent powers.

I wrote "If a computer can also disambiguate [variable names] accurately, it should do so" . You are arguing that computers can't disambiguate woolly thinking. I agree, but I am not asking for that. All I am asking is that the compiler call the to_upper function.

I was using an IDE that took care of all of this back in 1991. Now that I have a computer in front of me that is over 1000 more powerful, I don't understand why I can't have that same simplicity that I had back then. Give that power to the people!
 
You go on to attack "intuitive" software which will attempt to "do what I want". That's a straw-man. I am not a "intuition-loving hippie"; I am not asking for a computer with super-intelligent powers.

Clearly, you are not an intuition loving hippie. I did mention at the start that the post went beyond a reply to yours. It's a straw-man only as far as I'm no longer saying anything about your blog post. Those UI lunatics just piss me off, is all.

I am not convinced [Boing] would be *that* clearly different [from boing], as I am sure many people with names that are homonyms with English words might attest!

When you're speaking, they're not clearly different. When you're writing, they're more clearly different. However, when speaking you have a lot of other input to determine context, and when writing you try and make it clear which is which. When you're coding, what can you do? Your solution is never to name someone Boing. My fear is that this is sometimes impossible, impractical, or unnatural. I don't have an example (and you would likely give alternate suggestions for names anyway), but that's the limitation I think I'll be hitting. Further, to work around the limitation, I think I'll have to use some hungarianish notation.

I strongly agree with the first example. The variable could be called the_clock, and we would both be happy.

Actually, I wouldn't. I wouldn't complain, but I wouldn't be happy. "the" is not part of the name, so it shouldn't really be there. Anytime you refer to an instance, it should be clear that you're referring to an instance, and that's the "the".

I understand that means that since instances can be differentiated from classes anyway, that you can be case insensitive and the compiler knows what you mean, but it's still not right, because it's not consistent.

You should explain why consistency is good.

Consistency (other than for the reasons you mention) lowers entropy. I'll talk more about this later.

Oh, so by your own arguments, 'A' must therefore be a constant matrix class? :-)

These are conventions (and I argue deplorable ones) that have arisen from case-sensitive languages. These case conventions can still be used! You just can't have the identifiers overlap.


A convention allows you to control entropy, which I'll talk more about later. From the smiley I'm guessing you understand that the conventions are "soft", or "for our own information".

We could make it illegal to write x = 1.200. We could insist that it be written as x = 1.2, but it wouldn't add any value to be so pedantic, because 1.200 and 1.2 are both considered legal.

ah but what does it mean to say "1.200" as opposed to "1.2". I think I can tie this into the entropy point too.

There are some spirited defences of some versions based closely on the original.

After reading the "spirited defense", I have to say I agree with the article. I quite like the OpenGL convention of glVector3d(). Again, a point of entropy. Let's begin that point:

Code has entropy. Let's define this here as "information that is beyond syntax". Consistency lowers entropy, which is good, because you know what to skip. If you see something inconsistent, you know to look closer. A coding convention allows you to increase consistency and "control" entropy. This way you know that inconsistent things can be dangerous. Maybe it's a bug, or maybe something that needs to be documented. Your coding convention could use case to help identification of variable names. This increases entropy on case, but you know what this "means", so that's a good thing. It tells you something about the variable. Hungarian notation (as described in the link) also increases entropy in the variable name, but in the leading characters.

It's ridiculous to argue if your proposal + coding conventions would produce code that had consistent case, because then it'd compile on a case sensitive compiler as well, so let's assume that case is at least sometimes inconsistent.

Your proposal tell you less about, say, a variable name, because case is no longer tied to some feature of the variable. It also increases entropy because the case is inconsistent. "x = 1.200" tells you something about x (it's probably correct to 3 decimal places). I fail to see how your proposal tells you anything more about the code. How can you mangle case to make something clearer?

With great power comes great resposibility
 
Anonymous Anonymous
Those UI lunatics just piss me off, is all.

Okay, fair enough.

when writing you try and make it clear which is which.

Upper- versus lower-case is a helpful way to distinguish between "Boing" and "boing" (or "Sunny" and "sunny"). I don't want to lose this ability to subtley highlight the difference. However, it isn't sufficient, because English-speakers don't put enough emphasis on the difference between upper and lower-case to notice or prevent mistakes.

To clarify: Despite the fact that I have been as guilty as the next person in the past, we should avoid having instances and classes of the same name, even if the language remains case-sensitive.

If Apps Hungarian is the way you do this, so be it.

"the" is not part of the name, so it shouldn't really be there. Anytime you refer to an instance, it should be clear that you're referring to an instance, and that's the "the".

Yes, it should be clear, and that's why I recommend avoiding sharing the same name as the class! For the same reason that you don't call integers "i", you don't call Accounts "account". You call them "account_to_be_deleted" or "singleton_account". I actually am finding it hard to defend "the" - I would prefer "my" as a degenerate case, to indicate it is a data member of a class, and that's why it exists. However, I would prefer "the" to leaving the name of the variable the same as the type. Heck, I would even prefer the dreaded underscore prefix (despite the fact that it is a fugitive from the C++ language lawyers).

Code has entropy. Let's define this here as "information that is beyond syntax".

I am afraid your choice of words and definition needs some work here. I found that it clouded your argument. "Entropy" is normally defined by the amount of disorder in a system. Your definition seems to describe the word "semantics", but you don't use it that way.

A coding convention allows you to increase consistency and "control" entropy.

I am not arguing against clear coding conventions. In fact, I find myself arguing for some new ones! I am arguing that I don't need to learn unnecessary skills in shift-key accuracy, or learn to be able to tell the difference between SsSsSsS and SsSSSsS when skimming code quickly. Make it these identifiers the same, and let my IDE deal with making the display consistently.

It's ridiculous to argue if your proposal + coding conventions would produce code that had consistent case, because then it'd compile on a case sensitive compiler as well, so let's assume that case is at least sometimes inconsistent.

I think this is probably our key sticking point. Let me explain.

As long as we write code like "class Foo foo(25);", we will make common mistakes, but, far worse, we will also continue to think we need to use case-sensitive languages to support it.

As long as we use case-sensitive languages, our development tools MAY NOT help us out, by correcting case to follow the coding conventions. That would change the semantics, and while an IDE can help out with the formatting, it may not change the semantics.

I have used case-insensitive languages, and tools that smash-case in the right way (Dictionary Definition canonical form)! I plan to blog about them in the future), and they are surprisingly liberating and productive. I need to work with the languages that other people use, so I am calling for others to throw off their shackles too.

Your proposal tell you less about, say, a variable name, because case is no longer tied to some feature of the variable.

I've got a new analogy to try out here.

UI designers recommend that you do not use colour alone to indicate some fact - too many humans are colour-blind.

In the same way, I recommend that you do not use case alone to indicate the kind of identifier - too many humans are (temporarily) case-blind when skimming code.

If you start using other indicators (like a "my" prefix) - perhaps in conjunction with case hints - then this objection is removed.

So, I can type "foo myfoo(25);", and the IDE will fix it up to "Foo myFoo(25);" and everyone is happy. I'll settle for happy enough not to complain!


p.s. I've just noticed that your stylesheet is non-case preserving for my name!
 
Man, this thing is huge now, and I'm tired of reading. Here's what I get from your message:

You think repeated names are bad, no matter what they are. Even hungarian notation (or "the" or "my" as specific prefixes), is preferable.

This is because english speakers don't care about case.

You think I'm talking about semantics. I was afraid my definition would result in that misconception. It's basically a "what's the programmer trying to tell me?"

We agree on strong coding conventions.

Let's get to the sticking point, which I think the final point is related to: What (I think) you're doing by clobbering case is that you're limiting the ability to re-use same name by the compiler. What I'm doing by keeping case is limiting the ability to have inconsistent naming in the code.

I think we'd both agree that both things are bad. We disagree in which is worse, which is kinda hard to argue.

I'm going to revert to the Intel rule: Make the CPU simple and make the compiler handle the complexity. In the same way, make the language really simple so the IDE can handle the complexity. I'd easily believe that an IDE could produce code that we'd both be happy with, and would compile on both a case-sensitive compiler, as well as a CICP compiler.
 

Add Comment


<< Home