CTI Programmer's Guide |
Links
|
While I was employed at CTI, I noticed that there was quite a lot of stuff about programming that the CTI courses didn't cover. Things like good coding habits, tips on being a better programmer, and which one of the programming languages was best suited for them. So I wrote this essay to try and explain all of this stuff to them, and distributed it to my students. I compiled this from various sources, all of which are listed at the bottom of the page. Why programming?Why would anyone want to learn programming? Well, apart from the fact that picking up any kind of useful skill is a good idea, learning to tell a computer what to do and how to do it means learning a lot of problem-solving, logical ways of thinking about things, and clarity of expressing what you're trying to say. These are all worth knowing in the real world, even if you never write anything more complex than "hello world". More directly, being able to hack up a quick program to automate some tedious task (be it adding all the marks in a set of tests, calculating how much interest you'll pay on the car you're about to buy, automatically generating SQL code, etc.) can save a lot of time and, even if it doesn't, can turn a tedious job into something more interesting and creative. I originally taught myself to program so that my mom could work out her petrol consumption, but I was about 10 years old at the time so that kind of reason was enough. Perhaps it's enough for you as well. Which programming language?You've got a choice of four: C++, C#, Java, and Visual Basic .NET. I’ve had experience with all four of them, so I'm in a good position to tell you what you should go for. Visual Basic .NET is a decent enough language to do for those of you reading this who haven't programmed before, as the syntax is quite close to everyday English. This gives it the advantage of being very easy to read. As an example, my own programming experience in VB .NET is very limited, but put some code in front of me and I can probably figure out what the coder is doing. It's a good start to get into the industry (there's a large VB .NET market in Durban), and it's less strict than the other languages. This, however is a bit of a Bad Thing, as you can pick up bad coding habits this way. Ultimately, you should eventually move to something more substantial, but it's a very good way to get going if you're new, you can do most things with it (even if the other languages are more powerful), and there is a definite job market for it. The other .NET framework language, C#, is far more powerful. It's also not that difficult to use (although the syntax can be confusing to first-timers, once you've got it, you've got it - and besides, it's quite close to pseudocode, which you learn in Program Design). C# is a very new language; it inherits qualities from both C++ and Java. In fact, the Java programmers call it Microsoft's version of Java (which isn't quite true). Until recently, you would have been locked into Microsoft's libraries and tools, like VB .NET (which is also a Bad Thing); however, an open source program called Mono has recently come out of beta. It enables you to develop C# applications on other platforms such as Linux and Mac OS X. C# may be a better choice than VB .NET, although both definitely have their uses. There is one problem with doing C#, and that is the job market, which is still in its infancy. To be fair, C# is a new language and its job market is growing. If you want to do some serious programming in your career, sooner or later you will have to learn some C or C++. C++ is incredibly powerful (Windows is written in C++), while its older brother C (not offered at CTI) is the working language of the Unix/Linux world. In fact, many of the C++ jobs that I've seen require some form of Linux experience. If you decide to do C++, it's worth finishing a few days ahead so that you can do Administering Linux or Advanced Linux as an extra course. (This is less important for the ISE students, who do Administering Linux anyway). This is more difficult that is sounds, as C++ has the disadvantage of being difficult to learn and get around. You see, C++ is by far the most efficient and powerful language that CTI offers, but it gets that efficiency and power by requiring the programmer (you) to do a lot of low-level tasks, such as memory management, by hand. If you're a novice programmer, stay well away from C++. In fact, you should only try it if you're programmed before (and you're good at it). Part of the problem with C++ is that it tries to be backwards compatible with C. (The two languages are very similar - the only real difference is that C is procedural, while C++ is object-orientated. If you know one, you can easily pick up the other.) Bjarne Stroustrup, the creator of C++, said in his retrospective book The Design and Evolution of C++, "Within C++, there's a smaller and cleaner language trying to get out". Many programmers would now add "Yes, and it's called Java". Java's main advantage is that it's portable. The same code will run on Windows, Linux, Mac OS X - on any system that supports the Java Runtime Environment. Well, that's the theory anyway. In practice, there are enough differences in the runtime environment between the various platforms to create some subtle (and not so subtle) issues with portability. Regardless, this is Java's big plus, and I'm sure that most of you reading this now have cellphones that can run Java applications. Perhaps this is why Java draws most of the programming students. Java's portability comes at a price, however. As far as runtime resources go, it's very inefficient. It's also not quite as powerful as C++ or even C# (although it was derived from C++ to a certain extent). That being said, it has different uses from those two languages, and it's good at what it does. So, you may be wondering, where does that leave SQL (which you have to do)? SQL isn't a programming language; it's a scripting language for databases. You'll learn Microsoft's implementation of SQL, but be aware that there are many others out there; the open-source MySQL is the most popular. Take the time to become familiar with SQL when you do it, because you will be writing programs that interact with databases at some point in your career, which means that you will have to know it. Finally, don't be limited to what CTI offers you. There are many other programming languages for you to try. Take Python, as an example. Although not that powerful, it's a wonderfully scalable language that doesn't require you to write a lot of boilerplate code around a simple program. If you haven't programmed before and would like to do some programming before you get on to your main language, learn Python. Perl is another language that you can try, especially if you're planning on doing scripting. If you're even remotely interested in Web development, learn some PHP. A programmer with only one programming language is like a builder with only one tool. You need a hammer and a screwdriver to do most things worth doing. Good Coding HabitsWhen you get out into the industry, you will be working on projects as a team with other programmers. This means that these other programmers need to figure out what you've done (or tried to do). Readability and understandability is the name of the game here. If other programmers have to spend a day trying to figure out what your code does, that's a day that they've wasted when they could have been doing something more productive (like adding nice new features to your code). Besides, I enjoy taking lots of marks off from code that I can't read. So, how do you make your code readable and understandable? Here's how:
Perhaps I should stress indentation a little more. Consider the difference between these two code samples. Don't worry about what the code does if you don't know C++. #include <iostream> #include <iostream>
There are some development tools out there that will indent your code for you. They can make you become complacent in your indenting if you're not careful. Don't let this happen to you. Rather, pay close attention to how your code is indented. You may need this later on. This is particularly important for the VB .NET and C# students, as you go from using an IDE that indents your code for you in the first four units to one that doesn't in the last two. Indenting your code is important, but it's not the only thing that is. Variable names and method names are also significant. Which one of these two code samples makes more sense? #include <iostream> #include <iostream> Of course, if you comment your code, it becomes even more readable and understandable: #include <iostream> Comments can be added after a block is written (although many people find it easier to write the comment first to clarify in their minds what the block does) but indentation and meaningful names must be part of the code. After writing good code for a while, it becomes habit. Writing bad code, however, is also habit-forming. Choose to develop good habits early on and you will have fewer problems later. Of course, this is only part of programming. Even if you follow good coding habits, somewhere along the line your program will either not compile, or break badly when it does. Which brings us to our next point... How to fix bugs effectivelyLearn to Debug Debugging is the cornerstone of being a programmer. The first meaning of the verb to debug is to remove errors, but the meaning that really matters is to see into the execution of a program by examining it. A programmer that cannot debug effectively is blind. Idealists that think design, or analysis, or complexity theory, or whatnot, are more fundamental are not working programmers. The working programmer does not live in an ideal world. Even if you are perfect, you are surrounded by and must interact with code written by major software companies, organizations like GNU, and your colleagues. Most of this code is imperfect and imperfectly documented. Without the ability to gain visibility into the execution of this code the slightest bump will throw you permanently. Often this visibility can only be gained by experimentation, that is, debugging. Debugging is about the running of programs, not programs themselves. If you buy something from a major software company, you usually don't get to see the program. But there will still arise places where the code does not conform to the documentation (crashing your entire machine is a common and spectacular example), or where the documentation is mute. More commonly, you create an error, examine the code you wrote and have no clue how the error can be occurring. Inevitably, this means some assumption you are making is not quite correct, or some condition arises that you did not anticipate. Sometimes the magic trick of staring into the source code works. When it doesn't, you must debug. To get visibility into the execution of a program you must be able to execute the code and observe something about it. Sometimes this is visible, like what is being displayed on a screen, or the delay between two events. In many other cases, it involves things that are not meant to be visible, like the state of some variables inside the code, which lines of code are actually being executed, or whether certain assertions hold across a complicated data structure. These hidden things must be revealed. The common ways of looking into the innards of an executing program can be categorized as:
Debugging tools are wonderful when they are stable and available, but the printlining (which I use most frequently) and logging are even more important. Debugging tools often lag behind language development, so at any point in time they may not be available. In addition, because the debugging tool may subtly change the way the program executes it may not always be practical. Finally, there are some kinds of debugging, such as checking an assertion against a large data structure, that require writing code and changing the execution of the program. It is good to know how to use debugging tools when they are stable, but it is critical to be able to employ the other two methods. Some beginners fear debugging when it requires modifying code. This is understandable - it is a little like exploratory surgery. But you have to learn to poke at the code and make it jump; you have to learn to experiment on it, and understand that nothing that you temporarily do to it will make it worse. If you feel this fear, seek out a mentor, friend, instructor, whoever - we lose a lot of good programmers at the delicate onset of their learning to this fear. How to Debug by Splitting the Problem Space Debugging is fun, because it begins with a mystery. You think it should do something, but instead it does something else. It is not always quite so simple - any examples I can give will be contrived compared to what sometimes happens in practice. Debugging requires creativity and ingenuity. If there is a single key to debugging, it is to use the divide and conquer technique on the mystery. Suppose, for example, you created a program that should do ten things in a sequence. When you run it, it crashes. Since you didn't program it to crash, you now have a mystery. When out look at the output, you see that the first seven things in the sequence were run successfully. The last three are not visible from the output, so now your mystery is smaller: "It crashed on thing #8, #9, or #10." Can you design an experiment to see which thing it crashed on? Sure. You can use a debugger or we can add printline statements (or the equivalent in whatever language you are working in) after #8 and #9. When we run it again, our mystery will be smaller, such as "It crashed on thing #9." I find that bearing in mind exactly what the mystery is at any point in time helps keep one focused. When several people are working together under pressure on a problem, it is easy to forget what the most important mystery is. The key to divide and conquer as a debugging technique is the same as it is for algorithm design: as long as you do a good job splitting the mystery in the middle, you won't have to split it too many times, and you will be debugging quickly. But what is the middle of a mystery? There is where true creativity and experience comes in. To a true beginner, the space of all possible errors looks like every line in the source code. You don't have the vision you will later develop to see the other dimensions of the program, such as the space of executed lines, the data structure, the memory management, the interaction with foreign code, the code that is risky, and the code that is simple. For the experienced programmer, these other dimensions form an imperfect but very useful mental model of all the things that can go wrong. Having that mental model is what helps one find the middle of the mystery effectively. Once you have evenly subdivided the space of all that can go wrong, you must try to decide in which space the error lies. In the simple case where the mystery is: "Which single unknown line makes my program crash?", you can ask yourself: "Is the unknown line executed before or after this line that I judge to be executed in the about the middle of the running program?" Usually you will not be so lucky as to know that the error exists in a single line, or even a single block. Often the mystery will be more like: "Either there is a pointer in that graph that points to the wrong node, or my algorithm that adds up the variables in that graph doesn't work." In that case you may have to write a small program to check that the pointers in the graph are all correct in order to decide which part of the subdivided mystery can be eliminated. How to Remove an Error I've intentionally separated the act of examining a program's execution from the act of fixing an error. But of course, debugging does also mean removing the bug. Ideally you will have perfect understanding of the code and will reach an "aha!" moment where you perfectly see the error and how to fix it. But since your program will often use insufficiently documented systems into which you have no visibility, this is not always possible. In other cases the code is so complicated that your understanding cannot be perfect. In fixing a bug, you want to make the smallest change that fixes the bug. You may see other things that need improvement; but don't fix those at the same time. Attempt to employ the scientific method of changing one thing and only one thing at a time. The best process for this is to be able to easily reproduce the bug, then put your fix in place, and then rerun the program and observe that the bug no longer exists. Of course, sometimes more than one line must be changed, but you should still conceptually apply a single atomic change to fix the bug. Sometimes, there are really several bugs that look like one. It is up to you to define the bugs and fix them one at a time. Sometimes it is unclear what the program should do or what the original author intended. In this case, you must exercise your experience and judgment and assign your own meaning to the code. Decide what it should do, and comment it or clarify it in some way and then make the code conform to your meaning. This is an intermediate or advanced skill that is sometimes harder than writing the original function in the first place, but the real world is often messy. You may have to fix a system you cannot rewrite. How to Deal with Intermittent Bugs The intermittent bug is a cousin of the 50-foot-invisible-scorpion-from-outer-space kind of bug. This nightmare occurs so rarely that it is hard to observe, yet often enough that it can't be ignored. You can't debug because you can't find it. I've heard of a bug in a program that only occurred on the 31st day of each month. It would then be assigned to a developer who would fiddle around with it for a few hours before marking it "impossible to reproduce". Although after 8 hours you will start to doubt it, the intermittent bug has to obey the same laws of logic everything else does. What makes it hard is that it occurs only under unknown conditions. Try to record the circumstances under which the bug does occur, so that you can guess at what the variability really is. The condition may be related to data values, such as "This only happens when we enter Fish Hoek as a value." If that is not the source of variability, the next suspect should be improperly synchronized concurrency. Try, try, try to reproduce the bug in a controlled way. If you can't reproduce it, set a trap for it by building a logging system, a special one if you have to, that can log what you guess you need when it really does occur. Resign yourself to that if the bug only occurs in production and not at your whim, this may be a long process. The hints that you get from the log may not provide the solution but may give you enough information to improve the logging. The improved logging system may take a long time to be put into production. Then, you have to wait for the bug to reoccur to get more information. This cycle can go on for some time. Beta-testing You may think that your program is complete, that your bugs are fixed. However, it's often said that the worst person to test a program is the one who wrote it. That's why any programmer who's thinking will tell you that good beta-testers (who know how to describe symptoms clearly, localize problems well, can tolerate bugs in a quickie release, and are willing to apply a few simple diagnostic routines) are worth their weight in rubies. Even one of these can make the difference between a debugging phase that's a protracted, exhausting nightmare and one that's merely a salutary nuisance. Find a friend of yours and get him or her to play around with your program. They may cause it to crash by doing something that you would never have thought of doing yourself, and you can then fix these bugs. All of this makes your program much more stable and secure. General tipsAs much as I hate to say this, your programming course won't automatically make you a good programmer. Researchers have shown it takes about ten years to develop expertise in any of a wide variety of areas, including chess playing, music composition, painting, piano playing, swimming, tennis, and research in neuropsychology and topology. There appear to be no real shortcuts: even Mozart, who was a musical prodigy at age 4, took 13 more years before he began to produce world-class music. In another genre, the Beatles seemed to burst onto the scene with a string of #1 hits and an appearance on the Ed Sullivan show in 1964. But they had been playing small clubs in Liverpool and Hamburg since 1957, and while they had mass appeal early on, their first great critical success, Sgt. Peppers, was released in 1967. Peter Norvig, Google's director of Search Quality, has published the following recipe for programming success:
If you don't have functional English, take the time and effort to improve your English. Originally, I was reluctant to add this in, lest it be taken as some sort of cultural imperialism. However, English is the working language of the programmer culture and the Internet, and you will need to know it to function correctly. I've recently learned that many programmers who have English as a second language use it in technical discussions even when they share a birth tongue; it was reported to me that English has a richer technical vocabulary than any other language and is therefore simply a better tool for the job. For similar reasons, translations of technical books written in English are often unsatisfactory (when they get done at all). Here's an example. Linus Torvalds, the author of Linux, is Finnish, yet he comments his code in English. It apparently never occurred to him to do otherwise. His fluency in English has been an important factor in his ability to recruit a worldwide community of developers for Linux. It's an example worth following. There's a common misconception that you need to be good at maths to program. Programming uses very little formal mathematics or arithmetic. In particular, you won't usually need trigonometry, calculus or analysis (there are exceptions to this in a handful of specific application areas like 3D computer graphics). Knowing some formal logic and Boolean algebra is good. That's why you do PLC. Much more importantly: you need to be able to think logically and follow chains of exact reasoning, the way mathematicians do. While the content of most maths won't help you, you will need the discipline and intelligence to handle maths. If you lack the intelligence, there is little hope for you as a programmer; if you lack the discipline, you'd better grow it. I think a good way to find out if you have what it takes is to pick up a copy of Raymond Smullyan's book What Is The Name Of This Book?. Smullyan's playful logical conundrums are very much in the programming spirit. Being able to solve them is a good sign; enjoying solving them is an even better one. Lastly, I should mention something about cheating. I've noticed a disturbing tendency for students to copy code from each other (recently I came across two SQL assignments that were near-identical and handed in within 15 minutes of each other). Don't do that. The only real way to learn programming is to do it yourself, figure it out yourself, fix your bugs yourself, and so forth. The skills that doing it all yourself teaches you will be needed once you get out into the industry. If you decide to ignore this advice and copy code anyway, then at least make sure you understand what you're copying, otherwise you'll be creating more problems than you're solving. ReferencesThe Jargon File by various authors How To Become A Hacker by Eric Raymond How to be a Programmer, A Short, Comprehensive, and Personal Summary by Robert Read Teach Yourself Programming in Ten Years by Peter Norvig Teach Yourself Programming by Jeremy Thurgood How to survive Computer Methods pracs by Jeremy Thurgood |