Delphi, Java and Visual Basic! Oh my!

Yesterday, MyBroadband published an editorial piece regarding the Department of Basic Education’s choice of programming languages for the high school curriculum (and for the coders who come here, it’s well worth a read).  Specifically, the direction seems to be Delphi — this resulted in a storm of comments describing Delphi as “outdated”, “obsolete”, “antiquated” and similar.  While those are perfectly valid points, there is something important that those commentators have overlooked.

Delphi is an object-oriented derivative of the Pascal language, created in the late 1960s, and what a lot of people don’t realize is that Pascal (and hence, by extension, Delphi) was primarily created as a language to teach students structured programming.  The language lacks features that would make it useful in a commercial/production environment (I certainly wouldn’t use it to pay the bills!), but it’s just fine for teaching basic concepts — perhaps not the absolute best choice (as I’ll go into a little later), but a solid choice nonetheless.  True, it’s not what’s being used out there in the Real World, but as the DBE (correctly, in my opinion) puts it, their aim is not vocational training, but “to lay a solid foundation to enable a learner to pursue further education at [a higher education institution] in the IT field”.  Delphi, Pascal, and several other languages do just that.

This is something that I can most definitely attest to with my personal experience.  I took my first foray into programming with Turbo Pascal, when my age was still in the single figures, and it was the language that I used when I completed high school in 2002.  (We were the last class to use Turbo Pascal though, the 2003 class were on Delphi.)  I have neither seen nor written a line of Pascal code since, but the concepts taught served me well when I moved on to “Real World” languages (C, C++, C#, Java, PHP and plenty of others).  A few years later, when working as an instructor at a private college, I noticed a distinct pattern: the people who had those concepts instilled into them in high school generally handled the subject matter satisfactorily (it was mainly Pascal and Delphi folks filtering though), whereas the people who hadn’t were jumping straight into C#, Java and Visual Basic, and finding themselves well out of their depths.

The last sentence above is worthy of further elaboration and dissection, as a lot of people over on the MyBroadband thread believe Java to be a worthy first language.  I strongly disagree, and I’m not the only one.  In January 2008, Dr. Robert B.K. Dewar and Dr. Edmond Schonberg published in the Journal of Defense Software Engineering a piece entitled “Computer Science Education: Where Are the Software Engineers of Tomorrow?” (freely downloadable as a PDF here), in which Java comes in for some particularly savage mauling (search the paper for “The Pitfalls of Java as a First Programming Language”).  As they brutally put it, Java “encourages the [first time] programmer to approach problem-solving like a plumber in a hardware store: by rummaging through a multitude of drawers (i.e. packages) we will end up finding some gadget (i.e. class) that does roughly what we want”.  There’s a lot of boilerplate code that one has to write in Java around a simple “Hello World!” program: there were a few folks over on the MyBroadband thread lamenting the fact that they had to parrot-learn “public static void Main()” without understanding what “public”, “static” and “void” did and, more importantly, why they were important.  It’s perfectly fine if you have the concepts already and are using this in a production environment.  Not so fine though when you’re learning how to program the first time.

Eric S. Raymond, in his “How To Become A Hacker” essay, makes a point that I find very hard to disagree with:

There is perhaps a more general point here. If a language does too much for you, it may be simultaneously a good tool for production and a bad one for learning. It’s not only languages that have this problem; web application frameworks like RubyOnRails, CakePHP, Django may make it too easy to reach a superficial sort of understanding that will leave you without resources when you have to tackle a hard problem, or even just debug the solution to an easy one.

Having said that however, I have some concerns about the Department of Basic Education’s approach. From the MyBroadband article, it looks like the curriculum will be primarily based on using wizards; I may be a bit old-school, but this approach makes me uncomfortable. To me, it’s just a different type of boilerplate (just a different iteration of “public static void Main()” in a way) — great for production, where time is a factor, but for learning and educational purposes, you want people to know (0) what the wizard is doing, and (1) why it’s doing what it’s doing. Nothing that I read in the original article gives me any confidence that pupils will be taught this.

Finally, while I consider Pascal/Delphi good teaching languages, I don’t consider them to be the best.  That accolade, to me, goes to Python.  From a beginner point of view, it’s cleanly designed, well documented and, compared to a lot of other languages out there, relatively kind to beginners — and yet, the language itself is powerful, flexible and scalable to far larger projects.  Moreover, the language is free (both free as in freedom and free as in beer), which was one of the original requirements of the Department of Basic Education but which seems to have been kicked to the sidewalk at some point.  For those interested, ESR has written a detailed critique of Python, and the Python website itself has some very good tutorials.

Download All The Things, Round II

Those of you who have been reading this blog for a while may recall Download All The Things!, where I investigated the feasibility of downloading the entire Internet (lolcats included, of course).  I’ve decided to revisit this, but with one small (or not so small) difference: change our estimation of the size of the internet.

For Round II, I’m going with one yottabyte (or “yobibyte” to keep the SI religious happy).  This is a massive amount of data: 1024 to the power 8 (or 2 to the power 80) bytes (and no, I’m not typing the full figure out on account of word-wrapping weirdness); it’s just short of 70,000 times the size of our previous estimate.  To give a more layman-friendly example: you know those 1 terrabyte external hard drives that you can pick up at reasonable prices from just about any computer store these days?  Well, one yottabyte is equivalent to one trillion said drives.  A yottabyte is so large that, as yet, no-one has yet coined a term for the next order of magnitude.  (Suggestion for those wanting to do so: please go all Calvin and Hobbes on us and call 1024 yottabytes a “gazillabyte”!)

There’s two reasons why I wanted to do this:

  • Since writing the original post, I’ve long suspected that my initial estimate of 15 EB, later revised to 50 EB, may have been way, way too small.
  • In March 2012, it was reported that the NSA was planning on constructing a facility in Utah capable of storing/processing data in the yottabyte range.  Since Edward Snowden’s revelations regarding NSA shenanigans, it’s a good figure to investigate for purposes of tin foil hat purchases.

Needless to say, changing the estimated size of the internet has a massive effect on the results.

You’re not going to download 1 YB via conventional means.  Not via ADSL, not via WACS, not via the combined capacity of every undersea cable.  (It will take you several hundred thousand years to download 1 YB via the full 5.12 Tbps design capacity of WACS.)  This means that, this time around, we’re going to have to go with something far more exotic.

What would work is Internet Protocol over Avian Carriers — and yes, this is exactly what you think it is.  However, the avian carriers described in RFC 1149 won’t quite cut it out, so we’ll need to submit a new RFC which includes a physical server in the definition of “data packet” and a Boeing 747 freighter in the definition of “avian carrier”.  While this is getting debated and approved by the IETF, and we sort out the logistical requirements around said freighter fleet, we can get going on constructing a data centre for the entire internet.

As for the data centre requirements, we can use the NSA’s Utah DC for a baseline once more.  The blueprints for the data centre indicate that around 100,000 square feet of the facility will be for housing the data, with the remainder being used for cooling, power, and making sure that us mere mortals can’t get our prying eyes on the prying eyes.  Problem is, once the blueprints were revealed/leaked/whatever, we realised that such a data centre would likely only be able to hold a volume of data in the exabyte range.

Techcrunch told us just how far out the yottabyte estimate was:

How far off were the estimates that we were fed before? Taking an unkind view of the yottabyte idea, let’s presume that it was the implication that the center could hold the lowest number of yottabytes possible to be plural: 2. The smaller, and likely most reasonable, claim of 3 exabytes of storage at the center is directly comparable.

Now, let’s dig into the math a bit and see just how far off early estimates were. Stacked side by side, it would take 666,666 3-exabyte units of storage to equal 2 yottabytes. That’s because a yottabyte is 1,000 zettabytes, each of which contain 1,000 exabytes. So, a yottabyte is 1 million exabytes. The ratio of 2:3 in our example of yottabytes and exabytes is applied, and we wrap with a 666,666:1 ratio.

I highlight that fact, as the idea that the Utah data center might hold yottabytes has been bandied about as if it was logical. It’s not, given the space available for servers and the like.

Yup, we’re going to need to build a whole lot of data centres.  I vote for building them up in Upington, because (1) there’s practically nothing there, and (2) the place conveniently has a 747-capable runway.  Power is going to be an issue though: each data centre is estimated to use 65 MW of power.  Multiply this by 666,666, and… yeah, this is going to be a bit of a problem.  Just short of 44 terawatts are required here, and when one considers that xkcd’s indestructible hair dryer was “impossibly” consuming more power than every other electrical device on the planet combined when it hit 18.7 TW, we’re going to have to think outside of the box.  (Pun intended for those who have read the indestructible hair dryer article.)

Or not… because this means that our estimate of one yottabyte being the size of the internet is way too high.  So, we can do this in phases: build 10,000-50,000 or so data centres, fill them up, power them up, then rinse and repeat until we’ve got the entire Internet.  You’ll have to have every construction crew in the world working around the clock to build the data centres and power stations, every electrical engineer in the world working on re-routing power from elsewhere in the world — especially when one considers that, due to advancements in technology (sometimes, Moore’s Law is not in our favour), the size of the Internet will be increasing all the time while we’re doing this.  But it might just about be possible.

That said: even due to the practical impossibility of the task, don’t underestimate the NSA.

Or, for that matter, Eskom:

When is that extra 0.9% important?

The question in the post title was asked on Quora recently (in the form of “when is the difference between 99% accuracy and 99.9% accuracy important”), and while it mainly attracted some stock-standard responses, such as service level agreements, Alex Suchman told us when it’s really important: when it can stop a zombie apocalypse.

It’s 2020, and every movie buff and video gamer’s worst fear has become reality. A zombie outbreak, originating in the depths of the Amazon but quickly spreading to the rest of the world (thanks a lot, globalization) threatens the continued existence of the human race. The epidemic has become so widespread that population experts estimate one in every five hundred humans has been zombified.

The zombie infection (dubbed “Mad Human Disease” by the media) spreads through the air, meaning that anyone could succumb to it at any moment. The good news is that there’s a three day asymptomatic incubation period before the host becomes a zombie. A special task force made of the best doctors from around the world has developed a drug that cures Mad Human, but it must be administered in the 72-hour window. Giving the cure to a healthy human causes a number of harmful side effects and can even result in death. No test currently exists to determine whether a person has the infection. Without this, the cure is useless.

As a scientist hoping to do good for the world, you decide to tackle this problem. After two weeks of non-stop lab work, you stumble upon a promising discovery that might become the test the world needs.

Scenario One: The End of Mankind

Clinical trials indicate that your test is 99% accurate (for both true positives and true negatives). Remembering your college statistics course, you run the numbers and determine that someone testing positively will have Mad Human only 16.6% of the time [1]. Curse you, Thomas Bayes! You can’t justify subjecting 5 people to the negative effects of the cure in order to save one zombie, so your discovery is completely useless.

With its spread left unchecked, Mad Human claims more and more victims. The zombies have started taking entire cities, and the infection finally reaches Britain, the world’s last uncontaminated region. Small tribal groups survive by leaving civilization altogether, but it becomes clear that thousands of years of progress are coming undone. After the rest of your family succumbs to Mad Human, you try living in isolation in the hope that you can avoid the epidemic. But by this point, nowhere is safe, and a few months later you join the ranks of the undead. In 2023, the last human (who was mysteriously immune to Mad Human) dies of starvation.

Scenario Two: The Savior

Clinical trials indicate that your test is 99.9% accurate. Remembering Bayes’ Theorem from your college statistics course, you run the numbers and determine that someone testing positively will have Mad Human 66.7% of the time [2]. This isn’t ideal, but it’s workable and can help slow the zombies’ spread.

Pharmaceutical companies around the world dedicate all of their resources to producing your test and the accompanying cure. This buys world leaders precious time to develop a way to fight back against the zombies. Four months after the release of your test, the U.S. military announces the development of a new chemical weapon that decomposes zombies without harming living beings. They fill Earth’s atmosphere with the special gas for a tense 24-hour period remembered as The Extermination. The operation is successful, and the human race has been saved!

Following the War of the Dead, you gain recognition as one of the greatest scientific heroes in history. You go on to win a double Nobel Prize in Medicine and Peace. Morgan Freeman narrates a documentary about your heroics called 99.9, which sweeps the Academy Awards. Your TED Talk becomes the most-watched video ever (yeah, even more than Gangnam Style). You transition into a role as a thought leader, and every great innovator of the next century cites you as an influence.

Life is good.

That is when the difference 99% and 99.9% matters.

[1] A 99% accurate test doesn’t mean that someone who tests positive has a 99% chance of actually being positive. Because the event of having the infection is so relatively rare (only 1 in 500) and the event of not having the disease is so common (499 in 500), even though the test is rarely wrong, it turns out to be more likely that a positive test comes from a healthy person than a sick one. To compute this we use Bayes’ Theorem, which states that

P(A|B) = \frac{P(B|A)P(A)}{P(B)}

We let A be the event that the person is sick and B be the event that the person tests positive, so we have

P(sick|+ test) = \frac{P(+ test|sick)P(sick)}{P(+ test)}
In this situation,

P(+ test|sick) = .99


P(sick) = .002 (that’s 1 in 500)

To compute P(+ test) we have to condition on whether the person is sick or not. So

P(+ test)

= P(+ |sick)P(sick) + P(+|not)P(not)

= (.99)(.002) + (.01)(.998) = 0.01196

Plug everything in and we get

P(infected|+ test) = \frac{.99*.002}{0.01196} = 0.16555

[2] This time,

P(+ test|sick) = .999


P(+ test)

 = (.999)(.002) + (.001)(.998)

 = 0.002996


P(sick|+ test) = \frac{.999*.002}{0.002996} = 0.66689

Perpetual motion

Anyone who claims perpetual motion to be impossible has obviously never encountered an argument on the internet

Considering that perpetual motion is defined as “motion that continues indefinitely without any external source of energy; impossible in practice because of friction”, I think we’ve finally found something that meets the definition without being restricted in any way by the improbability.

Download all the things!

One of my friends over on my crappy little forum recently received the following support ticket (and, quite understandably, facepalmed):

Can you please download internet on my system?

Rather than partake in some sympathetic facepalming of my own, I thought I’d come up with a quite literal answer, in xkcd’s “What If?” style.*

The first question we have to answer is: what is the size of the Internet?  Any answer will be an estimate at best (and wild speculation at worst), because the cold, hard truth is that no-one knows.  That’s because of the distributed nature of the Internet (as well as the underlying TCP/IP protocol suite that the Internet is built on) — with quite possibly millions of servers connected over the world, it’s hard to measure for sure.  The other problem: what would count towards the size requirement?  Certainly content served over HTTP/HTTPS would count, but FTP? SMTP? NNTP? Peer to peer filesharing?  And would any content accessible indirectly (such as data stored in a backend database) count?

The only thing that we have to go on is an estimate that Eric Schmidt, Google’s executive chairman, made back in 2005; at the time, he put the estimate at around five million terabytes (while I’ll round up to 5 exabytes).  At the time, Google only indexed 200 terabytes of data, so Schmidt’s estimate probably took e-mail, newsgroups, etc. into consideration.  Due to our world becoming more connected in the interceding 8 years, that figure has likely shot up, particularly with sites such as YouTube, Facebook, Netflix, The Pirate Bay et al coming into the equation.  I’m going to throw a rough guestimate together and put the figure at 15 EB today, based on my gut feeling alone.  (Yes, I know it’s not terribly scientific, and I’ve probably shot way too low here, but let’s face it — what else gives?)

Currently, down here on the southern tip of Africa, our fastest broadband connection is 10 Mbps ADSL.  In reality, our ISPs would throttle the connection into oblivion if one were to continually hammer their networks trying to download the Internet like that (contention ratios causing quality of service for everyone else to be affected and all of that), but let’s assume that, for the purposes of this exercise, we can sweet-talk them into giving us guaranteed 10 Mbps throughput.  15 EB of data works out to a staggering 138,350,580,552,821,637,120 bits of data, and given that we can download 10,000,000 of those bits every second (in reality, it will be lower than this due to network overhead, but let’s leave this out of the equation), it would take almost 440,000 years to download the Internet over that connection.

But actually, with that length of time, you’d never be able to download the Internet.  Considering that the Internet went into widespread public use in the early 1990s (not considering the decades before when the Internet was pretty much a research plaything), the Internet is growing at a faster rate than one can download it using a 10 Mbps connection.  Plus, given the timeframe involved, the constant status update requests on the support ticket would drive all involved to suicide, even if (actually, particularly if) we discover a way of making human immortality a possibility in the interim.  Clearly, we need something a lot faster.

Enter the WACS cable system.  It’s a submarine cable that links us up to Europe via the west coast of Africa, cost US$650 million to construct, and has a design capacity of 5.12 Tbps.  If we could secure the entire bandwidth of this cable to download the Internet, we could do it in a little over 10 months.  While we may still have the aforementioned suicide problem, this is far more like it.

But of course, what point would we have downloading the Internet if we can’t store the data we just downloaded?

Currently, the highest capacity hard drives have a capacity of 4 TB (here’s an enterprise-level example from Western Digital).  We’d need a minimum of 3,932,160 such drives to store the Internet (in the real world, we’d need more for redundancy, but once again, let’s not worry about that here).  Our enterprise-level drives use 11.5 watts of power each, so we’d need ~45 MW of power to simply power the hard drives alone; we’d need plenty more (and I’m thinking around 10 to 15 times more!) to power the hardware to connect all of this up, the building where this giant supercomputer will be housed, and the cooling equipment to keep everything running at an acceptable temperature.  We’d need to build a small power plant to keep everything running.

So yes, you can download the Internet.  You just need a major submarine communications cable, tens of millions of hard drives, and a small power plant to provide enough electricity to run it all.  If you get started now, you can give someone the present of One Internet** when next Christmas rolls around.  The question of dealing with bandwidth and electricity bills is one that I will leave to the reader.

Now get going, dammit!

* Randall, if you’ve somehow stumbled upon this and you think you could do a better job than myself, go for it!

** Though, depending on who the recipient is, you may or may not want to include 4chan’s /b/ board.

UPDATE #1: I was asked to up it to 50 EB, which on retrospection may be a more realistic size for the Intranet than the 15 EB I put forward earlier.  That would take almost 3 years to download on WACS and would require 13,107,200 hard drives with a significantly increased power requirement.  The Koeberg Nuclear Power Station (not too far away from the WACS landing site at Yzerfontein) has two reactors, each capable of producing 900 MW, so if we take Koeberg off the national grid (which will cause the rest of the country to experience rolling blackouts, but hey, it’s in the name of progress!) and use the entire nuke plant’s capacity to power our supercomputer and related infrastructure, that should just about do it.

Real life questing

With a public holiday yesterday, and with an old friend from Durban finally coming down to visit, I decided to take some people on a tour through the Winelands.  Also joining me were my two future housemates, and a new friend of mine: a visiting master’s student from Canada (being the only non-nerd in the group).

Of course, in true nerd style (much to our non-nerd’s bemusement), we did this in the style of a World of Warcraft quest chain:

  1. Assemble a party of fellow questers.  Your fellow party members may be found in Mowbray, Sunningdale and Gordon’s Bay.
  2. Journey to the Boschendal Wine Estate and acquire 1 Bottle of Fine Red Wine.  Completing this quest requires 50 gold.
  3. Journey to Fairview and acquire 2 Cheese Platters and 1 Loaf of Freshly-Baked Bread.  Completing this quest requires 100 gold.
  4. Prepare a Banquet of the Winelands to feed your party.  A Banquet of the Winelands may only be prepared at the Afrikaanse Taalmonument.  Party members that spend at least 10 minutes eating and drinking will be Well Fed and will receive the buff “Scribble Big Bang Theory Quotes on Ron’s Car” for 6 hours. (My fault for not washing it!)

I also took everyone to Nederburg and through the Huguenot Tunnel afterwards before the group disbanded.  Muchness of fun.

Lunar landings, 25 km away

NASA’s Lunar Reconnaissance Orbiter (which is on a mission to, amongst other things, map the moon’s surface as a precursor to future manned missions) was recently moved into a orbit of 25 km above the moon’s surface.  One of the things that the orbiter did while in that orbit was to take the following image of the Apollo 17 landing site:

Apollo 17 landing site

More information (and more images) are over at the NASA press release here.

SVN breakages: Wrath of the Sith!

We’ve had some issues within our department previously with the misuse of SVN.  In particular, some folks were merging changes from their development branch to the trunk branch, then forgetting to commit the trunk branch – which would result in hell breaking loose a bit further down the line.

In an attempt to sort this problem out, we set up a war board system that monitors the trunk SVN repositories for any uncommited stuff.  When it finds anything, it starts flashing alerts, and the developer responsible then has to drop whatever he’s doing and fix it.  It’s worked well.

Until today, when we rigged it to additionally play the Star Wars Imperial March…