Disclaimer: I want to be clear about my intent in this post. I am absolutely not arguing that programming, software engineering, testing, quantitation and other practical tasks or related fields are not every bit as important as Computer Science. These highly-related disciplines absolutely are just as important and useful as CS. Rather, what I’m trying to convey is that there exists, largely as a product of popular culture and a perpetuation of common misconceptions, a widespread misunderstanding of what Computer Science encompasses; particularly that there is a deeply theoretical core that constitutes the foundation of Computer Science that can and should not be ignored. Further, despite the fact that many students (including myself at the time) don’t fully appreciate this theory when they learn it, it is every bit as important as other, more practical considerations.
Further, I don’t mean to imply that areas of Computer Science that are not computability theory or complexity theory are somehow not real CS or are not every bit as interesting or worthy of study. However, I do wish to point out that the work done by people in almost every branch of Computer Science consists of certain things that are “Computer Science proper” and other things that reside in a different domain. This doesn’t mean that these people aren’t real Computer Scientists, or that the things in a different domain are any less important. For example, I do research in Computational Biology. Part of what I do on a daily basis is Computer Science, part of it is software design and programming, part of it is Biology and a lot of it is Statistics or other branches of Mathematics. These other tasks are just as interesting, fulfilling and important as the CS I do, they just aren’t CS. In the following post, I don’t mean to offend anyone in CS or otherwise, but rather to help clarify some of the common public misperceptions of what CS actually is.
I was going to title this post What is Computer Science? However, the writeup attached to such a title is more likely to be a book or at least a manifesto than a blog post. Instead, I figured that I can write a reasonably sized post (or a few) about what Computer Science (CS) isn’t — specifically, what things people usually, but incorrectly, conflate with CS.
There are two practices that people often conflate with Computer Science that I find to be the two most prevalent misconceptions. The first (and this hits home if you are, yourself, a computer scientist and you have family members who aren’t) is “tech support.” In particular, people think that Computer Science is about learning how specific computers or pieces of software work and how to fix them when they break. Now, it’s probably true that your typical Computer Scientist may have more luck installing a printer driver than the typical lay person, but that has nothing whatsoever to do with their formal training in Computer Science and more to do with the fact that they (typically, but not always) just spend a lot more time dealing with computers.
However, the conflation of tech support with CS, though very common, isn’t very interesting. In particular, they’re so different that it should be easy to clear up such confusion; potentially with the help of some analogies. Here are two great quotes (often attributed potentially incorrectly to Edgar Dijkstra) that really cut to the heart of the matter:
“Computer science is not about machines, in the same way that astronomy is not about telescopes. There is an essential unity of mathematics and computer science” – Michael R. Fellows (1991) ”Computer SCIENCE and Mathematics in the Elementary Schools”
“What would we like our children- the general public of the future—to learn about computer science in schools? We need to do away with the myth that computer science is about computers. Computer science is no more about computers than astronomy is about telescopes, biology is about microscopes or chemistry is about beakers and test tubes. Science is not about tools, it is about how we use them and what we find out when we do.” –Micheal R. Fellows, Ian Parberry (1993) “SIGACT trying to get children excited about CS”. in: Computing Research News. January 1993.
Okay, so the take-away message is that Computer Science isn’t about computers or “tools” in general in the same way that astronomy (or astrophysics) isn’t about telescopes. You might expect that an astrophysicist knows more about a telescope than, say, a librarian, simply because they might use them substantially more frequently. However, the heart of what they do has nothing, whatsoever, to do with the telescope itself. The same is true of Computer Scientists and computers.
Another thing that people commonly conflate with CS, which requires a more subtle distinction, is computer programming (CP). The reason that this is more difficult misconception to rectify is that the vast majority of computer scientists do program. In fact, many of them program a lot. Yet, the practice of programming itself does not constitute Computer Science. Let me motivate this distinction with an anecdote.
My advisor created a new class, which he is teaching this semester, called “Algorithms and Data Structures (for Scientists).” The title of the class should make the aim fairly clear; the goal of the course it to teach scientists (i.e. technically competent and mathematically mature (graduate) students) how to understand, develop and analyze algorithms and data structures. As computational techniques become more prevalent in different scientific domains, the need for computational specialists will increase but so too will the need of the domain scientists to acquire a base level of understanding of the techniques they employ. So there was much interest in the class and many people registered.
Now, about 3/4 of the way through the course, my friend, who is one of the TAs, and I were talking at lunch and he explained to me that a number of the students in the course were concerned or confused by the lack of programming assignments. Now, the course syllabus was clear that the class itself would consist of very little programming (~ 2 assignments), but instead would be heavy on written homework problems. It was also clear that the class is primarily about the design and analysis of algorithms — the goal is that at the end of the course, the student should be able to design efficient algorithms to tackle the problems they encounter and analyze (i.e. prove) the correctness of those algorithms and characterize their running times. Yet, despite the fact that the course content and goals were clearly described, one can still understand the students’ confusion. In particular, amongst those who are not, themselves, Computer Scientists (e.g. scientists in other fields), the idea that Computer Scientists are programmers and that Computer Science is programming is rampant.
Addendum: So as to not mis-represent the class or my opinion, I want to clarify the above paragraph with the following: I’m not arguing that one shouldn’t be implementing the algorithms that he learns in such a class. In fact he should be, and the students are encouraged to do so. What I’m arguing is that, because of a fundamental misconception about what “algorithm design and analysis” means, it’s not uncommon to have people sign up for a class expecting something different from what they get (and different from algorithm design and analysis). Implementing a shortest path algorithm and verifying that it computes shortest paths for a large number of test cases is great, and it most likely will help aid your understanding of the algorithm. However, it doesn’t constitute a formal proof of correctness, and the proof of correctness, just like the practical implementation, is also important. That’s what I’m trying to get at here; not that the practical things are unimportant (they are, in fact, of the utmost importance), but that the theoretical things are not just a silly, pedantic waste of time but are fundamentally important in their own right (and worthy of study). Furthermore, the class itself does incorporate programming assignments. However, they are outnumbered in both frequency and focus by other written homework assignments.
Yet, apart from their connection to the theoretical basis of computation (i.e. the Lambda calculus), real programming languages have as little to do with Computer Science as, again, telescopes have to do with astronomy. Programming languages are hugely important, and their speed, expressiveness and features makes an immense practical difference in the construction of efficient and scalable software.
However, from the perspective of Computer Science proper, Haskell and BrainF*&k are equivalent; they are Turing complete languages capable of performing the same set of computations*(see below). However, the algorithm is distinct as an object of design and study from any particular implementation thereof. This realization is at the basis of understanding the difference between CP and CS. Computer Science is math; it is the study of what is computable and what is efficiently computable, and it is about the design and analysis of those procedures that can efficiently compute things. One can design an algorithm, prove its correctness and characterize its runtime without entering so much as a character into a source code file, and all of those tasks fit nicely within the realm of Computer Science, yet none of them require (or admit) one bit of programming. Now, to make the designed algorithm useful, one would have to implement it, and take into account not just theoretical but also practical considerations (e.g. Is the algorithm asymptotically efficient or actually efficient? What design and engineering decisions need to be made so that the implementation is practical for real-world data? etc.).
So, while the programming and the implementation that follow are of the utmost importance, and while they constitute due diligence for most CS research, these tasks are not, themselves, Computer Science proper. Perhaps we can draw another analogy here as well. The design and analysis of the algorithm (CS) might be akin to the discovery of Bernoulli’s principle (putting aside the momentous but philosophical issues regarding invention vs. discovery), while the engineering and efficient implementation of the algorithm (CP) might be akin to the construction of an airfoil that operates based on the principle. It’s not a great analogy (certainly not as good as the one involving the telescope), but it does relate the substantial difference between the related but absolutely distinct disciplines of Computer Science and Computer programming.
Unfortunately, I don’t think that this confusion is something that’s going to go away anytime soon, as Computer Science programs themselves continue to mix CS and CP more heavily in their curricula. It’s not that this mixing is bad, per se. In fact, without the ability to design and construct software (a CP task), graduating CS majors would be much less employable than they currently are, though perhaps many more of them would be continuing on to do theoretical work in graduate school. However, this conflation of the two disciplines does confuse even further the distinction between them, and it makes it even more difficult to explain to people in other fields what Computer Science is and what it’s about (i.e. that it’s not about Python, or Matlab or even programming in general). I don’t know how to solve this problem, but I do know that it exists and that it doesn’t look to be going away anytime soon.
*As pointed out by gasche in the Reddit post linking this blog entry , the original sentence doesn’t convey my actual intent. While Haskell and BrainF&*k are equivalent from a computability theory perspective, and they both represent a Turing complete model of computation, there are real theoretical differences between them that can’t be swept under the rug. Note, however, that given a BrainF&*k interpreter, one could write a Haskell compiler (the other way around can also be done – and it has). There are also, obviously, a world of practical difference between them, but that was the point of raising these two examples in the first place. What I really wanted to get at in the difference between the algorithm and the implementation of that algorithm is what is commonly captured by Landau (big-O) notation, and the core idea is put rather well by gasche’s comment on Reddit:
This conveys the idea that even though a given program may be consistently running ten time faster on this different machine or with this different compiler, we decided to abstract those details out and look at the way the performance evolve on large inputs, and this allows to draw conclusion that resist rapid advance in machine power or other technologies (but, of course, sometimes you want to do finer analyses than that, doing less approximations in a still scientific manner, as you do when work e.g. on cache-oblivious algorithms with a refined abstract machine model).
So the argument is that the algorithm describes a procedure for carrying out some calculation that is largely independent of any particular language (formal semantics) and that is applicable in a wide-range of models of computation. The design and analysis of the algorithm deals directly with questions about the computability and efficient (asymptotically) computability of the solution. Obviously, to apply the algorithm to solve real-world problems, a specific implementation and the quality of that implementation are of the utmost importance. However, the design and analysis of the algorithm can be carried out independently of any particular implementation. While the theory and practice are often studied together and act synergistically, they often have distinct goals and sometimes answer distinct questions.