Mathematical intuition

Note: this post is a translation/adaptation of a post I wrote in French a few years ago: Intuition mathématique.

I initially wrote that post as a “warning” to my French-reading readers, saying “I might not manage to avoid annoying language tics such as ‘intuitively’, ‘obviously’, ‘it’s easy to see that'”. I think I got slightly better at that since then (or at least slightly better at noticing it and correcting it), but it’s still probably something I do.

I do try to avoid the “intuitive argument” when I explain something, because it used to make me quite anxious when I was on the receiving end of it and I had a hard time understanding why it was intuitive. But still, it does happen – it did happen in an exam once, to fail at explaining “why it’s intuitive”. Something that felt so brightly obvious that I had forgotten why it was so obvious. It’s quite annoying when someone points it out to you… especially when the “someone” is the examiner.

One of the most interesting articles I read on the topic was from Terry Tao, There’s more to mathematics than rigour and proofs. He distinguishes between three “stages” in math education:

  • the “pre-rigourous stage” – you make a lot of approximations, analogies, and probably you spend more time computing than theorizing
  • the “rigorous” stage – you try to do things properly, in a precise and formal way, and you work with abstract objects without necessarily having a deep understanding of what they mean (but you do work with them properly)
  • the “post-rigorous stage” – you know enough of your domain to know which approximations and which analogies are indeed valid, you have a good idea fairly quickly about what something is going to yield when the proof/computation is done, and you actually understand the concepts you work with.

Six years ago, when I wrote the original version of this post, I was considering myself a mix of the three. I did start to get some “intuition” (and I’ll explain later what I meant by that), but I was still pretty bad at finishing a complex computation properly. And, obviously, it did (and still does) depends on the domain: in “my” domain, I was approximately there; if you ask me right now to solve a differential equation or to work on complicated analysis, I’m probably going to feel very pre-rigourous. I’ve been out of university for a few years now, and there’s definitely some things that have regressed, that used to be post-rigourous and are now barely rigorous anymore (or that require a lot more effort to do things in a proper way, let’s say). One of the nice things that stayed, though, is that I believe I’m far less prone to make the confusion between “pre-rigourous” and “post-rigourous” than I used to be (which got me qualified as “sloppy” on more than one occasion).

Anyway, I believe that the categories are more fluid than what Tao says, but I also believe he’s right. And that there’s no need to panic when the person in front of you says: “it’s obvious/intuitive that”: she probably just has more experience than you do. And it’s sometimes hard to explain what became, with experience, obvious. If I say “it’s obvious than 2 + 3 = 5”, we’ll probably agree on it; if I ask “yeah, but why does 2 + 3 = 5 ?”, I’ll probably get an answer, but I may have some blank stares for a few seconds. It’s a bit of a caricature, but I think it’s roughly the idea.

In the everyday language, intuition is somewhat magical, a bit of “I don’t know why, but I believe that things are this way or that way, and that this or that is going to happen”. I tend to be very wary of intuition in everyday life, because I tend to be wary about what I don’t understand. In maths, the definition is roughly the same: “I think the result is going to look like that, and I feel that if I finish the proof it’s going to work”. The main advantage of mathematical intuition is that you can check it, understand why it’s correct or why it’s wrong. In my experience, (correct) intuition (or whatever you put behind that word) comes with practice, with having seen a lot of things, with the fact of linking things to one another. I believe that what people put behind “intuition” may be linking something new (a new problem) to something you’ve already seen, and to do this correctly. It’s also a matter of pattern matching. When it comes to algorithm complexity, which is the topic of the next two posts that I’ll translate, it’s “I believe this algorithm is going to take that amount of time, because it basically the same thing as this other super-well-known-thing that’s taking that amount of time”. The thing is, the associations are not always fully conscious – you may end up seeing the result first and explain it later.

It doesn’t mean that you can never be awfully wrong. Thinking that a problem is going to take a lot of time to solve (algorithmically speaking), when there is a condition that makes it much easier. Thinking that a problem is not very hard, and realizing it’s going to take way more time than ou thought. It still happens to me, in both directions. I believe that making mistakes is actually better than the previous step, which was “but how can you even have an idea on the topic?”

I don’t know how this intuition eventually develops. I know mine was much better after a few years at ETH than it was at the beginning of it. I also know it’s somewhat worse right now than when I just graduated. It’s still enough of a flux that I still get amazed by the fact that there are things now that are completely natural to me whereas they were a complete unknown a few years ago. Conversely, it’s sometimes pretty hard to realize that you once knew how to do things, and you forgot, by lack of practice (that’s a bit of my current state with my fractal generator).

But it’s also (still) new enough that I do remember the anxiety of not understanding how things can be intuitive for some people. So, I promise: I’m going to try to avoid saying things are intuitive or obvious. But I’d like you to tell me if I err in my ways, especially if it’s a problem for you. Also, there’s a fair chance that if I say “it’s intuitive”, it’s because I’m too lazy to get into explanations that seem obvious to me (but that may not be for everyone else). So, there: I’ll try to not be lazy 🙂

(Ironically: I did hesitate translating this blog post from French because it seemed like I was only saying completely obvious things in it and that it wasn’t very interesting. My guess is – it was less obvious to me 6 years ago when I wrote it, and so it’s probably not that obvious to people who spent less time than I did thinking about that sort of things 😉 )

So… What’s a logarithm?

This is the translation and update of a blog post originally written in French: “Et donc, c’est quoi, un logarithme ?”.

It’s quite hard to write a series of articles about theoretical computer science without talking about logarithms. Why? Because it’s one of the “useful” functions when one talks about algorithm complexity. So, to make sure that everyone is on the same page, this is a remedial class about logarithms. People who have bad memories about high school maths probably also have bad memories of logarithms; however, logarithms are cool.

Speaking of logarithms, I don’t know how they taught you that in your maths classes; for me, I’m pretty sure it’s been defined from the integral of the reciprocal function. STAY HERE, THAT’S NOT WHAT I’M GOING TO DO. Well, not yet. First, let me try to give some intuition about that thing.

Let us first consider the figure I just successfully inserted in my blog post. I started from a point; from this point I made two branches, at the end of which I added a point; on each of these points, I added two branches, at the end of which I added a point, and so on. I could have continued like that for a while, conceptually – I’d have issues with space and fatigue (it’s going to be annoying really fast to draw points), but I think you can imagine a tree like that with as many levels as you want. Let’s also suppose, because we do computer science, that the “root” of the tree (the very first point I added on my picture at the bottom of it) is at level 0.

Now suppose that I want to count the number of “leaves” of my tree, that is to say the number of points that I have on the highest level of my tree.

It’s pretty clear that the number of leaves depends on the level at which I stop drawing my tree, and that it increases for each level. If I stop drawing at level 0, I have 1 leaf. If I stop at level 1, I multiply that by 2 (because I made two branches), so that’s 2. If I stop drawing at level 2, I multiply the number of leaves at level 1 by 2 again, so that’s 8. And for every level, I take the number of leaves from the previous level and I multiply again by two. At level 3, I’ll have 2×2×2 = 8 leaves, and at level 4, 2×2×2×2 = 16 leaves. To know the number of leaves at level n, where n is the number of my level, I do n multiplications of 2 by itself, which can be written 2^n (read “two to the power of n“).

Now suppose that I don’t want to know the number of leaves corresponding to a level, but the number of levels corresponding to a given number of leaves. For instance, I have 2 leaves: that’s level 1. 16 leaves: that’s level 4. 32 leaves, level 5. And if I have 128 leaves, that’s level 7. It gets a bit more complicated if I have, say, 20 leaves. 20 leaves, that’s level “4 and a bit”: I started drawing level 5, and then I stopped because I got too tired to finish.

This operation (finding the level for a given number of leaves) is the inverse function of the previous “power” operation (finding the number of leaves for a given level), and that’s a logarithm. I say it’s the inverse function because it allows me to “undo” the previous operation. If I take a number n, I compute its power of 2, it yields 2^n, and if I take the logarithm of that, I get

\log(2^n) = n

Similarly, if I have a number n, that I take its logarithm, and that I compute the power of two of the result, I get

2^{\log n} = n

Alright, everything’s nice and shiny, but what happens if, instead of making two branches at each step, I make 3? With the same reasoning as before, at level n, I have 3×3×…×3 leaves, that is 3^n. And, well, in the same way, I can define a logarithm that would be the inverse of this 3^n. But I do want to be able to tell one from the other, so I write the power to which they correspond as a subscript, like this:

\log_2, \log_3

with

3^{\log_3 n} = n

and

\log_3(3^n) = n

That subscript is the “base” of the logarithm. Allow me a small remark about logarithm in base 10 (it’s also true for other bases, at least integer, of logarithms, but let me avoid that). It’s very easy to have a rough estimate of the logarithm in base 10 of a number. It’s the number of digits of said number, minus 1. We have \log_{10}(10) = 1, \log_{10}(100) = 2 (because 10^2 = 100); the logarithm base 10 of all the numbers between 10 and 100 is between 1 and 2. In the same way, you can say that the logarithm base 10 of 14578 is between 4 and 5, because 14578 is between 10000 = 10^4 and 100000 = 10^5, which allows to conclude on the value of the logarithm. (I’m hiding a number of things here, including the reasons that make that reasoning actually correct.)

As an aside, as you may see, one interesting property of the logarithm is that it can “compress” orders of magnitude. If you want to represent on a single sheet of paper quantities that have a very large amplitude. For example, in chemistry, you may want to represent concentrations that go from 10^{-10} to 0.1, and on a “normal” scale, you wouldn’t be able to distinguish between 10^{-10} and 10^{-9}. You can however use a logarithmic scale, so that you represent with the same amount of space “things that happen between 10^{-10} and 10^{-9} and “things that happen between 0.01 and 0.1”. For large scales, xkcd made a very nice drawing with the observable universe seen at log scale: Height.

Back to the previous point – now I have defined the concept of “base” for my logarithm – that’s the number corresponding to the power function that I inverse to get my logarithm. The question is – what prevents me from using “exotic” bases for my logarithms? The answer is “nothing”. I can define a logarithm in base 3.5 (corresponding to the power at which I raise 3.5 to get the number for which I’m computing the logarithm base 3.5), \displaystyle \frac 5 3 (corresponding to the power at which I raise\displaystyle \frac 5 3 to get the number for which I’m computing the logarithm base \displaystyle \frac 5 3), or even \pi (corresponding to… okay, you get the idea) if I want to. It’s less “intuitive” than when looking at the explanation with the tree and the number of levels (because it’s pretty hard to draw \pi branches), but if you see it as the inverse of the power of the same number, I hope you get the idea.

Now the next question you can ask is whether all these logarithms are somehow linked, or whether you can express them in some common way. The answer is yes. There exists the following relation between logarithms of any three bases a, b and c:

\displaystyle \log_a(x) = \frac{\log_c b}{\log_c a}\log_b(x)

(Yes, that’s typically the kind of things that I had in my exam formular because I always get confused, especially when I’m stressed out… like I am in an exam 😉 )

Also observe that the base of the logarithm c absolutely does not matter: the ratio between the logarithms of two numbers stays the same independently of the base.

The important thing to remember here is that all logarithms are equal “up to a constant”; they have the same “asymptotic behavior” (I’m giving the terms here, but I’ll write a specific post on the topic, because it’s a bit long to explain). For theoretical computer science, it’s interesting because we’re mostly interested in behaviors “up to a constant” when we’re talking about execution time or memory consumption of an algorithm. Again – I’ll come back to this later – take it as a “spoiler” of the next episodes 🙂

People from different backgrounds tend to prefer different bases for their logarithms; the three most common bases are 2, 10 and e \approx 2.71828. Here, I feel that it’s possible someone just read that and went “wait, what?”. As far as powers go, there is a “special” one: the powers of e. e is approximately equal to 2.71828, and the function x \mapsto e^x has its own name and its own notation: it’s the exponential function, and e^x = \exp(x). It’s special because of an interesting property: it is equal to its derivative. And because the exponential function is often used, its inverse, the “natural logarithm” (logarithm in base e) is also used a lot, and we write \ln(x) = \log_e(x). It’s also been brought to my attention that some conventions (and, I presume, authors) use \text{lg}(x) = \log_2(x) and use \log(x) = log_{10}(x). Wikipedia has more opinions on the question.

It also turns out that the natural logarithm is related to the reciprocal function \displaystyle x \mapsto \frac 1 x. Formally, we write that

\displaystyle \ln(x) = \int_1^x \frac 1 t \:\text{d}t

And this is what it means, graphically:

The red curve represents the function \displaystyle x \mapsto \frac 1 x. The grey zone corresponds here to the integral from 1 to 6: the area of that zone is equal to \ln(6). And you can represent the natural logarithm of any value (greater than 1) by the area of the zone between the x-axis and the \displaystyle x \mapsto \frac 1 x curve from 1 to that value. Side remark: this area is equal to 1 when you take it from 1 to e (because \log_b(b) = 1 for all b, so \ln(e) = \log_e(e) = 1).

And, to conclude, a nice property of the logarithm function: you can write (in any logarithm base):

\log(a \times b) = \log(a) + \log(b)
\displaystyle \log \left(\frac a b\right) = \log(a) - \log(b)

This is probably easiest to see via the power functions. Let us write things down. We have, on the one hand:

2^{\log_2(a\times b)} = a \times b

and, on the other hand,

2^{\log_2 a + \log_2 b} = 2^{\log_2 a} \times 2^{\log_2 b} = a \times b

Since, if x^m = x^n, then m = n, putting everything together yields the first result. The second equality (with the minuses) can be derived in exactly the same way (and left as an exercise to the reader 😉 ).

Note that it’s because of that kind of property that logarithms were used in the first place. If you have a large table with a lot of “number/logarithm” correspondances, and if you want to multiply two large numbers easily, you look up in the table the logarithms of both numbers, you add them (which is much easier than to multiply them), and you look at the corresponding value (by looking at the table in the other direction) to get the value of the multiplication. Historically, that kind of table appeared in the 18th century (thank you M. Napier) to make astronomical computation easier; slide rules also work on that principle; and all of this only started to disappear when, quite recently, mechanical and later electronic calculators appeared…

Anyway. Logarithms are cool. And now I hope you’re also convinced of that 🙂

What IS Theoretical Computer Science?

(Note: this is a “present-day version” of an article that I published in French around 5 years ago.)

As I mentioned when starting this blog, there’s a few blog posts from my French blog that I kind of want to translate to English. This is part of it: I wrote a series of articles about trying to explain a bit of math and theoretical computer science, and I believe these are pretty good candidates for that.

A few years ago, I was studying at ETH Zürich, and I got asked quite a lot “So, what are you doing these days?” – which was a question that was fairly hard to answer to – typically, I said “Theoretical computer science. Basically maths.” It wasn’t very satisfying, but getting into the details of things when answering a “social” question wasn’t necessarily a good idea either. But maybe there are some people who ARE interested in a bit more detail (if you’re one of the persons to whom I answered that at the time, and you’re still interested, now is your chance 😉 ), and that maybe I could explain things to people who didn’t swim in that area for a few years. To give an idea, I’m aiming at the “I didn’t do maths since high school” level. I’m not sure I managed it – it’s been a while since I’ve quit high school, and I did do a bit of maths since then 🙂

So this is first a very general post to give a few elements, and then I want to give a bit more detail. The idea is that I still love what I studied a few years ago, and I want to explain why I love it. It’s going to ask a bit of work – the first thing I did on the French series was to write a post about “but, but, but… what IS a logarithm?”. At that point I was a bit afraid it’d be an “insult” to the scientific knowledge of my readers, but I still did it because I thought it was fun and because I thought that logarithms were still a very useful notion. It so happened that said logarithm blog post was, by far, my most popular post ever (it still getting hits from search engines these days), so. And then, there’s a few blog posts depending on the mood. A bit of complexity theory, a bit of graph theory (I like graph theory), a bit of probabilities (lol), and the “prerequisites” for all of that (my goal was to be able to explain P vs NP, how to define what NP-complete means, and I eventually got there, and it was a nice exercise).

But still, let’s get to the point. What is theoretical computer science? The first answer I have is that’s it’s the theory of everything you can do with a computer. In the middle of that, you have the notion of algorithm. An algorithm is just a series of instructions. You can think of a cooking recipe as an algorithm:

  1. To make béchamel sauce, you need butter, flour and milk.
  2. Measure out identical quantities of butter and flour.
  3. In a pan over medium heat, melt the butter and add the flour; mix until you get a thick paste.
  4. While the béchamel sauce does not have the desired texture, add milk and mix.
  5. If you wish, add salt, pepper, nutmeg or cheese.

You have instructions (which are numbered) that can be repeated (while the sauce doesn’t have the desired texture) or that you only execute under certain conditions (you only add nutmeg if you want to).

For another process that is more commonly described as an algorithm, you can think of the long division algorithm you learnt at school.

I’m a bit biased on the categories that you can fit into theoretical computer science, because there are domains that I know better than others, but here’s a strictly non-exhaustive list (if I forgot your favorite domain… that’s because I don’t know what to say about it, don’t hesitate to explain in the comments). For a more complete list, I’ll send you to the Wikipedia article about theoretical computer science, which itself is based on the list from SIGACT (a large theoretical computer science research group). As for the fact that “I do maths” – theoretical computer science is often considered as part of discrete mathematics and, in general, proving that something works often implies writing pretty proofs full of pretty maths (or full of ugly maths, but that’s far less satisfying).

I’m going to allow myself a fair amount of approximations here (in particular, I’m confusing problems and problem instances) – I’m trying to write things simply without writing overlong paragraphs. Those who know will be able to correct on their own; for the others: it’s a first approximation, it’s on some level wrong by definition, but I do intend to make it a bit less wrong in the following posts.

So, to me, theoretical computer science contains among others:

  • Algorithmics. Given a problem, how do you solve it with an algorithm, how do you prove that your solving method (the algorithm) is correct, and how much time do you need to get a result?
  • Complexity theory. It includes, but does not restrict itself to, the “how much time do you need to get a result?” from the previous point. We’re also interested in the question “given a problem, what are the minimum, average and worst case time and memory consumption that will allow me to solve it?”.
  • Graph theory. A graph is a mathematical structure made of points (the “vertices” of the graph) and lines (the “edges” of the graph). One example could be a subway map: the stops are the vertices, the edges are the connexions from one stop to another with a subway line. In graph theory, we look at graph properties (for example: can I go from one point to another; what is the shortest path from one point to another; can I still go from point A to point B if I remove this vertex – or, in my previous example, if I shut down this subway station), and at algorithms to decide these properties (how do I find the shortest path from one point to another in the graph).
  • Data structures. How do I store the data of the problem I’m interested in in the most efficient way possible (from a time and space perspective) for the problem I want to solve? Or, to go back to the previous example, how do I store my subway map so that I can compute stuff about it? Data structures and algorithms go hand in hand – you need data on which to run your algorithm, and your data without algorithms is quite often just storage.
  • Computational geometry. How to represent and solve with a computer problems that are related to geometry? A common example is the post office problem: given the location of all post offices in a city, how do I efficiently find the one closest to my home? If I’m only interested in the one that’s closest to my home, I can compute the distance between my home and all post offices, but how can I improve on that if I’m interested in that information for all houses in the city? And what do I do if I shut down or open a new post office?
  • Randomness. What is the impact of randomness for all this? For some algorithms, it’s interesting to add some randomness in the computation (I flip a coin, if I get heads the algorithm does operation X, if I get tails the algorithm does operation Y). What impact does it have on algorithms – in particular their correctness, and how to assess the time that a random algorithm will need to (correctly) solve a problem? Random data structures can also be interesting. For instance, you can define random graphs: you take a set of points, and you add edges between them with some probability. Depending on how you define your random graph, it can be a fairly good model of social networks (who’s friend with whom)… or of the human brain.
  • Cryptography. If Alice wants to send a message to Bob without Eve being able to read it, and if Bob wants to be sure that the message he just read does come from Alice, how can they do that? And how to prove that Eve cannot read the message, and that Bob can be sure of where the message comes from.
  • Computability. It turns out you can’t solve everything with an algorithm, and that you can even prove that some things cannot be computed with an algorithm. The most famous example is the “halting problem”, which is a bit meta, but I hope you’ll forgive me. You can prove that there is no program that can decide if a given problem eventually stops and give a result or if it just loops for ever. The proof is very pretty, but it kind of hurts the brain the first time you see it; maybe I’ll eventually write a blog post about it. (Note: I said that “maybe” five years ago, maybe I should do that for real 😉 )

There, those are all things for which I have at least some notions. Theoretical computer science includes some more stuff for which my knowledge is mostly zero: information theory (I’m somewhat ashamed of that), machine learning (or how do you make a computer recognize pictures of cats), formal methods (or how do I prove that my code does not have bugs), quantum computing (would a quantum computer be better at recognizing pictures of cat than a normal computer), computational biology (how do I check whether two DNA sequences are close of one another or not), and I probably still forget a few.

Also note that different people may have different opinions about what is or is not theoretical computer science. I’m not necessarily convinced of the fact that most machine learning as it is applied today is theoretical computer science, because it still looks very experimental, as in “we tweaked that parameter and consequently the computer is 2.2% better at recognizing cats. (Note: I’d probably be less “critical” now than I was five years ago on that topic. Maybe. Probably.) Wikipedia also gives distributed computing as theoretical computer science. It most definitely intersects, but I wouldn’t say that distributed computing is a subset of theoretical computer science. Same for computational biology: some elements are part of it (the example of the DNA strings), but I’m pretty sure that not all computational biologists would define themselves as theoretical computer scientists 🙂

I thought this would be a pretty short blog post: I think that’s a failure. I hope it was still somewhat interesting; if there are things you think I can talk about in a future blog post in this category… well, don’t hesitate. As I mentioned, I actually have some amount of content that I aim to translate from my archives; I am open to suggestions still, my goal is to Blog More 🙂