Serpentine Squiggles

The Lord of the Rings is well​-​known story. It’s real, and we can say a few things about it: for instance that it includes a character named ‘Frodo’, that it is typically split into three books, that it is written in english, that it is over 570 thousand words long.

One of these is not like the other, however. The Lord of the Rings has been translated many times. Into french, into russian. There are omnibus editions of the work which are bound as a single book.

And, of course, it has a different wordcount in different languages.

Nevertheless, each one would have a character we would identify as Frodo, if not necessarily by those symbols. Indeed, in languages with different character sets, like Japanese or Korean, almost no a symbol of the original text is likely to remain — yet we would still identify it as The Lord of the Rings, still.

(There may be nihilists who will bite the bullet and say no, a translation of TLotR is a different story. No, a differently divided print run is a different story. No, each copy of TLotR is a different, albeit similar, story. I don’t care, and nothing stops us from defining a notion of equivalence between editions, whether we call it ‘identity’ or not.)

So this spiritual essence of TLotR​-​ness that persists through translations — we can of course say things like ‘Frodo exists’ or ‘the Battle of Helm’s Deep occured’. What all can we say? There are certain tempting statements, like “TLotR is a long story” — yet the wordcount fluctuates throughout versions. I know from correspondencies with a russian friend that russian versions of his stories typically have less words — yet the character count remains intriguingly similar, as if there’s some conservation of information density.

Indeed, we should expect that even in the tersest human language, TLotR is still longer than, say, “The Last Question.”

What counts as a language, though? Plaintext compresses well — so the ‘language’ of a LZMA would be quite short. Do conlangs count? Do hypothetical languages you could make count? I can imagine a language so efficiently and specifically optimized for talking about fantasy journeys and battles that entire chapters of TLotR could be written in a few sentences — but for which the notion of ‘computer’ is completely foreign, requiring entire books worth of groundwork and elaboration on its feasibility and cultural context.

What about a language of aliens so deeply appreciative of Tolkein’s work that the entirety of any book is communicable with but a single word — not as an abbreviation, but such that you don’t even truly speak the language if the word fehewa doen’t call to mind the entirety of Fellowship of Ring, just as you aren’t using english correctly if you hear ‘blue’ and don’t comprehend this is a experiential color, it has connotations of sadness, it’s typically a property of to light, etc.

Of course, these examples feel mightily cheaty. Surely even if you could summarize a climatic scene in four wordsbarely audible, it would rob it of so much emotional impact — and if one scene stirs the heart and another does not, is it really a translation?

But we aren’t here to talk about that, not really.


Our analogy is no longer doing us favors, so let’s step back and speak in more precise terms.

Only the original target of this post knows this, but I’m not talking about The Lord of the Rings at all. I’m really talking about functions, or computer programs.

(It’s all the same, in truth, since there is a computer program which prints the source of TLotR in some language, or answers questions like “what happens first” and “after x, what happens?”)

We need two notions: syntax and semantics.

In a word, respectively, syntax is extrinsic and semantics are intrinsic. Semantics is “this function multiplies two numbers”, syntax is “this function requires 15 SLOC to write.”

Formally, syntax can change when you translate a function from one language to another, while semantics cannot.

(For clarity’s sake, from here on out I use “function” to refer to the mathematical object, the platonic essence, and “program” to refer to the function as implemented in a language. The semantics of a program is a function, and the syntax of a function in a particular language is a program.)

Here we run into a similar issue as with Tolkein translations: sure, maybe this language can express multiplication in 20 instructions while this one takes 22, but what’s the harm? Especially if, as is generally the cases, addition takes fewer instructions. Addition is simple, and multiplication is complex. This seems intuitive: is it true?

(For an analogy, consider the triangle. It has three vertices and three sides enclosing an area. A particular triangle might have an angle of 2pi/3 between each pair of lines, and a perimeter of 60 miles. Now, every triangle has 3 lines, but only some triangle instances have lines that are 20 miles long. The line​-​count is intrinsic to triangles, but the perimeter is not.

So one way to frame our question: is function complexity more like the number of sides a shape has, or the area it encloses?

But I would be remiss not to point out where this analogy fails: what’s intrinsic and what’s not is contingent on the shape definition. We could just as well talking about 173 mile2areas, whether they are shaped like triangles, hexagons, or gerrymandered congressional districts. In which case the analogous properties are reversed.)

First, we need a way to compare the complexity within a language. And here we run into our first roadblock: you can’t. If you were very bad at programming, you might write a multiplication with 100 lines of code. (That sounds like a lot, but never underestimate a CS grad student). Yet, the fact that someone is bad at programming shouldn’t matter to the function — there is a simplest way to write a multiplication function, that’s a fact of the language.

But knowing whether any given program is the simplest requires knowing that no program shorter than it has the same semantics, which, if you could do that, could be exploited to create an oracle for the halting behavior of any program. This is logically contradictory by the proof of the unsolvability of the halting problem — unless you want to believe humans are hypercomputers, a super power we have yet to exhibit.

But let’s wave that aside. We can define this notion of simplicity, even if we can never know if something exhibits it. What can we say about cross​-​language simplicity? The shortest program in one language may be longer than the shortest program with the same semantics in a different language.

We can put a limit on the difference, but that limit is yet another roadblock. See, every general purpose language can emulate any other general purpose language, therefore if you know the simplest way to write a program in language A, you can write an emulator for A in the language B, then write a program which runs the emulator whose input is the aforementioned simplest program. The semantics of this program is the same, therefore we have an upper bound on the length of the shortest implementation that program in B. Nice trick, isn’t it? It means we can say with mathematical certainty that the complexity difference between two languages cannot be greater than a constant factor equal to the shortest emulator of A in B.

Except that’s the rub. It’s a constant dependent on the languages chosen! We can imagine a language that lets you write simple mathematical statements, but requires a 2gb long compiler directive to access advanced control flow like loops or recursion.

Moreover, examine the language more closely. No greater than a constant factor! That means it can be less than it! Which is to say, in language A, program x > program y, but in language B, it can be the case that program y > program x, if for no other reason than B includes y as a language primitive.

That should be the nail in the coffin for this idea, but some will seek some way to dismiss the languages themselves, to hold on to the dream of objective complexity. Surely you can see that there’s something unnatural about these languages? 2gb compiler directives? Complex functions given as primitives? Isn’t there a sense in which these languages are intrinsically complex, in a way that ought to stain their claims about simplicity?

Before we go further, let’s remain aware of our terminology. The length of a program is syntactic. The function implemented is semantic. Languages can be identified with the functions that emulate them. That is, if we say functions are mappings from lists of natural numbers (the arguments) to natural numbers (the result), then a language is a function such that for every function f, there is a number (the index) where calling the language​-​function with an argument list that starts with the index produces the same result as calling f with the rest of the list.

Put another way, a language is a way of numbering functions, assigning each one a natural number index. (We already know this is true; files are just 0s and 1s, so every source code is a number — the concatenation of its utf​-​8 bytes, say.)

The natural numbers are trivially ordered, so we can say that every language acts as a total order on the set of computable functions.

The dream of simplicity, then, is this: can we put a total order on the set of functions without privileging one language arbitrarily?

Sure, you might find a particular language (say, haskell, or the SKI combinator calculus) particularly elegant. But how do you implement SKI? You can omit the I, as is popularly done to futher minimize it. But then you have four symbols (s, k, and the parentheses), which you can assign arbitrarily to any of the four length​-​2 bitstrings (which trivial translate to natural numbers). Or why choose length​-​2? Let s be 1, and k 01, etc.) Why think SK(I) is simplest at all, and not choose a one​-​combinator language like iota or jot? (What if you believe Mr. Wolfram’s claims that you don’t even need k, and s alone is universal?)

These seem like pedantic objections. After all, if it came down to whether one combinator was 2 bits or 3, at the scale of actual programs the differences are going to be remarkably minute. Even if there are programs which are 3 bits shorter in one version of the language, we can still create a partial order instead of total order — it should still be obvious that a increment function is simpler than a javascript interpreter.

We could try to use the fact that, as a consequence of totally ordering all functions, each language also totally orders the set of all languages. It’s easier to write a SKI​-​interpreter in C++ than it is to write a C++ interpreter, so in a very true sense, C++ thinks SKI is simpler than itself. Potentially there might be a language even SKI thinks is simpler. Does this walk down the language​-​graph end in one place?

No, it doesn’t. By Kleene’s second recursion theorem, a program can incorporate its own description (viz., quines are possible). Therefore it is possible to create a program such that for input 0, it emulates itself, while for all other inputs, it emulates another language. In more familiar terms, it’s possible to write a compiler that, when given the empty string as input, will output its own machine code.

This language, by our established metric, thinks that it is the simplest possible language.

Still, intuitively this program is cheating. It’s just a wrapper around a “proper” language, and the essence of that proper language will recognize the beauty of SKI​-​calculus, it just needs to be freed.

If the language it’s wrapping is C++ or SKI, then it’s clear the next simplest language is the ‘true’ simplest language. We could write some sort of weighted probability walk of the language graph, where the simpler languages have more weight but not all of it. A simple way to do it is to walk up the total order of languages, flipping a coin flip at each one and jumping to that language if it lands on heads. Given enough timesteps, it’s clear that this wrapper language might hang around itself for a bit, but it will jump to the simpler languages soon enough.

Is this behavior general? Are all complex languages just perversions of simple languages, and this probabalistic algorithm eventually converges on a family of objectively simple languages?

First of all, we have to be more specific about what we’re doing. The best way of thinking about what’s happening is a language has a certain probability mass, and at each step it gives half its probability mass to its simplest language, half of what remains to the next simplest, etc. So sure, our wrapper language maintains half of its mass at each step, but the rest is redirected at SKI, and SKI gives it back to itself, meaning that over time the probablity mass goes where it rightfully belongs.

Problem is, this applies to SKI too. Every single language diminishes in probability mass, because a fraction is wafted off to intricately bizarre languages who only give a fraction back. There is an infinite class of ‘circlejerk’ languages whose total ordering is consists mostly other circlejerk languages, and non​-​circlejerk languages are as sparely distributed as hyperexponents of a googleplex. This means that a portion of SKI’s probability mass is siphoned off to these circlejerk languages, and by construction they give a dramatically smaller portion of the probability back. (I don’t know how precisely much code it takes to define a hyperexponential circlejerk language, but given how few words it took to describe them, I’m convinced they are simple enough that they will be given MUCH more probability from SKI than the (less than!) 1/2googleplexthey give back).

Which isn’t even to address the problem of how this probability is normed. How are you going to even assign every member of an infinite set equal probability? Or maybe you could say it’s not probability, but a kind of abstract weight that doesn’t need to sum to 1 — in which case you immediately run into the issue that every single language has an infinite number of “admirer languages” which order that particular language first, meaning that every language is immediately drowned in infinite weight.

Or maybe you do it on a case by case basis — giving only one language at a unit of weight, and seeing where its weight ends up distributed, and see if most languages converge on the cluster of objective simplicity. But again, every language behaves the same in this case: a fraction of weight siphoned off to the outer darkness of circlejerk languages, never to be heard from again.


It becomes very clear how futile this endeavor is when we take a step back and look at what we are trying to do.

You have an infinite class of ways of sorting a list of elements. Pick the correct one.

Correct by what metric? What is this a list of?

The very thing you’re trying to do is subjective. Even the valuation of simplicity is contingent — consider a universe where it takes less energy to perform a computation the longer the program is. Simplicity would be the opposite of a good thing, there.

There is no natural total order on the set of program semantics. Sure, there is a structure on the set — if two functions exist, their composition exists, their primitive recursion exists, an unbounded search operation for both of them exists. So what if we exploit that — clearly the composition of two functions is more complex than either of them? But this isn’t the case. Compose +5 with ​-​5 and you have an intuitively simpler function, the identity function. It’s similarly easy to think of recursions or loops over complex program bodies that implement simple operations (remember the CS grad’s 100 line multiply() function?)

Furthermore, there is no objective way to chose a basis function. The SKI combinators seem simple, but they’re hardly the only functions that can generate the set of computable functions under the operations of composition, recursion, and mu​-​recursion. And if it’s a single universal function that generates the set of computables, how do you determine that one is “better” than another? What actual, objective difference is there?

And once again, our assumptions betray us, because why should we even privilege composition, recursion and mu​-​recursion as the generators of the computable functions? They aren’t the only relationships that exist between computable functions!

And ultimately, that’s the defeater for the whole project. There are relationships between computable functions — an infinite plurality of them! — and there is no universal, objective, semantic total order on the set of computable functions by reference to those relationships.

There is no function that determines the complexity of a program, and there is no function that generates a strictly more complex program. Meaning there is no computable semantic relationship you can define between functions which preserves complexity​-​ordering. (I.e., If f R g, either one can be more complex. And if f R g R’ h (e.g., h = compose(f,g)), then there is no computable, semantic value of R and R’ which places absolute constraint on their ordering.)

There is one caveat, one olive branch. If you have an complete infinite total ordering of computable functions, then any other complete infinite total order necessarily has only finitely many ordering exceptions with respect to one element. Meaning that, given any two languages, you can pick a function and find that there are an infinite number of functions both languages agree are more complex than the picked function. (Don’t forget that finite is still extremely big: it could mean one, two, ten, a billion, a google, graham’s number — there are an infinite number of finite numbers. And moreover, this is only with respect to one element; an infinite number of finitely many exceptions is an infinite number of exceptions!) This holds even if you increase the number of languages considered to any finite number — pick a function, and there is only finitely many cases where there isn’t consensus about what functions are more complex than that function. But this breaks when you scale to infinity. If you have infinitely many total orderings, then you can have infinitely many disagreements. The computable functions are countable, and the only infinity that fits inside a countable infinity is a countable infinity. Consider: for any large finite number, there are more composite numbers less than it than there are prime numbers less than it. But as soon as you have the entire set, it reverses: there are infinitely many primes, and infinitely many composite, and thus one prime for every composite, and indeed, one prime for every single natural number! The finite cases might fool us into thinking there more composite than primes. But this is an illusion.

I feel like I’m repeating myself, and I don’t know how many different ways I can say it. There is no natural total ordering of the set of computable functions. There is no natural total ordering of the set of computable functions!

How about a final analogy to the natural numbers: what’s the correct way to represent them? Is it the Peano way, of an ever​-​lengthening list of the form S(S(S(S(0)))) = 4? Is the ZFC way, where every number is the set of numbers smaller than it? {{},{{}},{{},{{}}},{{},{{}},{{},{{}}}}} = 4? Is it with binary numbers, trinary, decimal? Is it with plain english? bigints? floating points??

THERE IS NO OBJECTIVE ANSWER!

You can only answer it by begging the question, taking what you care about for granted, deriving it from your axioms. Without assumptions, nothing follows.

There is no function that determines the complexity of a program, and there is no function that generates a strictly more complex program. Meaning there is no computable semantic relationship you can define between functions which preserves complexity​-​ordering. (I.e., If f R g, either one can be more complex. And if f R g R' h (e.g., h = compose(f,g)), then there is no computable, semantic value of R and R' which places absolute constraint on their ordering.)

How about a final analogy to the natural numbers: what's the correct way to represent them? Is it the Peano way, of an ever​-​lengthening list of the form S(S(S(S(0)))) = 4? Is the ZFC way, where every number is the set of numbers smaller than it? {{},{{}},{{},{{}}},{{},{{}},{{},{{}}}}} = 4? Is it with binary numbers, trinary, decimal? Is it with plain english? bigints? floating points??