Andrei's blog

Programming Languages are Fake

TLDR: Many programming languages are using the same compiler (LLVM) under the hood.

People talk about programming languages like they are all very different, and fight over which one is the best. In reality many new shiny languages are more similar than you might expect.

Creating a programming language has 2 parts: the syntax and the machine code.

People always think about the syntax, because it is the stuff you see. The shiny keywords and notations. This is the Design side of the task, making code easy to write, understand, and manage.

What you barely hear about, is what's under the hood. The true compiler coverting the language into fast machine code. This is the Engineering side of the problem, and it is hard. Languages need to compile for different CPUs (x86 and arm64) which work differently, meaning lots of manual work which must be done properly.

β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Humanβ”œβ”€β”€>β”‚Syntaxβ”œβ”€β”€>β”‚Compilerβ”œβ”€β”€>β”‚Machine Codeβ”‚
β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Many "modern" languages (such as Swift, Rust, Zig, and Nim) don't bother or can't afford writing a good compiler, so they just convert themsevles into another popular language that has a good compiler to do the heavy lifting.

β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Humanβ”œβ”€β”€>β”‚Syntaxβ”œβ”€β”€>β”‚Another Languageβ”œβ”€β”€>β”‚Compilerβ”œβ”€β”€>β”‚Machine Codeβ”‚
β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

(A famous example was Facebook transpiling PHP into C++.)

If I asked you to create a programming language, first thing you would do is a very slow interpreter in Python. Interpreters run code without compiling, at the cost of speed. JavaScript, Python (and Java) are slow because they use interpreters. So an interpreter running an interpreter would be insanely slow.

In programming languages everything is a trade off between optimization and overhead. You can spend time optimizing code, but then it will run quickly. Or instantly run slow code. Program Execution is a big topic.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Instantly Run Slow Codeβ”‚ <────────|────────> β”‚Wait for Fast Codeβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

When Interpretation is to slow, you use Transpilation. For a fun programing language like Rockstar that has impractical syntax, transpiling to JavaScript or Python sounds like a good idea, you can get a version of normal, readable code if you want or need to.

But then Nim shows up, and it needs transpilation into JavaScript, C, and Objective-C1 for practical reasons. There is also Web Assembly2 and soon compiling and transpiling becomes all very complicated.

Small languages have to maintain compilation quality and compatibility with big languages, otherwise nobody will use them. They are solo but with higher expectations than enormous languages backed by corporations. The easiest solution is to transpile to a bigger language, and inherit its compiler and libraries.

Small languages have one advantage for them: "serious" small languages have better design than big languages (eg. Rust solves memory safety) and are made by skilled programmers. This gives the underdogs just enough resources for a shortcut: modify other languages' compilers.

β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Humanβ”œβ”€β”€>β”‚Syntaxβ”œβ”€β”€>β”‚Stolen Compilerβ”œβ”€β”€>β”‚Machine Codeβ”‚
β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

This is a rather neat trick. It is faster and more efficient than transpilation, but still allows a small language to compile to all CPU architectures3 (and Web Assembly). Saving time on writing a good compiler, allows language creators to focus on syntax, language design, features, and configure the compiler specifically for them.

As you might expect most languages picked the C compiler, which was later generalized for easy integration and modifcation. That compiler was clang which had its backend LLVM taken from it. LLVM in itself is a sort of "master language" which everything else transplies into. LLVM then has very good optimizers and converters into machine code.

β”Œβ”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Humanβ”œβ”€β”€>β”‚Syntaxβ”œβ”€β”€>β”‚LLVMβ”œβ”€β”€>β”‚Anything LLVM compiles intoβ”‚
β””β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Now every programming language has a state of the art compiler. The hardest part was converting your syntax into LLVM language.

And LLVM is not the only one who does the heavy lifting. Kotlin uses Java's JVM. Java spent years creating an efficient and optimized virtual machine (JVM), then Kotlin just showed up with a better syntax, used LLVM to compile into bytecode, and the JVM to run it efficiently.

Modern computer software is a tree of dependencies, requiring the work of thousands of people to maintain advanced systems. It is almost impossible to match the complexity of modern borwsers, operating systems, or programming languages without standing on the shoulders of giants.

Programmers often critizise people who see only the colorful screen and forget about the thousands of maintainers that make everything work under the hood (sometimes for no pay at all.) But programmers themselves worship their colorful text, ignoring how many internal parts are borrowed or shared.

Languages are distinct among continents, nations, people, and cultures. People attach them to thoughts, feelings, and functions, but change a society's language entirely, and it will continue like nothing happened. Language is just a tool for communication, it is more detached from us than we want to admit.

A master is his skill, not his tools. For a master any tool will do.

- very qualified language expert

Footnotes

Further reading watching:

Layers of abstraction:

β”Œβ”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚1β”‚CPU-level (actual logic gates on the CPU)β”‚
β”‚2β”‚machine code                             β”‚
β”‚3β”‚assembly code                            β”‚
β”‚4β”‚[C/C++, JVM/bytecode]                    β”‚
β”‚5β”‚[JavaScript, Python]                     β”‚
β””β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Text diagrams were made using asciiflow.com. I think it was a nice replacement of SVGs or Images.

  1. Objective-C was Apple's language for MacOS until Swift was created.

  2. Web Assembly. Due to inefficnecy of JavaScrip, games and complex reendering on the web are extremely slow. Web Assembly is usually compiled from C, Rust, or Go and can run in the browser with extreme efficiency. Even Llama LLM can run in the browser.

  3. Every CPU works differently and needs custom machine code (that's how it was for the early computers). But that would require writing every program again every time a new computer comes out. Nowdays multiple different CPUs are using the same machine code. Commonly used machine code types are called "architectures." Currently x86 and Arm64 are the dominant ones.