Purple Papers: 'So Many Languages'

Purple Papers: 'So Many Languages'

So many languages

We've previously talked about how programmers who speak English may have an advantage over those who don't and the fact that many programming languages are written in English.

To many people it may seem the only reason to have all of these languages is simply to bamboozle everyone else (but who’d Adam and Eve that?).  As developers, at meetings we are introduced to “our technical people”.  I’m sure you can talk about Python, Sql and all that”, as if we’re from another planet (take your pick: Mercury, Venus, Mars, Saturn, Neptune).

To dispel this “myth” we thought it would be worth spending a few minutes exploring what’s going on (or maybe the battle is already lost). Why are there so many programming languages and is there any correlation with natural language?


What are programming languages?

Programming languages allow us to get machines to carry out our orders - a bit like a recipe for baking a cake – and like spoken languages they have a structure, or syntax, that the machine understands. Unlike spoken languages, which arose naturally by giving the advantages of communication between our ancestors, they were obviously designed by us to write programs that run on computers as opposed to brains.

The underlying machinery of the brain has an immense number of adaptable connections, trained over millions of years to generate patterns that allow the mind to add context to a loosely structured syntax.  This allows us to know what’s expected when “Dogs must be carried” or “Ties must be worn”.  The machinery of a computer can’t do this.  In a programming language the meaning of each statement is without context1 and must be unambiguous.  Writing them requires a certain precision - perhaps one reason why it attracts a “certain type of person”.


Where did they come from?

In common with human languages a lot of variation has developed over time from a small number of roots, but of course there are some intrinsic differences.

As programming languages run on computers, as some level they need to operate at the lowest level of the machine – telling it how to move and manipulate information in the form of 1’s and 0’s (called bits). Writing instructions at this (assembly language) level is understandably slow and boring, so people created building blocks, themselves programs, that circumvented this task. As natural languages arose from pre-existing abilities2 that had already evolved in concert with the physical brain – and we are the ones using that result – we tend to use that which we grew up with or learn those used by others.  It’s not often we build a new one.  But, as ever, there are exceptions.

The programmatic building blocks we see today are the operating systems (MacOS, Linux, Unix, Windows, etc), and other supporting software (e.g. compilers and interpreters) that abstract us far away from the computer hardware and simplify the job at hand.  Since they were designed by different people or organisations, and for differing hardware, they spawned their own separate higher level programming languages (e.g. PL/1, Algol). 

These languages are called ‘general purpose’ because they can be used to write programs that aren’t restricted to a single discipline (aka ‘domain’).  They use a series of instructions, or ‘statements’, similar to the cake recipe mentioned earlier, that are suitable for programmers to write and understand and can be boiled down to the underlying assembly language by further programs called compilers. 

These statements also allow us to introduce greater complexity.  Conditional statements, such as “if you go to the bakers, pick up some bread” provide branching but - remember the ‘context’ limitation - our programming language won’t know that it’s expected to return home with the loaf. (In fact, always remember to include that part when sending a programmer to the shops).

Looping statements such as “keep going to the bakers until you have 5 loaves” simplify the coding of repeated actions but rely on keeping track of how much bread we have, so we also need to store the ‘state’ of the system.

Programmers took advantage of these features and, as programs became larger and more complex, the branching structure led to unwieldy code and it became more difficult to ensure the ‘state’ wasn’t affected by other parts of the program – if someone else is making sandwiches while we’re at the shops we’ll just keep going forever and never reach our bread target. This made it nigh on impossible to produce, test and maintain reliable programs.  So to restrict these problems language developers and academics invented alternative ways to design and organize the structure of code.  These were viewed as paradigm shifts in programming and so are called programming ‘paradigms’.

Since these recipe style programs are analogous to the imperative mood in our natural languages, this style is called the ‘imperative programming’ paradigm.  This, in turn, was adapted to resolve the complexity issues, first by the ‘procedural’ paradigm that grouped statements into procedures, and later by the ‘object-oriented’ paradigm that added encapsulation of the ‘state’ – the shopper, in our overstretched analogy, puts a lock on the bread bin.

New languages were developed to suit these ‘paradigms’ but it didn’t stop there.  Others were working on languages that didn’t use instructions but instead ‘declared’ the required outcome and left our building blocks to work out how to get there.  Unsurprisingly, these come under the ‘declarative’ paradigm beneath which we group more sub-paradigms such as ‘functional’ and ‘logic’ programming – each with a plethora of languages from which to choose and many designed to incorporate multiple paradigms.  You guessed it, ‘multi-paradigm’ languages.

Simultaneously, programming languages were designed, like horses for courses, to be more effective (and simpler for us to program) within specific ‘domains’:

  • Scripting languages suited to obtain information from, or manage, the operating system (Bash, csh, VBScript) in a concise manner, or to control web browsers (JavaScript) or even application-specific ones to customise video games (QuakeC, GML). 
  • Query languages (e.g SQL) to, yes, query databases or other information sources such as documents (e.g. CQL) or chess games (again CQL – we’re running out of acronyms) and, as with natural languages, each of these may have evolved their own dialects (T-SQL, MySQL, PostgreSQL) targeted to specific database engines.
  • Markup languages to format and display data on the screen (e.g. HTML) for the web or in typesetting (LaTex)

The domain-specific list gallops on becoming ever more specific, and now aided by programs designed to design them (e.g MPS).

Incredibly, if we ignore automated weaving machines, and carillons, and Charles Babbage’s difference engine, all of this has happened since 1949.  With new domains opening up all the time and programmers who seem to love creating new ones for the shear hell of it3, will it ever end?


An evolutionary parallel?

While the details of the evolutionary roots of natural language are still under debate, it’s clear that once the underlying ability had taken hold, languages proliferated mainly through geographic and social separation in a manner similar to the evolution of species4 and at an exponential rate – estimates for the origin of vocal ability stand at 3.5 to 4.5 million years, early proto-language at 80,000 to 160,000 years and Eurasiatic, the root of languages from English and Urdu to Japanese, at only 15,000 years. The Ethnologue estimates that in 2009 there were nearly 7000 of them5

However, research indicates that due to global communications and pressure to learn more dominant languages one third of them are in danger in the next few decades and that that between 50% and 90% will be extinct by 2100. The loss of a natural language is a loss to our culture but may be an economic necessity6 for the community involved.  Will programming languages follow suit? 

There is evidence both ways.  For example, modern programming frameworks (e.g. .Net Core, Dapr) are designed to allow programs written in a single language to be ‘ported’ to run on machines with different operating systems.  Some languages have become widespread and learning them broadens the available job market – a similar pressure to that on natural languages.  But although we may need fewer languages for new projects, there are millions of legacy programs requiring upgrades and maintenance and every now and then, as with the “millennium bug” it is crucial for those skills to be available. 

Ironically, counter to natural language, the loss of programming languages may be of minor cultural interest but have a worldwide economic impact.  We may be stuck with a lot of them.


1 This doesn’t exclude us from building programs that permit contextual understanding at a higher level or from building new architectures that emulate neural connectivity, both of which are areas of active research.

2 See ‘The Language Instinct’ (1994) by Steven Pinker for an extensive argument

3 Is this reasoning comparable to the creation of Klingon

4 Although, unlike species, languages absorb vocabulary from others due to migration, trade and invasion.  Maybe this feature of languages is more like the horizontal transfer of genes in bacterial evolution.

5 See an excellent article by Stephen R Anderson of the Linguistic Society of America

6 Reading the comments of this article you’d be forgiven for thinking that the majority would settle for English alone.