Language Bureaucracy

Laziness, impatience and hubris are the three great virtues that each programmer should have, at least according to Larry Wall [1]. My experience so far showed me that he was right. All programmers have these characteristics, if they do not, usually they are not real programmers. Since they are expressing these values with the usage of several programming languages, they tend to compare them. Usually this comparison ends up with a phenomenon called flame wars. The programmers are participating in endless quarrels, exchanging arguments regarding language features, their standard (or not) libraries, etc.

In my academic and professional life, I participated in various conversations of that kind, ending up talking hours regarding the coolness of C and C++, the inconsistencies of PHP and the easiness of Python programming. I know; we all fought our holy wars.

Almost two years ago, I co-authored a publication with my PhD advisor Dr. Spinellis and Dr. Louridas, where we conducted an experiment that involved several programming languages [2]. The experiment involved the creation of a corpus, that included simple and more complex algorithmic problems, each one implemented in every language. We used back then as our primary source of code the Rosetta Code repository. The corpus is available on Github [3].

Then it occurred to me, I could use this code, to finally find out, which language is more bureaucratic. How could one measure that and what does one mean with the term bureaucratic in a programming language context?

Terms & Hypothesis

The answer is simple; we measure the LoC (Lines of Code), which are the lines of executable code of a computer program. Since all programs perform identical tasks, we directly compare the LoC for each language and the one with the fewer lines wins. At least this is a straightforward method to do it.

The contestants were nine (9) of the most popular [4] programming languages; Java, C, C++, PHP, C#, Python, Perl, Javascript, and Ruby. Fortran and Haskell are excluded because many tasks were not implemented in these two languages (we are working on that).

The selected tasks were 72 and varied from String tokenisation algorithms to anagrams and Horner’s rule implementation for polynomial evaluation.

Counting the LoC

The following graph illustrates the total LoC for all tasks per language:

It seems that the Python is the big winner with only 892 LoC for all the tasks and C is the big loser with 2626 LoC. It seems that if we could divide the languages in two categories, statically typed and dynamic, the latter are winning, at least on the program size front. Dynamic languages count in total 5762 lines while the static languages have combined LoC around 9237, almost double the size.

Counting the Winners

In addition let’s examine, which languages won the first place (had the minimum LoC) across all tasks. The following figure illustrates the number of wins for each language.

Python is again (as expected) the big winner here, with 22 wins. Javascript, Perl and Ruby follow with 14, 14 and 13 wins each. Surprisingly PHP has three wins only, and C++ and C# have one win each. The other languages never won the first place on any task.

Threats of Validity, Conclusions and Future Work

One may notice that the actual sum of all tasks is 68, instead of 72. The remaining four were won by Fortran, which was excluded from this experiment. Since, the repository is still very active, and many of the tasks are re-organised and re-implemented to better suit the ongoing research process, I never sanitised the data, thus there may be possible errors. I do not think that the results were affected though, since the dynamic languages won the match and dominated the statically typed languages.

While writing this blog entry I consulted two friends of mine, which provided two very interesting aspects. Dimitris Mitropoulos suggested to also take into account the character count for each line and George Oikonomou that suggested applying various voting systems [5] on the language ranking for each task, thus finding the real winner.

I considered both approaches, and I think that they would produce interesting results, but first I wanted to sanitise the data set more and better examine quality attributes of the code.

References

[1] Larry Wall, Programming Perl, 1st Edition, O’Reilly and Associates
[2] Diomidis Spinellis, Vassilios Karakoidas, and Panagiotis Louridas. Comparative language fuzz testing: Programming languages vs. fat fingers. In PLATEAU 2012: 4th Annual International Workshop on Evaluation and Usability of Programming Languages and Tools–Systems, Programming, Languages and Applications: Software for Humanity (SPLASH 2012). ACM, October 2012.
[3] https://github.com/bkarak/fuzzer-fat-fingers
[4] Ritchie S. King. The top 10 programming languages. IEEE Spectrum, 48(10):84, October 2011.
[5] http://en.wikipedia.org/wiki/Positional_voting_system

XRDS

Crossroads – The ACM Magazine for Students

Language Bureaucracy

Leave a Reply Cancel reply