{ datagubbe }

datagubbe.se » a lasting legacy: thoughts on cobol

A Lasting Legacy: Thoughts on COBOL

Autumn 2021

I'm getting increasingly fascinated by mainframes. I'm not sure why - maybe because they're the final outpost of actual old-timey computing still around in these hectic days of clouds, browsers and framework fatigue. Not the mainframes themselves, perhaps - they've progressed immensely over the years - but rather the way developers interact and interface with them. One of the most famous ways to do that is of course to program them using COBOL.

The history of COBOL has been documented in detail elsewhere, but the short version is that rear admiral Grace Hopper invented the compiler and then constructed FLOW-MATIC, an English-like high level programming language that predates both LISP and ALGOL (but just by a little). This proved to be such a productivity boost in the field of programming that a proliferation of similar languages soon appeared. Fearing future problems in hiring, maintenance and compatibility, a committee called CODASYL was formed and tasked with creating an industry standard for a FLOW-MATIC-like language. They did, and they named it COBOL, for COmmon Business-Oriented Language.

Today, COBOL is usually either the butt of cruel jokes or a mythical concept in programmer lore, the story usually being that a COBOL guru is rushed in by a massive corporation to write a few lines of program code in exchange for tremendous amounts of money, saving the world from a bug that's just been waiting to happen since some time in 1967. Unlike many other old languages like Assembler, LISP, C, BASIC and Pascal, COBOL seems to stand for itself in discussions about software development. To many developers, it's an afterthought - so much so that when Jonathan Blow cooks up doomsday scenarios, it's a fictional lack of C programmers that threatens civilization rather than most IT professionals' complete disinterest in COBOL - the language that runs both their bank accounts and their airline bookings to Very Important Conferences.

Why is that, exactly?

Strengths set in stone

As far as computer related standards go, COBOL is a likely candidate for being the most successful one. The language has changed very little over time and code written in 1991 or even 1971 is likely to compile and run in 2021 with little or no modification. It was designed from the ground up to be highly portable and code is written comfortably without knowledge of OS- or machine-specific API calls.

It's also a domain specific language, purposely made for processing large amounts of business transactions. It's very good at this, for several reasons. One is the design of the language itself: in-program data structures and variables are defined to adhere strictly to fixed-length fields in a set of records. Simple constructs for reading, writing, sorting and searching datasets are readily available. We might take this for granted today but when COBOL was designed, programs were just as likely to be written in Assembler or even machine code.

Another reason is that COBOL compilers have been produced by mainframe hardware manufacturers with intricate first hand knowledge of the execution architecture. Since both the language and the production code base has stayed largely unchanged for decades, efforts to optimize compilers for speed and reliability can be prioritized over adding new features.

While some COBOL dialects (and the COBOL 2002 standard) allow for the creation of curses-like TUI:s, such user interfaces may just as well be created using another language, even on the mainframe itself. COBOL is the backend for churning data, unaffected by the silly whims of nervous designers. Sure, a COBOL application may have gotten new some new parts to comply with the output format fads of the time - CSV, XML, JSON - but they're all short-lived hypes compared to the tried and tested mainframe datasets used for doing the real work.

Sugar is bad for you

Consider the following snippet of JavaScript code, made possible with the advent of ECMAScript 6 in 2015:


While not exactly indecipherable (a seasoned JavaScript professional should have no trouble parsing it), it's certainly not easy to read, either. Executing this line of code will print the number 120, which, given the input, might hint that it's a factorial function.

Most JavaScript developers don't write code like this (I hope), but the makings are there, creating an itch to use them. Syntactic sugar like this can be very pleasant for developers, especially when they know a language well. Introducing new features and adding new sugar isn't necessarily bad - but apart from churning out code, a developer must spend time and energy to keep up to date with changes in their language of choice. It also means that someone who've learned JavaScript during the last few years is unlikely to recognize the way problems were solved and code was written twenty, fifteen or even just ten years ago. And that's just the language itself - add a few various browser versions and frameworks on top of that and the chaos is complete.

Not so with COBOL. As with most languages, there are many slightly different ways to implement a factorial algorithm in it, but all of them force the developer to write readable code. Even the most compact COBOL code will seem rather verbose to people used to newer high level languages. Here's a fairly short implementation:


            01 NUM PIC 9 VALUE 5.
            01 RES PIC 9(36) VALUE 1.

          PERFORM UNTIL NUM = 0
            SUBTRACT 1 FROM NUM
          DISPLAY RES
          STOP RUN.

There simply is no way around the fact that you must declare all your variables, they must be declared in the DATA DIVISION and they nearly always need a PIC (picture) describing their format and size. If you don't like the rather chatty "multiply by" and "subtract from" you can use the COMPUTE statement, but it still requires you to clearly formulate and format your code:

            COMPUTE RES = NUM * RES
            COMPUTE NUM = NUM - 1

Maintaining legacy code is always hard, but it's even harder when a language changes significantly every couple of years. Once a programmer knows COBOL, they know COBOL - and can focus on understanding a particular application and code base rather than language intricacies and frameworks. The JavaScript example above may be a bit exaggerated, but many programmers (myself included) can't really refrain from being "clever" from time to time, producing something that's more of a geeky flex instead of succinct, maintainable code. COBOL effectively puts a stop to that, enabling future maintainers to focus on the task at hand rather than decrypting obfuscated ego stroking left there by their predecessor.

Scientists, hackers and suits

Common explanations for COBOL's obscurity is either that it's not a product of academia or that its corporate origins made hackers shun it from day one. I believe there's some truth to both theories, but I think it's obscure mostly because A) it runs on mainframes and mainframes are fairly scarce and B) it frankly isn't particularly useful to anyone except the kind of people who need a mainframe or two to keep their business running smoothly.

Many popular languages originate from decidedly corporate settings: SQL and C are prime examples, as is JavaScript. The latter was invented at Netscape (a corporation), hijacked by Microsoft (another corporation) and is now controlled by a dull standards committee rather than a disorganized band of bearded freedom lovers. In fact, most popular languages today are produced by megacorps and/or governed by standards bodies and foundations, filled to the brim with managerial professionals financed by corporate stakeholders. PHP is one of the few widely used languages with genuinely hackerish roots and it gets just as much flak as COBOL, if not more.

Instead, I'd say the main difference between COBOL and languages preferred by hacker type persons isn't the suits and ties behind them, but rather that hacker type languages invite to tinkering and experimenting, whereas COBOL invites to formatting evenly spaced columns of purchase orders.

It's true that COBOL isn't held in high esteem in computer science circles. Edsger Dijkstra once claimed that "the use of COBOL cripples the mind," but then again he was also of the opinion that anyone who's been exposed to BASIC programming is "mentally mutilated beyond hope of regeneration." I'm not sure if I agree with those particular assessments - in fact, a lot of computer scientists today probably got started with BASIC on an 8-bit micro. But I digress. COBOL just isn't a very interesting language to computer science type persons because it's not made for solving computer science type problems. Despite this, COBOL courses have been offered at many universities and schools because, if nothing else, it's a useful skill when looking for gainful employment.

A few COBOL compilers have been released for home computers and PC:s, but the typical home computer programmer was (and is) probably interested in making software outside the scope of COBOL's strengths, such as games and the software used to create them (E.G. text editors, pixel painters and music trackers). Typical office dwellers will probably rather use a spreadsheet to tackle COBOL-like workloads on a smaller scale, or their PC simply runs an interface program connected to a mainframe running COBOL software written by someone else. Today, anyone so inclined can download GnuCOBOL and start writing intricate business logic within minutes.

In short, COBOL is an obscure language because its use cases are obscure, not because it was designed by a committee or because it's old and cumbersome to write. Other old and cumbersome languages are also in use for similarly specialized tasks: plenty of people still code Assembler and FORTRAN.

COBOL - the bad parts

I wouldn't want to implement a digraph i COBOL, but that's OK: it's not designed for that. There are however other, more basic aspects of it that might cause a certain aversion among programmers today. I don't think I could cook up a way to do reliable recursion in it. It's got a single global scope. Labels are used for flow control, though subroutines can be constructed and CALL:ed by the main program, oftentimes as separate programs (a concept familiar to REXX and ARexx aficionados). COBOL 2002 introduced user defined functions and object oriented programming, but it seems few COBOL shops utilize such new-fangled trickery. Reasons for this may be poor compiler support, fear of clashes with an existing code base and plain old programmer stubbornness.

Still, considering the fact that COBOL is a domain specific language and factoring in its age, parts of it is both surprisingly powerful and delightfully interesting. Some constructs feel eerily modern, like overflow handling:

              ON SIZE ERROR
                STOP RUN.

Despite such niceties, writing COBOL (including COBOL 2002) will be challenging to a programmer coming from other (high level) languages. It's not that COBOL is harder to learn than any other language - quite the opposite, I dare say - it's just that once you get past the initial hurdles of writing code in any language, you'll notice that stuff like lists, objects and first class functions are pretty nice inventions and that syntactic sugar, when used wisely, can increase productivity without necessarily making the code harder to read.

COBOL was designed to read like English, but in reality that's only true for very short programs: even for simple textbook examples the syntax soon grows unwieldy. The fact that all variables, not just the ones pertaining to dataset descriptions, must be declared in a separate section at the start of the program means there's going to be a lot of jumping back and forth between the top and bottom of the code when you realize you've forgotten to declare a temporary variable - something COBOL programs tend to use a lot of.

COBOL is also, because of its rigidity, showing its age even when performing the sort of work it was originally constructed to simplify.

Unstring my heart

Consider the ostensibly mundane task of moving the last word of an arbitrarily long string to its beginning, for example turning "up with it I will not put" into "put up with it I will not". Depending on your language of choice, you could for example find the position of the last space and concatenate the two portions together, or split the string into a list of words which can then be rearranged and joined back into a string.

This isn't impossible to do in COBOL, but to anyone who's used any popular modern language (for varying values of modern - both REXX and ANSI C can solve it more elegantly IMHO) it's going to feel rather clunky. COBOL has the UNSTRING statement which can split strings on arbitrary delimiters, but this alone won't get you all the way: UNSTRING typically requires you to know exactly how many parts a string can be split into and declare placeholder variables in advance.

COBOL also allows substrings: if the value of STR is "Hello there", DISPLAY STR(1:5) will echo "Hello". But to utilize this, you'd need to know the position of the relevant space. In COBOL, a string is always as long as you say it is when you declare it and it's rather tricky to tell where in a variable the actual string ends and filler spaces start.

You can also declare arrays of sorts, called tables in COBOL, but they have roughly the same behavior as strings: if it's a table of strings, all indices up to the upper bound will be initialized to filler spaces and you must declare an upper bound. (In fact, a table of strings is more like a long string sectioned at given intervals.)

UNSTRING and its counterpart STRING can work in incremental steps and keep track of the last affected character position using a counter variable, called "pointer" in COBOL. Using INSPECT (used for counting character occurrences) and intrinsic functions (added in 1989) you can employ a few tricks to determine the actual end of a string's contents.

Knowing all of this, you could either employ an intricate series of UNSTRING and STRING operations using temporary variables (commonly prefixed with HOLD- in COBOL, such as HOLD-WORD) and pointers, or filling a table with the words, one at a time, which will require you to keep track of the number of occupied indices in the table by hand, and using PERFORM loops to manually count character positions.

My point here is that tasks that are commonplace in transaction processing, such as working with strings, can quickly become a chore and you end up with code that neither reads very much like English nor resembles any commonly used programming language. Functionality that was no doubt a revelation in the 1960:s must have started to feel crippling no later than in the 1980:s, when even the most stubborn of mainframe programmers would've had a chance to compare it to the relative simplicity and elegance of REXX.

This doesn't mean that writing COBOL has to be a horrible experience. Modern development tools (including various kinds of code generation) help reduce boilerplate and excessive typing, and the limitations and quirks can be seen as either a source of frustration or as something to tackle with creativity.

The past, present and future of COBOL

From time to time, COBOL crops up in the news, usually touted as an arcane language on the verge of extinction, causing all kinds of mayhem at some governmental institution. Of course a poorly documented and maintained system written some time in the 1980:s will cause trouble when it needs to be modified, but perhaps the issues with COBOL in the public sector are more related to the public sector than to COBOL: most banks run on COBOL and don't seem to have the same problems.

It's also sometimes mentioned that what was once the most widely used programming language in the world has now been relegated into an obscure niche actor. The fact is of course that it was extremely niche to own and even work with computers during COBOL's dominant years and the types of businesses using it back then were roughly the same as those using it today.

Many, if not all COBOL systems are "legacy" in some sense. That's not particularly strange, after all: they're used by institutions and companies that often existed in some form well before the introduction of mainframes. The workloads they needed and still need computers for remain roughly the same now as when they bought their first mainframes way back when. COBOL has proven itself in this field; "It just works."

Fixing something that isn't broken is usually a bad idea and as programmers, I think we often confuse "boring" with "broken". Most of us would be fascinated with an old, purpose-built machine still performing an impressive amount of work, be it a combine harvester, a professional sewing machine (I know a tailor still using gear from the 1960:s) or even an old computer - as long as it's doing something quaint, like managing a school's heating system. But when it comes to Serious Computers™ running Serious Software™, we all of a sudden want to replace things on account of their age alone - even if they're still running just fine, performing the one task they were once built and bought for.

It's fun and exciting (and usually more profitable - there's some very real economic incentives here as well) to build something new, but for some reason we often find it less interesting to dive head first into a complex old code base in order to figure out how the hell it works, especially if the sole purpose of that activity is to mend a few lines of code here and there, to keep up with changing business regulations and data formats.

Banks, however, care very little about what programmers find exciting. From time to time, this may be detrimental when trying to hire new programmers, but banks are usually neither stupid nor poor and they're not just sitting around twiddling their thumbs and waiting for their mission critical systems to fail. This can possibly lead to COBOL salaries at least periodically spiking high above average, but the mythical get rich quick stories are usually just stories. There are plenty of opportunities for highly sought after programmers with specific skillsets to negotiate a nice salary when hired, but that goes for all languages and stacks, not just COBOL.

One Swedish bank has started a COBOL training program, where thousands of applicants are screened and gradually narrowed down to a class of students who are hired and paid a highly livable salary just to study COBOL for six months. One can assume that most of them will be offered continued employment by the end of the semester. Such an undertaking is both expensive and time consuming - meaning at least this particular bank is in it for the long haul. Similar programs exist in other countries and lines of business as well.

In other words: COBOL isn't going away any time soon.

Disclaimer: I'm not an experienced COBOL programmer. Anyone who is, is encouraged to correct any possible misconceptions and errors by sending me electronic mail.