{ datagubbe }


datagubbe.se » tables and strings in cobol

Tables and Strings in COBOL

Big Data like it's 1985

Spring 2022

I recently came across a blog post dealing briefly with the concept of strings, tables and subscripting in COBOL. While the code in the blog post works just fine, I personally think it's overcomplicating a very simple use case (subscripting a string) and underselling a powerful COBOL feature (tables). Since I'm a deeply demented man with a lot of free time on my hands, I decided to expand a bit on the subject - if only to give myself a chance of brushing up on my own very rudimentary COBOL knowledge. Feel free to point out any errors.

Table of Contents

Subscripting Strings

The original blog post is correct in that strings in COBOL can't strictly speaking be subscripted. It's also correct in that you can create a table with an item length of one, put your string in that and then subscript it. That is, if you have the string "Hello, World!" in a table accessible as mystring and want to access the first character ("H"), you'll be able to write mystring(1).

But there's a much easier and more powerful way to do this in COBOL, called reference modification - or as many other languages call it, substrings. It's easier because you don't have to define a table and more powerful because, unlike the table solution, you can access an arbitrary length of the string in one go.

If mystring is instead an ordinary string, you can access its first character with mystring(1:1). The first integer before the colon is the starting character position and the integer after is the desired substring length.

This can of course return an arbitrary string length from an arbitrary position, such as mystring(2:4) and even the remainder of a string from a given starting point, such as mystring(3:). In short, it's similar to substring handling in many other languages.

Reference modification can also be used for value assignment, as in the following program:

IDENTIFICATION DIVISION.
PROGRAM-ID. "Strings".

DATA DIVISION.
LOCAL-STORAGE SECTION.
  01 string-a PIC X(10).
  01 string-b PIC X(20).

PROCEDURE DIVISION.
  MOVE ALL "foo" TO string-a.
  DISPLAY string-a.
  MOVE ALL "bar" TO string-b.
  DISPLAY string-b.
  MOVE string-a(2:5) TO string-b(8:5).
  DISPLAY string-b.
  STOP RUN.

This will yield the following output:

foofoofoof
barbarbarbarbarbarba
barbarboofoobarbarba

Doing something similar with tables alone would require a lot of STRING:ing together individual characters into temporary variables. Since reference modification was introduced in COBOL-85, I dare say it's going to be available on all but the most ancient of legacy systems.

Creating Tables

This doesn't mean that tables aren't useful, because they provide additional constructs and abstractions for working with data. Consider the following code:

DATA DIVISION.
LOCAL-STORAGE SECTION.
  01 str-tbl.
    02 str3 PIC XXX OCCURS 5 TIMES.

Here, we've defined the table str-tbl, which can hold five str3 items, each with a length of three characters (that's what XXX means; it can also be written as X(3)). Now, let's populate it with some items:

PROCEDURE DIVISION.
  MOVE "abc" TO str3(1).
  MOVE "def" TO str3(2).
  MOVE "ghi" TO str3(3).
  MOVE "jkl" TO str3(4).
  MOVE "mno" TO str3(5).

If we wanted to pick the fifth element from a populated str-tbl, we'd subscript it by referencing the item name: str3(5). We'd now get a three-character string back, since that's how we've defined it. So, DISPLAY str3(5) would print "mno".

This subscript can be combined with reference modification, which means that DISPLAY str3(5)(1:2) would print "mn".

We can still deal with the whole table as a string, meaning DISPLAY str-tbl will print "abcdefghijklmno" and DISPLAY str-tbl(1:1) will print "a".

Why call it "Tables"?

Things in COBOL often differ from other languages, because COBOL is, in many ways, not like other languages. That could perhaps suffice as an explanation of why tables are called tables, but I'd argue that the reason they're called tables is because they are, well, tables. They can be subdivided into multiple fields, and they can be sorted and searched in ways that are reminiscent of SQL.

Consider the following table definition:

01 mix-tbl.
    02 mix-item OCCURS 4 TIMES.
      03 mix-num PIC 99.
      03 mix-str PIC XXX.

Here, we've defined the table mix-tbl, in which we'll store three of the item (or record, as the COBOL lingo goes) mix-item. The record itself consists of both a numerically formatted mix-num field and the alphanumeric field mix-str. (Having the option of arbitrarily formatted fields in a table record means you could feed in the numbers "07250" and get a nicely formatted cost back, E.G. "$72,50". How's that for the awesome power of COBOL, eh?)

We can now populate this table in a number of ways, though I strongly advice to always populate individual record fields. Here are a few different varieties:

MOVE "03Aaa" TO mix-tbl.
MOVE "11Bbb" TO mix-item(2).
MOVE 2 TO mix-num(3).
MOVE "Ccc" TO mix-str(3).
MOVE 2 TO mix-num(4).
MOVE "Ddd" TO mix-str(4).

We could of course also populate our table by reading from a file, but let's leave that for another time.

Bad Table Practices

It's important to note that COBOL will only format our numeric values for us if we perform atomic assignments to the individual record fields. If the first assignment above had read MOVE "3Aaa" TO mix-tbl, we'd have quite a problem on our hands, because COBOL would then happily put "3A" into our mix-num field.

With that out of the way, let's continue on!

Sorting Tables

If we wanted to look at mix-tbl in its entirety, we could now simply DISPLAY mix-tbl, which would give us the output "03Aaa11Bbb02Ccc02Ddd".

We could also access for example mix-item(3), giving "02Ccc", or mix-num(3) and mix-str(3) giving "02" and "Ccc", respectively. We can also easily sort the table using the SORT instruction.

SORT mix-item
  ASCENDING mix-num
  DESCENDING mix-str.

Note that just like in SQL, we can sort fields in different orders and according to an arbitrary chain of precedence. The table is now sorted in place; mix-num(3) will give us "03" and mix-str(3) will give us "Aaa".

Here's the entire program:

IDENTIFICATION DIVISION.
PROGRAM-ID. "Sorting Tables".

DATA DIVISION.
LOCAL-STORAGE SECTION.
  01 mix-tbl.
    02 mix-item OCCURS 4 TIMES.
      03 mix-num PIC 99.
      03 mix-str PIC XXX.

PROCEDURE DIVISION.
  MOVE "03Aaa" TO mix-tbl.
  MOVE "11Bbb" TO mix-item(2).
  MOVE 2 TO mix-num(3).
  MOVE "Ccc" to mix-str(3).
  MOVE 2 TO mix-num(4).
  MOVE "Ddd" to mix-str(4).

  DISPLAY mix-tbl.
  DISPLAY mix-item(2).
  DISPLAY mix-str(3).
  DISPLAY mix-num(3).

  SORT mix-item
    ASCENDING mix-num
    DESCENDING mix-str.

  DISPLAY mix-tbl.
  DISPLAY mix-item(2).
  DISPLAY mix-str(3).
  DISPLAY mix-num(3).
  DISPLAY mix-str(4).
  DISPLAY mix-num(4).

  STOP RUN.

It should produce the following output:

03Aaa11Bbb02Ccc02Ddd
11Bbb
Ccc
02
02Ddd02Ccc03Aaa11Bbb
02Ccc
Aaa
03
Bbb
11

Searching Tables

Tables can also be searched. In order to perform a search, our table must be indexed, which we'll tell it with the INDEXED BY instruction when defining it:

DATA DIVISION.
LOCAL-STORAGE SECTION.
  01 product-tbl.
    02 product-item OCCURS 5 TIMES INDEXED BY idx.
      03 product-name PIC X(8).
      03 product-price PIC $ZZ.
  77 search-query PIC X(8).

Once this table is populated, we can now search it using the SEARCH construct, which follows a common pattern in COBOL. It's got two sub-clauses, one of which is AT END, which in the case of SEARCH means we've reached the end of the table without finding a matching search criteria. (When reading files in COBOL, you perform your typical line reading in a NOT AT END clause, which I find both confusing and amusing.)

SEARCH product-item
  AT END
    DISPLAY "No matches for "search-query
  WHEN product-name(idx) = search-query
    DISPLAY product-name(idx)": "product-price(idx)
END-SEARCH.

In this case, we're searching for a product name and when it's found, we display its price. Here's all of the code:

IDENTIFICATION DIVISION.
PROGRAM-ID. "Searching".

DATA DIVISION.
LOCAL-STORAGE SECTION.
  01 product-tbl.
    02 product-item OCCURS 5 TIMES INDEXED BY idx.
      03 product-name PIC X(8).
      03 product-price PIC $ZZ.
  77 search-query PIC X(8).

PROCEDURE DIVISION.

*> Populate and print our table.
  PERFORM VARYING idx FROM 1 BY 1 UNTIL idx=6
    STRING "Product" FUNCTION CHAR(65 + idx) INTO product-name(idx)
    COMPUTE product-price(idx) = idx * 10
    DISPLAY product-name(idx) " : " product-price(idx)
  END-PERFORM.

*> Search with mismatch.
  MOVE "NotFound" TO search-query.
  PERFORM Search-Table.

*> Search with match.
  MOVE "ProductC" TO search-query.
  PERFORM Search-Table.

  STOP RUN.

  Search-Table.
    MOVE 1 TO idx.
    SEARCH product-item
      AT END
        DISPLAY "No matches for "search-query
      WHEN product-name(idx) = search-query
        DISPLAY product-name(idx)": "product-price(idx)
    END-SEARCH.

The above program should output the following:

ProductA : $10
ProductB : $20
ProductC : $30
ProductD : $40
ProductE : $50
No matches for NotFound
Found ProductC: $30

On a sorted table, we could also perform a binary search using SEARCH ALL.

Another Dimension

Multi-dimensional tables can also be defined. We can add to our mix-tbl:

IDENTIFICATION DIVISION.
PROGRAM-ID. "Multidimensional Tables".

DATA DIVISION.
LOCAL-STORAGE SECTION.
  01 mix-tbl.
    02 mix-item OCCURS 3 TIMES.
      03 mix-num PIC 99.
      03 mix-str PIC XXX.
      03 mix-sub OCCURS 3 TIMES.
        04 sub-num PIC 99.

PROCEDURE DIVISION.
  MOVE 30 to mix-num(3).
  MOVE 31 to mix-sub(3,1).
  MOVE 32 to mix-sub(3,2).
  MOVE 33 to mix-sub(3,3).

  DISPLAY mix-sub(3,2).

  STOP RUN.

This will now of course output "32".

A note on pointers

Another way of accessing arbitrary positions in COBOL strings are pointers. They're not exactly of the C variety, though they have some vague similarities to the pointer arithmetic used when working with string parsing in C. COBOL pointers are used together with the instructions STRING and UNSTRING, to handle character positions during parsing/tokenization.

STOP RUN.

That's enough COBOL for one helping. Thanks for reading and Happy Hacking!