Tables and Strings in COBOL
Big Data like it's 1985
Spring 2022
I recently came across a blog post dealing briefly with the concept of strings, tables and subscripting in COBOL. While the code in the blog post works just fine, I personally think it's overcomplicating a very simple use case (subscripting a string) and underselling a powerful COBOL feature (tables). Since I'm a deeply demented man with a lot of free time on my hands, I decided to expand a bit on the subject - if only to give myself a chance of brushing up on my own very rudimentary COBOL knowledge. Feel free to point out any errors.
Table of Contents
- Subscripting Strings
- Creating Tables
- Why call it "Tables"?
- Bad Table Practices
- Sorting Tables
- Searching Tables
- Another Dimension
- A note on pointers
- STOP RUN.
Subscripting Strings
The original blog post is correct in that strings in COBOL can't
strictly speaking be subscripted. It's also correct in that you
can create a table with an item length of one, put your string
in that and then subscript it. That is, if you have the string
"Hello, World!"
in a table accessible as mystring
and want to access the first character ("H"), you'll be able to write
mystring(1)
.
But there's a much easier and more powerful way to do this in COBOL, called reference modification - or as many other languages call it, substrings. It's easier because you don't have to define a table and more powerful because, unlike the table solution, you can access an arbitrary length of the string in one go.
If mystring
is instead an ordinary string,
you can access its first character with
mystring(1:1)
.
The first integer before the colon is the starting character position
and the integer after is the desired substring length.
This can of course return an arbitrary string length from an arbitrary
position, such as mystring(2:4)
and even the remainder of
a string from a given starting point, such as mystring(3:)
.
In short, it's similar to substring handling in many other languages.
Reference modification can also be used for value assignment, as in the following program:
IDENTIFICATION DIVISION. PROGRAM-ID. "Strings". DATA DIVISION. LOCAL-STORAGE SECTION. 01 string-a PIC X(10). 01 string-b PIC X(20). PROCEDURE DIVISION. MOVE ALL "foo" TO string-a. DISPLAY string-a. MOVE ALL "bar" TO string-b. DISPLAY string-b. MOVE string-a(2:5) TO string-b(8:5). DISPLAY string-b. STOP RUN.
This will yield the following output:
foofoofoof barbarbarbarbarbarba barbarboofoobarbarba
Doing something similar
with tables alone would require a lot of STRING
:ing together
individual characters into temporary variables. Since reference modification
was introduced in COBOL-85, I dare say it's going to be available on all
but the most ancient of legacy systems.
Creating Tables
This doesn't mean that tables aren't useful, because they provide additional constructs and abstractions for working with data. Consider the following code:
DATA DIVISION. LOCAL-STORAGE SECTION. 01 str-tbl. 02 str3 PIC XXX OCCURS 5 TIMES.
Here, we've defined the table str-tbl
, which can hold five
str3
items, each with a length of three characters (that's what
XXX
means; it can also be written as X(3)
).
Now, let's populate it with some items:
PROCEDURE DIVISION. MOVE "abc" TO str3(1). MOVE "def" TO str3(2). MOVE "ghi" TO str3(3). MOVE "jkl" TO str3(4). MOVE "mno" TO str3(5).
If we wanted to pick the fifth element from a populated str-tbl
,
we'd subscript it by referencing the item name: str3(5)
. We'd now
get a three-character string back, since that's how we've defined it. So,
DISPLAY str3(5)
would print "mno".
This subscript can be
combined with reference modification, which means that
DISPLAY str3(5)(1:2)
would print "mn".
We can still deal with the whole table as a string, meaning
DISPLAY str-tbl
will print "abcdefghijklmno" and
DISPLAY str-tbl(1:1)
will print "a".
Why call it "Tables"?
Things in COBOL often differ from other languages, because COBOL is, in many ways, not like other languages. That could perhaps suffice as an explanation of why tables are called tables, but I'd argue that the reason they're called tables is because they are, well, tables. They can be subdivided into multiple fields, and they can be sorted and searched in ways that are reminiscent of SQL.
Consider the following table definition:
01 mix-tbl. 02 mix-item OCCURS 4 TIMES. 03 mix-num PIC 99. 03 mix-str PIC XXX.
Here, we've defined the table mix-tbl
, in which
we'll store three of the item (or record, as the COBOL lingo goes)
mix-item
. The record itself consists of both a numerically
formatted mix-num
field and the alphanumeric field
mix-str
. (Having the option of arbitrarily formatted fields
in a table record means you could feed in the numbers "07250" and get a nicely
formatted cost back, E.G. "$72,50". How's that for the awesome power of
COBOL, eh?)
We can now populate this table in a number of ways, though I strongly advice to always populate individual record fields. Here are a few different varieties:
MOVE "03Aaa" TO mix-tbl. MOVE "11Bbb" TO mix-item(2). MOVE 2 TO mix-num(3). MOVE "Ccc" TO mix-str(3). MOVE 2 TO mix-num(4). MOVE "Ddd" TO mix-str(4).
- On the first line, we move the value "03Aaa" to the table itself, thus populating the first record.
- On the second line, we assign the value "11Bbb" to the second record in the table.
- On the third and fourth lines, we populate the third record by assigning mix-num and mix-str individually. The same method is repeated on the last two lines.
We could of course also populate our table by reading from a file, but let's leave that for another time.
Bad Table Practices
It's important to note that COBOL will only format our numeric
values for us if we perform atomic assignments to the individual record
fields. If the first assignment above had read
MOVE "3Aaa" TO mix-tbl
,
we'd have quite a problem on our hands, because COBOL would then happily
put "3A" into our mix-num
field.
With that out of the way, let's continue on!
Sorting Tables
If we wanted to look at mix-tbl
in its entirety, we could
now simply DISPLAY mix-tbl
, which would give us the
output "03Aaa11Bbb02Ccc02Ddd".
We could also access for example
mix-item(3)
, giving "02Ccc", or mix-num(3)
and
mix-str(3)
giving "02" and "Ccc", respectively.
We can also easily sort the table using the SORT
instruction.
SORT mix-item ASCENDING mix-num DESCENDING mix-str.
Note that just like in SQL, we can sort fields in different orders
and according to an arbitrary chain of precedence.
The table is now sorted in place; mix-num(3)
will give us
"03" and mix-str(3)
will give us "Aaa".
Here's the entire program:
IDENTIFICATION DIVISION. PROGRAM-ID. "Sorting Tables". DATA DIVISION. LOCAL-STORAGE SECTION. 01 mix-tbl. 02 mix-item OCCURS 4 TIMES. 03 mix-num PIC 99. 03 mix-str PIC XXX. PROCEDURE DIVISION. MOVE "03Aaa" TO mix-tbl. MOVE "11Bbb" TO mix-item(2). MOVE 2 TO mix-num(3). MOVE "Ccc" to mix-str(3). MOVE 2 TO mix-num(4). MOVE "Ddd" to mix-str(4). DISPLAY mix-tbl. DISPLAY mix-item(2). DISPLAY mix-str(3). DISPLAY mix-num(3). SORT mix-item ASCENDING mix-num DESCENDING mix-str. DISPLAY mix-tbl. DISPLAY mix-item(2). DISPLAY mix-str(3). DISPLAY mix-num(3). DISPLAY mix-str(4). DISPLAY mix-num(4). STOP RUN.
It should produce the following output:
03Aaa11Bbb02Ccc02Ddd 11Bbb Ccc 02 02Ddd02Ccc03Aaa11Bbb 02Ccc Aaa 03 Bbb 11
Searching Tables
Tables can also be searched. In order to perform a search, our table
must be indexed, which we'll tell it with the
INDEXED BY
instruction when defining it:
DATA DIVISION. LOCAL-STORAGE SECTION. 01 product-tbl. 02 product-item OCCURS 5 TIMES INDEXED BY idx. 03 product-name PIC X(8). 03 product-price PIC $ZZ. 77 search-query PIC X(8).
Once this table is populated, we can now search it using the
SEARCH
construct, which follows a common pattern in COBOL.
It's got two sub-clauses, one of which is AT END
, which
in the case of SEARCH
means we've reached the end of the table
without finding a matching search criteria. (When reading files in COBOL,
you perform your typical line reading in a NOT AT END
clause,
which I find both confusing and amusing.)
SEARCH product-item AT END DISPLAY "No matches for "search-query WHEN product-name(idx) = search-query DISPLAY product-name(idx)": "product-price(idx) END-SEARCH.
In this case, we're searching for a product name and when it's found, we display its price. Here's all of the code:
IDENTIFICATION DIVISION. PROGRAM-ID. "Searching". DATA DIVISION. LOCAL-STORAGE SECTION. 01 product-tbl. 02 product-item OCCURS 5 TIMES INDEXED BY idx. 03 product-name PIC X(8). 03 product-price PIC $ZZ. 77 search-query PIC X(8). PROCEDURE DIVISION. *> Populate and print our table. PERFORM VARYING idx FROM 1 BY 1 UNTIL idx=6 STRING "Product" FUNCTION CHAR(65 + idx) INTO product-name(idx) COMPUTE product-price(idx) = idx * 10 DISPLAY product-name(idx) " : " product-price(idx) END-PERFORM. *> Search with mismatch. MOVE "NotFound" TO search-query. PERFORM Search-Table. *> Search with match. MOVE "ProductC" TO search-query. PERFORM Search-Table. STOP RUN. Search-Table. MOVE 1 TO idx. SEARCH product-item AT END DISPLAY "No matches for "search-query WHEN product-name(idx) = search-query DISPLAY product-name(idx)": "product-price(idx) END-SEARCH.
The above program should output the following:
ProductA : $10 ProductB : $20 ProductC : $30 ProductD : $40 ProductE : $50 No matches for NotFound Found ProductC: $30
On a sorted table, we could also perform a binary search using
SEARCH ALL
.
Another Dimension
Multi-dimensional tables can also be defined. We can add to our
mix-tbl
:
IDENTIFICATION DIVISION. PROGRAM-ID. "Multidimensional Tables". DATA DIVISION. LOCAL-STORAGE SECTION. 01 mix-tbl. 02 mix-item OCCURS 3 TIMES. 03 mix-num PIC 99. 03 mix-str PIC XXX. 03 mix-sub OCCURS 3 TIMES. 04 sub-num PIC 99. PROCEDURE DIVISION. MOVE 30 to mix-num(3). MOVE 31 to mix-sub(3,1). MOVE 32 to mix-sub(3,2). MOVE 33 to mix-sub(3,3). DISPLAY mix-sub(3,2). STOP RUN.
This will now of course output "32".
A note on pointers
Another way of accessing arbitrary positions in COBOL strings are
pointers. They're not exactly of the C variety, though
they have some vague similarities to the pointer arithmetic used when
working with string parsing in C. COBOL pointers are
used together with the instructions STRING
and
UNSTRING
, to handle character positions
during parsing/tokenization.
STOP RUN.
That's enough COBOL for one helping. Thanks for reading and Happy Hacking!