FLCBYT-API
FLUC Byte Interface
FLUC Byte Interface

This interface provides stream-, block-, record- or element-oriented sequential read and write access to original files. Original files can be binary, text or XML data sets as well as GZIP-, BZIP2-, XZ-, PGP-, ZIP, FLAM4 files or any other kind of original local or remote data stream format supported by FLAM. It includes all conversion and formatting capabilities of the Frankenstein Limes Universal Converter (FLUC). These conversion capabilities are also available as an in-memory variant (fcbconv/runv/finv or fcbconp/runp/finp) without file I/O.

The byte interface provides functions similar to the byte-wise file I/O functions in C. Only the file open function differs in that it takes a filename string and a format string. You can specify them using the syntax of our command line (FLCL). Instead of errno, fcberrno contains the FLAM return code. fcberrms() can be used instead of strerror(), and the complete FLAM error trace can be fetched with fcberrtr(). Additionally, this interface provides functions for getting help and syntax information for the file and format string parameters.

There is also a logging facility which causes log messages to be collected in memory. These log messages can be queried when desired.

Additionally, a set of conversion functions is available, which can be used when accessing elements (format.element()), for example to parse XML documents.

There are also function for hash, MAC and HMAC calculations, base encoding or decoding and a few other help function for example to determine the amount of processed bytes or units (records or blocks).

File definition and format strings

When reading: Through the file definition string, you specify how the read operation interprets and transforms your original data to produce an internal neutral FLAM5 element list from it. The format string describes how this element list will then be formatted for sequential byte or record access. You also have the option of getting the raw elements by specifying element formatting in the format string.

When writing: The format string defines how the provided byte stream, records or elements will be formatted into a FLAM5 element list. The file definition string describes how this element list will be written to a file. The converted data can be written to multiple targets in parallel by specifying multiple I/Os through the OUTPUT object. It is possible to specify multiple WRITE overlays or OUTPUT objects to write the same data in different formats to different files, reading the input only once.

Write of multiple members to the same open file

To write more than one member to a ZIP archive or FLAMFILE the functions fcbnewmbr() and fcbnewmbr2() can be used. These functions implement a reopen of the open file to append a new member. This is a lot faster than to close the file and open it again with the append switch for the next member.

This will work for all kind of files (not only archives) and could result in a concatenation or a bouquet of further files depending on the input-output name mapping specified in the write string.

The fcbopen() describes the first member and for all other member of this file you must call fcbnewmbr(). Here you can provide a new format and state string. If the new member is formated like the previous member, the format string can be NULL. In this case the current formatting will be reused. With fcbnewmbr2 you have the possibility to get the interim statistics buffer too.

Here is the typical program sequence in pseudo code:

   hdl=fcbopen("write.text(... archive.zip(member='[name]'))","format.text()","state(orgf='first.member.txt')");
   if (hdl==NULL) exit(1);
   fcbwrite(...,hdl);
   ...
   fcbwrite(...,hdl);
   hdl=fcbnewmbr("format.record()","state(orgf='second.member.txt')");
   if (hdl==NULL) exit(2);
   fcbwrite(...,hdl);
   ...
   fcbwrite(...,hdl);
   hdl=fcbnewmbr(NULL,"state(orgf='third.member.txt')");
   if (hdl==NULL) exit(3);
   ...
   ...
   ...
   fcbclose(hdl);

Formatting

When reading or writing, you can define how to format the data (binary, character, text, xml, ...), use different conversions (Base64, OpenPGP, GZIP, CHRSET, ...) executed sequentially and different kinds of I/O methods (block, record, text, FLAM4, ...). This means that you can read and write compressed, encrypted and encoded files, where you can change the character set and other things as part of the write and read operation.

Through the format string, you specify how the data is transformed into a sequential data stream, records or elements. For example, if the data contains text, you can define which delimiter to add after each line, which character set the output should be in. Please use the interactive help function fcbhelp() for more information.

For stream-oriented data handling, you can choose between binary, character, text and XML formatting. For read operations, you can use the auto detection capabilities. For example:

fcbopen("read.file='filename'","format.record()")

This results in any kind of file (encoded, encrypted and/or compressed) being converted to records. If the content is XML, then the XML data is pretty printed into records. If it is text, then the data is parsed based on the containing delimiters. If it is binary and record lengths are detected, then the records are provided one by one. If it is binary and no record lengths are known or detected, then the data is wrapped into records.

Record formatting

If you use record formatting, then the read and write functions do not operate like in stream-oriented I/O. Instead, the behavior is more similar to the fread() and fwrite() operations in z/OS with 'type=record', except that you can define whether records are truncated (size>0) or a length error occurs (size==0) of the provided buffer is too short. With size=-1 (SIZE_MAX), there is also a zero-copy mode for optimized performance.

The record formatting can be useful for normal text files to read the text record-by-record without the delimiter and null termination at the end (see fcbputs() and fcbgets()).

Element formatting

FLAM5 elements are parsed data elements with a type, a length, a value and more. If the element data contains printable characters, then these characters are encoded in UTF-8 by default, but can also be converted as needed.

With element formatting, you can read and write a serialized form of FLAM5 elements. For example, it can be used to tokenize an XML document and read these tokens (elements) for further processing. The serialized element format is described by the struct FlmElmRec0.

Element conversion

When using element formatting, you can use several individual element data conversion modules for per-element data conversion. If no converter is used, the element data is simply copied into application memory to build the element structure (FlmElmRec0). A set of functions with 'v' at the end of the function name can be used to set a custom element data converter. A converter must be opened before using it, which can done by calling fcbopenv() with a corresponding conversion string. The conversion string describes how the data is converted from the neutral format of a FLAM5 element to the representation in the application memory (when reading) or how the application memory must be interpreted to form the corresponding neutral FLAM5 element data type (when writing). The functions fcbreadv() and fcbwritev() accept an additional conversion handle as parameter which was is obtained from fcbopenv() to replace the default/standard conversion. The fcbreadv() and fcbwritev() functions require to know the data format that will be read in advance, which is not always the case. Instead, you can also use the regular fcbread() and fcbwrite() functions and use the function fcbconv() to convert the data, if needed. In fact, this function works on arbitrary data and may also be used independently.

There is no limit on the amount of element converters that can be opened. The output length of most converts can be controlled in two ways: By passing appropriate length values in the conversion string or or by passing a buffer of appropriate length at call time.

The converter handle must be closed with fcbclosev() to release all associated resources.

Some usage scenarios for element converters:

  • Selective character conversion
  • Number conversion from/to BCD/binary integer
  • Removal of whitespace

Table support

With version 5.1.16 of FLAM the table support was introduced. For the byte interface you can activate an end of table support if you read from a file (file string). If you activate ENDOFT in the format string at write and use fcbstn() or auto detection in conjunction with an output file name containing the procession rule [table], then you can split the data in different files. But at read you have to handle FLMRTC_EOT as well as FLMRTC_EOF if the switch ENDOFT defined. If no data is read the reason could be EOF or EOT. With EOT the next read gives the data of the new table. At EOF you still get no data and still get EOF. If you activate ENDOFT at read you can use the new function fcbgtn() to get the name of the current table after fcbopen() or after EOT was recognized. This can be used to interpret the data correctly if more than one table is in a file. Even the first read after open might might give no data and EOT in case the automatic detection needs more data to analyze the correct format. Only further read calls will retrieve the data eventually. Depending on the block size in the block oriented approach at table change (EOT) with out data could also be happen somewhere in the middle. If block oriented reading is used it is important not to ignore this. If you want to write more than one table to a file, you can work with the table format detection, but normally it is better to define the table format with the new function fcbstn(), before you provide the data. If you set the table name then the data must match this table format (row specification). The table format detection is disabled if fcbstn() is used. You can reactivate the automatic table detection at write with a call to fcbgtn(). Setting of the table name only works if no remainder is in the buffer. If a block oriented write method is used, it is required that the last row is complete before using fcbstn(). If the record oriented write method is used, each record must fit within the current table format. In this case a remainder in the buffer is not possible. Usually we recommend the record oriented approach to read and write tables.

Environment variables

For all default character conversions, it is useful to set the environment variable LANG. Other used environment variables of FLAM can be found in the FLCL manual. With version 5.1.19 a new function was introduced (fcbenv()) to load the FLAM environment . This function can be used before the first API call to establish the same environment used by FLAM utilities, subsystems and so on. This give the application developer the possibility to adjust the environment before the first real call is done. Until version 5.1.18, each opening function has read the system variables on z/OS. This is now part of the fcbenv() function to give complete control about the environment to users of the API. To fetch a symbol from the environment, the function fcbsym() can be used.

Special EBCDIC code page support

On system using EBCDIC a special support for critical punctuation characters was implemented (see FLCL manual). This support converts the several punctuation character from a certain EBCDIC code page to the local character set defined over the LANG variable (if LANG not defined the default is 1047). Below you can find the list of character with different code points in the different supported EBCDIC code pages, which are part of the first 128 Unicode code points.

   CRITICAL PUNCTUATION CHARACTERS:
      ! $ # @ [ \ ] ^ ` { | } ~
   SUPPORTED EBCDIC CODE PAGES FOR COMMAND ENTRY:
      "IBM-1140","IBM-1141","IBM-1142","IBM-1143",
      "IBM-1144","IBM-1145","IBM-1146","IBM-1147",
      "IBM-1148","IBM-1149","IBM-1153","IBM-1154",
      "IBM-1156","IBM-1122","IBM-1047","IBM-924",
      "IBM-500","IBM-273","IBM-037","IBM-875","IBM-424",
      "IBM-277","IBM-278","IBM-280","IBM-284","IBM-285",
      "IBM-297","IBM-871","IBM-870","IBM-1025","IBM-1112",
      "IBM-1157"

This conversion is required to interpret the command syntax correctly. These CLP strings are the major part used by this interface. To work with this API the user must build such CLP string. For this often literals are used. On EBCDIC systems you can define in which CCSID (code page) the literals are provided by the compiler. For example, in C/C++ the default, if the CONVLIT() parameter not defined, is no conversion, this means that 1047 is normally used.

Your application could get variables from outside (e.g. file names) in the local character set (e.g. 1141). Your literals are in 1047 and you must build a CLP string with literals and variables. To support this kind of inconsistent code pages (since version 5.1.19 of FLAM) escape sequences (&xxx;) the CCSID areas (&nnnn;...&nnnn;) are supported (see FLCL manual). Below you can find two examples for a CLP info command how to use it.

   snprintf(acClp,sizeof(acClp),"get.file='&TLD;/mydata.bin'")
   snprintf(acClp,sizeof(acClp),"&1047;get.file='&0000;%s;&1047;'",fn)

An unsupported CCSID (e.g. 0) can be used to define the local character set as code page (default case). To get an area for the literal code page you must add a CCSID escape sequence (&1047;). If you have a variable part, you must switch to the local character set (&0000;) or the correct CCSID for this variable and if the literal continued switch back to the literal CCSID (&1047;). If you need such moving characters in your literals you can use the corresponding escape sequence for it. Below you can find all escape sequences for the critical punctuation characters.

   ! = &EXC;   - Exclamation mark
   $ = &DLR;   - Dollar sign
   # = &HSH;   - Hashtag (number sign)
   @ = &ATS;   - At sign
   [ = &SBO;   - Square bracket open
   \ = &BSL;   - Backslash
   ] = &SBC;   - Square bracket close
   ^ = &CRT;   - Caret (circumflex)
   ` = &GRV;   - Grave accent
   { = &CBO;   - Curly bracket open
   | = &VBR;   - Vertical bar
   } = &CBC;   - Curly bracket close
   ~ = &TLD;   - Tilde

This two feature gives you the possibility to build CLP strings Independent of the EBCDIC code page used for literals and as local or system character set.

Info command

The interface also provides access to the info command. fcbinfo() can be used, for example, to get information about files and supported CCSIDs.

Hash, HMAC and MAC calculations with clear keys

The byte interface provides a set of function to calculate hash, MAC and HMAC values. fcbhini(), fcbhadd() and fcbhfin() can be used to calculate a checksum of multiple consecutive buffers of the data. fcbhash() does the same, but processes only a single buffer and returns the result.

Base encoding decoding

There is a set of functions to encode and decode binary data to/from base64/32/16. fcbbini(), fcbbrun() and fcbbfin() can be used to convert multiple consecutive buffers of data whereas fcbbase() converts a single buffer of data independent from previous calls.

Statistics

There are some additional open and close functions available that take a few more parameters. With fcbopen2() the internal state of FLUC can be accessed after an open for read and set before open for write. This is required to set the file attributes of the source in archive headers like GZIP, PGP and FLAM. fcbclose2() returns the statistics information collected by FLUC.

Thread-safety

The byte interface is thread-safe on all supported platforms except USS on zSeries. On classical mainframe operating systems like z/OS, threads are not supported but the interface is in general reentrant and can be used in parallel by different processes. The compiler switch DEFINE(__HOST__) or DEFINE(__USS__) must be defined on z/OS systems.

Hints for z/OS

To use empty parameter list (fcberrtr()) and fix integer data types the language level on z/OS must at minimum C99. Additionally, the long name support and DLL usage must be activated.

LANG(EXTC99),LO,DLL

Sample programs

Several sample program in C with name SCFCBCPY/ELM/DOM can be found as part of the installation package for mainframe systems in the library SRCLIBC(SCFCBCPY/ELM/DOM), with the corresponding compile and link step in JOBLIB(SBUILD). For other platforms (Windows, UNIX) the sample program source of SCFCBCPY/ELM/DOM is located in the 'sample' directory and the compile and link procedures can be found in the Makefile of the same directory.

  • SCFCBCPY: Makes byte interface available as utility (you can define all 4 fcbopen() strings)
  • SCFCBELM: Read a file and writes each element to an element dump list to STDOUT
  • SCFCBDOM: This utility reads a XML file to a DOM tree and write the tree to STDOUT