Infoworks 6.1.3
Onboard Data

Onboarding Data from Mainframe Data Files (COBOL Copybook)

Creating a Mainframe Data Files Source

For creating a Mainframe Datafiles source, see Configuring Additional Connectors. Ensure that the Source Type selected is Mainframe Data Files.

Mainframe Datafiles Configuration

FieldDescription
Source NameThe source name of the target table. The source name must be unique and must not contain space or special characters except underscore. For example, Customer_Details.
Source File Location

The storage systems location where your files are stored. You can select one of the following options:

  • Databricks File System (DBFS)
  • Remote Server (using SFTP)
  • Cloud Storage
SFTP HostThe SFTP host from which the data will be read.
SFTP PortThe SFTP port where the data will be run.
User NameThe username to log in to the SFTP host.
Cloud Type

The cloud storage where the data is stored. You can select the following options for the cloud type:

  • Azure Blob Storage (WASB)
  • Amazon S3
  • GCS
  • Azure DataLake Storage (ADLS) Gen 2
Container NameThe name of the container in the storage account, in which the files are stored. A container organizes a set of blobs, similar to a directory in a file system. For more details, see Create a Container.
Storage Account NameThe unique name of the storage account. For more details, see Create a storage account.
Project IDThe ID of the project in the source. For more details, see Project ID.
Authentication MechanismThe authentication mechanism using which security information is stored. i. For Remote server (Using SFTP), select if you want to authenticate using private key or password. ii. For Cloud Storage, Select Access Key to authenticate using access key or select None to access data from the public cloud storage folders.
Password

Type the password to log in to the SFTP host.

NOTE This field appears only when Using Password is selected as Authentication Mechanism.

Private Key

Type the private key to log into the SFTP host. It can either be a text, uploaded, or a path on the edge node.

NOTE This field appears only when Using Private Key is selected as Authentication Mechanism.

When using Private Key as authentication mechanism:

  • The client public key needs to be added under ~/.ssh/authorized_keys on the SFTP server. The corresponding private key on the job cluster will be used to connect.
  • The private key should be in RSA format. If it is available in OpenSSH format, use the command "ssh-keygen -p -f <file> -m pem" to convert it into RSA format.
Enable support for Azure Government Cloud RegionsSelect this check box to enable ingestion from a source available in the Government Cloud regions. This option is not editable if the data source tables are already created.
Storage Account KeyThe storage account access key. This option is displayed if the Authentication Mechanism used is Account Key. For more details, see Manage storage account access keys.
Access IDThe access ID uniquely identifies an AWS account. You can use the access ID to send authenticated requests to Amazon S3. The Access ID is a 20-character, alphanumeric string, for example, AKIAIOSFODNN7EXAMPLE. For the access ID details, contact your AWS Cloud admin. For more details, see AWS Account Access Keys.
Secret Key

The secret key to send requests using the AWS account. The secret access key is a 40-character string. For example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY.

For the secret key details, contact your AWS Cloud admin.

Source Base Path

The bucket where all the Mainframe copybook files to be accessed are stored. For more details, see Creating a bucket. The base of all directories that will be read from the file system.

  • For GCS cloud type, the source base path is gs://.
  • For Azure Blob Storage (WASB), the source base path is wasbs://<container_name>@<account_name>.blob.core.windows.net/
  • For Amazon S3, the source base path is s3a://.
Access SchemeScheme used to access ADLS Gen2. The available options are abfs:// and *abfss:// * This field appears only for Azure DataLake Storage (ADLS) Gen 2 cloud_._
File SystemProvide a file system where all the data of an artifact will be stored.
Access Key

Provide the storage access key to connect to ADLS.

NOTE This field appears only when the authentication mechanism is selected as Access Key in Azure DataLake Storage (ADLS) Gen 2 cloud.

Application ID

Provide the ID that uniquely identifies the user application.

NOTE This field appears only when the authentication mechanism is selected as Service Principal in Azure DataLake Storage (ADLS) Gen 2 cloud.

Directory ID

Provide the ID that uniquely identifies the Azure AD instance.

NOTE This field appears only when the authentication mechanism is selected as Service Principal in Azure DataLake Storage (ADLS) Gen 2 cloud.

Service Credential

Provide the credential string value that the application uses to prove its identity.

NOTE This field appears only when the authentication mechanism is selected as Service Principal in Azure DataLake Storage (ADLS) Gen 2 cloud.

Authentication MechanismThe mechanism used to provide security credentials for accessing the Azure File Share. Options include: Access Key: A storage account key granting full access to the file share. SAS Token: A shared access signature token providing scoped, time-limited access to the file share.
Share NameThe name of the Azure File Share containing the files to be accessed. Example: myfileshare This identifies the specific share within the storage account.
Storage Account NameThe name of the Azure Storage account hosting the file share. Example: mystorageaccount. This is used to construct the endpoint URL (e.g., https://mystorageaccount.file.core.windows.net). For account details, contact your Azure admin.
Source Base PathThe directory path within the file share where ingestion begins. Example: data/input/
Access Key (if selected)The storage account key for authentication. A 88-character base64-encoded string, e.g., abc123xyz.... Required if “Access Key” is chosen as the authentication mechanism. For the key, contact your Azure admin.
SAS Token (if selected)The shared access signature token for authentication. Example: sp=r&st=2025-04-01T00:00:00Z&se=2025-04-02T00:00:00Z&spr=https&sig=abc123.... Required if “SAS Token” is chosen.

Configuring File Mappings

NOTE You can configure the File Mappings by uploading the Copybook file and providing the Source Data file details.

To configure the File Mappings using the Copybook file, perform the following steps:

  1. Click Add Table.
  2. Enter the following fields in the Copybook Details section:
FieldDescription
Comment Up to CharBy default each line starts with a 6 character comment. The exact number of characters can be tuned using this option.
Comment After CharBy default all characters after 72th one of each line is ignored by the COBOL parser. The exact number of characters can be tuned using this option.
Copybook FileBrowse the copybook and upload the copybook file.
Record Type

This field indicates the type of records present in the Mainframe data files. The type of records are presented by the following keywords:

Fixed length (F). It is the default value.

Fixed Block (FB)

NOTE The Records Per Block field will be displayed only in case of fixed block (FB)record type. It indicates the number of records present in a single block.

Variable Length RDW (V)

NOTE The Record Header Encoding field will be displayed only in case of variable length (V)record type.

The following options are available:

  • Big Endian: In Big endian machines, the first byte of binary representation of the multibyte data-type is stored first.
  • Little Endian: In Little Endian, the last byte of binary representation of the multibyte data-type is stored first.

Variable Block BDW+RDW (VB)

NOTE The Record Header Encoding and Block Header Encoding fields will be displayed only in case of fixed block (VB)record type.

The following options are available for both Record Header Encoding and Block Header Encoding fields:

  • Big Endian: In Big endian machines, the first byte of binary representation of the multibyte data-type is stored first.
  • Little Endian: In Little Endian, the last byte of binary representation of the multibyte data-type is stored first.

Little and big endian are the two ways of storing multibyte data-types ( int, float, and so on).

ASCII Text (D)

Variable Length (RDW) custom header (VCH)

NOTE The following fields will appear for Variable Length (RDW) custom header record type:

Header: It identifies the record uniquely in variable record length files.

Comment Up to Char: For more information, refer to the top of this table.

Comment After Char: For more information, refer to the top of this table.

Copybook File: Provide the Copybook file related to the header.

Variable Block BDW+RDW custom header (VBCH)

Variable Length (RDW) custom header (VCH)

NOTE The following fields will appear for Variable Length (RDW) custom header record type:

Header: For more information, refer to the above field.

Comment Up to Char: For more information, refer to the top of this table.

Comment After Char: For more information, refer to the top of this table.

Copybook File: For more information, refer to the above field.

Flatten FieldsThis checkbox indicates if the complex fields (for example, struct) in the target should be flattened or not.

Configure the Data File Details

Enter the following details in the Source Data Files section for configuring the data files.

FieldDescription
Source Relative PathThe path to the directory where the data files are stored (relative to the source base path). (For example, /Cobol)
Include FilesThe regex pattern to include files. Only the files matching this Java Regex will be crawled.
Exclude FilesThe regex pattern to exclude files. The files matching this Java Regex will not be crawled.
File Encoding

The encoding type for the data file that is used.

EBCDIC: The EBCDIC stands for Extended Binary-Coded Decimal Interchange Code, data-encoding system, that uses a unique eight-bit binary code for each number and uses alphabetic characters, punctuation marks, accented letters, and non alphabetic characters. *TEXT: Specifies the simple plain text.

Character SetCharacter encoding to be used.
File Header Length (Bytes)Length of the header in bytes.
File Trailer length (Bytes)Length of the footer in bytes.
Include sub directoriesSelect this option to onboard the data files present in the sub directories of the source path.
  1. Enter the following details in the Target Table Configuration section:
FieldDescription
Filter Record TypeSpecifies the value of the columns to be extracted, double-colon (::) separated. For example, D::31 .
Filter Record ColumnSpecifies the column to extract the segments, double-colon (::) separated. For example, RECORD_TYPE::RECORD_TYPE_SEQ.
Table NameThe table name used to specify in the Infoworks UI.
Target Table NameThe target table name to be created in the Datalake.
Target Schema NameThe schema name of the target table.
  1. Click Save and Crawl Schema to save the schema.

Edit the Schema and View Sample Data

To edit the schema and view the sample data, perform the following steps:

  1. Under File Mappings tab, enter the following details:
FieldDescription
Column NameEdit the column name of the table.
Transform FunctionSelect the Transform function.
TypeSelect the data types from the drop-down. The available data types are Decimal, Integer, Float, Double, String, Boolean, Date, Timestamp, Long, and Byte.
  1. Click Save Schema to save the schema successfully.
  2. Click Sample Data to view the sample data.

Configuring a Mainframe Datafiles Table

For configuring a Mainframe Data Files Table, see Configuring a Table.

  Last updated by Monika Momaya