Infoworks 6.1.3
Prepare Data

Exporting Data to Postgres SQL

Postgres is a powerful, open-source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads.

The process to export data to a Postgres DB is described in the following sections:

Define the Postgres Target Data Connection > Create a Pipeline to Add Postgres Node > Export the Data

Define the Postgres Target Data Connection

The following are the steps to configure a Postgres data connection:

  1. Navigate to the required domain and click the Target Data Connections icon.
  2. Click Add Connection.
  3. In the Add Data Connection page, provide the following connection details.
  4. Click Save.
FieldDescription
TypeThe type of target data connection. Select Postgres.
NameThe name for the target data connection. The connection name must be unique across the specific domain.
DescriptionThe description for the target data connection.
JDBC URLThe JDBC URL to connect to the Postgres database. The URL must be of the following format: jdbc:postgresql://<host>:<port>/<database_name>
User NameThe username of the Postgres target account.
PasswordThe password of the Postgres target account.
Additional ParamsThe optional JDBC parameters.

Create a Pipeline to Add Postgres Node

Following are the steps to add a new pipeline to the domain:

  1. Click the Domains menu and click the required domain from the list. You can also search for the required domain.
  2. Click the Pipelines icon.
  3. In the Pipelines page, click the New Pipeline button.
  4. Enter the following details.
  5. Click Save. The new pipeline will be added to the list of pipelines.
FieldDescription
NameUnique name for the pipeline.
DescriptionDescription for the pipeline.
Execution EngineSelect the Execution Engine as Spark.
Machine Learning LibrarySelect SparkML__.
Cluster Templates for Batch ModeSelect the default cluster template.

Design a Pipeline

Following are the steps to design a pipeline:

  1. In the Pipelines page, click the new pipeline created. The Overview page is displayed with the pipeline version details and a blank design space.
  2. Click Open Editor or click on the blank design space to open the editor page.
  3. From the left panel, drag and drop the required Source tables from the Datalake Tables tab, and Postgres Target node from the Transformations tab, to the pipeline editor and connect them to design a pipeline.

Postgres Target Node Configuration

Following are the steps to apply Postgres Table Target node in pipeline:

  1. Double-click the Postgres Table node. The Properties page is displayed.
  2. Click Edit Properties.
  3. Click Save.
FieldDescription
Build ModeThe options include overwrite, append or merge. Overwrite: Drops and recreates the Postgres target. Merge: Merges data to the existing table based on the natural key. Append: Appends data to the existing Postgres target. Insert-Overwrite: Overwrites all the rows based on combination of natural keys (OR condition) and inserts new rows.
Data ConnectionThe data connection created in the " Define the Postgres Target Data Connection" step for Postgres Target.
Schema NameThe name of the existing schema in the Postgres database.
Create a schema if it does not existEnable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Postgres.
Table NameThe table name for the Postgres target.
Natural KeysThe required natural keys for the Postgres target. This is the combination of columns to uniquely identify the row.
Is Existing TableWhen enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks.
Partition TypeThe type of partition for the table. By default, the Partition Type is set to None. You can also select the following partition types: List: Table is partitioned according to the listed values. If the table’s relevant columns contain data for a particular list, then the List partitioning method is used in these tables. Range: Table is partitioned according to the specific date and number range. Each partition has an upper and lower bound, and the data is stored in this range on partitions. Hash: Hash partitioning divides partitions using a hashing algorithm that partitioning key is applied that you identify.
Partition ColumnThe column based on which the data will be partitioned. This field is displayed for the list, range, and hash partition types.
Is DefaultThe condition to use the default partitioning for the child partition. This field is displayed for the list partition type. Default partition is supported only for list partition.
Child Partition TableThe name of the child partition table. This field is displayed for the list, range, and hash partition types. You can add multiple child partition tables. While creating the child partition tables, make sure that of each row of source dataset is mapped to anyone of the child partition. If a row does not have a matching child partition, it will fail.
ValuesThe values based on which the child partition will be created.
MinThe minimum value for the range. This field is displayed for the range partition type.
MaxThe maximum value for the range. This field is displayed for the range partition type.
Partition ModulusThe modulus for hash partition. This field is displayed for the hash partition type.
RemainderThe remainder associated with the child partition. This field is displayed for the hash partition type.
Index Type

The indexing option. By default, the Index Type is set to None. You can also select the following Index types:

Hash: Can be used where equal comparison is more important. You can only look up for data that matches exactly.

BTree: Is used for the data that can be sorted and can handle equality and range queries.

GIST: This option is used when the data to be indexed is more complex.

SP-GIST: This option is used when the data can be grouped into non-overlapping groupings.

GIN: This option is used when indexing data that consists of multiple elements in a single column such as arrays, json documents (jsonb) or text search documents (tsvector).

BRIN: This option is used for large size tables that have columns with some natural sort order.

Note:GIST, SP-GIST, GIN indexes are not supported with primitive datatypes. Following datatypes can be used with these indexes:

  1. GIST, SP-GIST : TSTZRANGE
  2. GIN : array
Index ColumnThe index columns for the target table to be created.
Is Unique IndexEnable this option to define a unique constraint on the btree index. This field is available only for BTree type indexing. Note: Unique constraints on partitioned tables must include all the partition key columns.
Sync Table SchemaThe option to synchronise pipeline export table schema with source table schema.

NOTE Natural columns in Postgres merge must uniquely define rows in the target table. If you are using an existing table, ensure that unique indexes are created using natural columns. For non-existing tables, Infoworks creates unique indexes using natural columns during the first build. If you are changing an already built postgres target from Overwrite mode to Merge mode, ensure that unique indexes are already created using natural keys.

Export the Data

Following are the steps to build a pipeline:

  1. Navigate to the required pipeline and click the Build icon.
  2. Select the Version and click Build Pipeline to build the pipeline version.
  3. You can click the Build Target Metadata button to enable import of multiple, sequential pipelines with export configuration.
  4. Click any of the existing builds under the Recent Builds section to view the detailed information of the job. You can view the summary. You can also download logs of the job.

Supported Datatype Mappings

Below is the list of supported source datatypes and their mapping to Postgres datatypes

Spark DatatypePostgres Datatype
DecimalTypedecimal
NUMBERint
FloatTypefloat
DoubleTypefloat
LongTypebigint
StringTypetext
DateTypedate
TimestampTypetimestamp
ArrayTypearray
BooleanTypeboolean
  Last updated by Monika Momaya