Infoworks 6.1.3
Prepare Data

Exporting Data to Azure Synapse Analytics

Infoworks supports Azure Synapse Analytics as a target for data transformation pipeline.

Azure Synapse Analytics is a cloud-based enterprise data warehouse that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data and can be used as a key component of a big data solution.

Prerequisites

  • Ensure the Azure Synapse Analytics target data connection is configured. For the steps to configure data connection, see Setting Azure Synapse Analytics Data Connection.
  • Ensure a database master key for the Azure Synapse Analytics is available.
  • To allow Infoworks to reach Azure Synapse Analytics, it is recommended to set Allow access to Azure services to ON on the firewall pane of the Azure Synapse Analytics server through Azure portal.

Setting Azure Synapse Analytics Target Properties

Following are the steps to use Azure Synapse Analytics target in pipeline:

  1. Double-click the Azure Synapse Analytics node. The properties page is displayed.
  2. Click Edit Properties, and set the following fields:
FieldDescription
Build Mode

The options include overwrite, append, merge or insert-overwrite.

Overwrite: Drops and recreates the Azure Synapse Analytics target.

Append: Appends data to the existing Azure Synapse Analytics target.

Merge: Merges data to the existing table based on the natural key.

Insert-Overwrite : Overwrites all the rows based on combination of natural keys (OR condition) and inserts new rows.

Data Connection NameData connection to be used by the Azure Synapse Analytics target.
Schema NameSchema name of the Azure Synapse Analytics target.
Create a schema if it does not existEnable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Azure Synapse Analytics Target.
Table NameTable name of the Azure Synapse Analytics target.
Indexing Type

The indexing options include clustered columnstore index, clustered index and nonclustered index, and a non-index option - heap.

Note: Clustered Column Store and Non-Clustered indexes are not supported for tables containing column(s) of STRING data type, and Columns with STRING data type and TIMESTAMP datatype can not be used as a partitioning column.(https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver15)

Distribution TypeDistribution methods for designing distributed tables. The options include hash, replication and round-robin.
Natural KeysThe required natural keys for the Azure Synapse Analytics target.
Partition Column NameThe column name for partition.
Partition RangeThe range type. The options include left and right.
Boundary Values

The boundary points for partition (comma separated).

NOTE Boundary values must exactly match the partition column data type. For example, if the column data type is int, the boundary values can be 10,20,50,... If the column data type is date, the boundary values can be ‘11-12-2019’,’10-11-2020’,... If the column data type is string, the boundary values must be provided in single quotes (comma separated).

Sync Table SchemaThe option to synchronise pipeline export table schema with source table schema.
  1. Click Save.

Supported Datatype Mappings

Below is the list of supported source datatypes and their mapping to Azure Synapse Analytics datatypes:

Spark DatatypeSQL Datatype
DecimalTypedecimal
IntegerTypeint
FloatTypefloat
DoubleTypefloat
LongTypebigint
StringTypenvarchar(MAX)
DateTypedatetime
TimestampTypedatetime
BooleanTypebit
  Last updated by Monika Momaya