Infoworks 6.1.3
Prepare Data

Exporting Data to SQL Server

SQL Server is a relational database management system that supports a number of applications, including business intelligence, transaction processing and analytics.

The process to export data to a SQL Server is described in the following sections:

Define the SQL Server Target Data Connection > Create a Pipeline to Add SQL Server Node > Export the Data

Define the SQL Server Target Data Connection

The following are the steps to configure a Postgres data connection:

  1. Navigate to the required domain and click the Target Data Connections icon.
  2. Click Add Connection.
  3. In the Add Data Connection page, provide the following connection details.
  4. Click Save.
FieldDescription
TypeThe type of target data connection. Select SQL Server.
NameThe name for the target data connection. The connection name must be unique across the specific domain.
DescriptionThe description for the target data connection.
JDBC URLThe JDBC URL to connect to the SQL Server database. The URL must be of the following format: jdbc:sqlserver://<host>:<port>;databaseName=<database_name>
User NameThe username of the SQL Server target account.
PasswordThe password of the SQL Server target account.
Additional ParamsThe optional JDBC parameters.

Create a Pipeline to Add SQL Server Node

Following are the steps to add a new pipeline to the domain:

  1. Click the Domains menu and click the required domain from the list. You can also search for the required domain.
  2. Click the Pipelines icon.
  3. In the Pipelines page, click the New Pipeline button.
  4. Enter the following details.
  5. Click Save. The new pipeline will be added to the list of pipelines.
FieldDescription
NameUnique name for the pipeline.
DescriptionDescription for the pipeline.
Execution EngineSelect the Execution Engine as Spark.
Machine Learning LibrarySelect SparkML****__.
Cluster Templates for Batch ModeSelect the default cluster template.

Design a Pipeline

Following are the steps to design a pipeline:

  1. In the Pipelines page, click the new pipeline created. The Overview page is displayed with the pipeline version details and a blank design space.
  2. Click Open Editor or click on the blank design space to open the editor page.
  3. From the left panel, drag and drop the required Source tables from the Datalake Tables tab, and SQL Server Table node from the Transformations tab, to the pipeline editor and connect them to design a pipeline.

SQL Server Table Node Configuration

Following are the steps to configure SQL Server Table node in pipeline:

  1. Double-click the SQL Server Table node. The Properties page is displayed.
  2. Click Edit Properties.
  3. Click Save.
FieldDescription
Build Mode

The options include overwrite, append, merge or insert-overwrite. Overwrite: Drops and recreates the SQL Server target.

Merge: Merges data to the existing table based on the natural key.

Append: Appends data to the existing SQL Server target.

Insert-Overwrite: Overwrites all the rows based on combination of natural keys (OR condition) and inserts new rows.

Data ConnectionThe data connection created in the " Define the SQL Server Target Data Connection" step.
Schema NameThe name of the existing schema in the SQL Server database.
Table NameThe table name for the SQL Server target.
Natural KeysThe required natural keys for the SQL Server target. This is the combination of columns to uniquely identify the row.
Is Existing TableWhen enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks.
Partition Type

The option to partition the target table based on the range condition. The options include:

Left range: Left Range means that the actual boundary value belongs to its left partition, it is the last value in the left partition.

*_Right range: *_Right range means that the actual boundary value belongs to its right partition, it is the first value in the right partition.

Partition Column

The column based on which the data will be partitioned.

Note: Columns that are of data types: String, Text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined cannot be specified

Partition ValuesThe range value to partition the target table based on the range condition.
Partition SchemeA partition scheme maps the partitions created by a partition function to a set of filegroups that you define.
Partition Scheme File GroupsThe file groups to which the partitions must be mapped.
Index Type

An Index is a key built from one or more columns in the database that speeds up fetching rows from the table or view. This key helps a database to find the row associated with key values quickly. The options include:

Clustered: A clustered index defines the order in which data is stored in the table which can be sorted in only one way. So, there can be an only a single clustered index for every table.

Non-Clustered: A Non-clustered index stores the data at one location and indices at another location. The index contains pointers to the location of that data. A single table can have many non-clustered indexes as an index in the non-clustered index is stored in different places.

Index Columns

The index columns for the target table to be created.

Note: Columns that are of data types String, ntext, text, varchar(max), nvarchar(max), varbinary(max), xml, or image cannot be specified.

Sync Table SchemaThe option to synchronise pipeline export table schema with source table schema.

NOTE Timestamp values upwards 'YYYY-MM-DD hh:mm:ss.xxx' are not supported.

Export the Data

Following are the steps to build a pipeline:

  1. Navigate to the required pipeline and click the Build icon.
  2. Select the Version and click Build Pipeline to build the pipeline version.
  3. You can click the Build Target Metadata button to enable import of multiple, sequential pipelines with export configuration.
  4. Click any of the existing builds under the Recent Builds section to view the detailed information of the job. You can view the summary. You can also download logs of the job.

Supported Datatype Mappings

Below is the list of supported source datatypes and their mapping to SQL datatypes:

Spark DatatypeSQL Datatype
DecimalTypedecimal
NUMBERint
FloatTypefloat
DoubleTypefloat
LongTypebigint
StringTypenvarchar(MAX)
DateTypedatetime
TimestampTypedatetime
ArrayTypearray
BooleanTypebit

SQL Server Pipeline Export Privileges Required

Option 1 - Infoworks Managed

Database level privileges:

  • CREATE TABLE
  • DROP TABLE
  • TRUNCATE TABLE
  • EXEC SP_RENAME

Object level privileges:

  • SELECT
  • INSERT
  • UPDATE
  • TRUNCATE
  • DELETE
  • ALTER (if Sync Table Schema is enabled)

Schema level:

if the check box for "Create a schema if it does not exist" is checked then the corresponding permissions would be needed for the SCHEMA.

Option 2 - Is Existing Table (User-Managed)

Is Existing Table checkbox. When checked, the existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks. The Is Existing Table option can be used if you do not want to grant create table and drop table statements.

Database level privileges:

  • TRUNCATE TABLE

Object level privileges:

  • SELECT
  • INSERT
  • UPDATE
  • TRUNCATE
  • DELETE
  • ALTER (if Sync Table Schema is enabled)

Schema level:

If the check box for "Create a schema if it does not exist" is checked then the corresponding permissions would be needed for the SCHEMA.

  Last updated by Monika Momaya