SQL Server is a relational database management system that supports a number of applications, including business intelligence, transaction processing and analytics. |
The process to export data to a SQL Server is described in the following sections:
Define the SQL Server Target Data Connection > Create a Pipeline to Add SQL Server Node > Export the Data |
The following are the steps to configure a Postgres data connection:
Field | Description |
---|---|
Type | The type of target data connection. Select SQL Server. |
Name | The name for the target data connection. The connection name must be unique across the specific domain. |
Description | The description for the target data connection. |
JDBC URL | The JDBC URL to connect to the SQL Server database. The URL must be of the following format: jdbc:sqlserver://<host>:<port>;databaseName=<database_name> |
User Name | The username of the SQL Server target account. |
Password | The password of the SQL Server target account. |
Additional Params | The optional JDBC parameters. |
Create a Pipeline to Add SQL Server Node
Following are the steps to add a new pipeline to the domain:
Field | Description |
---|---|
Name | Unique name for the pipeline. |
Description | Description for the pipeline. |
Execution Engine | Select the Execution Engine as Spark. |
Machine Learning Library | Select SparkML__. |
Cluster Templates for Batch Mode | Select the default cluster template. |
Design a Pipeline
Following are the steps to design a pipeline:
Following are the steps to configure SQL Server Table node in pipeline:
Field | Description |
---|---|
Build Mode | The options include overwrite, append, merge or insert-overwrite. Overwrite: Drops and recreates the SQL Server target. Merge: Merges data to the existing table based on the natural key. Append: Appends data to the existing SQL Server target. Insert-Overwrite: Overwrites all the rows based on combination of natural keys (OR condition) and inserts new rows. |
Data Connection | The data connection created in the " Define the SQL Server Target Data Connection" step. |
Schema Name | The name of the existing schema in the SQL Server database. |
Table Name | The table name for the SQL Server target. |
Natural Keys | The required natural keys for the SQL Server target. This is the combination of columns to uniquely identify the row. |
Is Existing Table | When enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks. |
Partition Type | The option to partition the target table based on the range condition. The options include: Left range: Left Range means that the actual boundary value belongs to its left partition, it is the last value in the left partition. *_Right range: *_Right range means that the actual boundary value belongs to its right partition, it is the first value in the right partition. |
Partition Column | The column based on which the data will be partitioned. Note: Columns that are of data types: String, Text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined cannot be specified |
Partition Values | The range value to partition the target table based on the range condition. |
Partition Scheme | A partition scheme maps the partitions created by a partition function to a set of filegroups that you define. |
Partition Scheme File Groups | The file groups to which the partitions must be mapped. |
Index Type | An Index is a key built from one or more columns in the database that speeds up fetching rows from the table or view. This key helps a database to find the row associated with key values quickly. The options include: Clustered: A clustered index defines the order in which data is stored in the table which can be sorted in only one way. So, there can be an only a single clustered index for every table. Non-Clustered: A Non-clustered index stores the data at one location and indices at another location. The index contains pointers to the location of that data. A single table can have many non-clustered indexes as an index in the non-clustered index is stored in different places. |
Index Columns | The index columns for the target table to be created. Note: Columns that are of data types String, ntext, text, varchar(max), nvarchar(max), varbinary(max), xml, or image cannot be specified. |
Sync Table Schema | The option to synchronise pipeline export table schema with source table schema. |
Following are the steps to build a pipeline:
Below is the list of supported source datatypes and their mapping to SQL datatypes:
Spark Datatype | SQL Datatype |
---|---|
DecimalType | decimal |
NUMBER | int |
FloatType | float |
DoubleType | float |
LongType | bigint |
StringType | nvarchar(MAX) |
DateType | datetime |
TimestampType | datetime |
ArrayType | array |
BooleanType | bit |