Column Specifications

The columns are specified for each sync, and determine how the data will be interpreted and transformed at the comparison stage. A full example of all general options per column is shown below:

yaml

comparisonColumns:
  - name: id
    type: Uint8
    unique: True
  - name: name
    type: Utf8
    unique: False
  - name: another_unique_id
    type: Utf8
    unique: False

Alternate column declaration

If you do not need to set any special settings for the columns, you can use the alternate syntax for column declaration to speed things up. This will cause the first field in the list to be set as unique:

yaml

comparisionColumns:
  uniqueField: Int32
  fieldA: Utf8
  fieldB: Int32

Comparison columns

The comparison columns are a representation of how the data should look when doing the comparison, meaning AFTER any source or destination transformers have been applied. You can retrieve any number of columns not listed in the columns-section, but they will not be used for comparison. You can use them for insertion and transformations, though.

Field types

There are a number of field types that can be used for the columns:

Integer types: pl.Int8, pl.Int16, pl.Int32, pl.Int64
Unsigned Integer types: pl.UInt8, pl.UInt16, pl.UInt32, pl.UInt64
Float types: pl.Float32, pl.Float64
Boolean: pl.Boolean
Utf8: pl.Utf8
Object: pl.Object

There are a number of more field types in the PolaRS library, but not all of them have been validated to work with Beetl yet. In particular, the Binary column is not going to work.

Unique Fields

The unique fields determine how the comparison is made between source and destination, and don't have to correspond exactly with all unique columns in both datasets.

In the above ID column, the unique field is set to True. This will mean that the comparison is done like this:

Match all Source IDs to Destination IDs to find those that are missing (Create)
Match all Destination IDs to Source IDs to find those that are to be deleted (Delete)
Match all rows that exist in both datasets and compare the data to find those that are different (Update)

You can add multiple ID columns for a dataset, but there should never be no unique columns.

TIP

If you transfer to/from SQL Server, make sure to transform the UUID (uniqueidentifier)s on the other end to uppercase since this is done by default in SQL Server. If you don't do this, you will likely have all records updated every time.

Multi-dimensional Data

If you use sources with multi-dimensional data (e.g. json, MongoDB, etc.), you can use the pl.Object type to store the entire data structure in a single column. You also have the option to use transformers to flatten the data structure into multiple columns

Objects

yaml

allRows:
  - name: hello
    attrs:
      height: 100
      length: 200
      width: 300
  - name: world
    attrs:
      height: 240
      length: 120
      width: 360

This data structure can be flattened into the following columns:

name	attrs.height	attrs.length	attrs.width
hello	100	200	300
world	240	120	360

Arrays

Arrays will generally be flattened to a pl.List type, and can then be modified and/or extracted with for example the strings.join or structs.jsonpath transformers

Lists cannot yet be compared against each other.

yaml

- name: hello
  attrs:
    - 100
    - 200
    - 300
  deep:
    - nested: list
      deep_attr:
        - 1
      
...

# This will turn the above field into a comma-separated string
- transformer: strings.join
  config:
    inField: attrs
    separator: ", "

# This will extract the first element of the array under attrs
- transformer: structs.jsonpath
  config:
    inField: attrs
    path: "0"

# Jsonpath can also be used for deeper data structures
- transformer: structs.jsonpath
  config:
    inField: deep
    path: "0.deep_attr.0"

Column Specifications ​

Alternate column declaration ​

Comparison columns ​

Field types ​

Unique Fields ​

Multi-dimensional Data ​

Objects ​

Arrays ​

Column Specifications

Alternate column declaration

Comparison columns

Field types

Unique Fields

Multi-dimensional Data

Objects

Arrays