Struct Transformers
The struct transformers apply on all objects of a column
Compose struct
Composes a struct from fields on each row
- transformer: structs.compose_struct
config:
# map: A mapping of struct field names to column names from which the values should be taken (Mandatory)
map: {
"city": "city",
"street": "street_address",
"zip": "zip_code"
}
# outField: The destination field.
outField: addressExample
If the configuration above was used on this DataFrame:
| city | street_address | zip_code |
|---|---|---|
| "new york" | "Kennedy st. 1" | "11224" |
| "stockholm" | "Drottninggatan 5" | "55667" |
The resulting DataFrame would look like this:
| city | street_address | zip_code | address |
|---|---|---|---|
| "new york" | "Kennedy st. 1" | "11224" | {"city": "new york", "street": "Kennedy st. 1", "zip": "11224"} |
| "stockholm" | "Drottninggatan 5" | "55667" | {"city": "stockholm", "street": "Drottninggatan 5", "zip": "55667"} |
You could then, as an example, follow up with the frames.project_columns transformer to drop all other columns but address and insert it into a document database.
Compose list of structs
Composes a list of structs from list fields on each row Much like structs.compose_struct but operates on lists and outputs a list of structs at the destination column.
If the list are of differing lengths the transformer will fill the shorter ones with None values.
- transformer: structs.compose_list_of_structs
config:
# map: A mapping of struct field names to column names from which the values should be taken (Mandatory)
map: {
"id": "group_ids",
"technical_name": "group_technical_names",
"display_name": "group_display_names"
}
# outField: The destination field.
outField: groupsExample:
If the configuration above was used on this DataFrame:
| group_ids | group_technical_names | group_display_names |
|---|---|---|
[1, 2] | ["admin", "user"] | ["Administrator"] |
The resulting DataFrame would look like this:
| group_ids | group_technical_names | group_display_names | groups |
|---|---|---|---|
[1, 2] | ["admin", "user"] | ["Administrator"] | [{"id": 1, "technical_name": "admin", "display_name": "Administrator"},{"id": 2, "technical_name": "user", "display_name": None}] |
You could then, as an example, follow up with the frames.project_columns transformer to drop all other columns but groups and insert it into a document database.
Json Path
Extracts a value from a json-, object- or listfield. Strings are interpreted as object keys, numbers are first tried as list indicies, otherwise as object keys
- transformer: structs.jsonpath
config:
inField: field_name
outField: new_field
jsonPath: "path.to.value.0"
# defaultVal: "default_value" (Optional, default empty)Staticfield
Adds a static field to the dataset
- transformer: structs.staticfield
config:
field: new_field
value: "static_value"
# only add the static field if it is missing from the dataset
# default=False
only_add_if_missing: False