The Layer Pipeline

The White Label Data pipeline is responsible for querying and transforming data in advance of mapping that data into a visualization.

The pipeline is specified in your Layer file in the <pipeline> section. The pipeline is comprised of a series of steps that are executed in order. Pipelines can built up over a series of layers. If one layer depends on another layer, the dependent layer’s pipeline is executed first.

The most basic pipeline would simply make a query.

{
    "steps" : [
        {
            "action" : "bigquery_query",
            "query_name" : "my_query_name"
        }
    ]
}

The above example looks for a query with the name my_query_name in a <query> tag and executes it. The result is that a Pandas DataFrame is created in memory and passed to the next step in the pipeline. If there are no further steps, the visualization rendering proceeds to mapping data on the figure.

Sometimes it’s useful to transform data before it is mapped to a visualization. For example:

{
    "steps" : [
        {
            "action" : "bigquery_query",
            "query_name" : "my_query_name"
        },
        {
            "action" : "transform",
            "operation": "format",
            "format" : "<br>{{ column1 }}",
            "output_column_name": "tooltip"
        }
    ]
}

The above creates a new column in the Dataframe, alongside the existing ones, that contains an HTML-formatted tooltip. See all transform options here.

Pipeline Options

Option Value(s) Description
action -looker_query
-snowflake_query
-postgres_query
-bigquery_query
-elasticsearch
-sql_transform
-read_csv
-custom_query
-transform
The type of pipeline step.
query_name A string The name of the query specified in a <query> tag within the combined Layer.
input_dataframe A string The name of the DataFrame to use as an input for transformations. If no input_dataframe is specified, it uses the last DataFrame from a previous step in the pipeline.
csv_filename A string When the action is read_csv, this is the file name or path within the csv folder of your Git repository. The comma-separated file should contain one header row and is used to read data from a flat file into a DataFrame as part of a pipeline step.
input_dataframe A string The name of the DataFrame to use as an input for transformations. If no input_dataframe is specified, it uses the last DataFrame from a previous step in the pipeline.
output_dataframe A string The name to use when creating a new DataFrame. This allows you to have multiple DataFrames in the pipeline context and to map columns from multiple queries to a single visualization. If no name is specified, the DataFrame will be named after the query name.
operation A string When the action is a transform, this specifies the name of the transfor operation. See below.
connection A string Optional. The name of the connection specified in appconfig.json. This is only needed if there are two connections of the same type. For example, if you have two BigQuery connections, you need to specify which one to use for BigQuery query pipeline step.
shared true or false Indicates whether the query is shared and available to multiple visualizations. See Shared Queries.