---
title: "Operators"
page-category: "searchable"
---

The page documents operators configured via Conductor. For operators included with Airflow, read the Airflow [documentation on hooks and operators](https://airflow.apache.org/docs/apache-airflow/stable/operators-and-hooks-ref.html)

## Using Conductor Operators
Conductor operators prefill configuration and route inputs, outputs, and compute jobs to conductor-managed AWS resources automatically.
They are provided to DAGs via the `conductor.core.Conductor` class.
Utilize them within your DAG files as follows:

```python
from project_config import project
from conductor.core import Conductor

c = Conductor(
    "my-dag",
    project=project,
    schedule_interval="@daily",
)

redshift_op = c.operators.RedshiftOperator(...)
redshift_op.outputs.s3_url
```

## Configured Operators
The below docs detail usage of Conductor Configured Operators.
- Args: required arguments.
- Optional Args: optional, aka keyward arguments or kwargs.
- Outputs: return values, accessed via the `outputs` attribute.
- Required Config: Conductor configuration required in the selected `EnvironmentConfig` required for usage of the operator.
- Optional Config: Conductor configuration that is not necessary for operator usage, but can modify or augment the behavior.

> **Note:** `task_id` is an argument to all Conductor operators. 
{:.note}

### BadgerExportOperator
First export jsonlines data from a previous step and then you can use this operator to export the [BadgerDB](https://dgraph.io/docs/badger/) format along with a manifest of the model information and exported datafiles.  See the [BLOKS: User Guide]( https://wiki.xarth.tv/pages/viewpage.action?pageId=245992560) for how to serve this dataset using the BLOKS service.

##### Args:
- `input_s3_url: str` S3 url of the input data.
- `key_field: str` Column name in the input data to select as a key.
- `dataset_name: str` Name of the model used to generate the input data or the dataset.
- `version: str` Version of the model used to generate the input data or the dataset version.

##### Optional Args:
- `instance_type: str = "ml.m5.4xlarge"` Instance type to use for the export.
- `volume_size_in_gb: int = "128"` Size of the instance volume for the export in GB.

##### Outputs:
- `data_s3_url: str` Output S3 path of the exported data.
- `manifest_s3_url: str` Output S3 path of the manifest.

##### Optional Config:
- `VpcConfig` Sets the VPC for the Processing job.
- `SageMakerConfig` Override Conductor's auto-generated execution role with a custom one.


### RedshiftOperator
Executes a SQL query against a Redshift cluster, with options to unload to the default S3 path.
See the [Redshift Unload Docs](https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) for more information.

##### Args:
- `query: str` SQL query to run.

##### Optional Args:
- `unload: bool = False` Whether or not to unload the results of the query to the default S3 path.
- `unload_prefix: str = ""` Prefix to append to the default S3 path before the unloaded files.
For example: base path `s3://my-project.staging/master/my-dag/unload-task` vs passing `unload_prefix={{ts}}` a timestamp macro `s3://my-project.staging/master/my-dag/unload-task/{{ts}}`
- `parallel_unload: bool = True` If set, uses Redshift's parallel unload. This is much faster, but leads to empty files in S3 if an unload worker's shard contains no rows.
- `unload_format: str = "PARQUET"` Format to unload. 
- `overwrite: bool = False` If set, unload overwrites existing data.

##### Outputs:
- `s3_url: Optional[str]` The output S3 prefix if unload was used.

##### Required Config:
- `RedshiftConfig` Additionally, `unload_role` must be set to use the unload functionality.

### SageMakerProcessingOperator
Executes a Sagemaker Processing job with the provided entrypoint and arguments.

##### Args:
- `entrypoint: Callable` Some callable function or method imported from your project.

##### Optional Args:
- `config: Dict = None` Configuration to override or augment the conductor defaults, see [create_processing_job docs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_processing_job) for more info.
Override is done via a shallow dictionary merge.
- `**kwargs` Any additional args will be sent to the provided `entrypoint` when the container is called, i.e. `entrypoint(**kwargs)

##### Optional Config:
- `VpcConfig` Sets the VPC for the Processing job.
- `SageMakerConfig` Override Conductor's auto-generated execution role with a custom one.

### SageMakerTrainingOperator
Executes a Sagemaker Training job with the provided model class.

##### Args:
- `model_cls: conductor.types.model.Model` Model class implementing the Conductor model interface.

##### Optional Args:
- `config: Dict = None` Configuration to override or augment the conductor defaults, see [create_training_job docs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_training_job) for more info.
Override is done via a shallow dictionary merge.

##### Outputs:
- `s3_url: str` S3 output path for the trained model.

##### Optional Config:
- `VpcConfig` Sets the VPC for the Processing job.
- `SageMakerConfig` Override Conductor's auto-generated execution role with a custom one

### SageMakerModelOperator
Creates a SageMaker model.

##### Args:
- `model_name: str` Name of the model. Will be appended with the environment name (i.e. `staging`) and a timestamp, to avoid model name collisions.
- `model_cls: conductor.types.model.Model` Model class that was used to train this model.
- `model_s3_path: str` Path in S3 where the serialized model artifact is stored.

##### Optional Args:
- `config: Dict = None` Configuration to override or augment the conductor defaults, see [create_model docs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model) for more info.
Override is done via a shallow dictionary merge.

##### Outputs:
- `model_name: str` Name of the model modified as above.

##### Optional Config:
- `VpcConfig` Sets the VPC for the Processing job.
- `SageMakerConfig` Override Conductor's auto-generated execution role with a custom one.

### SageMakerTransformOperator
Batch infer with a SageMaker model on a given set of data.

##### Args:
- `model_name: str` Name of the model to use.

##### Optional Args:
- `config: Dict = None` Configuration to override or augment the conductor defaults, see [create_transform_job docs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_transform_job) for more info.
Override is done via a shallow dictionary merge.

##### Outputs:
- `output_s3_path: str` Output S3 path of the transformed data.


### SageMakerEndpointOperator
Creates a Sagemaker Endpoint Config and deploys it as a SageMaker Endpoint.
If an endpoint of the same name already exists, updates the endpoint instead.

If an endpoint of the same name exists in a failed state, that endpoint must first be deleted manually before this operator will succeed.

##### Args:
- `model_name: str` Name of the model to use.
- `endpoint_name: str` Name of the endpoint.

##### Optional Args:
- `initial_instance_count: int = 1` Number of instances to start up the endpoint with.
- `instance_type: str = "ml.t2.medium"` Type of AWS machine to create the endpoint with.
- `config: Dict = None` Configuration to override or augment the conductor defaults, see [create_endpoint_config docs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config) for more info.

##### Outputs:
- `endpoint_config_name: str` Name of the generated endpoint config.

### SageMakerMonitoringBaselineOperator

Generates SageMaker baseline statistics and constraints (JSON files) from the provided baseline dataset.
See [statistics.json](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-byoc-statistics.html)
and [constraints.json](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-byoc-constraints.html)
for file schemas.

##### Args:
- `baseline_dataset_uri: str` S3 path of the baseline dataset.
- `baseline_dataset_format: Dict` Expected format:
    ```python
    {
      'csv': {
            # Whether the csv dataset to baseline and monitor has a header.
            'header': True | False,
            
            # The position of the output columns.
            'output_columns_position': 'START' | 'END' 
        }
    }
    ```
    or
    ```python
    {
        'json': {
            # Whether the file should be read as a json object per line.
            'lines': True | False
        }
    }
    ```
    Example: 
    ```python
    baseline_dataset_format = {"csv": {"header": True, "output_columns_position": "START"}}
    ```
    Can be programmatically generated with [DatasetFormat](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.dataset_format.DatasetFormat) using SageMaker API.
        
##### Optional Args
- `instance_count: int = 1` Number of instances used for calculating baseline.
- `instance_type: str = 'ml.m5.large'` Type of EC2 instance for calculating baseline.
- `volume_size_in_gb: int = 30` Size in GB of the EBS volume to use for storing data during computing.
- `config: Dict = None` Configuration to override or augment the conductor defaults. Expected format:
    ```python
    {
        "ModelMonitorConfig": {
            "role": str,
            "instance_count": int,
            "instance_type": str,
            "volume_size_in_gb": int,
        },
        "BaselineConfig": {
            "baseline_dataset": str,
            "dataset_format": dict,
            "output_s3_uri": str,
        }
    }
    ```
    See [DefaultModelMonitor constructor](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.DefaultModelMonitor) 
  for parameter descriptions and more parameters under *ModelMonitorConfig* 
  and [suggest_baseline function](https://sagemaker.readthedocs.io/en/stable/api/inference/model_monitor.html#sagemaker.model_monitor.model_monitoring.DefaultModelMonitor.suggest_baseline)
  for parameter descriptions and more parameters under *BaselineConfig*.

##### Outputs:
- `statistics_s3_url: str` Output S3 path for statistics.json.
- `constraints_s3_url: str` Output S3 path for constraints.json.

### SageMakerMonitoringScheduleOperator
Creates a Sagemaker monitoring schedule for a SageMaker endpoint.
If a monitoring schedule of the same name already exists, updates the monitoring schedule instead.

##### Args:
- `endpoint_name: str` Name of the endpoint to monitor.
- `statistics_json_url: str` S3 path for statistics.json file.
- `constraints_json_url: str` S3 path for constraints.json file.
##### Optional Args:
- `schedule_expression: str = 'cron(0 * ? * * *)'` Default to run hourly. A cron expression that describes frequency of the monitoring schedule.
  > **Warning:** SageMaker Model Monitor only allows [a selection of all cron expressions](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-schedule-expression.html).
{:.danger}
- `report_s3_url: str` S3 path for violation reports folder.
- `schedule_name: str` Default to '{endpoint_name}-{task_id}-default-monitor'.
  'schedule_name' stays the same with repeated runs of this operator, which means this operator will update monitoring schedule during reruns.

- `instance_count: int = 1` Number of compute instances.
- `instance_type: str = 'ml.m5.large'` Type of compute instance.
- `volume_size_in_gb: int = 30` The size of the storage volume, in gigabytes.
- `config: Dict = None` Configuration to override or augment the conductor defaults, see [create_monitoring_schedule_docs](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_monitoring_schedule) for more info.
##### Outputs:
- `schedule_name: str` Name of the created/updated monitoring schedule.
- `report_s3_url: str` S3 path for violation reports folder.
