Skip to content

Guide

There are basically 2 main components in Dagster:

  • Asset, which represents something just like a DB table
  • Job, which is a collection of assets

You might see the term Op and Graph in documentations or tutorials, they're outdated.

Materialize basically means execute/run.

Creating a Workspace

  • The way our project structure works, one workspace represents one workflow/pipeline.
  • To create a workspace, create a definitions.py, then add it to workspace.yaml.

img.png img_1.png

  • Run main.py, then go the Deployment tab. You'll see the demo workspace.

img_2.png

Asset

  • You can create an asset by defining a function decorated by @asset, then passing it to Definitions().

img_3.png

  • Press reload to update Dagster.

img_4.png

  • Go to Assets tab, then click default. You'll see the assets available in the workspace.

img_5.png img_6.png

  • To connect an asset, simply add the asset's name to the function's parameters.

img_7.png img_8.png

  • Clicking Materialize all would result in the execution of all assets. The execution order would be based on the DAG.
  • Clicking on yearly_report, you'll be able to see on the right the output path. By default, it gets pickled.

img_9.png

IO Manager

  • Instead of pickle files, you can change how an assets is saved by setting io_manager_def.
  • PydanticIOManager is not a built-in Dagster feature.

img_10.png

  • Materialize yearly_report. You'll notice that the path now points to a JSON file.

img_11.png img_12.png

Partition

  • A partition represent a sub-set of an asset.
  • Dagster has some built-ins such as MonthlyPartitionsDefinition, WeeklyPartitionsDefinition, etc.

img_15.png

  • You can partition an asset by assigning a value to partitions_def.

img_16.png

  • If you try to materialize the asset, you will encounter an additional screen asking which partition to materialize.

img_17.png

  • You can access the partition info via context.partition_time_window. Since we materialized 2023, you'll see that the start is 2023. Adjust the code accordingly to filter out data.

img_18.png

Job

  • You can create a job by using define_asset_job, then adding it to definitions.py.

img_13.png

  • A job is basically a collection of assets. Running a job would materialize all assets associated to it.

img_14.png

Scheduled Job

  • If you want to run a scheduled job instead, use build_schedule_from_partitioned_job.

img_20.png img_19.png