Advanced Tutorial
This advanced tutorial builds directly on the basic tutorial and continues from the code produced in its final step. If you have not yet completed the basic tutorial, we recommend doing so first, as all examples here assume familiarity with the foundations introduced there.
The focus of this tutorial is on advanced usage, in particular templating your configuration via the YTE template engine, and extending your specifications with more powerful customization options.
Setup
When running the tutorial via GitHub Codespaces YTE should already be installed. If not we can do this by running the following command:
conda install -c conda-forge yte
You can make sure it is installed by running the following command:
yte -h
Fetching new data
We will extend our datasets with more information about each oscar winning movie. In particular we will add one full table per movie containing information about the companies involved in producing this movie. This means we will end up with quite a few tables. To fetch all these we will run the following curl command that downloads and extracts the compressed data:
curl -L https://raw.githubusercontent.com/datavzrd/datavzrd.github.io/main/resources/companies.zip | unzip -d data -
Our data directory should now include a directory named companies
with one csv file per movie named based on its imdbID.
Step 1: Adding the tables to the report
Since our movies dataset contain more than 180 movies, adding one table per movie will result in a lot of manual typing and makes the configuration file large and difficult to maintain.
To avoid this, we will use the YTE template engine to generate the configuration file dynamically.
YTE allows us to use Python expressions within a YAML file while still maintaining a valid YAML syntax.
As our configuration now becomes a template, we should rename it to clearly indicate its purpose.
We can do that using our IDE with right-clicking on the file and renaming it to config.yte.yaml
or running the following command:
mv config.yaml config.yte.yaml
Next up we can add the new tables to the report. We will do that by adding the following section to or config file config.yte.yaml
:
__definitions__:
- import re, pathlib
- tt_numbers = [re.match(r"(tt\d+)_", f.name).group(1) for f in pathlib.Path("data/companies").glob("tt*_companies.csv")]
name: Oscars and movies
default-view: oscars
datasets:
?for movie in tt_numbers:
?f"{movie}":
path: ?f"data/companies/{movie}_companies.csv"
views:
?for movie in tt_numbers:
?f"{movie}":
dataset: ?movie
desc: ?f"All companies involved in the making of {movie}"
To break these changes down let us start with __definitions__:
. This special YTE keyword allows us to define variables that can be used within the template.
In this case we are importing the re
and pathlib
modules and defining a list of movie IDs we parse from the file names located in the data/companies
directory.
Within our datasets
definition we can now use the tt_numbers
variable to generate the table names and paths dynamically for each movie.
Similarly we iterate all movies in the views
definition to add one additional view for each movie.
Note
YTE is capable of a lot more than what is shown in this example. To learn more about it visit the YTE documentation.
To render the template we simply have to call YTE via the command line:
yte < config.yte.yaml > config.yaml
Now look into the generated config.yaml file. We can see that the table names and paths have been generated dynamically for each movie. The configuration is now ready to be used with Datavzrd to generate the report:
datavzrd config.yaml --output example-report --overwrite-output
Have a look at the generated report to see how very few lines added that many datasets and views to the report.
Step 2: Adding a heatmap with a custom color palette and legend
Let us now bring some color into our report by adding a heatmap with a custom color palette. This can be done by explicitly specifying a domain and a range of colors for a column.
Using the legend
keyword, we can also add a legend for the column to the description of our view.
Add the following heatmap
definition to the company_type
column:
views:
?for movie in tt_numbers:
?f"{movie}":
dataset: ?movie
desc: ?f"All companies involved in the making of {movie}"
render-table:
columns:
company_type:
plot:
heatmap:
scale: ordinal
domain: ["distribution", "sales", "production", "specialEffects", "miscellaneous"]
range: ["blue", "green", "red", "yellow", "cyan"]
legend:
title: "Company Type"
Re-render the template using YTE as well as the report using the previous commands and inspect the changes.
Step 3: Linking the detailed views to the main table
Since each movie has its own detailed view about the involved companies it makes sense to link them to the main table. This can be done by adding a links
keyword to dataset definition of our movies
table:
datasets:
movies:
path: data/movies.csv
links:
oscar:
column: Title
table-row: oscars/movie
companies:
column: imdbID
view: "{value}"
After re-runnning YTE and Datvazrd open the movies view in the report and use the linkout in the most right column to jump to the corresponding company view for a movie of your choice.
Step 5: Using a spell
Spells are reusable configuration snippets in Datavzrd. They make report creation easier by letting users define common setups in a modular way. Spells can be loaded from local files or remote URLs directing to the datavzrd-spells repository. Users can also create their own spells and share them with the community by contributing to the repository.
For our report we want to use the boolean spell for the column is_major_studio
.
It is possible to pass different parameters to spells using the with
keyword.
These parameters vary depending on the spell.
For the boolean spell, the parameters are true_value
and false_value
.
These specify the values that the spell should use to represent them as colored boxes with a plus and minus sign.
The url
keyword can either take a remote url to any YTE template holding the spell, a local file path to a YTE template or lastly a pattern that consists out of a version number, spell category and spell name.
For this example we will use the latter:
views:
?for movie in tt_numbers:
?f"{movie}":
dataset: ?movie
desc: ?f"All companies involved in the making of {movie}"
hidden: true
render-table:
columns:
is_major_studio:
spell:
url: v1.4.1/logic/boolean
with:
true_value: "True"
false_value: "False"
company_type:
plot:
heatmap:
scale: ordinal
domain: ["distribution", "sales", "production", "specialEffects", "miscellaneous"]
range: ["blue", "green", "red", "yellow", "cyan"]
legend:
title: "Company Type"
After adding the spell to your YTE template, rerender the template and the reports afterwards and verify that the boolean spell is working as expected.
This concludes the advanced tutorial. For more information on how to use YTE templates, please refer to its documentation. To see what other spells are available, visit the spells documentation. Finally you can also explore publications using Datavzrd reports for more inspiration on what is possible.