Tuning and training machine learning models with Argo Workflows

by Nicolas Guinoiseau | 10.1.2024

Cover image for Training Machine Learning Models with Argo Workflows

Intro

This post is the second of a series on Argo Workflows, and a direct follow-up of the blog post Getting started with Training Models using Argo Workflows, in which we introduced Argo Workflows and how we use it for our Machine Learning experiments. I highly recommend reading it, as we will mention much of its content here! Here we'll focus on how easy it is to re-use WorkflowTemplates.

Why you want to train and tune your Machine Learning models regularly?

Improve model accuracy Machine learning models often require updates to improve their accuracy over time. As new data becomes available, the model may need to be retrained to incorporate the new information and adjust its predictions.
Avoid overfitting Regularly tuning a machine learning model can help prevent overfitting, which occurs when a model is too complex and begins to be influenced in its predictions by the noise in the data instead of the underlying patterns. Tuning can help simplify the model and improve its generalization ability.
Keep up with changing data Regularly tuning and training machine learning models can help ensure that the model stays up-to-date with changes in the data and continues to provide accurate predictions.
Adapt to changing business requirements As business requirements change, the machine learning model may need to be updated to reflect these changes. Regular tuning and training can help ensure that the model continues to meet the needs of the business.

Regularly tuning and training machine learning models is critical to ensure their continued accuracy and relevance. By keeping models up-to-date with changing data and business requirements, your organisation can make better decisions and achieve better results.

A little recap

Quick recap of the previous blog post. There we went through the subjects of:

What is a WorkflowTemplate
How to use WorkflowTemplates in a Workflow
We made a Workflow capable of evaluating several hyperparameter sets.

A close to real life example of tuning and training

You strive to find the best hyperparameter set for your case to train the best machine-learning model. For instance, it can be which hyperparameter set maximizes the ROC-AUC. This process is we refered earlier as model tuning, and there are several possible techniques to get the best hyperparameter sets. For simplicity we will apply the "brute force" grid technique, which consists of trying several hyperparameter sets and picking the best-performing one.

A UML diagram of how to evaluate hyperparameter sets for a machine learning model

As you can see, it is very similar to the Workflow described in the previous blog post. The last steps of the Workflows are different because while the previous Workflow was meant for results observation, we now want to decide which hyperparameter set we should use to train a model with all the available training data. The great news is that even though this new Workflow has more steps than the previous one, we only have a few manifests to add to our project. The steps from "get-data" to "evaluate" are already declared, so we only need to create two new WorkflowTemplates: "select-best-performing-hypereparameter-set" and "train-model".

The WorkflowTemplate manifest

Only the last two WorkflowTemplates had to be created to create this new Workflow!

Your keen eye should have noticed that this time we are not describing a Workflow, but a WorkflowTemplate. Both syntaxes are almost the same, and having our Workflow as a WorkflowTemplate allows us to use it in various cases. For example, we can refer to it in an Argo Events Sensor (we shall talk about Argo Events in a future blog-post!) as well as in a CronWorkflow.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: transform-train-model-template
  labels:
    workflow.argoproj.io/archive-strategy: "false"
spec:
  imagePullSecrets:
    - name: yoursecret
  serviceAccountName: argo
  parallelism: 16
  entrypoint: transform-train-model
  templates:
    - name: transform-train-model
      steps:
        - - name: get-data  # We are using the same WorkflowTemplates as for our first Workflow
            templateRef:
              name: get-data-template
              template: get-data
        - - name: transform
            templateRef:
              name: transform-template
              template: transform
            arguments:
              artifacts:
                - name: training-data
                  from: "{{steps.get-data.artifacts.raw-training-data}}"
          - name: hyper-params
            templateRef:
              name: hyper-params-template
              template: hyper-params
        - - name: evaluate
            templateRef:
              name: evaluate-template
              template: evaluate
            arguments:
              artifacts:
                - name: training-data
                  from: "{{steps.transform.outputs.artifacts.training-data}}"
              parameters:
                - name: params
                  value: "{{item}}"
            withParam: "{{steps.hyper-params.outputs.parameters.parameter-set}}"
        - - name: pick-best-hyperparameter-set  # Only the last two WorkflowTemplates are new
            templateRef:
              name: pick-best-hyperparameter-set-template
              template: pick-best-hyperparameter-set
            arguments:
              artifacts:
                - name: metrics
                  value: "{{steps.evaluate.outputs.artifacts.metrics}}"
        - - name: train-model
            templateRef:
              name: train-model-template
              template: train-model
            arguments:
              artifacts:
                - name: training-data
                  from: "{{steps.get-data.artifacts.raw-training-data}}"

CronWorkflow

Just as Kubernetes, Argo Workflows offers the possibility to run Workflows on schedule with the kind CronWorkflow. Since we declared our workflow as a WorkflowTemplate we can simply refer to it as shown in this code block.

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: daily-model-training-cron-workflow
spec:
  schedule: "0 0 * * *"
  concurrencyPolicy: "Forbid"
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  startingDeadlineSeconds: 0
  workflowSpec:
    serviceAccountName: argo
    arguments:
      parameters:
        - name: work-dir
          value: '/app'
        - name: image-train
          value: your-ai-train-image:latest
        - name: image-utilities
          value: your-ai-utilities-image:latest
        - name: image-get-data
          value: your-ai-etl-image:latest
    ttlStrategy:
      secondsAfterSuccess: 60
      secondsAfterCompletion: 60
    workflowTemplateRef:
      name: transform-train-model-template

Multi use

Now we have a workflow capable of tuning and training a machine learning model! 🚀

Model training often proves itself very energy-consuming. At the same time, machine learning models can become obsolete fairly fast, so we want to train new models as new training data becomes available?

In most cases, it is fair to assume that a small addition of training data has little chance of bringing significant changes to the performance of a given hyperparameter set. Considering this, we suggest storing a set of best hyperparameter sets for periodic model tuning rather than regularly evaluating thousands or tens of thousands of models during each tuning session.
Whichever option you think best suits your case, from the Workflow point of view, only the amount of evaluated hyperparameter sets changes: the same Workflow can be used either way! We can simply pass various input arguments to make the difference.

I hope you had a pleasant read and more importantly that you learned something useful

Nicolas Guinoiseau

Data Scientist

nicolas@distrikt.fi

Getting started with Training Models using Argo Workflows

by | 3.1.2024

Argo Events: Event-Driven Workflow Automation

by | 24.1.2024

Related cases

Cuuma Communications

Automated lead generation

STRATEGY

DESIGN

SOFTWARE

CONTENT

CASE STUDY

Caruna

Marketplace for a greener future

DESIGN

SOFTWARE

CASE STUDY

Got a project in mind that we could assist with?

Get in touch

Tuning and training machine learning models with Argo Workflows

Intro

Why you want to train and tune your Machine Learning models regularly?

A little recap

A close to real life example of tuning and training

The WorkflowTemplate manifest

CronWorkflow

Multi use

I hope you had a pleasant read and more importantly that you learned something useful

Nicolas Guinoiseau

Data Scientist

nicolas@distrikt.fi

Related posts

Getting started with Training Models using Argo Workflows

Argo Events: Event-Driven Workflow Automation

Related cases

Automated lead generation

STRATEGY

DESIGN

SOFTWARE

CONTENT

CASE STUDY

Marketplace for a greener future

DESIGN

SOFTWARE

CASE STUDY

Got a project in mind that we could assist with?