Skip to content
microsoft_azure_logo-1
Microsoft Azure

Machine Learning - deployment anytime, anywhere

Utilise your on-prem and cloud footprint with Microsoft Azure Machine Learning and Azure Arc 

Introduction

Machine learning (ML) has many applications for organisations of all sizes - and the possibilities are constantly expanding. It can help uncover insights into business data, enhance the user experience for customers, and reduce business risk. And while ML can pose unique challenges for organisations, technologies from Microsoft can help you overcome those challenges and harness the benefits of ML.

Azure Machine Learning can help your data team rapidly build and train models and operationalise them at scale with MLOps. It also lets your data team innovate on an open and flexible platform, all while building reliable ML solutions that protect data and models.

Azure Arc is a set of technologies that extends Azure management and services to any infrastructure. It enables you to build and deploy cloud-native apps and ML models anywhere, at scale. It also provides central visibility, operations, and compliance, allowing organisations like yours to run Azure services like Azure Machine Learning anywhere.

Azure Machine Learning integrates with Azure Arc to give you Azure Arc-enabled ML. With Azure Arc-enabled ML capabilities, organisations like yours can use Azure Machine Learning anywhere across one or more clouds, on-premises, or the edge.

PII and data sovereignty: Drug discovery

Multi-cloud: Medical imaging 

Regulatory compliance: Consumer banking

ML on the edge: Renewable energy

Microsoft case study: Wolters Kluwer Health

 

The assurances and the challenges of ML

ML can bring unprecedented benefits to organisations that embrace it. It can help identify patterns or structures within both your structured and unstructured data, helping you to identify new opportunities. It can deepen your data mining capabilities to find new relationships - and to improve its abilities over time. It can also help reduce business risk by identifying new patterns and tactics to head off fraud.

The challenges to realising all of the benefits of ML are two-fold: the operations that run ML and the data that powers it. ML operations can pose challenges to enterprise IT organisations because traditional IT management tools are not well-suited for managing ML. At the same time, the data that fuels ML brings unique challenges in terms of quantity and potential sensitivity that can require entirely new processes to meet them. Azure Arc-enabled ML integrates services and technologies from Microsoft that can supply your organisation with the specialised capabilities necessary to realise the benefits of ML.

 

The unique tools within ML

ML models have unique characteristics that require specific tools, technologies, and practices to operationalise at scale. Not only must organisations manage the creation of software - the ML models - but the data used to create those models must also be engineered as well. These pipelines must link up with workflows already in place for solution development.

Once trained and tested, ML models must also be put into production so that they have access to all their dependencies. Once deployed, the performance of ML models is expected to drift over time as the data used on those models changes. Organisations must continuously monitor their ML models while in production and retrain, repackage, and redeploy them as necessary. Moreover, oftentimes these operations will need to be carried out close to the data for reasons of compliance, latency, or data gravity. For all of these reasons, traditional software deployment practices such as DevOps are insufficient to operationalise ML at scale.

Operationalising a single ML model can be challenging; deploying multiple models in production at an enterprise scale can be daunting. ML models have multiple software dependencies, and they often need specialised infrastructure for optimal performance. And as models drift or other business metrics for model performance change, ML models must periodically be retrained on new data.

 

Set up Azure Arc ML 

 

Azure Arc-enabled ML combines the capabilities of Azure Arc and Azure Machine Learning. Azure Arc brings Azure management and services to any infrastructure, providing you with a single pane of glass to manage your entire Kubernetes infrastructure on-premises and across clouds. Azure Machine Learning is an enterprise-grade service that enables your data scientists and developers to innovate rapidly using familiar tools and your organisation to operationalise ML at scale with MLOps. Azure Machine Learning integrates with Azure Arc to give you Azure Arc-enabled ML. Azure Arc-enabled ML capabilities enable your organisation to run enterprise-grade ML workloads on Kubernetes anywhere with the same easy-to-use Azure Machine Learning experience found in the cloud with centralised visibility, operations, and compliance.

 

 

Capabilities

 

Azure Arc-enabled ML enables Azure Machine Learning to provide management for MLOps, rapid innovation for ML, and efficient ML operations on-premises, in the cloud, and at the edge.

 

Unified management

 

Azure Arc provides single-pane management through the Azure portal. All ML models, no matter where they’re built, can be stored and tracked in a central location in Azure Machine Learning for sharing, reproducibility, and audit compliance.

ML is predicated on reproducibility and reusability. This extends from ML models to the whole ML lifecycle. Regardless of its application, ML follows a consistent lifecycle, which is supported at each stage by Azure Machine Learning:

  1. Build - Data scientists create their initial models and then clean and prepare datasets to be usable by those models.
  2. Train - With the data prepared, data scientists use it to train models.
  3. Validate - Once the models are trained, ML engineers create containers for their models to run in with all their dependencies, and the data team evaluates how model performance compares to their business goals.
  4. Deploy - The data team deploys the models to scale out inference clusters in the cloud or on-premises to be used on new data.
  5. Monitor - As the models are used in production, the data team monitors them for degraded performance and retrains them as necessary to keep them relevant to business operations.

ML lifecycle

Figure 1. The entire ML lifecycle is managed at all levels by Azure Arc-enabled ML

 

With whole-of-lifecycle management and tracking, data science can become tenable for organisations of all sizes. With an efficient means of versioning and monitoring the entire ML lifecycle, organisations can head off irreproducible experiments and training runs. This can decrease the time for the whole ML lifecycle and the chance of errors. Moreover, with automated MLOps, Azure Arc-enabled ML enables organisations to unlock the power of infrastructure based on Kubernetes to run enterprise-grade MLOps.

 

Rapid innovation

 

Azure Machine Learning supports open-source tools and frameworks for ML model training and inferencing out of the box. Your data scientists can use familiar frameworks like PyTorch, TensorFlow, or scikit-learn, and they can choose the development tools that best meet their needs, including popular integrated development environments (IDEs), Microsoft Visual Studio Code, Jupyter Notebooks, and command-line interfaces (CLIs), or programming languages like Python and R.

With Azure Machine Learning, your organisation can access state-of-the-art responsible ML capabilities to understand, control, and help protect your data, models, and processes. This insight can help you explain model behaviour during training and inferencing and build for fairness by detecting and mitigating model bias. Transparency such as this can help your organisation meet regulatory compliance with its ML operations, as can the capability of Azure Machine Learning to preserve data privacy throughout the ML lifecycle with differential privacy techniques and using confidential computing to help secure ML assets. And Azure Machine Learning automatically maintains audit trails, tracks lineage, and uses model datasheets to enable accountability.

Azure Machine Learning also enables you to auto-scale compute resources for ML. Typical ML-model development starts with development or experimentation on a small amount of data, at which stage a single on-premises instance or a cloud-based virtual machine (VM) is sufficient. However, as you scale up your model training on larger datasets or perform distributed training, Azure Machine Learning creates clusters that auto-scale with each training run to provide sufficient compute resources.

 

 

IT and ML personas and experiences

 

The IT operations and data-science teams are integral to the broader ML team. Azure Arc-enabled ML enables the ML team to use of the advantages of Kubernetes - such as containerisation and separation of roles - without the ML infrastructure team or data-science professionals needing to learn Kubernetes. Separating roles makes collaboration more straightforward, as it can be based on different skill sets.

 

IT operators

 

IT operators are responsible for setting up the computing resources to be used by the data science team. To get a Kubernetes cluster up and running for Azure Arc-enabled ML for data-science professionals, IT operators need to complete the following tasks (see Figure 3):

  1. Ensure that the Kubernetes cluster is up and running.
  2. Connect to Azure with Azure Arc.
  3. Deploy the Azure Machine Learning extension to the Kubernetes cluster to support the necessary scenario:
    a. Enable the cluster for model training only.
    b. Enable the cluster for model deployment only (administrators can set up an endpoint with a public/private IP address and Transport Layer Security [TLS] termination proxy).
    c. Enable the cluster for both training and model deployment.
  4. Create compute targets in Azure Machine Learning workspaces to which data-science professionals submit workloads, and attach the Kubernetes cluster to the Azure Machine Learning workspace.

Azure Arc-enable ML

Figure 2. Tasks to deploy and start a Kubernetes cluster for Azure Arc-enabled ML

 

Administrators can control which engineers and data scientists have access - and they can grant access to entire clusters or selected parts of clusters. ML models and their dependencies can be packaged in discrete Kubernetes containers for deployment without data scientists needing to know Kubernetes.

 

Data-science professionals

 

With Azure Arc-enabled ML, data-science professionals can submit ML workloads using compute targets created by IT operators:

  • Data scientists can select the tools they like to use. Azure Arc-enabled ML provides different tools, such as a CLI, the Azure Machine Learning Studio user interface (UI), or the Python SDK (programmatic). The data scientists can then direct those tools to use the compute targets set up by IT operators.
  • Technical note: To make this process even easier for data scientists and practitioners, they can use Git to clone examples from the Azure Machine Learning repository and then change the compute targets in the examples to their desired compute targets to quickly get up and running: github.com/Azure/azureml-examples.
  • Data scientists determine which features are supported in Azure Arc-enabled ML to use. (For a list comparing the Azure Machine Learning built-in features and features supported in Azure Arc-enabled ML, see Appendix B.)

 

Typical Azure Arc-enabled ML use patterns

 

Table 1 summarises the typical use patterns for Azure Arc-enabled ML, including where the training data resides in each use pattern, the motivation driving each use pattern, and how the use pattern is realised using Azure Arc-enabled ML.

Table 1. Summary of the typical use patterns for Azure Arc-enabled ML

Use pattern Location of data Motivation Implementation in Azure Arc-enabled ML
Train models in the cloud, deploy models on premises
In the cloud
Make use of cloud computing, either because of elastic computing needs or special hardware in the cloud, such as a graphics processing unit (GPU)

Model must be deployed on-premises because of security, compliance, or latency requirements
Fully automated MLOps, including training and model deployment steps transitioning seamlessly from cloud to on-premises deployment

Repeatable, with all assets tracked properly

Model to be retrained when necessary, with on-premises model deployments updated automatically when retrained
Train models on-premises, deploy models in the cloud
On-premises
Data must remain on-premises due to data-residency requirements

Deploy model in the cloud for global services access or for compute elasticity for scale and throughout
Read the above
ML with network isolation (in cloud or on-premises
In the cloud or on-premises
Secure sensitive data or proprietary IP, such as ML models
In the cloud:Azure Kubernetes Service (AKS) private cluster with a private link to Azure Machine Learning workspaces; models deployed with private IP address

On-premises: Outbound proxy connection server and Azure ExpressRoute/virtual private cloud connection and private links to Azure Machine Learning workspaces; models deployed with private IP adresses

Note that with the network-isolation use pattern, multiple workspaces share the same Kubernetes cluster. Each business or project team has its own workspace but uses the same Kubernetes cluster to save compute costs and minimise IT management and maintenance overhead.

Real-world examples

Regulatory compliance: Consumer banking

Challenge

Banks frequently need to run financial risk-modelling predictions for things like consumer and small-business loans, and they rely heavily on ML models to make lending decisions. While banks’ risk models might be trained in data centres or the cloud, they often need to run ML on-premises to meet their regulatory requirements. Banks must also be able to transparently show regulators and other stakeholders how lending decisions were made through the ML models.

Solution

The hybrid-cloud capabilities of Azure Machine Learning help address these needs. For example, bank risk models can be trained in one or multiple clouds or at a bank data centre and then deployed on-premises to a bank branch. Moreover, the bank can use centralized MLOps to manage the lifecycle of its risk models regardless of where they are trained and deployed. Not only can this help a bank meet its regulatory requirements regarding customer data, but it can also enable the deployment of ML models even to bank branches with limited internet connectivity. (For a breakdown of some of the details of using Azure Arc-enabled ML this way, see the Train models in the cloud, deploy models on-premises and Train and deploy models on-premises use patterns in Table 1.)

PII and data sovereignty: Drug discovery

Challenge

Pharmaceutical companies must both protect sensitive patient data in drug trials and also conform to data sovereignty issues as to where trial data can be analysed. Beyond patient data, drug-discovery firms also want to secure proprietary data concerning candidate drugs and delivery procedures. There can be numerous circumstances in which pharmaceutical companies must isolate their data, models, and networks. At the same time, data scientists with these firms need to be able to use the same tools for data exploration and model building, regardless of where datasets reside.

Solution

Because Azure Arc-enabled ML enables models - including existing models trained outside of Azure Arc-enabled ML - to be deployed to almost any environment (be it on-premises, on the network edge, or in one or multiple clouds), pharmaceutical firms can train, tune, and deploy their models anywhere, even in isolated, locked-down networks. Doing so can help secure patients' personally identifiable information (PII) and corporate IP during drug research, discovery, and trials. What is more, administrators can still provide comprehensive MLOps to these isolated deployments to maintain efficiency along with security. (For a breakdown of some of the details of using Azure Arc-enabled ML this way, see the ML in network isolation (in the cloud or on-premises) and Train and deploy models on-premises use patterns in Table 1.)

Multi-cloud: Medical imaging

Challenge

Healthcare organisations commonly need multi-cloud support, given their inter-organisational collaboration structures. Patient data can be scattered between private practices, hospitals, and even different hospital systems - and consolidation within the healthcare industry can result in multiple legacy cloud infrastructures within a single organization. At the same time, healthcare organisations need to secure patient PII and share only the necessary details for patient care.

Solution

Using Azure Arc-enabled ML, healthcare organisations can efficiently perform multi-cloud ML. Models can be trained where patient data is, which can be vastly more efficient in cases using large data files, such as medical imaging. Models can then be deployed to other locations. Organisations can infer medical-images ML models using spare compute on-premises or at edge locations and burst to cloud capacity as the need arises. Moreover, ML administrators can still track all these models to automate the process of deploying retrained models in the future through centralized and automated MLOps. (For a breakdown of some of the details of using Azure Arc-enabled ML this way, see the Train models in the cloud, deploy models on-premises and Train and deploy models in one or more clouds use patterns in Table 1.)

ML on the edge: Renewable-energy infrastructure

Challenge

Wind farms are a great renewable energy source, but their power output fluctuates greatly depending on wind speed and other environmental factors. Wind farms generate petabytes worth of data continuously on site, and that can be unfeasible to move to the cloud. In addition, ML models can help forecast energy capacity, make power commitments, and facilitate predictive maintenance, but wind farms can have limited or intermittent internet connectivity.

Solution

With Azure Arc–enabled ML, models can be trained on premises without data movement. Wind energy firms don’t have to worry about the economics of shipping data. Models trained at a data centre or in one or multiple clouds can be deployed to the wind farm and used even without internet connectivity. And automation and MLOps with Azure Arc–enabled ML help to ensure that as data at the wind farm changes, the models retrain themselves and maintain accuracy. (For a breakdown of some of the details of using Azure Arc–enabled ML this way, see the Train models in the cloud, deploy models on-premises and Train and deploy models on-premises use patterns in Table 1.)

 

Case Study: Wolters Kluwer Health

 

The Health Division of Wolters Kluwer provides trusted clinical technology and evidence-based solutions that engage clinicians, patients, researchers, students, and the next generation of healthcare providers. The Wolters Kluwer Health data-science team always strives to adopt technological advancements in building ML models rapidly and efficiently to serve multiple business units within Wolters Kluwer Health. Wolters Kluwer Health enacted modern management for its ML pipeline by re-architecting its ML practice using the primary components of Azure Arc-enabled ML: Azure Machine Learning, Azure Kubernetes Service, and Azure Arc.

Challenge

Isolated training workspaces

The Wolters Kluwer Health data science team previously used standalone Azure VMs to handle training datasets and ML models. Because manually managing and changing VM tiers based on the varied compute and memory demands of different training runs was cumbersome and impractical, the Wolters Kluwer Health development pipeline used standardised high-compute, large-memory tier VMs, which was costly. In addition, because training jobs were often left to run overnight and there was no practical way to track job completion and automatically shut down idle VMs, Wolters Kluwer Health was forced to pay for unused VM capacity consistently.

Beyond VM expenses, Wolters Kluwer Health had no efficient means of versioning and monitoring its whole model training process and its input/output (I/O) artefacts, which resulted in irreproducible experiments and training runs. The lack of versioned datasets, environments, and models increased development time and multiplied the chance of errors in models. Additionally, whenever the data-science team received updated datasets to retrain models, the entire ML lifecycle had to be rerun manually. This significantly slowed model evaluation and tuning. The Wolters Kluwer Health ML practice was not scalable.

Solution

Azure Machine Learning on Kubernetes

Wolters Kluwer Health got a handle on its ML pipeline by shifting to Azure Machine Learning. The data science team can now run quick experiments and ad-hoc tasks from within an Azure Machine Learning workspace, eliminating the inefficiencies of performing these tasks on standalone Azure VMs. Wolters Kluwer Health can use workflows in Azure Machine Learning to handle model training and tracking and automate model retraining. And now, rather than having to set up entire environments again to retrain ML models, Wolters Kluwer Health can simply plug in the new data to efficiently handle retraining models.

Wolters Kluwer standardised its infrastructure on Azure Kubernetes Service (AKS), and the data-science team can use a dynamically adaptable cluster model for its model training workflows. And because AKS instance deployment is handled by Wolters Kluwer IT, the data-science team was able to reduce its self-management overhead. Finally, Wolters Kluwer Health increased ML compute efficiency by using Azure Arc to enable inter-subscription sharing of AKS clusters across Azure Machine Learning workspaces.

Wolters Kluwer Health can now track ML assets, with full versioning of all dependent I/O artefacts, along with monitoring of and reproducibility for all training runs. AKS helps Wolters Kluwer Health share the centrally managed compute resources for multiple purposes across the organization, including its ML workflows. And Azure Arc helps Wolters Kluwer Health use the same Kubernetes cluster across multiple ML workspaces provisioned in multiple subscriptions. All of these benefits help the Wolters Kluwer Health data team achieve faster, better results for the many teams across Wolters Kluwer Health that they support.

 

Enterprise-grade ML for all-sized organisations

 

ML poses challenges both because of the unique characteristics of ML models and the surrounding data that powers it. For combinations of legal, logistical, and technical reasons, data cannot always be moved where it would be convenient for training or running ML models. And traditional development and administrative tools are ill-suited to managed ML models, particularly as they proliferate and move between clouds and on-premises locations. Even IT’s go-to DevOps solution, Kubernetes, has to be used in new and different ways for ML.

By combining the flexibility and efficiency of Azure Arc with the robust MLOps and operations capabilities of Azure Machine Learning, Azure Arc-enabled ML can help organisations overcome the unique challenges and complexities of ML. Azure Arc-enabled ML simplifies using Kubernetes in ML development and production environments. It also automates many aspects of the MLOps pipeline so that administrators do not have to spend their time wrestling with sprawling ML model deployments and all their dependencies or with rickety conglomerations of open-source tools. Azure Arc-enabled ML also brings the hybrid capabilities of Azure Arc to ML, meaning that models can be trained in one or more clouds or on-premises and then deployed either on-premises or to the cloud. This flexibility means that Azure Arc-enabled ML can simplify ML for almost every organisation in almost every set of circumstances.

To see Azure Arc-enabled ML in action and learn more about what it can do for your organisation, contact Microsoft to set up a proof of concept.

 

 

Appendix A: Azure Arc-enabled ML architecture

 

Azure Arc-enabled ML architecture

 

 

Appendix B: Built-inn Azure Machine Learning capabilities and Azure-arc enabled ML

Feature name Features built into Azure Machine Learning Features supported in Azure Arc-enabled ML
Train with AzureML Python SDK v1.x
Yes
Yes
Train with AzureML CLI v2
Yes
Yes
Distributed training - MPI, tf.distributed, torch.distributed
Yes
Yes
Train with AzureML pipeline SDK v1.x
Yes
Yes
Train with designer pipeline
Yes
Yes
Deploy with SDK/CLI v1.x
Yes
No
Deploy with CLI v2
Yes
Yes
AKS and Azure Machine Learning workspace cross-subscription support
No
Yes (via Azure Arc)
Multiple workspaces share the same AKS cluster
No
Yes
Node pool targeting for workload
No
Yes
Training with NFA data access
No
Yes
Training with HDFS data access
No
In development
MLflow model integration
Yes
In development
MLOps (GitHub action)
Yes
Yes
Curated image support
Yes
Yes
Safe production rollout with green/blue deployment
Yes
Yes
ML with network isolation
Yes
Yes
Use managed identity to access Azure resources
Yes
Yes
Model-deployment bring-your-own-compute(BYOC) support
Yes
Yes

 

Appendix C: Azure Arc and Machine Learning

 

Azure Arc is a set of technologies that extends Azure management and services to any infrastructure. Its capabilities bring three main benefits to Azure Arc-enabled ML. Discover how machine learning can optimise your business through DSP machine learning services.

You might also be interested in...

Why Microsoft Azure is the future for building critical infrastructures

Managing and improving critical infrastructure, including utilities, transportation services, urban innovation and digital services is one of the most important functions of many organisations.

Microsoft Azure for hybrid and multi-cloud environments

This page seeks to demonstrate best practices, including warning signs and the essential actions that any business must take to enable hybrid environments.

DSP-Logo-2019-White-600px

Want to see what we can do for your business? 

Get in touch with our specialists today and start your modernisation journey with industry experts.