Machine Learning - deployment anytime, anywhere

Introduction

Machine learning (ML) has many applications for organisations of all sizes - and the possibilities are constantly expanding. It can help uncover insights into business data, enhance the user experience for customers, and reduce business risk. And while ML can pose unique challenges for organisations, technologies from Microsoft can help you overcome those challenges and harness the benefits of ML.

Azure Machine Learning can help your data team rapidly build and train models and operationalise them at scale with MLOps. It also lets your data team innovate on an open and flexible platform, all while building reliable ML solutions that protect data and models.

Azure Arc is a set of technologies that extends Azure management and services to any infrastructure. It enables you to build and deploy cloud-native apps and ML models anywhere, at scale. It also provides central visibility, operations, and compliance, allowing organisations like yours to run Azure services like Azure Machine Learning anywhere.

Azure Machine Learning integrates with Azure Arc to give you Azure Arc-enabled ML. With Azure Arc-enabled ML capabilities, organisations like yours can use Azure Machine Learning anywhere across one or more clouds, on-premises, or the edge.

The assurances and the challenges of ML

ML can bring unprecedented benefits to organisations that embrace it. It can help identify patterns or structures within both your structured and unstructured data, helping you to identify new opportunities. It can deepen your data mining capabilities to find new relationships - and to improve its abilities over time. It can also help reduce business risk by identifying new patterns and tactics to head off fraud.

The challenges to realising all of the benefits of ML are two-fold: the operations that run ML and the data that powers it. ML operations can pose challenges to enterprise IT organisations because traditional IT management tools are not well-suited for managing ML. At the same time, the data that fuels ML brings unique challenges in terms of quantity and potential sensitivity that can require entirely new processes to meet them. Azure Arc-enabled ML integrates services and technologies from Microsoft that can supply your organisation with the specialised capabilities necessary to realise the benefits of ML.

The unique tools within ML

ML models have unique characteristics that require specific tools, technologies, and practices to operationalise at scale. Not only must organisations manage the creation of software - the ML models - but the data used to create those models must also be engineered as well. These pipelines must link up with workflows already in place for solution development.

Once trained and tested, ML models must also be put into production so that they have access to all their dependencies. Once deployed, the performance of ML models is expected to drift over time as the data used on those models changes. Organisations must continuously monitor their ML models while in production and retrain, repackage, and redeploy them as necessary. Moreover, oftentimes these operations will need to be carried out close to the data for reasons of compliance, latency, or data gravity. For all of these reasons, traditional software deployment practices such as DevOps are insufficient to operationalise ML at scale.

Operationalising a single ML model can be challenging; deploying multiple models in production at an enterprise scale can be daunting. ML models have multiple software dependencies, and they often need specialised infrastructure for optimal performance. And as models drift or other business metrics for model performance change, ML models must periodically be retrained on new data.

Set up Azure Arc ML

Azure Arc-enabled ML combines the capabilities of Azure Arc and Azure Machine Learning. Azure Arc brings Azure management and services to any infrastructure, providing you with a single pane of glass to manage your entire Kubernetes infrastructure on-premises and across clouds. Azure Machine Learning is an enterprise-grade service that enables your data scientists and developers to innovate rapidly using familiar tools and your organisation to operationalise ML at scale with MLOps. Azure Machine Learning integrates with Azure Arc to give you Azure Arc-enabled ML. Azure Arc-enabled ML capabilities enable your organisation to run enterprise-grade ML workloads on Kubernetes anywhere with the same easy-to-use Azure Machine Learning experience found in the cloud with centralised visibility, operations, and compliance.

Capabilities

Azure Arc-enabled ML enables Azure Machine Learning to provide management for MLOps, rapid innovation for ML, and efficient ML operations on-premises, in the cloud, and at the edge.

Unified management

Azure Arc provides single-pane management through the Azure portal. All ML models, no matter where they’re built, can be stored and tracked in a central location in Azure Machine Learning for sharing, reproducibility, and audit compliance.

ML is predicated on reproducibility and reusability. This extends from ML models to the whole ML lifecycle. Regardless of its application, ML follows a consistent lifecycle, which is supported at each stage by Azure Machine Learning:

Build - Data scientists create their initial models and then clean and prepare datasets to be usable by those models.
Train - With the data prepared, data scientists use it to train models.
Validate - Once the models are trained, ML engineers create containers for their models to run in with all their dependencies, and the data team evaluates how model performance compares to their business goals.
Deploy - The data team deploys the models to scale out inference clusters in the cloud or on-premises to be used on new data.
Monitor - As the models are used in production, the data team monitors them for degraded performance and retrains them as necessary to keep them relevant to business operations.

ML lifecycle

Figure 1. The entire ML lifecycle is managed at all levels by Azure Arc-enabled ML

With whole-of-lifecycle management and tracking, data science can become tenable for organisations of all sizes. With an efficient means of versioning and monitoring the entire ML lifecycle, organisations can head off irreproducible experiments and training runs. This can decrease the time for the whole ML lifecycle and the chance of errors. Moreover, with automated MLOps, Azure Arc-enabled ML enables organisations to unlock the power of infrastructure based on Kubernetes to run enterprise-grade MLOps.

Rapid innovation

Azure Machine Learning supports open-source tools and frameworks for ML model training and inferencing out of the box. Your data scientists can use familiar frameworks like PyTorch, TensorFlow, or scikit-learn, and they can choose the development tools that best meet their needs, including popular integrated development environments (IDEs), Microsoft Visual Studio Code, Jupyter Notebooks, and command-line interfaces (CLIs), or programming languages like Python and R.

With Azure Machine Learning, your organisation can access state-of-the-art responsible ML capabilities to understand, control, and help protect your data, models, and processes. This insight can help you explain model behaviour during training and inferencing and build for fairness by detecting and mitigating model bias. Transparency such as this can help your organisation meet regulatory compliance with its ML operations, as can the capability of Azure Machine Learning to preserve data privacy throughout the ML lifecycle with differential privacy techniques and using confidential computing to help secure ML assets. And Azure Machine Learning automatically maintains audit trails, tracks lineage, and uses model datasheets to enable accountability.

Azure Machine Learning also enables you to auto-scale compute resources for ML. Typical ML-model development starts with development or experimentation on a small amount of data, at which stage a single on-premises instance or a cloud-based virtual machine (VM) is sufficient. However, as you scale up your model training on larger datasets or perform distributed training, Azure Machine Learning creates clusters that auto-scale with each training run to provide sufficient compute resources.

IT and ML personas and experiences

The IT operations and data-science teams are integral to the broader ML team. Azure Arc-enabled ML enables the ML team to use of the advantages of Kubernetes - such as containerisation and separation of roles - without the ML infrastructure team or data-science professionals needing to learn Kubernetes. Separating roles makes collaboration more straightforward, as it can be based on different skill sets.

IT operators

IT operators are responsible for setting up the computing resources to be used by the data science team. To get a Kubernetes cluster up and running for Azure Arc-enabled ML for data-science professionals, IT operators need to complete the following tasks (see Figure 3):

Ensure that the Kubernetes cluster is up and running.
Connect to Azure with Azure Arc.
Deploy the Azure Machine Learning extension to the Kubernetes cluster to support the necessary scenario:
a. Enable the cluster for model training only.
b. Enable the cluster for model deployment only (administrators can set up an endpoint with a public/private IP address and Transport Layer Security [TLS] termination proxy).
c. Enable the cluster for both training and model deployment.
Create compute targets in Azure Machine Learning workspaces to which data-science professionals submit workloads, and attach the Kubernetes cluster to the Azure Machine Learning workspace.

Figure 2. Tasks to deploy and start a Kubernetes cluster for Azure Arc-enabled ML

Administrators can control which engineers and data scientists have access - and they can grant access to entire clusters or selected parts of clusters. ML models and their dependencies can be packaged in discrete Kubernetes containers for deployment without data scientists needing to know Kubernetes.

Data-science professionals

With Azure Arc-enabled ML, data-science professionals can submit ML workloads using compute targets created by IT operators:

Data scientists can select the tools they like to use. Azure Arc-enabled ML provides different tools, such as a CLI, the Azure Machine Learning Studio user interface (UI), or the Python SDK (programmatic). The data scientists can then direct those tools to use the compute targets set up by IT operators.
Technical note: To make this process even easier for data scientists and practitioners, they can use Git to clone examples from the Azure Machine Learning repository and then change the compute targets in the examples to their desired compute targets to quickly get up and running: github.com/Azure/azureml-examples.
Data scientists determine which features are supported in Azure Arc-enabled ML to use. (For a list comparing the Azure Machine Learning built-in features and features supported in Azure Arc-enabled ML, see Appendix B.)

Use pattern	Location of data	Motivation	Implementation in Azure Arc-enabled ML
Train models in the cloud, deploy models on premises	In the cloud	Make use of cloud computing, either because of elastic computing needs or special hardware in the cloud, such as a graphics processing unit (GPU) Model must be deployed on-premises because of security, compliance, or latency requirements	Fully automated MLOps, including training and model deployment steps transitioning seamlessly from cloud to on-premises deployment Repeatable, with all assets tracked properly Model to be retrained when necessary, with on-premises model deployments updated automatically when retrained
Train models on-premises, deploy models in the cloud	On-premises	Data must remain on-premises due to data-residency requirements Deploy model in the cloud for global services access or for compute elasticity for scale and throughout	Read the above
ML with network isolation (in cloud or on-premises	In the cloud or on-premises	Secure sensitive data or proprietary IP, such as ML models	In the cloud:Azure Kubernetes Service (AKS) private cluster with a private link to Azure Machine Learning workspaces; models deployed with private IP address On-premises: Outbound proxy connection server and Azure ExpressRoute/virtual private cloud connection and private links to Azure Machine Learning workspaces; models deployed with private IP adresses

Challenge

Banks frequently need to run financial risk-modelling predictions for things like consumer and small-business loans, and they rely heavily on ML models to make lending decisions. While banks’ risk models might be trained in data centres or the cloud, they often need to run ML on-premises to meet their regulatory requirements. Banks must also be able to transparently show regulators and other stakeholders how lending decisions were made through the ML models.

Solution

The hybrid-cloud capabilities of Azure Machine Learning help address these needs. For example, bank risk models can be trained in one or multiple clouds or at a bank data centre and then deployed on-premises to a bank branch. Moreover, the bank can use centralized MLOps to manage the lifecycle of its risk models regardless of where they are trained and deployed. Not only can this help a bank meet its regulatory requirements regarding customer data, but it can also enable the deployment of ML models even to bank branches with limited internet connectivity. (For a breakdown of some of the details of using Azure Arc-enabled ML this way, see the Train models in the cloud, deploy models on-premises and Train and deploy models on-premises use patterns in Table 1.)

Challenge

Pharmaceutical companies must both protect sensitive patient data in drug trials and also conform to data sovereignty issues as to where trial data can be analysed. Beyond patient data, drug-discovery firms also want to secure proprietary data concerning candidate drugs and delivery procedures. There can be numerous circumstances in which pharmaceutical companies must isolate their data, models, and networks. At the same time, data scientists with these firms need to be able to use the same tools for data exploration and model building, regardless of where datasets reside.

Solution

Because Azure Arc-enabled ML enables models - including existing models trained outside of Azure Arc-enabled ML - to be deployed to almost any environment (be it on-premises, on the network edge, or in one or multiple clouds), pharmaceutical firms can train, tune, and deploy their models anywhere, even in isolated, locked-down networks. Doing so can help secure patients' personally identifiable information (PII) and corporate IP during drug research, discovery, and trials. What is more, administrators can still provide comprehensive MLOps to these isolated deployments to maintain efficiency along with security. (For a breakdown of some of the details of using Azure Arc-enabled ML this way, see the ML in network isolation (in the cloud or on-premises) and Train and deploy models on-premises use patterns in Table 1.)

Challenge

Healthcare organisations commonly need multi-cloud support, given their inter-organisational collaboration structures. Patient data can be scattered between private practices, hospitals, and even different hospital systems - and consolidation within the healthcare industry can result in multiple legacy cloud infrastructures within a single organization. At the same time, healthcare organisations need to secure patient PII and share only the necessary details for patient care.

Solution

Using Azure Arc-enabled ML, healthcare organisations can efficiently perform multi-cloud ML. Models can be trained where patient data is, which can be vastly more efficient in cases using large data files, such as medical imaging. Models can then be deployed to other locations. Organisations can infer medical-images ML models using spare compute on-premises or at edge locations and burst to cloud capacity as the need arises. Moreover, ML administrators can still track all these models to automate the process of deploying retrained models in the future through centralized and automated MLOps. (For a breakdown of some of the details of using Azure Arc-enabled ML this way, see the Train models in the cloud, deploy models on-premises and Train and deploy models in one or more clouds use patterns in Table 1.)

Challenge

Wind farms are a great renewable energy source, but their power output fluctuates greatly depending on wind speed and other environmental factors. Wind farms generate petabytes worth of data continuously on site, and that can be unfeasible to move to the cloud. In addition, ML models can help forecast energy capacity, make power commitments, and facilitate predictive maintenance, but wind farms can have limited or intermittent internet connectivity.

Solution

With Azure Arc–enabled ML, models can be trained on premises without data movement. Wind energy firms don’t have to worry about the economics of shipping data. Models trained at a data centre or in one or multiple clouds can be deployed to the wind farm and used even without internet connectivity. And automation and MLOps with Azure Arc–enabled ML help to ensure that as data at the wind farm changes, the models retrain themselves and maintain accuracy. (For a breakdown of some of the details of using Azure Arc–enabled ML this way, see the Train models in the cloud, deploy models on-premises and Train and deploy models on-premises use patterns in Table 1.)

Case Study: Wolters Kluwer Health

The Health Division of Wolters Kluwer provides trusted clinical technology and evidence-based solutions that engage clinicians, patients, researchers, students, and the next generation of healthcare providers. The Wolters Kluwer Health data-science team always strives to adopt technological advancements in building ML models rapidly and efficiently to serve multiple business units within Wolters Kluwer Health. Wolters Kluwer Health enacted modern management for its ML pipeline by re-architecting its ML practice using the primary components of Azure Arc-enabled ML: Azure Machine Learning, Azure Kubernetes Service, and Azure Arc.

Challenge

Isolated training workspaces

The Wolters Kluwer Health data science team previously used standalone Azure VMs to handle training datasets and ML models. Because manually managing and changing VM tiers based on the varied compute and memory demands of different training runs was cumbersome and impractical, the Wolters Kluwer Health development pipeline used standardised high-compute, large-memory tier VMs, which was costly. In addition, because training jobs were often left to run overnight and there was no practical way to track job completion and automatically shut down idle VMs, Wolters Kluwer Health was forced to pay for unused VM capacity consistently.

Beyond VM expenses, Wolters Kluwer Health had no efficient means of versioning and monitoring its whole model training process and its input/output (I/O) artefacts, which resulted in irreproducible experiments and training runs. The lack of versioned datasets, environments, and models increased development time and multiplied the chance of errors in models. Additionally, whenever the data-science team received updated datasets to retrain models, the entire ML lifecycle had to be rerun manually. This significantly slowed model evaluation and tuning. The Wolters Kluwer Health ML practice was not scalable.

Solution

Azure Machine Learning on Kubernetes

Wolters Kluwer Health got a handle on its ML pipeline by shifting to Azure Machine Learning. The data science team can now run quick experiments and ad-hoc tasks from within an Azure Machine Learning workspace, eliminating the inefficiencies of performing these tasks on standalone Azure VMs. Wolters Kluwer Health can use workflows in Azure Machine Learning to handle model training and tracking and automate model retraining. And now, rather than having to set up entire environments again to retrain ML models, Wolters Kluwer Health can simply plug in the new data to efficiently handle retraining models.

Wolters Kluwer standardised its infrastructure on Azure Kubernetes Service (AKS), and the data-science team can use a dynamically adaptable cluster model for its model training workflows. And because AKS instance deployment is handled by Wolters Kluwer IT, the data-science team was able to reduce its self-management overhead. Finally, Wolters Kluwer Health increased ML compute efficiency by using Azure Arc to enable inter-subscription sharing of AKS clusters across Azure Machine Learning workspaces.

Wolters Kluwer Health can now track ML assets, with full versioning of all dependent I/O artefacts, along with monitoring of and reproducibility for all training runs. AKS helps Wolters Kluwer Health share the centrally managed compute resources for multiple purposes across the organization, including its ML workflows. And Azure Arc helps Wolters Kluwer Health use the same Kubernetes cluster across multiple ML workspaces provisioned in multiple subscriptions. All of these benefits help the Wolters Kluwer Health data team achieve faster, better results for the many teams across Wolters Kluwer Health that they support.

Enterprise-grade ML for all-sized organisations

ML poses challenges both because of the unique characteristics of ML models and the surrounding data that powers it. For combinations of legal, logistical, and technical reasons, data cannot always be moved where it would be convenient for training or running ML models. And traditional development and administrative tools are ill-suited to managed ML models, particularly as they proliferate and move between clouds and on-premises locations. Even IT’s go-to DevOps solution, Kubernetes, has to be used in new and different ways for ML.

By combining the flexibility and efficiency of Azure Arc with the robust MLOps and operations capabilities of Azure Machine Learning, Azure Arc-enabled ML can help organisations overcome the unique challenges and complexities of ML. Azure Arc-enabled ML simplifies using Kubernetes in ML development and production environments. It also automates many aspects of the MLOps pipeline so that administrators do not have to spend their time wrestling with sprawling ML model deployments and all their dependencies or with rickety conglomerations of open-source tools. Azure Arc-enabled ML also brings the hybrid capabilities of Azure Arc to ML, meaning that models can be trained in one or more clouds or on-premises and then deployed either on-premises or to the cloud. This flexibility means that Azure Arc-enabled ML can simplify ML for almost every organisation in almost every set of circumstances.

To see Azure Arc-enabled ML in action and learn more about what it can do for your organisation, contact Microsoft to set up a proof of concept.

Feature name	Features built into Azure Machine Learning	Features supported in Azure Arc-enabled ML
Train with AzureML Python SDK v1.x	Yes	Yes
Train with AzureML CLI v2	Yes	Yes
Distributed training - MPI, tf.distributed, torch.distributed	Yes	Yes
Train with AzureML pipeline SDK v1.x	Yes	Yes
Train with designer pipeline	Yes	Yes
Deploy with SDK/CLI v1.x	Yes	No
Deploy with CLI v2	Yes	Yes
AKS and Azure Machine Learning workspace cross-subscription support	No	Yes (via Azure Arc)
Multiple workspaces share the same AKS cluster	No	Yes
Node pool targeting for workload	No	Yes
Training with NFA data access	No	Yes
Training with HDFS data access	No	In development
MLflow model integration	Yes	In development
MLOps (GitHub action)	Yes	Yes
Curated image support	Yes	Yes
Safe production rollout with green/blue deployment	Yes	Yes
ML with network isolation	Yes	Yes
Use managed identity to access Azure resources	Yes	Yes
Model-deployment bring-your-own-compute(BYOC) support	Yes	Yes

Getting to know us

Data Services

Development Services

Application Services

Cloud Services

Database Managed Services

Consulting Services

Technology Solutions

Recent case study:

Oracle EBS Cloud Deployment

DSP wins 5 awards at Oracle CloudWorld 2024.

Resources

Machine Learning - deployment anytime, anywhere

Utilise your on-prem and cloud footprint with Microsoft Azure Machine Learning and Azure Arc

Contents

Introduction

PII and data sovereignty: Drug discovery

Multi-cloud: Medical imaging

Regulatory compliance: Consumer banking

ML on the edge: Renewable energy

Microsoft case study: Wolters Kluwer Health

The assurances and the challenges of ML

The unique tools within ML

Set up Azure Arc ML

Capabilities

Unified management

Rapid innovation

IT and ML personas and experiences

IT operators

Data-science professionals

Typical Azure Arc-enabled ML use patterns

Real-world examples

Regulatory compliance: Consumer banking

Challenge

Solution

PII and data sovereignty: Drug discovery

Challenge

Solution

Multi-cloud: Medical imaging

Challenge

Solution

ML on the edge: Renewable-energy infrastructure

Challenge

Solution

Case Study: Wolters Kluwer Health

Challenge

Isolated training workspaces

Solution

Azure Machine Learning on Kubernetes

Enterprise-grade ML for all-sized organisations

Appendix A: Azure Arc-enabled ML architecture

Appendix B: Built-inn Azure Machine Learning capabilities and Azure-arc enabled ML

Appendix C: Azure Arc and Machine Learning

You might also be interested in...

Why Microsoft Azure is the future for building critical infrastructures

Microsoft Azure for hybrid and multi-cloud environments

Want to see what we can do for your business?