Use private python package in Azure Machine Learning Service.

Azure Machine learning service provides different features to operationalize the machine learning pipeline. One of the important feature is to have the AMLS environment configured with the workspace. In this post, I am going to talk about the AMLS environment, and how can we ship our private python packages(.whl) into our AMLS workspace.

What is AMLS Environment?

AMLS environment contains the collection of the python packages which are backed by pre-cached Docker images. An AMLS environment is associated with the AMLS workspace and can be used in any of the AMLS experiments. There are few types of environments available in AMLS.

  1. We can use the existing curated environment ( example: AzureMl-minimal, AzureML-Tutorial) where few selected python packages are already pre-installed.
  2. We can create the custom environment using the requirements.txt file. We can manage the package details and version in the requirement.txt file.

3) We can use the prebuilt docker images while creating the environment.

More information about the environment here in this documentation.

What is a private python package, and how is it useful?

In general, we install the python packages from the PyPI public repository using the pip command. For example, [pip install pandas]. We generally refer to it as global python packages.

There might be some requirement where we would like to reuse some utilities or methods in different python files in our project. In those scenarios, it is really helpful if we can import those methods into our working file and reuse the methods. To achieve that we create the private package with all the utility/helper files.

For example. let’s say we have training and scoring scripts. There is a requirement that we need to do the pre-processing of the data like model encoding, remove duplicate rows, missing value imputation in both the training and scoring scripts. In that case, we can package those preprocessing functions into a private package, and refer to them in the scoring and training scripts.

More Information in this document

How does the private package works in AMLS ?

Let’s have a small example to demonstrate the steps which are required to use the private packages in the AMLS workspace.

Prerequisites:

  1. AMLS workspace created. [the name used in the demo: amlsworkspace]
  2. AMLS cluster created [the cluster name is used in the demo: amlscompute]
  3. VS code installed in the system.

Steps:

  1. Create a private package.
  2. Create an AMLS Environment, and associate the private package with the environment.
  3. Submit the experiment in AMLS computes.

The Overall Folder structure is as below. Create the folder structure in your repository.

│   TriggerExperiment.py
config.json

├───locallib
│ │ README.md
│ │ setup.py
│ │
│ └───sharedlocallib
│ sharedfile.py
│ __init__.py

└───locallibtest
test.py

Step 1: Create the Private Package:

The files and folders under the locallib folder are for private package creation. Once the package is created, the sharedfile.py can be imported as

from sharedlocallib.sharedfile import print_statement

setup.py : It will package the subdirectories which has __init__.py file.

import sys
#import argparse
import setuptools
import os
import json
import subprocess
import sys
def main(version):
with open("README.md", "r", encoding="utf-8") as fh:
long_description = fh.read()
setuptools.setup(
name="amlsdemolocalwheel", # This will contain in the wheel file name.
version=version,
author="Samarendra Panda",
author_email="author@example.com",
description="A small example package",
long_description=long_description,
long_description_content_type="text/markdown",
url="https://github.com/pypa/sampleproject",
project_urls={
"Bug Tracker": "https://github.com/pypa/sampleproject/issues",
},
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
# package_dir={"": "src"},
packages=setuptools.find_packages(),
python_requires=">=3.6",
)
if __name__ == "__main__":
if "--version" in sys.argv:

dx = [i+1 for i,x in enumerate(sys.argv) if x == '--version'][0]
version = sys.argv[dx]
sys.argv.remove("--version")
sys.argv.remove(version)

main(version)

sharedfile.py: We just have a print statement in this file.

def print_statement(message):
print(message)

ReadMe.md and __init__.py: Just keep this file as a placeholder, and we do not need any content in it.

Build the private package:

Now the required files are ready. We can build the package. Open the terminal and use the below command.

PS C:\Users\......\repos\AMLS> cd .\locallib\PS C:\Users\.......\repos\AMLS\locallib> python setup.py --version "0.0.1" sdist bdist_wheel

Once the package is built, we will be able to see few folders are created as below.

Install the private package in the local environment

Now we have .whl file, we can use the pip command to install it in the local environment

PS C:\Users\.......\repos\AMLS\locallib> pip install .\dist\amlsdemolocalwheel-0.0.1-py3-none-any.whl

Once it is installed, we will be able to use the print method from the package file to print any message.

>>> from sharedlocallib.sharedfile import print_statement
>>> print_statement("Hi, welcome there!")
Hi, welcome there!
>>>

Step 2: Create an AMLS Environment and associate the private package with the environment.

The next step is to create the AMLS environment. Before proceeding, create a config.json in the root to connect to your AMLS workspace.

config.json

{
"subscription_id": "179ef35bXXXXXX2f389ea3f880",
"resource_group": "AMLSDemo",
"workspace_name": "amlsworkspace"
}

locallibtest\test.py: This file will be run in the AMLS compute. This file just refers to the sharedfile in the sharedlibrary and prints a message.

from sharedlocallib.sharedfile import print_statement
print_statement("Hi, welcome there!")

TriggerExperiment.py: In this file, we basically create AMLS Environment and create and submit the experiment.

from azureml.core import Workspace, Experiment, Datastore, Environment, ScriptRunConfig
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.pipeline.core import Pipeline, PipelineData
from azureml.core.authentication import InteractiveLoginAuthentication
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core import ScriptRunConfig
import json
import os
import path
# config values decalration
tenant_id = "<<Provide your tenant ID/ directory ID here >>"
cluster_name = "amlscompute"
wheel_path = os.path.join(os.curdir, "locallib\\dist\\amlsdemolocalwheel-0.0.1-py3-none-any.whl")
# This section does the authentication to the AMLS workspace.
interactive_auth = InteractiveLoginAuthentication(tenant_id = tenant_id)
ws = Workspace.from_config()
# Upload the whl to the AMLS default blob storage.
whl_url = Environment.add_private_pip_wheel(workspace=ws,file_path = wheel_path, exist_ok=True)
# Environment creation.
myenv = Environment(name="myenv")
conda_dep = CondaDependencies()
# Associate the private package with the environment
conda_dep.add_pip_package(whl_url)
myenv.python.conda_dependencies=conda_dep
# create the experiment
experiment = Experiment(workspace = ws, name = "demo")
# Get the cluster object
cluster = ComputeTarget(workspace=ws, name=cluster_name)
# create the script config class where we associate the experiment with the environment
src = ScriptRunConfig(source_directory="locallibtest", script="test.py", compute_target= cluster, environment=myenv)
#submit the experiment.r
run = experiment.submit(src)

3. Submit the experiment in AMLS compute.

Once the above script is executed, the experiment gets created in the AMLS. In the experiment, we can see the private package is being installed properly and it returns the expected value.

Hope this helps to create your ML pipeline!

Reference Links

Use private Python packages with Azure Machine Learning

Create & use software environments in Azure Machine Learning

I work in Microsoft as a Data & AI consultant. love building solution in Azure Data & AI platform. https://www.linkedin.com/in/samarendra-panda-1a19b573