ML Runtimes Known Issues and Limitations

You might run into some known issues while using ML Runtimes.

DSE-25143 Assembling plots in PBJ R Runtimes does not work

When trying to plot additional content on already existing plots, PBJ R Runtimes throw an error. Plots can only be created using the plot function.

DSE-32839 Extra configuration needed when using Spark in a PBJ Workbench-based R Runtime

When using Spark in R workloads that are running PBJ Workbench Runtimes, the environmental variable R_LIBS_USER must be passed to the Spark executors with the value set to "/home/cdsw/.local/lib/R/<R_VERSION>/library"

E.g. when using sparklyr with a PBJ Workbench R 4.3 Runtime, the correct way to set up a sparklyr connection is:
library(sparklyr)
config <- spark_config()
config$spark.executorEnv.R_LIBS_USER="/home/cdsw/.local/lib/R/4.3/library"
sc <- spark_connect(config = config)        

DSE-27222 Workbench editor for Python Runtimes and PBJ Runtimes cannot parse multiline strings properly

Workbench editor for Python 3.7 Runtime cannot parse multiline strings properly. Trying to evaluate multiline strings in some cases result in a "SyntaxError: EOF while scanning triple-quoted string literal" error message. This has been fixed in Python 3.8 and higher.

Workaround: Transform multiline strings to single line strings in code that is entered into the workbench directly.

Packages can fail to load in a session

When installing R or Python packages in a Session, the kernel might not be able to load the package in the same session, if a previous version of the package or its newly installed dependencies have been loaded in the same Session. Such issues are observable more often in PBJ R Runtimes, which automatically load basic R packages like vctrs, lifecycle, rlang, cli at session startup.

Workaround: Start a new session, import and use the newly installed package there.

Python Runtimes in CML fail to import the setuptools Python library and can fail installing some Python packages

Python Runtimes in CML fail to import the setuptools Python library and therefore can fail installing some Python packages when the library setuptools is present on the Runtime or is installed into the CML project with version 60.0.0 or higher.

Python 3.10 Runtimes from the 2023.05 Runtime release ship with a newer version of setuptools, so customers can run into this issue when they are using that Runtime. Also they can run into this issue when they are using Custom Runtimes that has a newer setuptools library version or when they install a new setuptools version into their project (regardless of what Runtime they use).

Workaround: Set the environmental variable SETUPTOOLS_USE_DISTUTILS=stdlib either on a project level under Project Settings -> Advanced or on a workspace level under Site Administration -> Runtime -> Environment variables.

VIM text editor no longer supported for Coudera provided ML Runtimes

The VIM text editor is no longer supported for Coudera provided ML Runtimes.

DSE-2804 PBJ CUDA Runtimes have wrong metadata version

PBJ NVidia GPU Runtimes Released in version 2023.05.1 had wrong runtime metadata version. Future releases from July, 2023 of CML Public Cloud might not be compatible with these Runtimes.

Workaround: Follow the instructions in TSB-678.

ML Runtimes installation might pull wrong metadata

For ML Runtimes 2023.05.1 only. Check your installation to determine if your ML Runtimes 2023.05 installation pulled the wrong metadata:
  1. Open Runtime Catalog.
  2. Check if newer runtime variants have been downloaded, e.g. ‘JupyterLab - Conda - Tech Preview’.
  3. Open the version list for this Runtime.
  4. Check if any of the versions’ details page shows ‘container.repository.cloudera.com...’ under ‘Runtime Image’ section.
  5. If the versions' Details page shows 'container.repository.cloudera.com..., complete the Workaround steps.

Workaround: This workaround applies for CML Public Cloud 2022.06.1 (CDSW-2.0.32) or CDSW 1.10.1 or higher versions.

Administrator should perform the following steps:

  1. Go to the Runtime Catalog page.
  2. On the right-hand side, open the version list of each Runtime variant.
  3. Open the Details page for the 2023.05.1-b4 versions.
  4. Use the three-dot menu and use the Set to Disabled function for those versions which have container.repository.cloudera.com ... as Runtime Image.
  5. Recheck within 24 hours that the runtimes with docker.repository.cloudera.com have been added.

Regular user should perform the following steps:

  1. If the project does not have ML Runtimes 2023.05 after the previous actions has been taken by the Administrator, no action is required.
  2. If the project does have ML Runtime 2023.05, after the Administrator disables the wrong one, it should display as ‘Disabled’.
  3. When the runtimes with the right metadata has been added automatically to the installation, use the Add Latest button to add the proper runtime(s).
  4. Remove the disabled runtimes from the project.
Active workloads are not affected, as with the wrong versions no workload could be started.

DSE-9818 JupyterLab Conda Tech Preview Runtime

Sessions
When starting a Notebook or a Console for a specific environment, the installed packages will be available and the interpreter used to evaluate the contents of the Notebook or Console will be the one installed in the environment. However, the Conda environment is not "activated" in these sessions, therefore commands like !which python will return with the base Python 3.10 interpreter on the Runtime. The recommended ways to modify a Conda environments or install packages are the following:
  • conda commands must be used with the -n or --name argument to specify the environment, for example conda -n myenv install pandas
  • When installing packages with pip, use the %pip magic to install packages in the active kernel’s environment, for example %pip install pandas
Applications and Jobs
To start an Application or Job, first create a launcher Python script containing the following line: !source activate <conda_env_name> && python <job / application script.py>
When starting the Application or Job, select the launcher script as the "Script".
Models
Models are currently not supported for the Conda Runtime.
Spark
Spark is not supported in JupyterLab Notebooks and Consoles.
Spark workloads are supported in activated Conda environments in JupyterLab Terminals, or in Jobs or Applications.
The CDSW libraries for Python and R are not available for the Conda Runtimes.

DSE-27222 Workbench editor for Python Runtimes and PBJ Runtimes cannot parse multiline strings properly

Workbench editor for Python Runtimes (3.7, 3.8, 3.9) and PBJ Runtimes (3.8, 3.9) cannot parse multiline strings properly. Trying to evaluate multiline strings results in a "SyntaxError: EOF while scanning triple-quoted string literal" error message.

Workaround: Transform multiline strings to single line strings in code that is entered into the workbench directly.

Adding a new ML Runtimes when using a custom root certificate might generate error messages

When trying to add new ML Runtimes, a number of error messages might appear in various places when using a custom root certificate. For example, you might see: "Could not fetch the image metadata" or "certificate signed by unknown authority". This is caused by the runtime-puller pods not having access to the custom root certificate that is in use.

Workaround:

  1. Create a directory at any location on the master node:

    For example:

    mkdir -p /certs/

  2. Copy the full server certificate chain into this folder. It is usually easier to create a single file with all of your certificates (server, intermediate(s), root):
    # copy all certificates into a single file: 
    cat server-cert.pem intermediate.pem root.pem > /certs/cert-chain.crt
  3. (Optional) If you are using a custom docker registry that has its own certificate, you need to copy this certificate chain into this same file:
    cat docker-registry-cert.pem >> /certs/cert-chain.crt
  4. Copy the global CA certificates into this new file:
    # cat /etc/ssl/certs/ca-bundle.crt >> /certs/cert-chain.crt
  5. Edit your deployment of runtime manager and add the new mount.

    Do not delete any existing objects.

    #kubectl edit deployment runtime-manager

  6. Under VolumeMounts, add the following lines. Note that the text is white-space sensitive - use spaces and not tabs.
    - mountPath: /etc/ssl/certs/ca-certificates.crt 
       name: mycert 
       subPath: cert-chain.crt #this should match the new file name created in step 4
                    

    Under Volumes add the following text in the same edit:

    - hostPath: 
       path: /certs/  #this needs to match the folder created in step 1
       type: "" 
    name: mycert
  7. Save your changes:

    wq!

    Once saved, you will receive the message "deployment.apps/runtime-manager edited" and the pod will be restarted with your new changes.

  8. To persist these changes across cluster restarts, use the following Knowledge Base article to create a kubernetes patch file for the runtime-manager deployment: https://community.cloudera.com/t5/Customer/Patching-CDSW-Kubernetes-deployments/ta-p/90241

Cloudera Bug: DSE-20530

DSE-24038 Current Cuda images (e.g.: dock-cuda_version=11.4.1) images do not contain nvvm libraries

The CUDA version (11.4.1) being used to build Nvidia GPU runtimes is not supported by newer Torch versions (1.13+).

Workaround: Use Torch versions up to 1.12.1.

DSE-17126 Starting a worker from Runtime session will create a worker with engine image

This issue is resolved in ML Runtimes 2021.09 when used with the latest ML Workspace versions.

For workers to function properly with ML Runtimes, please use ML Runtimes 2021.09 or later with CML Workspace version of 2.0.22 or later.

Spark Runtime Add-on required for Spark 2 integration with Scala Runtimes

Scala Runtimes on CML require Spark Runtime Addon to enable Spark2 integration. Spark3 is not supported with the Scala Runtime.

DSE-17981 - Disable Scale runtimes in models, experiments and applications runtime selection

Scala Runtimes should not appear as an option for Models, Experiments, and Applications in the user interface. Currently Scala Runtimes only support Session and Jobs.

DSE-17228 Workbench completion broken in R Runtime session

Code completion does not work in the R Runtimes Workbench versions of the ML Runtimes current release.

Workaround: Downgrade to ML Runtimes 2021.02.

This issue is resolved in ML Runtimes 2021.09.

DSE 14447 Some bugs present for R in the legacy engine persist in ML Runtimes

Some bugs that were present for R in the legacy engine persist in ML runtimes.