DirectML on Arm is here at last, almost

Microsoft’s much-delayed Windows Copilot Runtime is a step closer with the release of a developer preview of the Arm build of its DirectML AI tool. It’s still not production-ready, but it’s now possible to start experimenting with local AI applications using the Copilot+ PC’s Qualcomm Hexagon neural processing units (NPUs).

Bringing AI to the edge requires accelerator hardware, and the Arm-based Copilot+ PCs promised developer access to their built-in 45 TOPs AI accelerator with a mix of bundled AI models and tools to help you run your own. Although the Win App SDK APIs for Microsoft’s Phi Silica model have slipped to its 1.7 release and there’s still no sign of the promised PyTorch tools, this first tranche of tools should help get you coding.

Here come the NPU drivers

At the heart of the DirectML Arm release is a new driver for the Copilot+ PCs’ NPU. This is linked to a new build of the Open Neural Network Exchange (ONNX) runtime, bringing support for several open source models from Hugging Face. Microsoft has created a repository of DirectML-ready models that have been optimized by Qualcomm, ready for use with the new runtime.

But don’t get too excited. This first release has severe bugs that make it almost unusable, and the limited number of supported models restricts the applications you can build.

Delivering DirectML support to the Windows Copilot Runtime and the new Arm Copilot+ PCs is an important milestone. DirectML is part of the DirectX family of APIs that give access to GPUs and NPUs, allowing your C++ and .NET code to take advantage of hardware accelerators. As a result, it’s a foundational component of the Copilot Runtime, offering access to ONNX-formatted models and simplifying application packaging and distribution.

The new Qualcomm drivers need a manual install, downloading from the Qualcomm developer portal. This requires an account, and the sign-up process can take time. Once you’re approved and signed in, download the Windows Arm driver as a zip file. Be sure to download the correct version; there are also Linux and macOS releases. Unzip the downloaded file and run the NPU driver installer. You’ll need to reboot once it’s been installed.

Microsoft is providing new ONNX and DirectML builds for Arm64. Use the latest 1.15.2 build (or higher) as a prerequisite in your code, along with version 1.18 of the DirectML ONNX runtime.

Building C++ Windows Copilot Runtime applications

With this in place, you can start to build your first Windows Copilot Runtime application using the new driver’s DirectML support. One important point to note is that the first group of supported models are mainly computer vision models. That’s perhaps not surprising, as the original AI PC accelerators were focused on image and video processing. It’ll be interesting to watch how the NPU drivers and supported models evolve, as well as abstractions such as DirectML.

It’s worth looking at Microsoft’s existing DirectML samples as they don’t take much work to support the latest driver builds. The most interesting, an image upscaling application using the ESRGAN model, shows how the NPU can be used for photo upscaling or de-noising, much like the GPU-based AI image processing tool used by photo-processing applications such as Lightroom.

Unfortunately, I was unable to get the sample to work. Things seemed promising once I’d cloned the entire DirectML repository. Then issues started to appear.

First it required a stand-alone version of cmake to build out the files required for a build, not the version bundled with Visual Studio 2022. Once that was installed, cmake was available to Visual Studio and allowed the sample to self-configure. Don’t forget to explicitly choose to make an Arm64 build of the application to use the Copilot+ PC NPU.

Although Visual Studio contains its own Git implementation, you will need a stand-alone Git instance in your PATH for cmake to get all the required DirectML headers from GitHub. Here I installed Git for Windows, which I’ve found has a usable mix of command line and GUI tools that work well with Visual Studio.

Much of the underlying tools are still pre-alpha, so you should expect bugs and missing features. These omissions aren’t a big problem, but it would help if the documentation included a full list of prerequisites.

I was now able to compile the code. This delivered a build of the ESRGAN sample app, with a ready-to-use ONNX model. You do need to explicitly call the NPU to run the sample, using the command line. This makes it hard to use the Visual Studio debuggers.

Will it run?

Although the application compiles and runs and correctly detects the NPU, it currently hangs when loading the ONNX model. This appears to be an issue with the NPU drivers, as the app works using DirectML on the CPU. It’ll be interesting to see what future builds or other models bring.

After waiting so long for the first Windows Copilot Runtime tools to arrive, this was a disappointing result, especially as the hang not only locks up the NPU app but also affects tools like Task Manager. The only way out is to reboot your PC. I’m sure that updated drivers will be delivered soon, but for now, this feels like something of a dead end.

From desktop to WebNN

The drivers are intended to support WebNN as well as your own code. WebNN is a developing standard for using ONNX in the browser, allowing small models to run locally rather than use remote servers. Microsoft has been building experimental support for WebNN into its Edge browser, putting it behind a flag in both Dev and Canary Insider builds.

As part of the announcement of DirectML support for Qualcomm’s NPU, Microsoft included instructions for using DirectML with WebNN. Again, this is a somewhat complex process for now, as it involves extracting the DirectML DLL from the NuGet package and manually installing it in the appropriate Edge directory.

One thing to note with the WebNN setup instructions: When you’re running Edge Dev or Canary from the command line, there’s a formatting issue with the instructions on the original blog post. The em dashes at the start of each of the three parameters are, in fact, two separate standard dashes. Edge won’t load the ONNX drivers correctly if you simply copy and paste the command line from the blog.

Sadly, like the sample DirectML applications, WebNN models did not load and run, no matter which browser version I used.

Microsoft announced the Windows Copilot Runtime back in May 2024, and we were expecting the first code in July. It’s a pity this first set of drivers is so flawed. Hopefully, a fixed version will drop soon and we’ll be able to finally run AI code on those 50 TOPs NPUs.

Until then we at least can write and test code via DirectML’s CPU support, ready to switch to NPUs when we have code and models that work. That implies a long wait for a production release of the Qualcomm NPU drivers, as well as support for essential tools such as Olive to help us tune our own models for Copilot+ hardware. For now, let’s hope Microsoft and Qualcomm ship working drivers soon, especially now that AMD has announced its first x64 Copilot+ PCs.

Go to Source

Author: