Skip to content

Admiration Tech News

  • Home
  • Cyber Attacks
  • Data Breaches
  • Vulnerability
  • Exploits
  • Crack Tutorials
  • Programming
  • Tools

Jumpstart AI Workflows With Kubernetes AI Toolchain Operator

Posted on August 4, 2024 - August 4, 2024 by Maq Verma

Generative AI is booming as industries and innovators look for ways to transform digital experiences. From AI chatbots to language translation tools, people are interacting with a common set of AI/ML models, known as language models in everyday scenarios. As a result, new language models have developed rapidly in size, performance and use cases in recent years.

As an application developer, you may want to integrate a high-performance model by simply making a REST call and pointing your app to an inferencing endpoint for options like Falcon, Mistral, Phi and other models. Just like that, you’ve unlocked the doors to the AI kingdom and all kinds of intelligent applications.

Open source language models are a cost-effective way to experiment with AI, and Kubernetes has emerged as the open source platform best suited for scaling and automating AI/ML applications without compromising the security of user and company data.

“Let’s make Kubernetes the engine of AI transformation,” said Jorge Palma, Microsoft principal product manager lead. He gave the keynote at KubeCon Europe 2024, where AI use cases were discussed everywhere. Palma talked about the number of developers he’s met who are deploying models locally in their own infrastructure, putting them in containers and using Kubernetes clusters to host them.

“Container images are a great format for models. They’re easy to distribute,” Palma told KubeCon. “Then you can deploy them to Kubernetes and leverage all the nice primitives and abstractions that it gives you — for example, managing that heterogeneous infrastructure, and at scale.”

Containers also help you avoid the annoying “but it runs fine on my machine” issue. They’re portable, so your models run consistently across environments. They simplify version control to better maintain iterations of your model as you fine-tune for performance improvements. Containers provide resource isolation, so you can run different AI projects without mixing up components. And, of course, running containers in Kubernetes clusters makes it easy to scale out — a crucial factor when working with large models.

“If you aren’t going to use Kubernetes, what are you going to do?” asked Ishaan Sehgal, a Microsoft software engineer. He is a contributor to the Kubernetes AI Toolchain Operator (KAITO) project and has helped develop its major components to simplify AI deployment on a given cluster. KAITO is a Kubernetes operator and open source project developed at Microsoft that runs in your cluster and automates the deployment of large AI models.

As Sehgal pointed out, Kubernetes gives you the scale and resiliency you need when running AI workloads. Otherwise, if a virtual machine (VM) fails or your inferencing endpoint goes down, you must attach another node and set everything up again. “The resiliency aspect, the data management — Kubernetes is great for running AI workloads, for those reasons,” he said.

Kubernetes Makes It Easier, KAITO Takes It Further

Kubernetes makes it easier to scale out AI models, but it’s not exactly easy. In my article “Bring your AI/ML workloads to Kubernetes and leverage KAITO,” I highlight some of the hurdles that developers face with this process. For example, just getting started is complicated. Without prior experience, you might need several weeks to correctly set up your environment. Downloading and storing the large model weights, upwards of 200 GB in size, is just the beginning. There are storage and loading time requirements for model files. Then you need to efficiently containerize your models and host them — choosing the right GPU size for your model while keeping costs in mind. And there are troubleshooting pesky quota limits on compute hardware.

Using KAITO, a workflow that previously could span weeks now takes only minutes. This tool streamlines the tedious details of deploying, scaling, and managing AI workloads on Kubernetes, so you can focus on other aspects of the ML life cycle. You can choose from a range of popular open source models or onboard your custom option, and KAITO tunes the deployment parameters and automatically provisions GPU nodes for you. Today, KAITO supports five model families and over 10 containerized models, ranging from small to large language models.

For an ML engineer like Sehgal, KAITO overcomes the hassle of managing different tools, add-ons and versions. You get a simple, declarative interface “that encapsulates all the requirements you need for running your inferencing model. Everything gets set up,” he explained.

How KAITO Works

Using KAITO is a two-step process. First, install KAITO on your cluster, and then select a preset that encapsulates all the requirements needed for inference with your model. Within the associated workspace custom resource definition (CRD), a minimum GPU size is recommended so you don’t have to search for the ideal hardware. You can always customize the CRD to your needs. After deploying the workspace, KAITO uses the node provisioner controller to automate the rest.

“KAITO is basically going to provision GPU nodes and add them to your cluster on your behalf,” explained Microsoft senior cloud evangelist Paul Yu. “As a user, I just have to deploy my workspace into the AKS cluster, and that creates the additional CR.”

As shown in the following KAITO architecture, the workspace invokes a provisioner to create and configure the right-sized infrastructure for you, and it even distributes your workload across smaller GPU nodes to reduce costs. The project uses open source Karpenter APIs for the VM configuration based on your requested size, installing the right drivers and device plug-ins for Kubernetes.

Graphic of Kubernetes AI Toolchain Operator

 Zoom

For applications with compliance requirements, KAITO provides granular control over data security and privacy. You can ensure that models are ring-fenced within your organization’s network and that your data never leaves the Kubernetes cluster.

Check out this tutorial on how to bring your own AI models to intelligent apps on Azure Kubernetes Service, where Sehgal and Yu integrate KAITO in a common e-commerce application in a matter of minutes.

Working With Managed Kubernetes

Currently, you can use KAITO to provision GPUs on Azure, but the project is evolving quickly. The roadmap includes support for other managed Kubernetes providers. When you use a managed Kubernetes service, you can interact with other services from that cloud platform more easily to add capabilities to your workflow or applications.

Earlier this year, at Microsoft Build 2024, BMW talked about its use of generative AI and Azure OpenAI Service in the company’s connected car app, My BMW, which runs on AKS.

Brendan Burns, co-founder of the Kubernetes open source project, introduced the demo. “What we’re seeing is, as people are using Azure OpenAI Service, they’re building the rest of the application on top of AKS,” he told the audience. “But, of course, just using OpenAI Service isn’t the only thing you might want to use. There are a lot of reasons why you might want to use open source large language models, including situations like data and security compliance, and fine-tuning what you want to do. Maybe there’s just a model out there that’s better suited to your task. But doing this inside of AKS can be tricky so we integrated the Kubernetes AI Toolchain Operator as an open source project.”

Make Your Language Models Smarter with KAITO

Open-source language models are trained on extensive amounts of text from a variety of sources, so the output for domain-specific prompts may not always meet your needs. Take the pet store application, for example. If a customer asks the integrated pre-trained model for dog food recommendations, it might give different pricing options across a few popular dog breeds. This is informative but not necessarily useful as the customer shops. After fine-tuning the model on historical data from the pet store, the recommendations can instead be tailored to well-reviewed, affordable options available in the store.

As a step in your ML lifecycle, fine-tuning helps customize open source models to your own data and use cases. The latest release of KAITO, v0.3.0, supports fine-tuning, and inference with adapters and a broader range of models. You can simply define your tuning method and data source in a KAITO workspace CR and see your intelligent app become more context-aware while maintaining data security and compliance requirements in your cluster.

To stay up to date on the project roadmap and test these new features, check out KAITO on GitHub.

  • Facebook
  • Twitter
  • LinkedIn
  • Reddit
  • WhatsApp
  • Telegram
Posted in VulnerabilityTagged Cyber Attacks, Data Security, vulnerability

Post navigation

Open Source or Closed? The AI Dilemma
What’s New for JavaScript Developers in ECMAScript 2024

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • New Malicious PyPI Packages used by Lazarus(By Shusei Tomonaga)
  • Recent Cases of Watering Hole Attacks, Part 1(By Shusei Tomonaga)
  • Recent Cases of Watering Hole Attacks Part 2(By Shusei Tomonaga)
  • Tempted to Classifying APT Actors: Practical Challenges of Attribution in the Case of Lazarus’s Subgroup(By Hayato Sasaki)
  • SPAWNCHIMERA Malware: The Chimera Spawning from Ivanti Connect Secure Vulnerability(By Yuma Masubuchi)
  • DslogdRAT Malware Installed in Ivanti Connect Secure(By Yuma Masubuchi)
  • DslogdRAT Malware Targets Ivanti Connect Secure via CVE-2025-0282 Zero-Day Exploit
  • Lazarus Group’s “Operation SyncHole” Targets South Korean Industries
  • North Korean APT ‘Contagious Interview’ Launches Fake Crypto Companies to Spread Malware Trio
  • SocGholish and RansomHub: Sophisticated Attack Campaign Targeting Corporate Networks
  • Critical Flaw Exposes Linux Security Blind Spot: io_uring Bypasses Detection
  • Discord Used as C2 for Stealthy Python-Based RAT
  • Earth Kurma APT Targets Southeast Asia with Stealthy Cyberespionage
  • Triada Trojan Evolves: Pre-Installed Android Malware Now Embedded in Device Firmware
  • Fake GIF and Reverse Proxy Used in Sophisticated Card Skimming Attack on Magento
  • Fog Ransomware Group Exposed: Inside the Tools, Tactics, and Victims of a Stealthy Threat
  • Weaponized Uyghur Language Software: Citizen Lab Uncovers Targeted Malware Campaign
  • 4Chan Resumes Operation After Hack, Cites Funding Issues
  • ResolverRAT Targets Healthcare and Pharmaceutical Sectors Through Sophisticated Phishing Attacks
  • CVE-2024-8190: Investigating CISA KEV Ivanti Cloud Service Appliance Command Injection Vulnerability
  • Dissecting the Cicada
  • LockBit Analysis
  • Attacking PowerShell CLIXML Deserialization
  • Threat Hunting Report: GoldPickaxe
  • Exploiting Microsoft Kernel Applocker Driver (CVE-2024-38041)
  • Acquiring Malicious Browser Extension Samples on a Shoestring Budget
  • Type Juggling and Dangers of Loose Comparisons
  • Exploring Deserialization Attacks and Their Effects
  • Hunting for Unauthenticated n-days in Asus Routers
  • Element Android CVE-2024-26131, CVE-2024-26132 – Never Take Intents From Strangers
  • A Journey From sudo iptables To Local Privilege Escalation
  • AlcaWASM Challenge Writeup – Pwning an In-Browser Lua Interpreter
  • Fortinet Confirms Third-Party Data Breach Amid Hacker’s 440 GB Theft Claim
  • Adversary Emulation is a Complicated Profession – Intelligent Cyber Adversary Emulation with the Bounty Hunter
  • Cloudflare blocks largest recorded DDoS attack peaking at 3.8Tbps
  • RPKI Security Under Fire: 53 Vulnerabilities Exposed in New Research
  • CVE-2024-5102: Avast Antivirus Flaw Could Allow Hackers to Delete Files and Run Code as SYSTEM
  • Build Your Own Google: Create a Custom Search Engine with Trusted Sources
  • Rogue AI: What the Security Community is Missing
  • Ransomware Roundup – Underground
  • Emansrepo Stealer: Multi-Vector Attack Chains
  • Threat Actors Exploit GeoServer Vulnerability CVE-2024-36401
  • In-depth analysis of Pegasus spyware and how to detect it on your iOS device
  • GoldPickaxe exposed: How Group-IB analyzed the face-stealing iOS Trojan and how to do it yourself
  • Beware CraxsRAT: Android Remote Access malware strikes in Malaysia
  • Boolka Unveiled: From web attacks to modular malware
  • Ajina attacks Central Asia: Story of an Uzbek Android Pandemic
  • SMTP/s — Port 25,465,587 For Pentesters
  • POC – CVE-2024–4956 – Nexus Repository Manager 3 Unauthenticated Path Traversal
  • Unauthenticated RCE Flaw in Rejetto HTTP File Server – CVE-2024-23692
  • CVE-2024–23897 — Jenkins File Read Vulnerability — POC
  • Why Django’s [DEBUG=True] is a Goldmine for Hackers
  • Extracting DDosia targets from process memory
  • Dynamic Binary Instrumentation for Malware Analysis
  • Meduza Stealer or The Return of The Infamous Aurora Stealer
  • Unleashing the Viper : A Technical Analysis of WhiteSnake Stealer
  • MetaStealer – Redline’s Doppelgänger
  • Pure Logs Stealer Fails to Impress
  • MetaStealer Part 2, Google Cookie Refresher Madness and Stealer Drama
  • From Russia With Code: Disarming Atomic Stealer

Recent Comments

  1. Maq Verma on Turla APT used two new backdoors to infiltrate a European ministry of foreign affairs
  2. binance Registrera on Turla APT used two new backdoors to infiltrate a European ministry of foreign affairs
  3. Hal on FBI: BlackSuit ransomware made over $500 million in ransom demands
  4. canadian pharmaceuticals on Linux: Mount Remote Directories With SSHFS
  5. situs togel resmi on Extracting DDosia targets from process memory

Archives

  • April 2025 (19)
  • November 2024 (20)
  • October 2024 (13)
  • September 2024 (2)
  • August 2024 (119)
  • July 2024 (15)

Categories

  • Crack Tutorials
  • Cyber Attacks
  • Data Breaches
  • Exploits
  • Programming
  • Tools
  • Vulnerability

Site Visitors

  • Users online: 0 
  • Visitors today : 3
  • Page views today : 3
  • Total visitors : 2,215
  • Total page view: 2,824

$22 Million AWS Bitmagnet BlackCat Bytecode CrowdStrike Cyber Attacks cyber security Data Breach Data Security DDOS Decentralized Encryption fake github Indexer Injection Activity kernel Linux Maestro malware Microsoft Model Architecture Netflix Open Source Phishing Phishing Scam Programming Ransomware Reverse Engineering Safe Delete Safe Erase Scam Security tool Software Crack Software Design software protection SOLID SOLID Principles Sophos Intercept X Advanced Spyware Tools Torrent TryCloudflare vulnerability Workflow Engine

Proudly powered by Admiration Tech News | Copyright ©2023 Admiration Tech News | All Rights Reserved