Skip to content

Admiration Tech News

  • Home
  • Cyber Attacks
  • Data Breaches
  • Vulnerability
  • Exploits
  • Crack Tutorials
  • Programming
  • Tools

Tag: Model Architecture

How To Run an Agent on Federated Language Model Architecture

Posted on August 4, 2024 - August 4, 2024 by Maq Verma

In the first part of this series, I introduced the idea of federated language models, where we take advantage of a capable cloud-based large language model (LLM) and a small language model (SLM) running at the edge.

To recap, an agent sends the user query (1) along with the available tools (2) to a cloud-based LLM to map the prompt into a set of functions and arguments (3). It then executes the functions to generate appropriate context from a database (4a). If there are no tools involved, it leverages the simple RAG mechanism to perform a semantic search in a vector database (4b). The context gathered from either of the sources is then sent (5) to an edge-based SLM to generate a factually correct response. The response (6) generated by the SLM is sent as the final answer (7) to the user query.

This tutorial focuses on the practical implementation of a federated LM architecture based on the below components:

  • OpenAI GPT-4 Omni as an LLM.
  • Microsoft Phi-3 as a SLM.
  • Ollama as the inference engine for Phi-3.
  • Nvidia Jetson AGX Orin as the edge device to run Ollama.
  • MySQL database and a Flask API server running locally.
  • Chroma as the local vector store for semantic search.

Refer to the tutorial on setting up Ollama on Jetson Orin and implementing the RAG agent for additional context and details.

Start by cloning the Git repository https://github.com/janakiramm/federated-llm.git, which has the scripts, data, and Jupyter Notebooks. This tutorial assumes that you have access to OpenAI and an Nvidia Jetson Orin device. You can also run Ollama on your local workstation and change the IP address in the code.

Step 1: Run Ollama on Jetson Orin

SSH into Jetson Orin and run the commands mentioned in the file, setup_ollama.sh.

Verify that you are able to connect to and access the model by running the below command on your workstation, where you run Jupyter Notebook.

123456789101112131415curl http://localhost:11434/v1/chat/completions \    -H “Content-Type: application/json” \    -d ‘{        “model”: “phi3:mini”,        “messages”: [            {                “role”: “system”,                “content”: “You are a helpful assistant.”            },            {                “role”: “user”,                “content”: “What is the capital of France?”            }        ]    }’


Replace localhost with the IP address of your Jetson Orin device. If everything goes well, you should be able to get a response from the model.

Congratulations, your edge inference server is now ready!

Step 2: Run MySQL DB and Flask API Server

The next step is to run the API server, which exposes a set of functions that will be mapped to the prompt. To make this simple, I built a Docker Compose file that exposes the REST API endpoint for the MySQL database backend.

Switch to the api folder and run the below command:

1start_api_server.sh


Check if two containers are running on your workstation with the docker ps command.

If you run the command curl "http://localhost:5000/api/sales/trends?start_date=2023-05-01&end_date=2023-05-30", you should see the response from the API.

Step 3: Index the PDF and Ingest the Embeddings in Chroma DB

With the API in place, it’s time to generate the embeddings for the datasheet PDF and store them in the vector database.

For this, run the Index_Datasheet.ipynb Jupyter Notebook, which is available in the notebooks folder.

A simple search retrieves the phrases that semantically match the query.

Step 4: Run the Federated LM Agent

The Jupyter Notebook, Federated-LM.ipynb, has the complete code to implement the logic. Let’s understand the key sections of the code.

We will import the API client that exposes the tools to the LLM.

We will import the API client that exposes the tools to the LLM.

First, we initialize two LLMs: GPT-4o (Cloud) and Phi3:mini (Edge)

After creating a Python list with the signatures of the tools, we will let GPT-4o map the prompt to appropriate functions and their arguments.

For example, passing the prompt What was the top selling product in Q2 based on revenue? to GPT-4o results in the model responding with the function get_top_selling_products and the corresponding arguments. Notice that a capable model is able to translate Q2 into date range, starting from April 1st to June 30th. This is exactly the power we want to exploit from the cloud-based LLM.

Once we enumerate the tool(s) suggested by GPT-4o, we execute, collect, and aggregate the output to form the context.

If the prompt doesn’t translate to tools, we attempt to use the retriever based on the semantic search from the vector database.

To avoid sending sensitive context to the cloud-based LLM, we leverage the model (edge_llm) at the edge for generation.

Finally, we implement the agent that orchestrates the calls between the cloud-based LLM and the edge-based LLM. It checks if the tools list is empty and then moves to the retriever to generate the context. If both are empty, the agent responds with the phrase “I don’t know.”

Below is the response from the agent based on tools, retriever, and unknown context.

To summarize, we implemented a federated LLM approach where an agent sends the user query along with available tools to a cloud-based LLM, which maps the prompt into functions and arguments. The agent executes these functions to generate context from a database. If no tools are involved, a simple RAG mechanism is used for semantic search in a vector database. The context is then sent to an edge-based SLM to generate a factually correct response, which is provided as the final answer to the user query.

Posted in VulnerabilityTagged Cyber Attacks, Data Security, Model ArchitectureLeave a comment

Recent Posts

  • New Malicious PyPI Packages used by Lazarus(By Shusei Tomonaga)
  • Recent Cases of Watering Hole Attacks, Part 1(By Shusei Tomonaga)
  • Recent Cases of Watering Hole Attacks Part 2(By Shusei Tomonaga)
  • Tempted to Classifying APT Actors: Practical Challenges of Attribution in the Case of Lazarus’s Subgroup(By Hayato Sasaki)
  • SPAWNCHIMERA Malware: The Chimera Spawning from Ivanti Connect Secure Vulnerability(By Yuma Masubuchi)
  • DslogdRAT Malware Installed in Ivanti Connect Secure(By Yuma Masubuchi)
  • DslogdRAT Malware Targets Ivanti Connect Secure via CVE-2025-0282 Zero-Day Exploit
  • Lazarus Group’s “Operation SyncHole” Targets South Korean Industries
  • North Korean APT ‘Contagious Interview’ Launches Fake Crypto Companies to Spread Malware Trio
  • SocGholish and RansomHub: Sophisticated Attack Campaign Targeting Corporate Networks
  • Critical Flaw Exposes Linux Security Blind Spot: io_uring Bypasses Detection
  • Discord Used as C2 for Stealthy Python-Based RAT
  • Earth Kurma APT Targets Southeast Asia with Stealthy Cyberespionage
  • Triada Trojan Evolves: Pre-Installed Android Malware Now Embedded in Device Firmware
  • Fake GIF and Reverse Proxy Used in Sophisticated Card Skimming Attack on Magento
  • Fog Ransomware Group Exposed: Inside the Tools, Tactics, and Victims of a Stealthy Threat
  • Weaponized Uyghur Language Software: Citizen Lab Uncovers Targeted Malware Campaign
  • 4Chan Resumes Operation After Hack, Cites Funding Issues
  • ResolverRAT Targets Healthcare and Pharmaceutical Sectors Through Sophisticated Phishing Attacks
  • CVE-2024-8190: Investigating CISA KEV Ivanti Cloud Service Appliance Command Injection Vulnerability
  • Dissecting the Cicada
  • LockBit Analysis
  • Attacking PowerShell CLIXML Deserialization
  • Threat Hunting Report: GoldPickaxe
  • Exploiting Microsoft Kernel Applocker Driver (CVE-2024-38041)
  • Acquiring Malicious Browser Extension Samples on a Shoestring Budget
  • Type Juggling and Dangers of Loose Comparisons
  • Exploring Deserialization Attacks and Their Effects
  • Hunting for Unauthenticated n-days in Asus Routers
  • Element Android CVE-2024-26131, CVE-2024-26132 – Never Take Intents From Strangers
  • A Journey From sudo iptables To Local Privilege Escalation
  • AlcaWASM Challenge Writeup – Pwning an In-Browser Lua Interpreter
  • Fortinet Confirms Third-Party Data Breach Amid Hacker’s 440 GB Theft Claim
  • Adversary Emulation is a Complicated Profession – Intelligent Cyber Adversary Emulation with the Bounty Hunter
  • Cloudflare blocks largest recorded DDoS attack peaking at 3.8Tbps
  • RPKI Security Under Fire: 53 Vulnerabilities Exposed in New Research
  • CVE-2024-5102: Avast Antivirus Flaw Could Allow Hackers to Delete Files and Run Code as SYSTEM
  • Build Your Own Google: Create a Custom Search Engine with Trusted Sources
  • Rogue AI: What the Security Community is Missing
  • Ransomware Roundup – Underground
  • Emansrepo Stealer: Multi-Vector Attack Chains
  • Threat Actors Exploit GeoServer Vulnerability CVE-2024-36401
  • In-depth analysis of Pegasus spyware and how to detect it on your iOS device
  • GoldPickaxe exposed: How Group-IB analyzed the face-stealing iOS Trojan and how to do it yourself
  • Beware CraxsRAT: Android Remote Access malware strikes in Malaysia
  • Boolka Unveiled: From web attacks to modular malware
  • Ajina attacks Central Asia: Story of an Uzbek Android Pandemic
  • SMTP/s — Port 25,465,587 For Pentesters
  • POC – CVE-2024–4956 – Nexus Repository Manager 3 Unauthenticated Path Traversal
  • Unauthenticated RCE Flaw in Rejetto HTTP File Server – CVE-2024-23692
  • CVE-2024–23897 — Jenkins File Read Vulnerability — POC
  • Why Django’s [DEBUG=True] is a Goldmine for Hackers
  • Extracting DDosia targets from process memory
  • Dynamic Binary Instrumentation for Malware Analysis
  • Meduza Stealer or The Return of The Infamous Aurora Stealer
  • Unleashing the Viper : A Technical Analysis of WhiteSnake Stealer
  • MetaStealer – Redline’s Doppelgänger
  • Pure Logs Stealer Fails to Impress
  • MetaStealer Part 2, Google Cookie Refresher Madness and Stealer Drama
  • From Russia With Code: Disarming Atomic Stealer

Recent Comments

  1. Maq Verma on Turla APT used two new backdoors to infiltrate a European ministry of foreign affairs
  2. binance Registrera on Turla APT used two new backdoors to infiltrate a European ministry of foreign affairs
  3. Hal on FBI: BlackSuit ransomware made over $500 million in ransom demands
  4. canadian pharmaceuticals on Linux: Mount Remote Directories With SSHFS
  5. situs togel resmi on Extracting DDosia targets from process memory

Archives

  • April 2025 (19)
  • November 2024 (20)
  • October 2024 (13)
  • September 2024 (2)
  • August 2024 (119)
  • July 2024 (15)

Categories

  • Crack Tutorials
  • Cyber Attacks
  • Data Breaches
  • Exploits
  • Programming
  • Tools
  • Vulnerability

Site Visitors

  • Users online: 0 
  • Visitors today : 3
  • Page views today : 3
  • Total visitors : 2,215
  • Total page view: 2,824

$22 Million AWS Bitmagnet BlackCat Bytecode CrowdStrike Cyber Attacks cyber security Data Breach Data Security DDOS Decentralized Encryption fake github Indexer Injection Activity kernel Linux Maestro malware Microsoft Model Architecture Netflix Open Source Phishing Phishing Scam Programming Ransomware Reverse Engineering Safe Delete Safe Erase Scam Security tool Software Crack Software Design software protection SOLID SOLID Principles Sophos Intercept X Advanced Spyware Tools Torrent TryCloudflare vulnerability Workflow Engine

Proudly powered by Admiration Tech News | Copyright ©2023 Admiration Tech News | All Rights Reserved