Below you will find pages that utilize the taxonomy term “Hugging Face”
May 19, 2025
Optimizing LLMs for Edge Devices: A GCP & Hugging Face Tutorial
This tutorial offers a detailed workflow for optimizing Large Language Models (LLMs) for edge devices using Google Cloud Platform (GCP) and Hugging Face. It covers setting up a GCP environment, fine-tuning a small LLM, and then applying advanced optimization techniques including knowledge distillation, quantization (dynamic and static), and pruning. The guide also details how to export the final model to formats like ONNX and TFLite for deployment in browsers (with Transformers.js or ONNX Runtime Web) and on mobile devices.
read more