Setup Environment¶
Hugging Face is a community for machine learning and artificial intelligence practitioners, with rich open-source tools and resources for developers. The Hugging Face Hub is the core platform for discovering and exploring repositories, models, datasets, spaces and other features available to developers. Hugging Face also has a number of tools and libraries to support developers:
huggingface_hub
- Python SDK for Hugging Face Hub interactions.llm-vscode
- VSCode extension to streamline working with LLM backends.- Spaces Dev Mode - (for Pro subscribers) connect VSCode to Docker container in Spaces.
- Gradio - Python library for building ML & web application demos rapidly.
To jumpstart development, I setup a dev container that has all required tools and dependencies pre-configured. Just launch the container in the cloud (with GitHub Codespaces) or on the local device (with Docker Desktop) - and you're ready to code. Let's validate this.
1. Dev Environment¶
This repository is instrumented with a devcontainer.json
file that pre-installs all Python dependencies listed in the requirements.txt
file.
1.1 | Launch Dev Container:
To get started, *fork the repo to your profile, then pick one of these 2 options to launch your development environment:
- Use GitHub Codespaces - launch the dev container in the cloud by selecting
Code > Codespaces > Create New Codespace
from your fork. This launches a codespaces session in a browser tab. - Use Docker Desktop - clone the forked repo to your local device, open in VS Code. Launch Docker Desktop & select
Reopen in Container
in VS Code (from prompt or command palette).
1.2 | Validate Setup:
You should have a Visual Studio Code IDE connected to a runtime that has all required tools and dependencies pre-installed. Open the terminal in VS Code and run the following commands to check:
Bash | |
---|---|
1 2 3 4 5 |
|
1.3 | Set Env Variables:
Setup env variables during initial setup by copying the .env.sample
file to .env
.
Bash | |
---|---|
1 |
|
2. Learning Resources¶
Once the development environment is ready, we can start using it to explore various learning resources including:
- Open-Source AI Cookbook - recipes for familiar AI tasks and workflows
- Tokenizers - fast, SOTA tokenization for NLP
- Datasets - library to access and share datasets
- Gradio - build applications in a few lines of code
- Inference API - experiment with 200K+ models on serverless tier
- Evaluate - assess and report model performance
- Distilable - synthesize data for AI and add feedback
- Tasks - understanding tasks taxonomy for inference