Setup Environment¶

Hugging Face is a community for machine learning and artificial intelligence practitioners, with rich open-source tools and resources for developers. The Hugging Face Hub is the core platform for discovering and exploring repositories, models, datasets, spaces and other features available to developers. Hugging Face also has a number of tools and libraries to support developers:

huggingface_hub - Python SDK for Hugging Face Hub interactions.
llm-vscode - VSCode extension to streamline working with LLM backends.
Spaces Dev Mode - (for Pro subscribers) connect VSCode to Docker container in Spaces.
Gradio - Python library for building ML & web application demos rapidly.

To jumpstart development, I setup a dev container that has all required tools and dependencies pre-configured. Just launch the container in the cloud (with GitHub Codespaces) or on the local device (with Docker Desktop) - and you're ready to code. Let's validate this.

1. Dev Environment¶

This repository is instrumented with a devcontainer.json file that pre-installs all Python dependencies listed in the requirements.txt file.

1.1 | Launch Dev Container:

To get started, *fork the repo to your profile, then pick one of these 2 options to launch your development environment:

Use GitHub Codespaces - launch the dev container in the cloud by selecting Code > Codespaces > Create New Codespace from your fork. This launches a codespaces session in a browser tab.
Use Docker Desktop - clone the forked repo to your local device, open in VS Code. Launch Docker Desktop & select Reopen in Container in VS Code (from prompt or command palette).

1.2 | Validate Setup:

You should have a Visual Studio Code IDE connected to a runtime that has all required tools and dependencies pre-installed. Open the terminal in VS Code and run the following commands to check:

Bash
# Check Hub Python SDK is installed
pip show huggingface-hub

# Check Hugging Face CLI is installed
hugginface-cli --help

1.3 | Set Env Variables:

Setup env variables during initial setup by copying the .env.sample file to .env.

Bash
cp .env.sample .env

We can then populate it with required values as we go (based on labs requirements). If you run in GitHub Codespaces, you will also get any secrets you associated with the repo auto-injected into the dev container runtime.

2. Learning Resources¶

Once the development environment is ready, we can start using it to explore various learning resources including:

Open-Source AI Cookbook - recipes for familiar AI tasks and workflows
Tokenizers - fast, SOTA tokenization for NLP
Datasets - library to access and share datasets
Gradio - build applications in a few lines of code
Inference API - experiment with 200K+ models on serverless tier
Evaluate - assess and report model performance
Distilable - synthesize data for AI and add feedback
Tasks - understanding tasks taxonomy for inference