Part 1: Setting Up the Foundation
Project Structure and Dependencies
Let’s start by creating our project structure and understanding why each component is essential.
# Create project directory
mkdir alexai-assistant
cd alexai-assistant
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install core dependencies
pip install pydantic-ai[openai] ollama-python chromadb whisper-cpp-python pyttsx3 pytesseract pillow fastapi uvicorn asyncio-mqtt sqlalchemy alembic
Why Each Dependency?
- pydantic-ai: Type-safe AI framework that validates inputs/outputs
- ollama-python: Interface with local Ollama models
- chromadb: Vector database for RAG and memory
- whisper-cpp-python: Fast, local speech-to-text
- pyttsx3: Text-to-speech synthesis
- pytesseract: OCR for image text extraction
- fastapi: Web framework for API and MCP server
- asyncio-mqtt: A2A communication protocol
Core Configuration
# config.py
from pydantic import BaseSettings
from typing import Optional, List
import os
class Config(BaseSettings):
# Ollama Configuration
OLLAMA_BASE_URL: str = "http://localhost:11434"
PRIMARY_MODEL: str = "llama3.1:8b"
EMBEDDING_MODEL: str = "nomic-embed-text"
VISION_MODEL: str = "llava:7b"
# Vector Database
CHROMA_PERSIST_DIR: str = "./data/chroma_db"
# Memory Configuration
MEMORY_DB_URL: str = "sqlite:///./data/memory.db"
MAX_MEMORY_TOKENS: int = 4000
# Voice Configuration
STT_MODEL: str = "base" # Whisper model size
TTS_VOICE_RATE: int = 200
# MCP Configuration
MCP_SERVER_PORT: int = 8000
# A2A Protocol
MQTT_BROKER: str = "localhost"
MQTT_PORT: int = 1883
AGENT_ID: str = "alexai-001"
class Config:
env_file = ".env"
config = Config()
Why this configuration structure?
- Centralized settings: Easy to modify behavior without code changes
- Environment variables: Secure credential management
- Type hints: Catch configuration errors early
- Defaults: Works out of the box for development