HomeAPI PlatformScout
Personal AI agent for Mac

Scout

Frontier Agent On Your MacBook

FREE LOCAL-FIRST AGENTS

Utilize Your Hardware for Free AI

Scout is a local-first agent that runs air-gapped on your MacBook — chats, memories, and personal data stay on your machine instead of someone else's cloud. It gives you a chat workspace for research, writing, filesystem utilization, and much more, with different apps for specific use cases.

Scout runs on the same engine as the Courier API Platform — tool calling, flex models, analytics, OpenAI-compatible endpoints — but ships with a polished chat interface, OS workspace, and a personal-use license. The full API surface is there if you need it.

What Scout Does

An Agent That Lives On Your Mac

Chat, research, automate files, test APIs, and extend with MCP apps — all from a macOS-style workspace powered by Gemma 4 running locally on your Mac, or by Courier Cloud for the heavy models when you don't have the memory.

Agentic Chat Interface

Streaming conversations with tool calls, model picker, and modes for research, writing, and deep workflows.

Pathfinder Semantic Search

Indexes your home folder with embeddings and descriptions so Scout can find documents, code, and notes instantly.

Shadow Shell

Our novel sandboxing technology that protects your filesystem from agent mistakes. Everything the OS agent changes is staged in a Shadow Shell for your approval.

OS Agent

Scout can operate on your OS, navigating your filesystem, reading and writing to your desktop as a powerful agent — all safely thanks to Shadow.

MCPost API Testing

Import Postman or Insomnia collections and let Scout run, modify, and chain API requests agent-first.

MCP App Ecosystem

Connect MCP servers as first-class Scout apps alongside Assistant, OS, and Settings modes.

Settings Agent

A dedicated agent that can help you configure anything within Courier OS — models, integrations, preferences, and workspace setup.

Recommended Scout Stack

  • Gemma 4 26B A4B — main Scout agent for chat, tools, and workflows
  • Gemma 4 E2B — lightweight sub-agent for routing and fast tasks
  • Qwen3 Embedding 0.6B — Pathfinder semantic search index

Local or Cloud, One Click

Scout always runs locally on your Mac, but the models it calls can run either way. Use Gemma locally when your Mac has the memory, or route inference through Courier Cloud's free tier — same agent experience either way.

Scout modes include Assistant for everyday chat, OS for filesystem automation, Settings for configuration help, and MCP apps you connect yourself.

Find the Right Mac for Your Use Case
Start from a preset or build your own flex stack, then see which Apple Silicon Mac fits your models.
Model Count

Select multiple models for different tasks (e.g., coding, vision, and general chat). As your user base grows, you will see increased latency and degradation in user-experience if multiple models are not utilized.

1-3 Models:Focused Setup
4-10 Models:Versatile Setup
10+ Models:Full Ecosystem
Throughput - Quantization & VRAM

Performance is determined by model quantization and available VRAM (Video Memory). Reasoning diminishes as quantization drops, possibly leading to hallucinations and other unintended side-effects.

  • 4-bit: Maximum speed, lower VRAM
  • 8-bit: Balanced speed and logic
  • 16-bit: Maximum reasoning capability
Model Size - Parameters

Parameters are the internal variables the AI learns during training. A 30GB model has more "knowledge" than an 8GB model.

Lite (1GB - 14GB):Fast, Efficient
Balanced (15GB - 50GB):Versatile, Strong
Frontier (70GB+):Advanced Reasoning
Context Window - Memory

The context window is the amount of text (tokens) the AI can "remember" during a conversation or process in a single request.

32k tokens:~50 pages of text
128k tokens:Full book length
1M+ tokens:Entire codebases

Dynamic Memory Management

Courier offers 2 model serving options to maximize memory efficiency, Flex and Static

Flex Models

Flex models load into memory upon request and unload after 5 minutes of inactivity.

• Enables running multiple large models on limited hardware

• Dynamic memory allocation

• Only the largest flex model counts towards VRAM requirements

Static Models

Static models stay loaded in memory at all times, providing instant response.

• Instant availability, no load time

• Continuous memory occupancy

• Each static model adds directly to total VRAM requirements

Feeling overwhelmed or unsure what to choose?

Let us help you figure it out.

Start With a Use Case

Pre-configured flex stacks — memory is calculated from the largest flex model loaded at once.

Scout
Chat agent, lightweight sub-agent, and embeddings for Pathfinder
All models use flex APIs
Coding Agent
Planner + implementer stack for agentic coding workflows
All models use flex APIs
Production Server
General-purpose production API with Gemma 4
All models use flex APIs

What do you need AI for?

Select the primary functions for your self-hosted AI setup

Agent
Tool-calling agents, chat, and multi-modal workflows
Image Generation
Generate images from text prompts
Embeddings/RAG
Semantic search, retrieval, and memory

Select Your Models

Choose the AI models to include in your platform (Filtered by your use cases)

No models selected. Add models to your platform to continue.

Hardware Recommendation

Infrastructure Requirements
Based on your model selection
Total VRAM Required7 GB
Recommended HardwareMac Mini (16GB)

Need multi-device clustering or a custom setup? Book a free consultation

Ready to ask Scout?

Deploy Gemma, open Courier OS, and start chatting with an agent that understands your files, tools, and APIs.