documentation
open-jet Docs
open-jet is a terminal-first agent for Jetson and Linux edge devices. It runs quantized models locally via llama.cpp, supports guarded tool calls, and manages context within RAM/token limits.
Install
Prerequisites:
llama-serverfromllama.cppavailable onPATH- A local
.ggufmodel file, orollamafor pull workflow - Python 3.10+
python -m venv .venv
source .venv/bin/activate
python -m pip install -e .Quickstart
- Run
open-jet. - Complete setup: hardware profile, model source, context size, and GPU layers.
- Chat normally, or load file context using
@file,@[path with spaces], or/load path. - Approve or deny state-changing tool calls with
y/n.
open-jet
open-jet --setupCLI Commands
| Command | Description |
|---|---|
open-jet | Start the TUI. On first run, setup wizard writes config.yaml. |
open-jet --setup | Force setup wizard, then restart runtime with updated config. |
Slash Commands
| Slash command | Aliases | Description |
|---|---|---|
/help | commands, ? | Show command help |
/exit | quit | Quit the app |
/clear | reset | Clear chat and restart llama-server (flush KV cache) |
/clear-chat | clear_messages | Clear chat only, keep current server/KV state |
/status | stats | Show runtime memory/context status |
/condense | - | Manually condense older context |
/load <path> | add | Load file into context |
/resume | - | Load previous session state into chat |
/setup | - | Open setup wizard and restart runtime |
/util [show|hide|toggle|status] | usage | Show/hide utilization line |
Agent Command Surface
The agent can call these runtime tools directly. This is the full callable command surface exposed to the model.
| Tool | Arguments | Behavior | Approval required |
|---|---|---|---|
shell | { command: string, timeout_seconds?: int } | Run shell command in subprocess; default timeout 60s; timeout returns exit code 124 | Yes |
read_file | { path: string } | Read file text (up to 2MB safety cap) | No |
write_file | { path: string, content: string } | Create/overwrite file; creates parent dirs if needed | Yes |
load_file | { path: string, max_tokens?: int } | Load text/code file into prompt context with RAM + token budget clamping | No |
File Mentions and Tab Autocomplete
Input behavior for file loading and command discovery follows the runtime usage flow documented in the project README.
| Pattern | Behavior |
|---|---|
@file | Loads referenced file content into context automatically. |
@[path with spaces] | Loads files whose path contains spaces. |
Tab after @ | Autocompletes file paths from the current workspace. |
/ then Up/Down + Tab | Shows slash command suggestions, cycles options, and autocompletes selected command. |
Interaction Controls
| Area | Keys | Result |
|---|---|---|
| Tool approvals | y / n, Left/Right, Enter | Approve/deny proposed state-changing tool call |
| Setup wizard | Up/Down, Tab/Enter, Shift+Tab | Navigate setup steps, save final selection |
| Prompt completion | Type / or @, then Up/Down + Tab | Cycle suggestions and apply selected completion |
| Generation stop | Esc | Stop current generation |
| Quit | Ctrl+C or /exit | Exit app and persist session state |
Runtime Functionality
| Capability | Details |
|---|---|
| File mentions | @file and @[path with spaces] auto-load file content into context |
| @ path autocomplete | Tab autocompletes @ file paths from the current workspace |
| Slash autocomplete | Typing / shows command suggestions; Up/Down cycles and Tab applies |
| Token budget display | Live tokens: context + draft, prompt budget, remaining |
| Memory-aware condense | Auto-condenses context when budget or RAM pressure is hit |
| Utilization line | CPU, memory, TPS, power/battery status; toggled via /util |
| Session logs | events.jsonl + metrics.jsonl in session_logs/ |
| Session resume | Persisted session_state.json and /resume restore |
| Model source workflows | Local GGUF or Ollama model selection + pull path resolution |
| Safety defaults | Mutation tools gated by explicit approval with preview |
Model Guidance
Setup recommends model tags from curated parameter bands based on detected or manually selected hardware.
| Recommendation band | Suggested Ollama tags |
|---|---|
| <= 2B params | qwen2.5:1.5bdeepseek-r1:1.5bgemma2:2b |
| <= 4B params | qwen2.5:3bqwen2.5:3b-instructgemma2:2b |
| <= 8B params | qwen2.5:7bmistral:7bdeepseek-r1:7b |
| <= 14B params | qwen2.5:14bdeepseek-r1:14bgemma2:9b |
| <= 32B params | qwen2.5:32bqwen2.5-coder:32bgemma2:27b |
Hardware Profiles
Manual setup includes these Jetson profile overrides:
- Jetson Nano (4GB RAM)
- Jetson Xavier NX (8GB RAM)
- Jetson Orin Nano (8GB RAM)
- Jetson Orin NX (16GB RAM)
- Jetson AGX Orin (32GB RAM)
- Jetson AGX Orin (64GB RAM)
Configuration
Runtime config is stored in config.yaml. Key controls include model selection, context window, memory guard thresholds, logging, and session state behavior.
context_window_tokens: 4096
device: cuda
gpu_layers: 20
model_source: local
model: /absolute/path/to/model.gguf
ollama_model: gemma2:2b
memory_guard:
context_reserved_tokens: null
min_prompt_tokens: 256
min_available_mb: null
max_used_percent: null
check_interval_chunks: 16
condense_target_tokens: 900
keep_last_messages: 6
logging:
enabled: true
directory: session_logs
label: open-jet
metrics_interval_seconds: 5
state:
enabled: true
auto_resume: false
path: session_state.jsonLogging and Session State
- Structured logs are written to
session_logs/by default. *.events.jsonl: user/assistant messages, tool requests/results, approvals, errors.*.metrics.jsonl: timestamped CPU, load, memory, and process samples.- Session state persists in
session_state.jsonand can be restored with/resume.
Edge Constraints and Safety
- Context loading accepts only text/code files.
- Large or over-budget file loads are truncated with explicit markers.
- Runtime automatically condenses context when token or memory pressure exceeds configured thresholds.
load_filetoken budgets are clamped against remaining prompt budget at runtime.