agentic tui for jetson & edge devices

Run LLMs on-device. Agentic tools. Fully offline.

A terminal-native agent that runs quantized models on NVIDIA Jetson and Linux devices via llama.cpp — shell execution, file operations, memory-aware context, and structured logging, all without a network connection.

Why not cloud? Keep data local for privacy, cut round-trip latency, and stay reliable even when the network is down.

Get started How it works

  ____                    _      _   
 / __ \ _ __   ___ _ __  (_) ___| |_ 
| |  | | '_ \ / _ \ '_ \ | |/ _ \ __|
| |__| | |_) |  __/ | | || |  __/ |_ 
 \____/| .__/ \___|_| |_|| |\___|\__|
       | |              _/ |         
       |_|             |__/

Loading Qwen2.5-7B.Q4_K_M.gguf...
Ready.

> Check the system load and summarize it.

[agent] Tool request: shell("uptime") → approved
The system has a load average of 1.45. It looks like it has been running normally without excessive CPU pressure.

tokens: 284/6144 | prompt<= 4915 | remaining: 4631

util: cpu 14.2% | mem 61.3% (9.8/16.0 GB) | tps 12.4 | pwr 8.1W [min 4.2W max 15.0W]

process

How it works

Run the setup wizard

Auto-detects your device, configures GPU layers, and recommends a model sized for your RAM.

Load a local model

Point to a local GGUF file or pull from Ollama. Inference stays on-device via llama-server.

Chat with tools

Run shell commands and file operations. Anything that mutates state requires operator approval.

features

Built for constrained hardware

Fully offline inference

Zero network dependency after initial model setup. Runs via llama.cpp.

4 agentic tools

shell, read_file, write_file, and load_file — LLM driven, gated by you.

Approval gates

State changes require explicit y/n approval with a compact execution preview.

Memory-aware context

Auto-condenses conversation when system RAM or token limits approach.

@file mentions & slash commands

@path loads files. Context-aware commands like /status, /condense, /load.

Structured session logging

JSONL event logs (prompts, tool calls) and metrics logs (CPU, memory, load).

use cases

Where it fits

NVIDIA Jetson devices

First-class support for Nano, Xavier NX, Orin Nano/NX, and AGX Orin.

Air-gapped & Edge deployments

Fully local operation. No cloud transport, no API keys, no latency. All inference and logs stay on-device.

Developer workstations

Lightweight terminal agent for local code assistance and shell automation.

compatibility

Supported hardware & stack

RuntimePython 3.10+ · Textual TUI

Inferencellama.cpp (llama-server)

Model formatGGUF (quantized)

Model sourcesLocal GGUF files · Ollama

Jetson targetsNano · Xavier NX · Orin Nano/NX · AGX Orin

Host OSLinux (CUDA or CPU-only)

deployment planner

What can your device run?

Device

16GB unified · 102.4GB/s · 25W TDP

Model

Qwen3 8B · Q4_K_M · 5GB on disk

Context length

KV cache: 147,456 bytes/tok × 4K= 0.60GB (FP16, per model config.json)

Decode speed

11.7 tok/s102.4GB/s × 60% eff ÷ 5GB × 95% ctx. Varies with thermal state and power mode.

Memory

8.6GB headroom5GB weights + 0.6GB KV + 1.5GB OS = 7.4GB of 16GB.

Capability

Strong agentic baseline — code and tool useBased on public benchmarks for Qwen3 8B. Validate on your workload.

get started

pip install, run the wizard, start chatting.

$pip install open-jet

View on GitHub How it works