llm-here — one shared tool for "which LLM is reachable from this host"¶
Status: Design note. Not built. Consolidation proposal for when the triplicated LLM-provider-detection code across three sibling projects becomes a drag on maintenance.
The problem: three implementations of the same answer¶
As of research/grid phase 5, three separate codebases implement "detect available LLM tooling on this host, decide which to use, run a prompt through it":
| Lives in | Language | File | Scope |
|---|---|---|---|
| caloron-noether | Python | stages/phases/_llm.py | 4 CLIs + 3 APIs, concatenated into stage impls |
| agentspec | Python | src/agentspec/resolver/resolver.py | 7 runtimes + Vertex routing |
| noether-grid | Rust | crates/noether-engine/src/llm/cli_provider.rs | 4 CLIs (phase 5), ported from caloron's lessons |
All three answer variations of:
- "Is
claude/gemini/cursor-agent/opencodeon PATH?" - "Do I have
ANTHROPIC_API_KEY/OPENAI_API_KEY/ …?" - "If both, which do I prefer?"
- "How do I invoke the chosen one with a prompt and a timeout?"
- "How do I avoid stalling inside a sandboxed
$HOME?"
They've already drifted. caloron discovered the 25 s-under-Nix-30 s timeout cap before noether-grid did; noether-grid picked it up in phase 5d as a backport. agentspec knows about Vertex routing; the other two don't. The pattern will repeat.
The proposed tool¶
A single-purpose binary llm-here that both Python and Rust call as a subprocess. Reads / writes JSON on stdout/stdin — language-agnostic.
Commands¶
# 1. Detect: what's reachable on this host?
llm-here detect
# stdout (JSON):
# {
# "providers": [
# { "id": "claude-cli", "kind": "cli", "binary": "/usr/local/bin/claude",
# "provider": "anthropic", "model_default": "claude-desktop" },
# { "id": "openai-api", "kind": "api", "env": "OPENAI_API_KEY",
# "provider": "openai", "model_default": "gpt-4o" },
# ...
# ]
# }
# 2. Dispatch: run a prompt through a named provider.
llm-here run --provider claude-cli --timeout 25 <<EOF
Tell me a joke about Rust.
EOF
# stdout (JSON):
# { "ok": true, "text": "...", "provider_used": "claude-cli",
# "duration_ms": 1834, "error": null }
# 3. Auto-chain: try providers in fallback order, return first success.
llm-here run --auto <<EOF
...
EOF
# → same JSON shape, "provider_used" tells you which one won.
What it knows¶
Single source of truth for:
- Binary names per CLI (
claude,gemini,cursor-agent,opencode). - Argv templates per CLI (flag order, system-prompt flag, dangerous-permissions flag).
- Endpoint URLs per API provider (Anthropic, OpenAI, Gemini, Mistral, Vertex).
- Env-var names for API keys (
ANTHROPIC_API_KEY, etc.). - Sandbox detection: honours
NOETHER_LLM_SKIP_CLI/CALORON_LLM_SKIP_CLI/AGENTSPEC_LLM_SKIP_CLI(matches any of the three so callers can keep their existing env conventions). - Timeout policy: defaults to 25 s; callers override with
--timeout. - Fallback order: the caloron-settled chain (Claude CLI > Gemini CLI > Cursor CLI > OpenCode > Anthropic API > OpenAI API > Gemini API > mock).
What it deliberately doesn't do¶
- No state. Each invocation is independent. No caching of previous responses, no conversation history, no session tokens.
- No cost accounting. Callers (broker, agentspec runner, caloron phases) own their own cost ledger;
llm-herejust runs prompts. - No streaming. Single prompt → single completion. Streaming belongs in whatever consumes
llm-here's output, not inllm-here. - No agent-loop semantics. No tool-use, no multi-turn history. Anyone who wants that either layers it on top or stays on their own direct integration.
How each project would use it¶
caloron-noether¶
stages/phases/_llm.py shrinks from 243 lines to ~30:
import json, subprocess
def call_llm(prompt: str, timeout: int = 120) -> str | None:
try:
r = subprocess.run(
["llm-here", "run", "--auto", "--timeout", str(timeout)],
input=prompt, capture_output=True, text=True, timeout=timeout + 5,
)
except (subprocess.TimeoutExpired, FileNotFoundError):
return None
if r.returncode != 0:
return None
try:
out = json.loads(r.stdout)
except json.JSONDecodeError:
return None
return out.get("text") if out.get("ok") else None
Lessons caloron learned stay in llm-here. Stages concatenate 30 lines instead of 243. Tests still pass because the contract (prompt in, text out, or None) is unchanged.
agentspec¶
Resolver's PROVIDER_MAP + RUNTIME_BINARIES delegate capability detection to llm-here detect:
def _detect_runtimes() -> dict[str, bool]:
out = subprocess.check_output(["llm-here", "detect"], text=True)
return {p["id"]: True for p in json.loads(out)["providers"]}
Vertex AI routing (agentspec's unique piece) stays in the resolver — it's a runtime-selection concern, not a detection concern. llm-here answers the "what's reachable" question; agentspec still owns "what should this manifest run on".
noether-grid worker¶
probe_subscription_clis() becomes a single call:
fn probe_subscription_clis() -> Vec<LlmCapability> {
let out = Command::new("llm-here").arg("detect").output().ok()?;
let providers: Providers = serde_json::from_slice(&out.stdout).ok()?;
providers.into_iter()
.filter(|p| p.kind == "cli")
.map(|p| LlmCapability { /* map fields */ })
.collect()
}
And CliProvider::complete dispatches by llm-here run --provider <id> instead of open-coding the subprocess. The engine's LlmProvider trait stays; only the concrete impls move out.
Rough implementation cost¶
Ballpark:
- New crate:
noether-llm-here(Rust, static binary, ~500 LOC covering all 4 CLIs + 4 API providers + the JSON surface). - Replace caloron's
_llm.pybody with the shim above (~20 min + existing test fixtures keep passing). - Replace agentspec's detection with
llm-here detect(~30 min, careful about Vertex routing staying put). - Replace noether-engine's
cli_provider.rsimpls with subprocess calls (~40 min; providers.rs auto-detect chain gets simpler, not harder). - Total: one focused day, deletes ~1,200 LOC across the three repos, eliminates the known drift sources.
When to actually do this¶
Not yet. Triggers that would change that:
- A fifth project (probably the next noether-adjacent thing) needs the same detection logic. That's the
shame on methreshold. - A feature crosses sides — e.g. Vertex routing would be useful in caloron's stages — and re-implementing it a second time costs more than extraction.
- A bug bites twice — the same drift-induced bug has to be fixed once in each repo. Unification would have prevented the second fix.
None of these are true yet. Phase 5d backported caloron's lessons into Rust and that's bought us parity for now. llm-here is the right structural answer when one of these triggers fires; until then the triplication is cheaper than the consolidation work.
Packaging when we do build it¶
- Ships from the
noetherworkspace as a fourth binary alongsidenoether,noether-scheduler,noether-grid-broker,noether-grid-worker. Crates.io publish is free. - Binary is small (~5 MB) — both caloron and agentspec can vendor it per-machine or pull from the noether GitHub release.
- Versioning: the JSON-wire format is what downstream callers depend on. Semver the JSON schema, not the binary. Additive changes (new provider in the list) are minor bumps; removing a field is a major.
- Feature parity test: the extraction commit includes a cross-repo test that sends one prompt through each of
llm-here, caloron's old_llm.py, and agentspec's resolver, asserts identical stdout contents for a mock provider. Catches the "we broke something during consolidation" class of regression.
Alternatives considered and rejected¶
- Rust crate linked from caloron + agentspec via PyO3 bindings. Ties all three projects to the Rust build toolchain. Caloron and agentspec benefit from being pure Python for ops reasons. A subprocess boundary is looser coupling.
- Git submodule of a shared Python lib. Only helps Python callers. noether-grid is Rust; the worker would still need its own implementation.
- Move everything into
noether-engine::llmand have the Python projects spawnnoether run .... Wrong layer —noether-engineis a composition engine, not a generic "call an LLM" tool. Would bloat noether-engine's dependency graph with caller-specific concerns.
Meta¶
This doc exists because I almost rebuilt _llm.py in Rust from scratch before noticing caloron already had it. It's a reminder to look in the other projects first, and a placeholder the next agent (or future me) can point at when the drift becomes expensive.