Chat
no model · memory only

Lab settings

Local model (WebLLM · WebGPU)
Memory works now with no model. Load one for a talking assistant (needs desktop Chrome/Edge). Tip: start with the smallest ⚡ model — the first load downloads it once (~0.9 GB) and it's cached in your browser, so it's instant every time after. Bigger models are better but download once-and-cache too.
Generation
0.40
512
8
Memory mode
Agent: the model fires neuron-db tools (recall / recall_value / remember / note…) in a loop, exactly like the MCP server — watch them in the memory pane. Best on 3B+ / reasoning models. Direct: a single deterministic recall per turn (most reliable on tiny models).
Harness behaviour
Appearance
This chat's memory
Reveal the system prompt