द्वा सुपर्णा सयुजा सखाया समानं वृक्षं परिषस्वजाते।
तयोरन्यः पिप्पलं स्वाद्वत्त्यनश्नन्नन्यो अभिचाकशीति ॥
Second-stage runtime for Manas: Moneo—The Meek Mnemonic Majordomo!
A smart wearable audio recorder that captures a continuous, unbroken session to a single WAV file using the device's PSRAM as a streaming ring buffer, then transcribes the recording via a remote LLM API and saves the transcript alongside the audio. Both files can optionally be uploaded to a remote storage server for downstream retrieval and agentic integration.
- Continuous Recording: PSRAM ring buffer (~16s of audio) flushes incrementally to a single WAV file per session — no 1-minute segment splits, no data fragmentation.
- Datetime Filenames: NTP time is fetched on boot; recordings are named in Android media format (
YYYYMMDD_HHMMSS.wav). Falls back to uptime-based names if NTP is unavailable. - LLM Transcription: After a session ends, the WAV file is POST-ed to a configurable OpenAI-compatible Whisper endpoint. The returned transcript is written as a Markdown file at the same path (
.wav→.md). - NVS Configuration: All runtime parameters (WiFi credentials, API endpoints, pin assignments) live in ESP32 Non-Volatile Storage via the
Preferenceslibrary, organised by namespace. Seeding NVS is as simple as dropping aconfig.jsonon the SD card — updating existing config is just the same. - Multi-WiFi Support: Multiple WiFi networks stored in NVS (home, office, hotspot, etc.); the device tries each in order.
- Optional Cloud Sync: Both the WAV and Markdown files can be uploaded to a remote file server after transcription.
- Touch Control: Capacitive touch start/stop, same as Marci.
- Visual Feedback: LED on during recording, off at idle.
Same base as Marci — PSRAM is now actively required:
- Board: Seeed Studio XIAO ESP32S3 Sense
- Microphone: Onboard PDM microphone
- Storage: MicroSD card (FAT32 formatted)
- PSRAM: 8 MB OPI PSRAM (required — used for the audio ring buffer)
- Touch Sensor: Capacitive surface on configurable GPIO pin
- WiFi: 802.11 b/g/n for NTP sync, LLM transcription, and optional upload
[PDM Mic] → I2S → [PSRAM ring buffer] → (flush) → [SD: session.wav]
↓ (on session end)
[LLM API] → [SD: session.md]
↓ (optional)
[Remote file server]
- Board:
Seeed Studio XIAO ESP32S3 - PSRAM:
Tools → PSRAM → OPI PSRAM(required) - Libraries to install via Library Manager:
ArduinoJson(forconfig.jsonand LLM response parsing)
- Built-in (no install needed):
WiFi,SD,ESP_I2S,HTTPClient,WiFiClientSecure,Preferences,time.h - Serial Monitor: 115200 baud
On first flash the device has no NVS entries and falls back to compile-time defaults in Config.h. To provision runtime config — or to update it later — place a config.json file in the root of the MicroSD card before booting. During startup, if the file is detected, every value in it is written into the corresponding NVS namespace and key; the file is then renamed to config.bak to avoid repeating the process during the next boot.
Top-level keys are NVS namespace names. Each namespace holds its own structure:
Note
The following schema is for indicative purposes only. The exact structure is subject to change, due to implementation limitations/conveniences. In such scenario, please make sure to document the change.
{
"wifi": {
"HomeNetwork": "home-password",
"OfficeWiFi": "office-password",
"PhoneHotspot": "hotspot-password"
},
"llm": {
"host": "api.openai.com",
"port": 443,
"path": "/v1/audio/transcriptions",
"key": "sk-...",
"model": "whisper-1"
},
"upload": {
"host": "files.example.com",
"port": 443,
"path": "/recordings"
},
"device": {
"touch_pin": 2,
"touch_thresh": 40,
"auto_transcribe": true,
"auto_upload": false
}
}- Compile and upload
moneo.ino - On boot: device checks for PSRAM, initializes SD, loads
config.jsoninto NVS if present, initializes I2S, attempts NTP sync - Touch the pin to start recording (LED turns on)
- Talk — audio streams continuously to a single WAV file on the SD card
- Touch again to stop (LED turns off); transcription and optional upload begin
The SD card root holds the optional config file; recordings live in a flat /recordings/ directory:
/
├── config.json ← optional; consumed on boot and renamed to config.bak
└── recordings/
├── 20260528_091523.wav ← continuous audio for the session
├── 20260528_091523.md ← LLM transcript (written after session ends)
├── 20260528_143210.wav
└── 20260528_143210.md
Moneo targets OpenAI-compatible Whisper transcription endpoints:
- Method:
POST - Path:
/v1/audio/transcriptions(configurable via NVS) - Body:
multipart/form-datawith fieldsfile(WAV binary) andmodel - Response:
{ "text": "transcript here" }
Compatible self-hosted backends:
whisper.cppwith HTTP server modefaster-whisper+ HTTP wrapperLocalAI,Ollama(Whisper-compatible endpoints)
For HTTPS endpoints (e.g., api.openai.com), WiFiClientSecure is required.
Each top-level key in config.json maps directly to an NVS namespace. The namespaces and their keys are:
config.json structure |
NVS representation | Description |
|---|---|---|
{ "<SSID>": "<pass>", …} |
indexed entries | Provisioned SSID / password pairs |
| Key | Type | Default (Config.h) | Description |
|---|---|---|---|
host |
String | DEFAULT_LLM_HOST |
LLM API hostname |
port |
Int | DEFAULT_LLM_PORT |
LLM API port (443 for HTTPS) |
path |
String | DEFAULT_LLM_PATH |
API endpoint path |
key |
String | — | Bearer token / API key (never put in Config.h) |
model |
String | DEFAULT_LLM_MODEL |
Model name (e.g. whisper-1) |
| Key | Type | Default (Config.h) | Description |
|---|---|---|---|
host |
String | — | File server hostname (leave unset to disable upload) |
port |
Int | DEFAULT_UPLOAD_PORT |
File server port |
path |
String | DEFAULT_UPLOAD_PATH |
Base path on the file server |
| Key | Type | Default (Config.h) | Description |
|---|---|---|---|
touch_pin |
Int | DEFAULT_TOUCH_PIN |
Touch-sensitive GPIO number |
touch_thresh |
Int | DEFAULT_TOUCH_THRESHOLD |
Touch detection threshold |
auto_transcribe |
Bool | true |
Run transcription after each session |
auto_upload |
Bool | false |
Upload files after transcription |
All NVS key names are defined in Config.h under the NVS_KEY_* constants. Compile-time defaults in Config.h are the last-resort fallback when no NVS value exists for a key.
Built with ❤️