MindExpander × OmniVoice × Inner Station

A voice-native digital entity framework.

Not just a voice clone. Not a tokenizer. Inner Station is a three-lane OmniVoice architecture for turning direction, inner thought, and final spoken identity into one living MindExpander voice system.

MindExpander
Output Voice
InstructorDirector / guide / task framing lane.
Dream WhisperInner monologue and reasoning texture.
OmniVoiceThe active synthesis and training framework.
HF DatasetPrivate training media, public-safe contract.
Architecture

Three lanes. One voice entity.

OmniVoice, not VoxCPM
DATASET_ID 0

Instructor lane

The clear director voice: task framing, guidance, radio-host control, and scene-setting.

DATASET_ID 1

Thinking whisper

The inner station: dream reasoning, soft cognition, liminal monologue, and reflective texture.

DATASET_ID 2

MindExpander output

The final public voice identity: the lane that carries the actual MindExpander sound and presence.

Training contract

Simple rows. Weird machine.

What the repo represents

This public page documents the project and architecture. The heavy/raw media stays private. The canonical model identity is OmniVoice_MindExpander_FULL_TRAIN, and the active dataset is an OmniVoice-generated three-lane voice/persona package.

  • Clone-friendly JSONL paths
  • Durations for filtering
  • Lane IDs for role-aware training
  • Final manifests built after all audio exists
{
  "audio": "mindexpander_output_full/audio/mindbot_000000_mindexpander_output.wav",
  "text": "spoken transcript for the MindExpander output voice",
  "duration": 3.5,
  "dataset_id": 2
}
Status

Cooking on Modal. Backed by receipts.

3000Instructor WAV target
3000Thinking whisper WAV target
3000MindExpander output WAV target
A100Current resume lane GPU
What gets fine-tuned?

The framework is the brand. The dataset is the fuel.

01 / BRAND

Inner Station

The public framework: name, architecture, lane contract, endpoint shape, demos, and docs. This is what people can understand and fork.

02 / TRAINING DATA

Protected WAV + JSONL

The actual fine-tune signal: audio paths, transcripts, durations, and dataset_id lane labels. This stays backed up and private unless approved.

03 / MODEL

OmniVoice checkpoint

First train the MindExpander output voice, then test role-aware multi-lane training for instructor, whisper, and output modes.

Roadmap

From dataset to living station.

Phase 01

Generate the full OmniVoice three-lane dataset and verify audio quality.

Phase 02

Upload the private Hugging Face dataset with clone-portable manifests and receipts.

Phase 03

Run the special unified three-lane training: all lanes together, balanced, with explicit Inner Station role tags so the model learns instructor, whisper, and output modes instead of blending them.

Phase 04

Wire the voice into realtime agents, radio shows, command centers, and old-timey cyber-oracle broadcasts.