# HyperFrames + Pocket TTS video upgrade

Status: `added_as_safe_video_workflow_note`
Skill installed locally: `hyperframes-pocket-tts-video`
Official Hermes skill installed locally: `hyperframes` from `official/creative/hyperframes`
Public actions: `closed_until_human_approval`

## What just got added

We added a new Hermes/Jimsky skill for the Sonic-Forage video lane:

> **HyperFrames + Pocket TTS Video Workflow**

This gives Afterparty Forge a cleaner way to make and explain videos:

- **HyperFrames** turns videos into HTML/CSS/JS compositions that agents can edit like web pages.
- **Pocket TTS** gives those videos swappable local voices.
- **FFmpeg** muxes the final video/audio and exports platform variants.
- **Hermes/Jimsky** can write the composition, generate the voices, build drafts, verify files, and prepare manual-upload packs.

The important change: videos become **remixable source code**, not one-off exports.

## Launch explanation

Short version for a video or Space:

> We just added a video-native layer. HyperFrames lets the agent write videos like web pages — HTML, CSS, timing, animation, and media assets in git. Pocket TTS gives the same pipeline swappable local voices, so the narrator can become Alba, Javert, Cosette, or a different character without rebuilding the visuals. That means Afterparty Forge videos are no longer one-off exports; they become remixable source code.

## What Hermes can do with the new skill

Hermes can now help with:

1. Scaffolding HyperFrames-style video projects when the Node toolchain is ready.
2. Writing timed HTML scenes for launch videos, Shorts, explainers, and proof-hub walkthroughs.
3. Turning Sonic-Forage docs, launch copy, or proof cards into a narrated video structure.
4. Generating Pocket TTS narration locally.
5. Changing the narrator voice per scene.
6. Creating multi-speaker scripts: host voice, entity voice, DJ voice, announcer voice, outro voice.
7. Replacing a video voice track without rebuilding the whole visual layer.
8. Exporting Telegram, Discord, X, Shorts, and YouTube-ready variants.
9. Preparing captions/transcripts and manual-upload checklists.
10. Keeping posting, streaming, uploads, dataset-public release, and money actions closed until explicit approval.

## Current local verification

HyperFrames / render stack:

- `npx hyperframes --version` returned `0.4.45`.
- Local Node is `v20.19.2`.
- HyperFrames docs recommend Node `>=22`, so full HyperFrames rendering should be treated as **not fully verified** until Node is upgraded or a Node 22 project runtime is used.
- FFmpeg is installed and available.

Pocket TTS stack:

- Pocket TTS project exists locally.
- OpenAI-compatible Pocket TTS server was started on `127.0.0.1:49112`.
- Health endpoint returned healthy.
- Available server voices verified:
  - `alba`
  - `azelma`
  - `cosette`
  - `eponine`
  - `fantine`
  - `javert`
  - `jean`
  - `marius`

Generated voice proof:

- MP3: `docs/assets/audio/hyperframes-pocket-tts/hyperframes_skill_alba.mp3`
- OGG/Telegram voice version: `docs/assets/audio/hyperframes-pocket-tts/hyperframes_skill_alba.ogg`
- Voice used: `alba`
- Approx duration: 18 seconds

## Voice swapping pattern

Scene manifest example:

```json
[
  {
    "scene": "cold_open",
    "voice": "alba",
    "text": "We missed the party, so we built the afterparty."
  },
  {
    "scene": "proof_stack",
    "voice": "javert",
    "text": "The proof hub is live, the repo is public, and the gates are closed."
  },
  {
    "scene": "builder_invite",
    "voice": "cosette",
    "text": "Come test, fork, break, remix, and shape it with us."
  }
]
```

Each scene gets its own generated voice clip. The clips are placed sequentially on the timeline and muxed into the video.

## Pocket TTS request pattern

Verified minimal request:

```python
import json, urllib.request

payload = json.dumps({
    "input": "We missed the party, so we built the afterparty.",
    "voice": "alba"
}).encode()

req = urllib.request.Request(
    "http://127.0.0.1:49112/v1/audio/speech",
    data=payload,
    headers={"Content-Type": "application/json"},
)

with urllib.request.urlopen(req, timeout=60) as resp:
    open("voice.mp3", "wb").write(resp.read())
```

## Official Hermes HyperFrames skill alignment

The official optional Hermes skill from the docs is now installed as `hyperframes`.

Install source:

```bash
hermes skills install official/creative/hyperframes
```

Official workflow rules now adopted for this video lane:

- **HTML is the source of truth for video.**
- Define visual identity before writing composition HTML; avoid generic defaults like `#333`, `#3b82f6`, or `Roboto` unless they are explicitly in the design system.
- Build the **hero frame** first, then add GSAP animation.
- Use `npx hyperframes lint`, `npx hyperframes validate`, and `npx hyperframes inspect` before render.
- Use `npx hyperframes render --quality draft --output draft.mp4` for drafts and high quality only for final local verification.
- Multi-scene pieces should use transitions instead of hard jump cuts.
- Full render verification still waits for Node `>=22`; the current machine is Node `v20.19.2`.

## HyperFrames composition pattern

Minimal shape:

```html
<div id="root" data-composition-id="afterparty" data-start="0" data-width="1920" data-height="1080">
  <h1 class="clip" data-start="0" data-duration="4" data-track-index="1">
    We missed the party. So we built the afterparty.
  </h1>

  <audio class="clip" data-start="0" data-duration="4" data-track-index="2" data-volume="1" src="assets/voice_alba.wav"></audio>
</div>
```

## Safety gates

This upgrade does **not** mean videos auto-publish.

Still closed without explicit human approval:

- Posting to X/Twitter, Discord, YouTube, Reddit, LinkedIn, etc.
- Scheduling or starting an X Space.
- Starting livestreams.
- Uploading or publishing videos.
- Making private Hugging Face datasets public.
- Spending money or starting paid GPU/API jobs.
- Creating payment links or invoices.
- Claiming endorsements, affiliations, revenue, users, or traction.

## Next practical use

Best next video increment:

1. Pick one video target: X teaser, 60-second proof hub walkthrough, or YouTube intro.
2. Choose 2–3 Pocket TTS voices.
3. Write a scene manifest.
4. Generate voice clips.
5. Create HyperFrames or PIL/FFmpeg visuals depending on Node 22 readiness.
6. Export a draft MP4.
7. Verify locally.
8. Ask for manual upload/post approval separately.