docent.studio

Markdown
for video.
Built for LLMs.

A file format for video. You write JSON. An engine renders it. A grammar of cognitive moves makes any film composable. A contract keeps it from being slop.

the format

One JSON file. Any film you can think of.

Most video tools start with the canvas — a timeline, layers, keyframes. Docent starts with the moves any piece of thought can make. You declare them. The engine renders. The same grammar handles a code review, a brand quarterly, a poetry close reading, a sci-fi short, a quarterly earnings walk, a documentary.

The format is the surface an LLM can author against without the output drifting into slop. Every scene declares its schema. Every scene declares its depth rules. A film that doesn't say anything doesn't ship.

the grammar

0 moves. The vocabulary of video.

Connection. Time. Flow. Comparison. Categorization. Experience. Narrative. Seven clusters of cognition — enough to compose any film. Adding a thirtieth move is a major version bump. That restraint is the format.

connection 4 moves how the parts relate.
time 2 moves order along an axis.
flow 3 moves systems in motion.
comparison 7 moves options against each other.
categorization 1 moves boundaries in the open.
experience 2 moves a human moving through.
narrative 10 moves the rhetorical move.
the cascade

JSON in. Narrated MP4 out.

The spec is the source. The render is the artifact. Same path whether the film is a PR review, a documentary, or a brand opener.

validate schema + per-scene
preprocess directives + modifiers
resolve style preset extends
tts kokoro / openai / elevenlabs
render remotion + audio overlay
films/openclaw-ar.json
out/openclaw-ar.mp4
try it

Write your first film. In an hour.

Three packages: the framework, the default implementation, the binary. Write a spec at films/<id>.json. Run docent build <id>. Watch. Ship.

$ bun add @bjelser/cli @bjelser/core @bjelser/kit