tutorialTutorials & how-to guides9 min read

Eleven v3 audio tags: directing expressive AI voices

Learn how to use ElevenLabs v3 audio tags — emotion, delivery, accent, and persona cues in square brackets — to direct expressive AI voice performances on OmniArt.

OmniArt Team
Eleven v3 audio tags: directing expressive AI voices

Most text-to-speech tools read a script the same way every time: flat, measured, and slightly robotic. Eleven v3 is different. It understands the emotional texture of your script and, with audio tags, you can give it explicit direction — the same way a voice director would cue a performer before a take.

Audio tags are short words or phrases in square brackets embedded directly in your script. They tell the model how to deliver the next line: whisper it, shout it, colour it with a British accent, or break it mid-sentence with a sigh. This guide covers the full tag vocabulary available on OmniArt, how to write multi-character scripts that use them, and how to decide when Eleven v3 is the right model for the job.

What are audio tags?

Audio tags are inline direction cues placed inside square brackets — [whispers], [excited], [British accent] — at the point in the script where you want the delivery to change. Eleven v3 parses them as instructions rather than words to speak, and adjusts tone, pace, and affect accordingly.

The key distinction from older TTS is that v3 interprets context. It doesn't just apply a blanket filter: it weighs the tag against the surrounding sentence, so [sighs] before "I suppose you're right" produces a different result than [sighs] before "Fine, let's go." That context-sensitivity is what makes tagged scripts feel directed rather than processed.

Tip

Place the tag immediately before the phrase it should affect. A tag at the start of a paragraph governs delivery until the next tag or a natural tonal reset.

The audio tag vocabulary

The table below organises every major tag category with examples. These are the cues Eleven v3 responds to reliably on OmniArt.

Emotion tags

TagEffect
[excited]Raised energy, faster pace, brighter tone
[sad]Slower, lower, more subdued delivery
[angry]Clipped, forceful, raised volume
[nervous]Slightly uneven pace, quieter overall
[happy]Warm, upbeat, open resonance
[tired]Slower, flatter, lower effort
[afraid]Tense, restrained, reduced breath
[disgusted]Flat affect with slight disdain
[surprised]Higher pitch onset, shorter phrase

Delivery tags

TagEffect
[whispers]Breathy, low-volume, intimate
[shouting]High volume, projected, wide resonance
[pause]Natural beat or break inserted here
[slowly]Stretched tempo without pitch change
[fast]Compressed tempo, higher energy
[sighs]Audible exhale woven into the start of the phrase
[laughs]Adds a short natural laugh before or during the line
[crying]Broken, wet quality to the delivery

Character and persona tags

TagEffect
[pirate voice]Theatrical, growled, exaggerated cadence
[robot voice]Clipped, monotone, synthetic quality
[narrator]Authoritative, measured, documentary register
[announcer]Projected, formal, broadcast quality
[childlike]Higher pitch, shorter phrasing, playful

Accent tags

TagEffect
[British accent]Received Pronunciation quality
[Southern US accent]Warm, drawn-out vowels
[Australian accent]Rising-terminal intonation
[Irish accent]Melodic, distinctive vowel rounding
[New York accent]Clipped consonants, nasal midrange

Note

Accent tags layer over the base voice preset. Results vary by preset — some voices respond more strongly than others. Generate a short test line before committing to a full script.

Tag cheat sheet

PurposeExample tags
Emotion — positive[excited], [happy], [surprised]
Emotion — negative[sad], [angry], [tired], [afraid], [nervous]
Volume / projection[whispers], [shouting]
Tempo[slowly], [fast]
Natural sounds[sighs], [laughs], [crying], [pause]
Character register[pirate voice], [robot voice], [narrator], [announcer], [childlike]
Accent[British accent], [Southern US accent], [Australian accent], [Irish accent], [New York accent]

Writing a tagged script: two examples

Example 1 — emotional narration

This is a short opening for an audiobook chapter. The tags shift mood as the scene changes.

[narrator] The city had been quiet for three days.

[slowly] Not the quiet of peace — [pause] the quiet of waiting.

[tired] Maya poured her fourth cup of coffee and stared at the map pinned to the wall.

[whispers] They had to be out there somewhere.

[sighs] She just needed one more lead.

The narrator tag sets a measured register from the start. [slowly] with a [pause] creates dramatic space. [tired] sags the delivery before [whispers] pulls it low and intimate. [sighs] adds a physical breath that makes the final line feel earned.

Example 2 — two-character dialogue

Eleven v3 can handle multi-speaker reads from a single prompt. Use character labels and delivery tags to distinguish each voice.

CAPTAIN (VOICE A): [excited] We found it. [pause] The actual coordinates — right where the old chart said they'd be.

FIRST MATE (VOICE B): [nervous] Sir, that chart is four hundred years old. Half of it is sea monsters drawn by someone who'd never left port.

CAPTAIN (VOICE A): [laughs] Exactly! [fast] Which means no one else thought it was worth following. Get the crew up.

FIRST MATE (VOICE B): [sighs] [slowly] Aye, captain.

Tip

For multi-character scripts, pick two voice presets with clearly different base registers — one deeper, one lighter — so the character distinction comes through even without visual speaker labels in the audio output.

How to use audio tags on OmniArt

  1. Go to Audio mode and select the Speech tab.
  2. Choose Eleven v3 from the model menu. It is available on STARTER tier and above.
  3. Select a voice preset. OmniArt offers 353 curated voices across speech models. Browse by gender and style — deeper, more authoritative presets work well for narration; brighter, mid-range presets respond well to strong emotion tags.
  4. Paste your tagged script into the prompt field. Eleven v3 accepts up to 5,000 characters per generation.
  5. Set the language to match your script.
  6. Generate and audition. If a tag is over-applied or under-applied, adjust its position, add another tag to reset delivery, or try a different voice preset.

Billing runs at 1 credit per started block of 50 characters. A 500-character script costs 10 credits; a 5,000-character script costs 100 credits. Partial 50-character blocks are rounded up.

Warning

OmniArt does not offer voice cloning, speed sliders, or pitch controls for Eleven v3. All delivery variation comes from the script text and audio tags.

When to use Eleven v3 vs other speech models

Three ElevenLabs models are available on OmniArt. Here is when to reach for each.

ScenarioBest modelReason
Emotionally varied performance — a character who laughs, cries, shoutsEleven v3Audio tags and context-awareness give the most expressive range
Stable multilingual narration (50+ languages)Eleven Multilingual v2Consistent, even delivery across languages; 10,000 chars per generation
Long scripts with fast turnaroundEleven Turbo v2.5Low latency; 40,000 chars per generation at 1 credit per 100 chars
Budget-conscious or FREE-tier generationMiniMax Speech 2.8 HD / TurboAvailable on the FREE tier; HD for finished quality, Turbo for drafts

A useful mental model: use v3 when the script calls for a performance and the delivery itself carries meaning. Use Multilingual v2 when the goal is clear narration that is easy to follow across many languages. Use Turbo v2.5 when you have a long, relatively neutral script and need results quickly.

See the dedicated model pages for full specs: Eleven v3, Eleven Multilingual v2, Eleven Turbo v2.5.

Common tagging mistakes to avoid

Over-tagging: adding a tag to every sentence flattens the variation. Emotion tags land harder when they arrive after a stretch of unmarked, natural delivery. Use them for peaks and transitions, not as a constant layer.

Contradictory tags: [shouting] immediately followed by [whispers] with no sentence between them can confuse the model. Leave a sentence of neutral delivery between strong contrasts.

Accent tags without a test: accent rendering depends on the base voice preset. Run a 50-character test line before applying an accent tag across a long script.

Tags mid-word: tags need to sit between complete words or punctuation, not inside a word. Incre[excited]dible will not parse correctly — write [excited] Incredible instead.

Use cases that benefit most

Audiobooks with multiple characters: the combination of voice presets and delivery tags lets you distinguish narrator from character and give each character a consistent emotional signature. See how to build a complete audio production in the MiniMax Speech voiceover guide for a comparable workflow.

Game dialogue and interactive fiction: short, punchy lines with strong tags — [afraid] Stay back!, [laughs] You call that a plan? — create believable NPCs without custom voice actors.

YouTube narration with emotional range: a documentary or explainer that moves between dramatic reveals, humorous asides, and quiet reflection benefits from delivery shifts. Tag the transitions and the pacing writes itself.

Dialogue-driven media and trailers: two or three character reads from a single generation, each distinguished by voice preset and tags, compress a dialogue scene into one workflow step.

Getting started on OmniArt

The fastest way to develop an ear for what v3 can do is to take a script you know well — a monologue, a short story opening, a few lines of game dialogue — and tag it twice: once with light tagging, once with aggressive delivery shifts. Generate both and compare. The difference between a lightly directed and a fully directed script is usually obvious within the first sentence.

Open Eleven v3 on OmniArt and paste your first tagged script. Start with the emotional narration example above, swap the voice preset, and see what changes. Once the tag vocabulary feels natural, the model becomes as responsive as a real recording session — without the studio.

For a broader look at every audio model available on OmniArt, including music and sound effects, see the complete audio workspace guide.

Ready to Create?

Start generating amazing content with AI

Get started free