Audio Systems & Optimization

Last updated: April 3, 2026


Voice System

Non-dialogue vocalizations for humans, mounts, and animals (grunts, pain sounds, commands, etc.) are not hardcoded into game logic. Instead, they’re managed through an external configuration file called voice_definitions.xml. This makes it straightforward to add or adjust voices without touching code. All declarations and definitions are written in XML and link directly to FMOD event paths.

File path:

..\Modules\Native\ModuleData\voice_definitions.xml

Voice Type Declarations

Before a voice can be used, it must be declared under the <voice_type_declarations> section. This registers the voice type so it can be referenced later.

<voice_type_declarations>
  <voice_type name="Grunt" />
</voice_type_declarations>

Here, the voice type Grunt is declared and can now be used in one or more definitions.

Voice Definitions

Once a type is declared, a voice definition links it to an FMOD event path and defines metadata such as pitch variation and usage restrictions.

<voice_definition
  name="male_01"
  sound_and_collision_info_class="human"
  only_for_npcs="true"
  min_pitch_multiplier="0.9"
  max_pitch_multiplier="1.1">

  <voice
    type="Grunt"
    path="event:/voice/combat/male/01/grunt"
    face_anim="grunt" />

</voice_definition>

Definition Attributes

Attribute Description
name Unique identifier for the voice definition
sound_and_collision_info_class Class of entity this voice belongs to (e.g. human, horse)
only_for_npcs If true, this voice cannot be assigned to the player
min_pitch_multiplier / max_pitch_multiplier Defines the pitch variation range

Voice Node Details

Inside a <voice_definition>, one or more <voice> entries can be added:

Attribute Description
type Must match a declared <voice_type>
path FMOD event path triggered for this voice
face_anim Facial animation tag associated with this voice (e.g. grunt)

Workflow Summary

  1. Add a new <voice_type> under <voice_type_declarations>
  2. Create a <voice_definition> with a name, class, restrictions, and pitch range
  3. Add one or more <voice> elements linking the definition to FMOD event paths
  4. Assign face_anim for animation syncing

Physics Materials

The physics system manages all material-to-material interactions in the game and determines which sound events are triggered when collisions occur. Instead of handling each case independently, all physical interactions — wood striking metal, stone colliding with a projectile, metal on metal — are processed in a single unified system. This keeps logic consistent, reduces duplication, and makes it easier to maintain and expand audio coverage as the game grows.

Material Interaction Handling

Each material type is defined in the physics system along with its interaction rules when colliding with other materials. For example:

Interaction Result
Wood vs. Metal A metallic clash with resonant overtones
Stone vs. Missile A sharp, brittle impact sound
Wood vs. Stone A dull, muted collision

When two materials collide, the system evaluates their properties and selects the correct FMOD event to play. This removes the need to hard-code every possible pairwise interaction and ensures audio scales automatically as new materials are introduced.

Special Case: Water Interactions

Water requires unique treatment compared to solid materials. Unlike other pairings, “object + water” doesn’t need a dedicated event for every possible object type. Regardless of what enters the water, the player primarily perceives it as a water sound — not as “body hitting water” or “stone hitting water.”

To simplify implementation, the system uses a single generic water event parameterized by the physical properties of the object entering the water:

Parameter Description
Volume Determines how large the displaced splash should sound
Mass Scales the weight and depth of the impact
Force (velocity/acceleration) Controls the intensity and sharpness of the splash

By combining these variables, the system can produce a wide range of splash sounds from the same event — from a small pebble skipping across the surface to a heavy boulder crashing into deep water.

Benefits of this approach:


Priority System

This system runs before FMOD’s built-in priority system, acting as a pre-filter to reduce the CPU cost of evaluating large numbers of events. It applies its own rules to decide which sounds are eligible, and only those are passed on to FMOD for playback. FMOD’s internal priority system still applies once events are submitted, so you can continue using it to refine prioritization among active events.

Core Mechanism

The system maintains an array whose maximum size corresponds to the maximum number of events it’s allowed to manage at once. Each candidate event is evaluated against a priority scheme based on:

Parameter Description
Length Duration of the event — used to determine whether short transient sounds or long events should be favored
Distance Spatial distance between emitter and listener. If this exceeds the event’s defined max distance in FMOD, the sound is culled immediately and never passed to FMOD
PriorityMultiplier A user-defined multiplier that adjusts the relative importance of an event compared to others

These parameters are stored in the SEDF (Sound Event Data File) for every event, ensuring consistent evaluation.

Distance Filtering

Events that fall outside their maximum audible distance are killed before FMOD ever processes them. This prevents FMOD from spending resources evaluating voices that can’t be heard.

Exceptions and Looping Events

Looping sounds need special handling:

To handle this, looped sounds can be virtualized instead of killed. Virtualization lets the event stop consuming CPU while still tracking its playback position, so when it returns to audible range it continues seamlessly from where it left off.

User Property Overrides

User Properties provide additional flexibility (see User Properties):

Benefits:


Battle Ambient Sound System

The Battle Ambient Sound System — abbreviated as BASS — is designed to optimize the playback of high-density sound effects, such as multiple sword clashes occurring in the same area. Instead of letting each individual sound play independently (which can quickly overwhelm both the mix and the CPU), BASS aggregates these events and replaces them with a single emitter that represents the group.

How It Works

Within a defined region, BASS monitors the number of sound events triggered over a given period. When the count exceeds a specified threshold, the system suppresses the individual sounds in that region and spawns a single group emitter event in their place.

This emitter event includes a parameter reflecting the density of sounds in the area. As the parameter increases, the mix shifts from sparse individual clashes to dense, chaotic battle noise. Players still perceive an intense soundscape, but without the overhead of dozens of overlapping one-shots.

Interaction with the Priority System

BASS operates before our proprietary engine’s priority system. Normally, the priority system decides which sounds to cull when many voices compete for playback. In this workflow, BASS preemptively reduces polyphony at the source level — only the grouped emitter remains active, meaning the priority system has fewer voices to manage. This prevents unnecessary CPU/DSP usage and ensures consistent sound quality during crowded combat.

BASS Group IDs

Use the battle_ambient_group_id user property (see User Properties) to assign an event to a BASS group.

ID Group
0 Infantry move
1 Cavalry move
2 Weapons fight
3 Shield fight
4 Missile volley
5 Human fight
6 Human victory
7 Horse vocal

Examples: - Adding a new weapon event → set battle_ambient_group_id to 2 - Adding a footstep event for a new mount type → set battle_ambient_group_id to 1

By assigning the correct group ID, BASS fires one unified event per group instead of hundreds of individual ones, significantly reducing active event count and saving system resources.

Benefits:


Performance and Optimization

In a CPU- and memory-intensive title like Mount & Blade II: Bannerlord, FMOD must be treated as part of the overall performance budget. Poorly managed audio banks, codecs, or event design can consume resources the game engine needs for everything else. Optimization is not a one-off task — it’s an ongoing discipline throughout the entire modding process.

Codec and Asset Strategy

Every decoded sample costs CPU, every oversized file costs memory and disk I/O, and every resample costs even more CPU. Codec strategy must be set deliberately, not left to chance.

FMOD lets you set a default codec for the entire audio build in the bank build settings (ensuring consistency across the project) and override codecs per asset when specific files demand different treatment.

FMOD Studio’s practical PC formats are:

Codec Characteristics
Vorbis Best compression, but higher decode cost. Best overall choice when CPU is available
ADPCM Larger than Vorbis but almost free to decode. Good fallback when Vorbis is too CPU-heavy
PCM No compression, no decode cost. Can quickly bloat memory — avoid unless strictly necessary

Load Types

Each asset has a Load Type setting that determines how it lives in memory:

Load Type Description
Compressed Loaded in compressed form and decompressed at playback. Less memory, slightly more CPU at play time
Decompressed Decompressed into PCM when banks load, not at playback. Not recommended unless you have a specific need (e.g. very low-spec devices)
Streaming Loaded in chunks from disk as needed. Good for long music and ambiences that aren’t time-sensitive

Streaming and Bank Management

On PC, long assets like music or ambiences can balloon RAM usage if kept fully in memory. Streaming solves this by pulling audio in chunks from disk, but every stream adds CPU overhead and disk I/O pressure. Stream too many files at once and you risk stutters and thread contention, especially on HDDs or low-end systems. Stream only what needs it.

Banks are containers for events, assets, and metadata — they determine what FMOD loads into RAM at runtime. Poorly structured banks leave unnecessary assets in memory, extend load times, and create streaming bottlenecks. Each event should live in a single bank; the same event should never appear in multiple banks unless there’s an unavoidable design requirement.

Event, DSP, and Channel Management

Every event generates one or more voices, each consuming CPU and memory. Stereo, quad, or 5.1 events multiply the number of channels being mixed. Combine this with DSP on each path and costs grow fast. If you don’t actively manage instances, channels, and DSP chains, your mix can choke the CPU long before your assets fill memory.

Controlling Event Instances and Channels

Although we already handle this with our custom priority system, you can set max instance limits on heavily triggered events if needed. For each event, define a sensible polyphony cap and choose a stealing rule:

Rule Behavior
Oldest Cuts off the instance that has been playing the longest
Newest Blocks the most recently triggered instance from starting
Quietest Stops the currently lowest-volume instance
Virtualize Suspends an instance into a no-CPU “virtual” state while keeping its timeline active. Useful for VO and music
None Prevents new instances from starting if the max is already reached
Furthest Stops the instance furthest from the listener

Keep channel counts realistic:

Every extra channel means extra mixing, panning, and potential DSP. Don’t waste channels on one-shots that will collapse to mono in most contexts anyway.

Managing DSP Load

DSP processing is the heaviest part of runtime audio performance. Effects like convolution reverb, FFT filters, and multiband dynamics consume significant CPU — often more than their audible benefit justifies. Apply DSP only when it provides clear value, such as shaping the overall mix in a way that can’t be pre-baked. For common one-shot events, keep them clean or embed processing directly into the audio file.

Channel count multiplies DSP cost: a stereo convolution reverb is twice the work of mono, and a quad instance multiplies it further. Use mono effects wherever possible.

Bus and Routing

Bus hierarchies should stay functional and minimal. Group major categories under dedicated sub-buses only where it serves a clear purpose. Use sends and sidechains intentionally — every extra routing hop adds to channel mixing and DSP load. Consolidate DSP chains at the bus level instead of stacking them across many individual events. Snapshots can then modify the properties of a single DSP effect across multiple states, significantly cutting down on total DSP instances required.

Using Baked Effects

If a sound always needs coloration (EQ, distortion, static reverb), bake it into the source file. Reserve runtime DSP for genuinely interactive processing: parameter-driven filters, ducking, pitch modulation, or environmental reverb zones that change dynamically. This prevents the slow creep of redundant DSP costs.

Importing Assets

Keep assets lean before they reach FMOD:

While FMOD encodes assets into its own formats at build time, the source format you import still affects workflow. Using .wav for everything bloats the project since .wav files are uncompressed. Switching to .flac for source assets can drastically reduce disk footprint while preserving lossless quality — this doesn’t affect the final encoded format (PCM, Vorbis, ADPCM, etc.), but makes asset management lighter and faster.

Testing for Edge Cases

Always test edge conditions: thousands of agents in combat, long campaign sessions, or rapid scene transitions. These scenarios reveal memory leaks, mismanaged banks, or runaway event counts before they become real-world problems reported by players.