Skip to main content
A banner showing a microphone that produces audio waves that look like they're in 3d space Note: This guide was published on April 22nd 2026 and was last updated on April 23rd 2026. Proximity voice chat in multiplayer games has become a genre-defining feature. Hearing a friend’s voice fade as they disappear around a corner, or listening to a teammate call out from across the map, makes a multiplayer game feel alive in a way that flat voice chat can’t. In this guide, you will learn how to combine the Discord Social SDK’s voice chat with Unity’s 3D audio system to build proximity voice chat for a multiplayer game. Discord handles everything about the voice call itself, and your game just needs to put the audio in the right place. While this guide focuses on Unity, the same concepts apply to any engine. You intercept the decoded audio from Discord and route it to your engine’s 3D audio system. This guide assumes you have already integrated the Discord Social SDK into a multiplayer Unity project. If you need to get set up with the Social SDK, follow the Unity getting started guide first.

Why Discord Voice With the Social SDK

Building voice chat from scratch is an incredibly difficult problem in games. Noise suppression, echo cancellation, and codec optimization are deep technical challenges. Discord has spent years battle-testing all of this infrastructure at scale. When you use the Discord Social SDK for voice, your game’s voice chat becomes powered by Discord. What your game needs to do is take the audio for each player and position it in 3D space. That’s where Unity comes in. By intercepting the audio that Discord would normally play through its default output, you can route it to per-player AudioSource components in your Unity scene instead. Unity handles all the spatial math: volume falloff based on distance, stereo panning based on direction, and any additional audio effects you want to layer on. The result is that players get Discord-quality voice that sounds like it’s coming from other players in the game world. No separate voice app needed. No complex audio networking code. Just Discord and Unity doing what each does best.

How It Works

Before diving into any code, it helps to understand the full architecture at a conceptual level. The proximity voice chat (spatial audio) pipeline has five stages.

1. Players Join a Lobby

Everything starts with a lobby in the Discord Social SDK. When players connect to a multiplayer session, they also join a shared Discord lobby managed by the Social SDK. The lobby tracks who is in the session and provides the foundation for the voice call.

2. Starting a Voice Call with Audio Callbacks

The Social SDK lets you start a voice call for a lobby, but here is the key difference from a normal voice call: instead of calling Client::StartCall and letting Discord handle playback through the user’s default audio device, you call Client::StartCallWithAudioCallbacks. This function intercepts the audio pipeline and gives you a callback that fires every time Discord has decoded audio ready for a player.

3. Intercepting the Audio

When Discord receives and decodes voice audio from a remote player, it calls the callback in Client::StartCallWithAudioCallbacks. The callback in this function hands you the raw PCM audio data along with the user ID of the speaker. It also gives you an outShouldMute flag. Setting this to true tells Discord not to play the audio through its normal output and allows you to control where it gets played.

4. Routing Audio to a GameObject

Instead of letting Discord play the audio, you route the raw audio stream (PCM data) to a per-player AudioSource that lives on each player’s GameObject in your Unity scene. Each remote player has their own AudioSource positioned at their character’s location.

5. Unity Handles the Spatial Audio

With the AudioSource configured for full 3D spatial blending (spatialBlend = 1f), Unity automatically handles everything else. As players move around the scene, voices get louder when they are close, quieter when they are far, and pan left or right based on direction relative to the listener. Unity handles it all for you.

Prerequisites

Before starting with the implementation in this guide, you should have:
  • The Discord Social SDK integrated into a Unity project, with a working lobby that players can create and join. If you haven’t done this yet, follow the Unity getting started guide and the Managing Lobbies guide first.
  • A multiplayer game where remote players can join and move around in 2D or 3D space. The specific networking library you use doesn’t matter as long as player spawning and movement is handled.
To utilize this communication feature, you must enable Client::GetDefaultCommunicationScopes in your OAuth Scope configuration. See the OAuth Scopes Core Concepts Guide for more details.

Players Join the Lobby

When a remote player joins the lobby, two things need to happen: your networking layer spawns their GameObject in the scene, and you register them in a dictionary that maps their Discord user ID to their VoiceAudioSource component (defined later in this guide). This dictionary is what lets you send each player’s audio to the correct GameObject. Declare the dictionary at the top of your class handling Social SDK callbacks:
private Dictionary<ulong, VoiceAudioSource> voiceSources = new Dictionary<ulong, VoiceAudioSource>();
Subscribe to Client::SetLobbyMemberAddedCallback to spawn the remote player and register them:
client.SetLobbyMemberAddedCallback((lobbyId, userId) =>
{
	// This is where you will spawn a remote player with your multiplayer framework
    GameObject playerObject = Instantiate(remotePlayerPrefab);
    VoiceAudioSource voiceSource = playerObject.GetComponentInChildren<VoiceAudioSource>();
    if (voiceSource != null)
    {
        voiceSources[userId] = voiceSource;
    }
});
Only register remote players, not the local player. The local player’s voice is captured by Discord and transmitted to others.
For the purpose of this guide, all of the player and voice setup is tied to a player joining a Social SDK lobby. In a production game you will likely tie your player’s lifecycle and voice to your own multiplayer session management system instead.

Setting Up the Voice Call

When a player joins a lobby, start a voice call using Client::StartCallWithAudioCallbacks. Provide two callbacks: one for received audio (OnVoiceAudioReceived defined below), which you will use for spatial positioning, and one for outgoing audio, which you can leave empty since Discord handles microphone capture.
activeCall = client.StartCallWithAudioCallbacks(currentLobbyId, OnVoiceAudioReceived,
    (data, samplesPerChannel, sampleRate, channels) => { });

if (activeCall != null)
{
    activeCall.SetVADThreshold(false, -80f);
}
The call returns an activeCall object you can use to configure voice settings. Voice Activity Detection (VAD) is how Discord determines whether a player is speaking or silent. Audio below the threshold is suppressed rather than transmitted. Call::SetVADThreshold with a value of -80f sets a low threshold for voice detection (VAD). A low threshold like this allows players to whisper and still be heard. Tuning this value lower to -100f will allow all audio to come through but you may hear keyboard clicks and other noise. Raising it or removing this call will use a standard threshold set for regular volume speech.

Intercepting Audio Per Player

With the voice call active, Discord will fire your OnVoiceAudioReceived callback every time it has decoded audio from a remote player. This is where you intercept Discord’s default playback and redirect audio to the correct AudioSource per player.
private void OnVoiceAudioReceived(ulong userId, System.IntPtr data, ulong samplesPerChannel, int sampleRate, ulong channels, ref bool outShouldMute)
{
    outShouldMute = true;

    if (voiceSources.TryGetValue(userId, out VoiceAudioSource voiceSource))
    {
        voiceSource.FeedSamples(data, samplesPerChannel, channels);
    }
}
The callback provides a few useful things. The userId of the player’s audio which you use to pass the audio to the correct player object in your scene. data, samplesPerChannel, sampleRate, and channels, all define the raw audio data which you’ll send to an AudioSource. Setting outShouldMute = true is important: it tells Discord to skip playing this audio to the default audio device for the player. Instead, you look up that player’s VoiceAudioSource component from a dictionary keyed by user ID and feed the raw samples directly to it.
outShouldMute lets you choose whether Discord should play the audio through its normal output or not. Setting it to true stops it from playing out of the default device and gives you full control to route the audio yourself, which is necessary for proximity voice chat. If you set it to false, Discord will play the audio through your players’ default audio device. In a full game it would make sense to set this to false while players are in a lobby so they can talk to each other and then true once they’re playing the game.

The VoiceAudioSource Component

The VoiceAudioSource component is where Discord’s audio pipeline and Unity’s spatial audio system meet. It lives on each remote player’s GameObject alongside a Unity AudioSource, receives raw PCM audio from the Discord callback, buffers it in a thread-safe ring buffer, and feeds it to Unity’s audio engine through a streaming AudioClip.
using System;
using System.Runtime.InteropServices;
using UnityEngine;

/// <summary>
/// Receives raw PCM audio from the Discord Social SDK and plays it through a
/// spatial AudioSource on the same GameObject.
///
/// Call FeedSamples() from the Discord UserAudioReceivedCallback.
/// Unity's audio thread drains the ring buffer via the streaming AudioClip callback.
///
/// Add this component to a remote player GameObject with an AudioSource.
/// </summary>
[RequireComponent(typeof(AudioSource))]
public class VoiceAudioSource : MonoBehaviour
{
    private const int SampleRate = 48000;
    private const int RingBufferSamples = SampleRate * 2; // 2-second ring buffer
    private const float PcmNormalizationFactor = 1 / 32768f; // scaling factor for int16 to float conversion
    private const int FrameSamples = 960; // 20ms at 48kHz
    private const int MaxChannels = 2;
    private float[] _ringBuffer;
    private readonly short[] _shortBuffer = new short[FrameSamples * MaxChannels];
    private int _writePosition;
    private int _readPosition;
    private readonly object _lock = new object();

    private AudioSource _audioSource;

    void Awake()
    {
        _ringBuffer = new float[RingBufferSamples];

        _audioSource = GetComponent<AudioSource>();
        // Streaming mono clip — OnPCMRead is called by Unity's audio thread to pull samples
        _audioSource.clip = AudioClip.Create("VoiceClip", SampleRate, 1, SampleRate, true, OnPCMRead);
        _audioSource.loop = true;
        _audioSource.spatialBlend = 1f; // full 3D positioning
        _audioSource.Play();
    }

    // Feed raw int16 PCM samples received from the Discord audio callback.
    public void FeedSamples(IntPtr data, ulong samplesPerChannel, ulong channels)
    {
        if (data == IntPtr.Zero || samplesPerChannel == 0) return;

        int channelCount = (int)channels;
        int totalSamples = (int)samplesPerChannel * channelCount;

        Marshal.Copy(data, _shortBuffer, 0, totalSamples);

        lock (_lock)
        {
            for (int i = 0; i < (int)samplesPerChannel; i++)
            {
                // Mix down to mono for spatial playback
                float mono = 0f;
                for (int c = 0; c < channelCount; c++)
                {
                    mono += _shortBuffer[i * channelCount + c] * PcmNormalizationFactor;
                }
                mono /= channelCount;

                _ringBuffer[_writePosition] = mono;
                _writePosition = (_writePosition + 1) % RingBufferSamples;
            }
        }
    }

    // Called by Unity's audio thread to fill the next block of samples
    private void OnPCMRead(float[] data)
    {
        lock (_lock)
        {
            int available = (_writePosition - _readPosition + RingBufferSamples) % RingBufferSamples;

            for (int i = 0; i < data.Length; i++)
            {
                if (available > 0)
                {
                    data[i] = _ringBuffer[_readPosition];
                    _readPosition = (_readPosition + 1) % RingBufferSamples;
                    available--;
                }
                else
                {
                    data[i] = 0f; // silence when buffer is empty
                }
            }
        }
    }
}
Here is what this component does:
  • Awake creates a streaming AudioClip that Unity’s audio thread pulls samples from continuously. The AudioSource is configured with spatialBlend = 1f for full 3D positioning, meaning Unity will apply distance-based volume attenuation and stereo panning based on where this GameObject is relative to the AudioListener in the scene.
  • FeedSamples is called from OnVoiceAudioReceived which is hooked up to Client::StartCallWithAudioCallbacks. It takes the raw audio data from Discord, converts it to floating point, mixes multi-channel audio down to mono, and writes the samples into a ring buffer. The lock ensures thread safety between the Social SDK and Unity’s audio thread.
  • OnPCMRead is called by Unity’s audio thread whenever it needs more samples to play. It drains available samples from the ring buffer, or outputs silence if the buffer is empty (for example, when the player is not speaking).
The ring buffer is what makes this work. It bridges two completely different threading models: Discord’s audio callback thread, which pushes data in, and Unity’s audio thread, which pulls data out. The two-second buffer provides plenty of headroom to absorb timing differences between the two systems.
The spatialBlend = 1f setting on the AudioSource is what makes the audio spatial. Unity handles all of the 3D math automatically based on the GameObject’s position relative to the AudioListener (typically on the camera or local player). You can further customize the spatial behavior by adjusting the AudioSource’s 3D Sound Settings in the Inspector, including min/max distance, rolloff curve, and spread.3D audio settings in the Unity AudioSource inspector

Putting It All Together

Here is the full architecture from start to finish:
  1. A player joins a lobby and Client::StartCallWithAudioCallbacks starts the Discord voice call with your custom audio callback.
  2. Decoded audio arrives for a remote player and the OnVoiceAudioReceived callback fires.
  3. Audio data is routed to that player’s VoiceAudioSource component via FeedSamples().
  4. VoiceAudioSource sends the audio into Unity’s audio system through its ring buffer and streaming AudioClip.
  5. As players move around the scene, Unity automatically adjusts volume and stereo panning based on distance and direction.
Discord handles voice networking. Unity handles spatial audio. The VoiceAudioSource component bridges the two. Now you have a working proximity voice solution combining the power of the Discord Social SDK and Unity’s 3D audio system. From here you can integrate this into an existing game, or create your own!

Next Steps

Ready to go deeper? These guides cover other ways to build games with Discord: