Last modified: Apr 16, 2026

Combine Video and Audio with Python FFmpeg

Merging video and audio is a common task. You might have a silent video clip and a separate audio track. Combining them creates a complete media file.

Python, with the FFmpeg tool, makes this process simple. This guide will show you how. We will cover installation, basic commands, and error handling.

What is FFmpeg?

FFmpeg is a powerful command-line tool. It handles multimedia data. You can use it to convert, stream, and edit audio and video.

It supports almost every format. This makes it incredibly versatile. We will use it from within a Python script.

Setting Up Your Environment

You need two things. First, install FFmpeg on your system. Second, install the Python subprocess module. It comes built-in with Python.

To install FFmpeg, visit the official website. Follow the instructions for your operating system. On many systems, you can use a package manager.

For example, on Ubuntu or Debian, you can run this command in your terminal.


sudo apt update && sudo apt install ffmpeg

On macOS, you can use Homebrew.


brew install ffmpeg

After installation, verify it works. Open a terminal and type:


ffmpeg -version

You should see version information. This confirms FFmpeg is ready.

The Basic FFmpeg Merge Command

The core command to merge files is straightforward. You specify an input video and an input audio file. Then you tell FFmpeg to copy the streams to an output file.

Here is the basic structure.


ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -c:a aac output.mp4

Let's break it down. The -i flag specifies an input file. We use it twice for video and audio.

The -c:v copy tells FFmpeg to copy the video stream without re-encoding. This is fast and preserves quality.

The -c:a aac sets the audio codec to AAC, a common format for MP4 files. Finally, output.mp4 is the target file.

Using Python to Run FFmpeg

You can run shell commands from Python. The subprocess module is perfect for this. It lets you run FFmpeg as if you were in the terminal.

Here is a simple Python function to combine video and audio.


import subprocess

def combine_video_audio(video_path, audio_path, output_path):
    """
    Combines a video file and an audio file into a single MP4 file.
    
    Args:
        video_path (str): Path to the input video file (e.g., 'my_video.mp4').
        audio_path (str): Path to the input audio file (e.g., 'my_audio.mp3').
        output_path (str): Path for the output video file (e.g., 'final_video.mp4').
    """
    # Construct the FFmpeg command
    command = [
        'ffmpeg',
        '-i', video_path,
        '-i', audio_path,
        '-c:v', 'copy',       # Copy the video stream
        '-c:a', 'aac',        # Encode audio to AAC
        '-shortest',          # Finish when the shortest input ends
        output_path
    ]
    
    # Run the command
    try:
        subprocess.run(command, check=True, capture_output=True, text=True)
        print(f"Success! Output saved to: {output_path}")
    except subprocess.CalledProcessError as e:
        print(f"An error occurred while processing:")
        print(f"STDOUT: {e.stdout}")
        print(f"STDERR: {e.stderr}")

# Example usage
combine_video_audio('silent_video.mp4', 'background_music.mp3', 'final_video.mp4')

The key function here is subprocess.run(). It executes our command list. The check=True argument makes Python throw an error if FFmpeg fails.

The capture_output=True captures any messages. This helps with debugging. We also added the -shortest flag. It ensures the output ends when the shortest input (video or audio) ends.

Handling Different Audio and Video Lengths

Your audio and video might not be the same length. This is common. FFmpeg provides flags to manage this.

We already used -shortest. Another useful flag is -map. It gives you precise control over which streams go to the output.

Here is a more explicit command using -map.


ffmpeg -i video.mp4 -i audio.mp3 -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 -shortest output.mp4

-map 0:v:0 means: from the first input (0), take the first video stream (v:0). -map 1:a:0 means: from the second input (1), take the first audio stream (a:0). This avoids confusion if files have multiple streams.

Common Errors and Solutions

You might encounter errors. Let's look at two common ones.

Error 1: "Invalid data found when processing input"

This usually means a corrupt file or wrong file path. Double-check your file paths in Python. Make sure the files exist.

Error 2: "Codec not supported"

Some containers, like MP4, have preferred codecs. For audio, AAC is a safe choice. If you get this error, try specifying a different audio codec, like libmp3lame for an MP3 in an AVI file.


ffmpeg -i video.avi -i audio.wav -c:v copy -c:a libmp3lame output.avi

Advanced: Processing Audio Before Merging

Sometimes you need to edit the audio first. You might want to normalize volume or trim silence. You can chain FFmpeg commands or use a Python audio library.

For more on audio manipulation, see our Python Audio Processing Guide for Beginners. It covers basic editing concepts.

After processing, you can merge the cleaned audio with your video using the same combine_video_audio function.

To explore the tools available, check out our guide on Python Audio Libraries: Play, Record, Process.

Practical Example: Adding a Voiceover

Let's walk through a real example. You have a screen recording (video.mp4) with system sounds. You recorded a separate voiceover (voice.mp3). You want the voiceover to replace the original audio.

The command is slightly different. You don't want to keep the original audio stream.


import subprocess

def replace_audio(video_path, audio_path, output_path):
    """
    Replaces the audio stream in a video file with a new audio file.
    """
    command = [
        'ffmpeg',
        '-i', video_path,          # Input video
        '-i', audio_path,          # Input new audio
        '-c:v', 'copy',            # Copy video
        '-c:a', 'aac',             # Encode new audio
        '-map', '0:v:0',           # Take video from first input
        '-map', '1:a:0',           # Take audio from second input
        '-shortest',
        output_path
    ]
    
    subprocess.run(command, check=True)
    print(f"Audio replaced. Output: {output_path}")

# Use the function
replace_audio('screen_recording.mp4', 'voiceover.mp3', 'final_tutorial.mp4')

This script maps only the new audio to the output. The original audio from the video file is ignored.

Conclusion

Combining video and audio with Python and FFmpeg is powerful. It automates a common multimedia task. The subprocess module bridges Python and the FFmpeg command line.

Start with the basic merge function. Use the -shortest flag to handle different lengths. Use -map for precise control. Always check your file paths and codecs if errors occur.

This skill is useful for content creators, developers, and hobbyists. You can build it into larger applications for batch processing or video editing pipelines.