Audio Processing: Converting Stereo to Mono, PCM 16-bit, and Resampling to 16kHz

In the world of audio processing, converting audio files to a specific format is a common task. Whether you’re preparing audio for machine learning models, optimizing for streaming, or ensuring compatibility with various devices, understanding how to manipulate audio files is crucial. In this blog post, we’ll walk through the steps to convert stereo audio to mono, convert it to PCM 16-bit little-endian format, and resample it to 16kHz. We’ll also provide a practical implementation using Node.js and the wav and sox-audio libraries.

Step-by-Step Guide

1. Convert Stereo to Mono

If the input audio is in stereo, the first step is to convert it to mono. This can be achieved by averaging the left and right channels:

\[x_{\text{mono}}(t) = \frac{\text{Left}(t) + \text{Right}(t)}{2}\]

2. Convert to PCM 16-bit Little-Endian

Next, we convert the mono audio to PCM 16-bit little-endian format. This involves scaling the audio samples to the 16-bit range and rounding them:

\[\text{PCM}_{\text{mono}}(t) = \text{round}(x_{\text{mono}}(t) \times 32767)\]

3. Resample to 16kHz

Finally, we resample the audio to a 16kHz sampling rate. This step ensures that the audio is compatible with systems that require a specific sampling rate:

\[y[n] = \text{PCM}_{\text{mono}}\left(\frac{n \cdot F_s}{F_d}\right)\]

Formula

\[y[n] = \text{round}\left(\left(\frac{\text{Left}\left(\frac{n \cdot F_s}{F_d}\right) + \text{Right}\left(\frac{n \cdot F_s}{F_d}\right)}{2}\right) \times 32767\right)\]

Explanation

Stereo to Mono: $\frac{\text{Left}(t) + \text{Right}(t)}{2}$
Resampling: $x\left(\frac{n \cdot F_s}{F_d}\right)$
PCM Conversion: $\text{round}(x(t) \times 32767)$

Implementation with Node.js, wav, and sox-audio

Now, let’s see how we can implement this process using Node.js and the wav and sox-audio libraries.

Prerequisites

Make sure you have Node.js installed on your machine. Then, install the necessary packages:

npm install wav sox-audio

Code Implementation

Here’s a function to process the audio file:

const fs = require('fs');
const wav = require('wav');
const SoxCommand = require('sox-audio');

function processAudio(inputPath, outputPath, callback) {
  const tempWavPath = 'temp_input.wav';
  const tempMonoPath = 'temp_mono.wav';

  // Convert input audio to WAV format using Sox
  const convertToWav = SoxCommand()
    .input(inputPath)
    .output(tempWavPath)
    .outputFileType('wav')
    .on('end', () => {
      console.log('Conversion to WAV finished.');
      processWav(tempWavPath, tempMonoPath, outputPath, callback);
    })
    .on('error', (err) => {
      console.error('Error converting to WAV:', err);
      callback(err);
    });

  convertToWav.run();
}

function processWav(inputPath, tempMonoPath, outputPath, callback) {
  // Read the input WAV file
  const reader = new wav.Reader();
  const writer = new wav.FileWriter(tempMonoPath, {
    channels: 1,
    sampleRate: 44100,
    bitDepth: 16
  });

  reader.on('format', (format) => {
    // Check if the audio is stereo
    if (format.channels === 2) {
      reader.on('data', (data) => {
        const monoData = Buffer.alloc(data.length / 2);
        for (let i = 0; i < data.length; i += 4) {
          const left = data.readInt16LE(i);
          const right = data.readInt16LE(i + 2);
          const mono = Math.round((left + right) / 2);
          monoData.writeInt16LE(mono, i / 2);
        }
        writer.write(monoData);
      });
    } else {
      reader.pipe(writer);
    }

    reader.on('end', () => {
      writer.end();

      // Resample to 16kHz using Sox
      const command = SoxCommand()
        .input(tempMonoPath)
        .output(outputPath)
        .outputSampleRate(16000)
        .outputEncoding('signed-integer', 16)
        .on('end', () => {
          console.log('Audio processing finished.');
          callback(null, outputPath);
        })
        .on('error', (err) => {
          console.error('Error processing audio:', err);
          callback(err);
        });

      command.run();
    });
  });

  fs.createReadStream(inputPath).pipe(reader);
}

Usage

To use the processAudio function, simply call it with the input and output file paths:

processAudio('input.mp3', 'output.wav', (err, outputPath) => {
  if (err) {
    console.error('Failed to process audio:', err);
  } else {
    console.log('Processed audio saved to:', outputPath);
  }
});

Conclusion

In this blog post, we’ve covered the steps to convert stereo audio to mono, convert it to PCM 16-bit little-endian format, and resample it to 16kHz. We’ve also provided a practical implementation using Node.js and the wav and sox-audio libraries. This process is essential for various applications, including audio preprocessing for machine learning, optimizing audio for streaming, and ensuring compatibility with different devices. Happy coding!

Apa itu Quorum

Multi-Agent AI Systems: The Future of Software Engineering