Skip to Content
All memories

The Sound & The Story - Mastering Audio & Video in Multimedia

 — #Computer Graphics#CS Basics

Hello again, multimedia explorers! After unraveling the secrets of text, images, and graphics in our previous post, it's time to tune our ears and focus our eyes on two of the most dynamic and engaging elements in the multimedia toolkit: Audio and Video.

From the subtle background music that sets the mood in a game to the high-definition video that brings a documentary to life, sound and moving pictures are powerful storytellers. They can evoke emotions, explain complex ideas, and create truly immersive experiences. So, let's turn up the volume and roll the cameras as we delve into the fascinating realms of audio and video!

Part 1: The Rhythms of Reality - All About Audio

Audio is more than just noise; it's a carefully crafted element that can make or break a multimedia project.

1. The Science of Sound: Nature of Sound Waves

At its most basic, sound is a wave – a vibration that travels through a medium, like air.

  • How it works: When an object vibrates (e.g., a guitar string, your vocal cords), it causes nearby air molecules to vibrate. These vibrations propagate outwards as longitudinal waves, creating areas of compression (higher pressure) and rarefaction (lower pressure).
  • Key Characteristics of Sound Waves:
    • Amplitude: The intensity or "loudness" of the sound. Higher amplitude means a louder sound.
    • Frequency: The number of vibrations per second, measured in Hertz (Hz). Frequency determines the pitch of the sound (how high or low it is). Humans typically hear frequencies from 20 Hz to 20,000 Hz (20 kHz).
    • Wavelength: The distance between two consecutive points of compression or rarefaction.
    • Timbre (Tone Quality/Color): What makes a trumpet sound different from a violin, even when playing the same note at the same loudness. It's determined by the complex mix of the fundamental frequency and its overtones (harmonics).

2. Harmony and Cacophony: Musical Sound and Noise

  • Musical Sound: Characterized by regular, periodic vibrations. It has a discernible pitch and often a pleasing quality. Think of notes played by instruments or sung by a vocalist.
  • Noise: Characterized by irregular, random vibrations. It lacks a distinct pitch and can often be perceived as unpleasant or distracting, though it can also be used creatively (e.g., sound effects like a car crash or wind).
    • Can you think of examples where "noise" is intentionally used in multimedia to create a specific effect or atmosphere?

3. The Building Blocks of Melody: Tone and Note

  • Tone: A pure sound with a single frequency (a sine wave). Most sounds we hear are complex combinations of tones.
  • Note: In music, a note represents a specific pitch and duration. It's the fundamental unit of musical notation.

4. How We Hear: Psycho-acoustics and Decibels

Psycho-acoustics is the study of how humans perceive sound. Our perception isn't always a direct reflection of the physical properties of sound waves.

  • For example, our sensitivity to different frequencies varies. We are most sensitive to frequencies in the range of human speech.
  • Decibels (dB): A logarithmic unit used to express the ratio of two values of a physical quantity, often power or intensity. In acoustics, it's used to measure sound pressure level or loudness.
    • The decibel scale is logarithmic because human hearing perceives loudness logarithmically. A 10 dB increase is roughly perceived as a doubling of loudness.
    • 0 dB is the threshold of human hearing.
    • Whisper: ~30 dB
    • Normal conversation: ~60 dB
    • Rock concert: ~110-120 dB (can cause hearing damage with prolonged exposure)

5. Capturing Sound: Microphone A microphone is a transducer – it converts sound waves (acoustic energy) into electrical signals (electrical energy).

  • How it works (common types):
    • Dynamic Microphones: Use electromagnetic induction. A diaphragm attached to a coil of wire moves within a magnetic field when sound waves hit it, generating an electrical current. They are rugged and can handle high sound pressure levels (good for drums or loud vocals).
    • Condenser Microphones: Use a capacitor (condenser). A thin diaphragm and a backplate form a capacitor. Sound waves cause the diaphragm to vibrate, changing the capacitance, which in turn varies the voltage. They are generally more sensitive and provide a more detailed sound, often requiring external power (phantom power).
  • Polar Patterns: Describe a microphone's sensitivity to sound arriving from different directions (e.g., omnidirectional, cardioid, bidirectional).

6. Boosting the Signal: Amplifier (Amp) The electrical signals produced by a microphone are very weak. An amplifier increases the power (amplitude) of these signals so they can drive speakers or be recorded effectively.

7. Making Sound Heard: Speakers

Speakers are also transducers, doing the opposite of microphones: they convert electrical signals back into sound waves.

  • How it works (common dynamic speakers): An electrical audio signal is fed to a coil of wire (voice coil) attached to a cone-shaped diaphragm. The voice coil is placed in a magnetic field. The varying current in the coil creates a varying magnetic force, causing the coil and diaphragm to vibrate, producing sound waves.

8. Sound Goes Digital: Digital Audio Specifications

To use audio in multimedia, analog sound (continuous waves) must be converted into digital format (discrete numbers). This process is called Analog-to-Digital Conversion (ADC).

  • Key Specifications:
    • Sampling Rate: The number of times per second that the analog signal is measured (sampled). Measured in Hz or kHz. Higher sampling rates capture more detail, especially at higher frequencies.
      • CD quality: 44.1 kHz (44,100 samples per second)
      • Professional audio/video: 48 kHz, 96 kHz, or even higher.
    • Bit Depth (Sample Resolution): The number of bits used to represent each sample. Higher bit depth provides a greater dynamic range (the difference between the loudest and quietest sounds) and more accurate representation of the amplitude.
      • CD quality: 16-bit (65,536 possible amplitude levels)
      • Professional audio: 24-bit or 32-bit.
    • Channels: The number of individual audio streams.
      • Mono: 1 channel
      • Stereo: 2 channels (left and right)
      • Surround Sound: Multiple channels (e.g., 5.1, 7.1) for a more immersive experience.
    • Why do you think CDs use a 44.1 kHz sampling rate, a seemingly odd number? (Hint: It relates to early video technology!)

9. Creating Sounds from Scratch: Synthesizers

A synthesizer is an electronic musical instrument that generates audio signals through various methods of synthesis (e.g., subtractive, additive, FM, wavetable). They can create a vast range of sounds, from emulating traditional instruments to producing entirely new electronic textures.

10. Computers and Music Talking: Musical Instrument Digital Interface (MIDI)

MIDI is not audio. It's a communication protocol and digital interface that allows electronic musical instruments, computers, and other related devices to connect and communicate with each other.

  • What MIDI data contains: Information about musical notes (pitch, velocity/loudness, duration), control signals (like pitch bend, modulation), timing information, and more.
  • Advantages:
    • Small file sizes (stores instructions, not actual sound waves).
    • Easy to edit and manipulate musical performances (change notes, tempo, instruments).
    • Can control synthesizers, samplers, and even lighting systems.
  • To hear MIDI, you need a MIDI-compatible sound generator (synthesizer or sound card with MIDI capabilities) to interpret the MIDI messages and produce sound.

11. The Heart of PC Audio: Sound Card (Audio Interface)

A sound card (or audio interface) is an expansion card or integrated circuit that handles the input and output of audio signals to and from a computer.

  • Key Functions:
    • Analog-to-Digital Conversion (ADC) for recording.
    • Digital-to-Analog Conversion (DAC) for playback.
    • MIDI interface (often).
    • Connections for microphones, line-level sources, headphones, and speakers.
    • May include onboard processing (effects, mixing).

12. Polishing the Sound: Audio Processing Steps and Software

Raw audio often needs to be edited and processed to achieve the desired quality and effect.

  • Common Processing Steps:
    1. Editing: Cutting, copying, pasting, trimming audio clips.
    2. Mixing: Combining multiple audio tracks, adjusting levels, panning (stereo positioning).
    3. Normalization: Adjusting the overall volume to a standard level.
    4. Equalization (EQ): Adjusting the balance of different frequency components to enhance or correct the sound.
    5. Dynamics Processing:
      • Compression: Reducing the dynamic range (making loud parts quieter and/or quiet parts louder).
      • Limiting: Preventing signals from exceeding a certain level (a more extreme form of compression).
      • Gating: Eliminating sounds below a certain threshold (useful for removing background noise).
    6. Effects: Adding reverb, delay, chorus, flanger, distortion, etc., to create ambiance or special sonic characteristics.
    7. Noise Reduction: Removing unwanted hiss, hum, or background noise.
    8. Mastering: The final stage of polishing the mixed audio for distribution, ensuring optimal loudness and tonal balance across different playback systems.
  • Popular Software (Digital Audio Workstations - DAWs):
    • Audacity (free, open-source)
    • Adobe Audition
    • Pro Tools (industry standard for professional music production)
    • Logic Pro X (Mac only)
    • Ableton Live
    • FL Studio

13. Storing Your Sounds: Audio File Formats

  • Uncompressed:
    • .wav (Waveform Audio File Format): Standard for Windows PCs. Can store uncompressed (PCM) audio. High quality, large file size.
    • .aiff (Audio Interchange File Format): Standard for Mac. Similar to WAV.
  • Lossless Compressed: Reduces file size without discarding any audio data.
    • .flac (Free Lossless Audio Codec): Open-source, excellent quality.
    • .alac (Apple Lossless Audio Codec): Apple's lossless format.
  • Lossy Compressed: Reduces file size significantly by discarding some audio data (ideally, data that is least perceptible to human ears).
    • .mp3 (MPEG-1 Audio Layer III): Very popular, good balance of quality and file size for general use.
    • .aac (Advanced Audio Coding): Generally offers better quality than MP3 at similar bitrates. Used by Apple iTunes, YouTube.
    • .ogg (Ogg Vorbis): Open-source, patent-free.
    • .wma (Windows Media Audio): Microsoft's format.

Part 2: The Magic of Moving Pictures - All About Video

Video combines a sequence of still images (frames) with synchronized audio to create the illusion of motion and tell compelling stories.

1. The Illusion of Motion: Video Frames and Frame Rate

  • Video Frames: Individual still images that make up a video.
  • Frame Rate: The number of frames displayed per second (fps). Higher frame rates generally result in smoother, more realistic motion.
    • Cinema: Typically 24 fps
    • PAL TV (Europe, Asia, Australia): 25 fps
    • NTSC TV (North America, Japan): 29.97 fps (often rounded to 30 fps)
    • Modern digital video: Can be 30 fps, 50 fps, 60 fps, or even higher for slow-motion or gaming.
    • What do you think happens if the frame rate is too low?

2. Capturing the Action (Old School): Analog Video Camera

Before digital, analog cameras recorded video onto magnetic tape.

  • How it worked: Light from the scene was focused by the lens onto an image sensor (like a CCD or a vidicon tube). The sensor converted the light patterns into an electrical signal representing the brightness and color information for each scan line of the image. This analog signal was then recorded.

3. Sending the Picture: Analog Video Signal Formats

These define how the different components of a video signal (brightness, color) are combined and transmitted.

  • Composite Video: Combines all video information (luminance/brightness, chrominance/color, and synchronization signals) into a single signal. Simple, but can result in color bleeding and lower quality. Often uses an RCA connector (yellow).
  • S-Video (Separate Video): Transmits luminance (Y) and chrominance (C) on separate wires, providing better quality than composite. Uses a 4-pin mini-DIN connector.
  • Component Video: Transmits different components of the video signal on separate wires for the highest analog quality. Common configurations include YPbPr (for analog) and YCbCr (for digital, though often transmitted over analog component connections). Often uses three RCA connectors (red, green, blue, but these don't directly correspond to RGB colors).

4. Pictures on the Airwaves: Television Broadcasting Standards

Different regions of the world adopted different analog television standards, which were incompatible with each other.

  • NTSC (National Television System Committee): Used in North America, Japan, and parts of South America. 525 scan lines, ~30 fps (29.97).
  • PAL (Phase Alternating Line): Used in most of Europe, Asia, Australia, and parts of Africa and South America. 625 scan lines, 25 fps. Generally considered to have better color consistency than NTSC.
  • SECAM (Séquentiel Couleur à Mémoire - Sequential Color with Memory): Used in France, parts of Eastern Europe, and Africa. Similar to PAL in lines and frame rate but uses a different method for color encoding.

5. The Digital Revolution: Digital Video

Digital video represents video information as digital data (0s and 1s).

  • Advantages over Analog:
    • Higher quality, less susceptible to noise and degradation.
    • Easier to copy, edit, and transmit without loss of quality (perfect copies).
    • Allows for non-linear editing (NLE) – easily access and rearrange any part of the video.
    • Supports higher resolutions (HD, 4K, 8K).
    • Enables interactivity and integration with other digital media.

6. Setting the Rules: Digital Video Standards & Codecs

  • Standards (often from MPEG - Moving Picture Experts Group): Define file formats and compression methods.
    • MPEG-1: Used for Video CDs (VCDs), early digital video.
    • MPEG-2: Used for DVDs, digital television broadcasts (DVB, ATSC), and early Blu-ray.
    • MPEG-4 Part 2 (e.g., DivX, Xvid): Popular for internet video.
    • MPEG-4 Part 10 / AVC (Advanced Video Coding) / H.264: Extremely popular, used for Blu-ray, streaming services (Netflix, YouTube), HDTV broadcasts, and many digital cameras. Offers good quality at relatively low bitrates.
    • HEVC (High Efficiency Video Coding) / H.265: Successor to H.264. Offers roughly double the compression efficiency (similar quality at half the bitrate). Used for 4K/8K video, streaming.
    • AV1 (AOMedia Video 1): A newer, open, royalty-free codec designed for internet video, aiming for even better compression than HEVC.
  • Codec (Coder-Decoder): Software or hardware that compresses (codes) video for storage/transmission and decompresses (decodes) it for playback. The standards above often define the compression algorithms used by codecs.

7. Video on Your Computer: PC Video

  • Display/Graphics Card (GPU): Essential for processing and displaying video. Modern GPUs have dedicated hardware for video decoding and encoding, offloading this task from the CPU.
  • Video Playback Software: Applications like VLC Media Player, Windows Media Player, QuickTime Player, use codecs to play various video file formats.

8. Crafting the Story: Video Processing Steps and Software

Creating a finished video involves several stages:

  1. Pre-production: Planning, scripting, storyboarding, casting, location scouting.
  2. Production (Shooting): Capturing the raw video footage and audio.
  3. Post-production:
    • Ingesting/Capturing Footage: Transferring video from cameras to a computer.
    • Editing (Non-Linear Editing - NLE): Assembling clips, trimming, creating sequences, adding transitions.
    • Color Correction & Grading: Adjusting colors for consistency and to achieve a specific artistic look.
    • Audio Editing & Mixing: Cleaning up audio, adding music, sound effects, voiceovers.
    • Visual Effects (VFX): Adding computer-generated imagery, compositing.
    • Titles & Graphics: Creating and adding text overlays, lower thirds, motion graphics.
    • Rendering/Exporting: Converting the edited project into a final video file format for distribution.
  • Popular Software (Non-Linear Editors - NLEs):
    • Adobe Premiere Pro
    • DaVinci Resolve (excellent free version with powerful color grading)
    • Final Cut Pro (Mac only)
    • Avid Media Composer (often used in professional film/TV)
    • HitFilm Express (free, good for VFX)

9. Saving Your Movie: Video File Formats

Video file formats are essentially containers that can hold video data (compressed by a codec), audio data, and metadata (like subtitles).

  • .mp4 (MPEG-4 Part 14): Very common, widely supported. Often uses H.264 or HEVC for video and AAC for audio.
  • .mov (QuickTime File Format): Developed by Apple. Often used in professional video workflows. Can contain various codecs.
  • .avi (Audio Video Interleave): Older Microsoft format. Can contain various codecs, but sometimes has limitations with modern ones.
  • .wmv (Windows Media Video): Microsoft's format.
  • .mkv (Matroska Multimedia Container): Open standard, flexible container that can hold virtually any type of video, audio, and subtitle tracks. Popular for HD/4K content.
  • .flv (Flash Video) / .f4v: Older formats used for web video with Adobe Flash (now largely obsolete).
  • .webm: Open format designed for web video, often using VP9 or AV1 video codecs and Opus or Vorbis audio codecs.

And that's a wrap on our tour of the audio and video realms in multimedia! From the physics of sound waves to the complexities of digital video codecs, these two components bring unparalleled dynamism and emotional depth to our digital experiences.