𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Intelligent Image and Video Compression: Communicating Pictures

✍ Scribed by David Bull, Fan Zhang


Publisher
Academic Press
Year
2021
Tongue
English
Leaves
610
Edition
2
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Intelligent Image and Video Compression: Communicating Pictures, Second Edition explains the requirements, analysis, design and application of a modern video coding system. It draws on the authors’ extensive academic and professional experience in this field to deliver a text that is algorithmically rigorous yet accessible, relevant to modern standards and practical. It builds on a thorough grounding in mathematical foundations and visual perception to demonstrate how modern image and video compression methods can be designed to meet the rate-quality performance levels demanded by today's applications and users, in the context of prevailing network constraints.

✦ Table of Contents


Front Cover
Intelligent Image and Video Compression
Copyright
Contents
List of figures
List of tables
List of algorithms
About the authors
Preface
1 Introduction
1.1 Communicating pictures: the need for compression
1.1.1 What is compression?
1.1.2 Why do we need compression?
Picture formats and bit rate requirements
Available bandwidth
1.2 Applications and drivers
1.2.1 Generic drivers
1.2.2 Application drivers and markets
Consumer video
Business, manufacturing, and automation
Security and surveillance
Healthcare
1.3 Requirements and trade-offs in a compression system
1.3.1 The benefits of a digital solution
1.3.2 Requirements
1.3.3 Trade-offs
1.4 The basics of compression
1.4.1 Still image encoding
1.4.2 Encoding video
1.4.3 Measuring visual quality
1.5 The need for standards
1.5.1 Some basic facts about standards
1.5.2 A brief history of video encoding standards
1.6 The creative continuum: an interdisciplinary approach
1.7 Summary
References
2 The human visual system
2.1 Principles and theories of human vision
Theories of vision
2.2 Acquisition: the human eye
2.2.1 Retinal tissue layers
The sclera
The ciliary body
The retina
The choroid
2.2.2 Optical processing
The cornea
The lens
The iris
The pupil
2.2.3 Retinal photoreceptors and their distribution
Rod cells
Cone cells
Macula
Fovea
Optic disc and nerve
2.2.4 Visual processing in the retina
2.3 The visual cortex
2.3.1 Opponent processes
2.3.2 Biased competition
2.3.3 Adaptation processes
2.3.4 V1 – the primary visual cortex
2.3.5 V2 – the prestriate cortex
2.3.6 Dorsal and ventral streams
2.3.7 Extrastriate areas
2.4 Visual fields and acuity
2.4.1 Field of view
2.4.2 Acuity
2.4.3 Light, luminance, and brightness
Radiant intensity and radiance
Luminance
Brightness
Luma
2.4.4 Light level adaptation
2.5 Color processing
2.5.1 Opponent theories of color
2.5.2 CIE 1931 chromaticity chart
2.6 Spatial processing
2.6.1 Just noticeable difference, contrast, and Weber's law
2.6.2 Frequency-dependent contrast sensitivity
2.6.3 Multiscale edges
2.6.4 Perception of textures
2.6.5 Shape and object recognition
2.6.6 The importance of phase information
2.7 Perception of scale and depth
2.7.1 Size or scale
2.7.2 Depth cues
2.7.3 Depth cues and 3D entertainment
2.8 Temporal and spatio-temporal response
2.8.1 Temporal CSF
2.8.2 Spatio-temporal CSF
2.8.3 Flicker and peripheral vision
2.9 Attention and eye movements
2.9.1 Saliency and attention
2.9.2 Eye movements
2.10 Visual masking
2.10.1 Texture masking
2.10.2 Edge masking
2.10.3 Temporal masking
2.11 A perceptual basis for image and video compression
References
3 Signal processing and information theory fundamentals
3.1 Signal and picture sampling
3.1.1 The sampling theorem
In one dimension
Extension to 2D
Extension to 3D
3.1.2 Multidimensional sampling lattices
3.2 Statistics of images
3.2.1 Histograms and distributions
Spatial and subband distributions
3.2.2 Mean values
3.2.3 Correlation in natural images
Spatial autocorrelation in natural images
Temporal autocorrelation in natural image sequences
3.3 Filtering and transforms
3.3.1 Discrete-time linear systems
Shift invariance
Linearity
3.3.2 Convolution
3.3.3 Linear filters
Extension to 2D
Separability
3.3.4 Filter frequency response
3.3.5 Examples of practical filters
LeGall wavelet analysis filters
Subpixel interpolation filters
3.3.6 Nonlinear filters
Rank order and median filters
Morphological filters
3.3.7 Linear transforms and the DFT
The discrete Fourier transform
The 2D DFT
The DFT and compression
3.4 Quantization
3.4.1 Basic theory of quantization
Uniform quantization
3.4.2 Adaptation to signal statistics
Deadzone quantizer
Lloyd Max quantizer
3.4.3 HVS weighting
3.4.4 Vector quantization
3.5 Linear prediction
3.5.1 Basic feedforward linear predictive coding
Predictor dynamic range
Linear predictive coding with quantization
3.5.2 Linear prediction with the predictor in the feedback loop
3.5.3 Wiener Hopf equations and the Wiener filter
3.6 Information and entropy
3.6.1 Self information
Independent events
3.6.2 Entropy
Entropy and first order entropy
3.6.3 Symbols and statistics
3.7 Machine learning
3.7.1 An overview of AI and machine learning
3.7.2 Neural networks and error backpropagation
The model of a neuron
Learning and the delta rule
Multilayer networks
Error backpropagation
3.7.3 Deep neural networks
Convolutional neural networks (CNNs)
Generative adversarial networks (GANs)
Variational autoencoders
The need for data
3.8 Summary
References
4 Digital picture formats and representations
4.1 Pixels, blocks, and pictures
4.1.1 Pixels, samples, or pels
Monochrome image
Color image
4.1.2 Moving pictures
4.1.3 Coding units and macroblocks
Macroblocks
Coding tree units
4.1.4 Picture types and groups of pictures
Frame types
Groups of pictures (GOPs)
4.2 Formats and aspect ratios
4.2.1 Aspect ratios
Field of view ratio
4.2.2 Displaying different formats
Pan and scan and Active Format Description
4.3 Picture scanning
Interlaced vs. progressive scanning
Problems caused by interlacing
4.3.1 Standards conversion
3:2 pull-down
4.4 Gamma correction
4.5 Color spaces and color transformations
4.5.1 Color descriptions and the HVS
Trichromacy theory
Color spaces
Color space transformations
Chromaticity diagrams
Color spaces for analog TV
Color spaces for digital formats
4.5.2 Subsampled color spaces
Chroma subsampling
4.5.3 Color sensing
Bayer filtering
Bayer demosaicing
4.6 Measuring and comparing picture quality
4.6.1 Compression ratio and bit rate
4.6.2 Objective distortion and quality metrics
Mean squared error (MSE)
Peak signal to noise ratio (PSNR)
PSNR for color images and for video
Mean absolute difference (MAD) and sum of absolute differences (SAD)
Sum of absolute transformed differences (SATD)
4.6.3 Subjective assessment
4.7 Rates and distortions
4.7.1 Rate-distortion characteristics
4.7.2 Rate-distortion optimization
4.7.3 Comparing video coding performance
4.8 Summary
References
5 Transforms for image and video coding
5.1 The principles of decorrelating transforms
5.1.1 The basic building blocks
5.1.2 Principal components and axis rotation
5.2 Unitary transforms
5.2.1 Basis functions and linear combinations
5.2.2 Orthogonality and normalization
5.2.3 Extension to 2D
5.3 Basic transforms
5.3.1 The Haar transform
5.3.2 The Walsh–Hadamard transform
5.3.3 So why not use the discrete Fourier transform?
5.3.4 Desirable properties of an image transform
5.4 Optimal transforms
5.4.1 Discarding coefficients
5.4.2 The Karhunen–Loeve transform (KLT)
5.4.3 The KLT in practice
5.5 Discrete cosine transform (DCT)
5.5.1 Derivation of the DCT
DCT derivation
5.5.2 DCT basis functions
5.5.3 Extension to 2D: Separability
5.5.4 Variants on sinusoidal transforms
5.6 Quantization of DCT coefficients
5.6.1 The basics of quantization
5.6.2 Perceptually optimized quantization matrices
5.7 Performance comparisons
5.7.1 DCT vs. DFT revisited
5.7.2 Comparison of transforms
5.7.3 Rate-distortion performance of the DCT
5.8 DCT implementation
5.8.1 Choice of transform block size
DCT complexity
5.8.2 DCT complexity reduction
McGovern algorithm
5.8.3 Field vs. frame encoding for interlaced sequences
5.8.4 Integer transforms
5.8.5 DCT DEMO
5.9 JPEG
5.10 Summary
References
6 Filter-banks and wavelet compression
6.1 Introduction to multiscale processing
6.1.1 The short-time Fourier transform and the Gabor transform
6.1.2 What is a wavelet?
The continuous wavelet transform (CWT)
The discrete wavelet transform (DWT)
6.1.3 Wavelet and filter-bank properties
6.2 Perfect reconstruction filter-banks
6.2.1 Filter and decomposition requirements
6.2.2 The 1D filter-bank structure
Intuitive development of the two-channel filter-bank
6.3 Multirate filtering
6.3.1 Upsampling
6.3.2 Downsampling
6.3.3 System transfer function
6.3.4 Perfect reconstruction
6.3.5 Spectral effects of the two-channel decomposition
6.4 Useful filters and filter-banks
6.4.1 Quadrature mirror filters
Aliasing elimination
Amplitude distortion
Practical QMFs
6.4.2 Wavelet filters
LeGall 5/3 filters
Daubechies 9/7 filters
6.4.3 Multistage (multiscale) decompositions
An alternative view of multistage decomposition
6.4.4 Separability and extension to 2D
6.4.5 Finite-length sequences, edge artifacts, and boundary extension
6.4.6 Wavelet compression performance
6.5 Coefficient quantization and bit allocation
6.5.1 Bit allocation and zonal coding
6.5.2 Hierarchical coding
6.6 JPEG2000
6.6.1 Overview
6.6.2 Architecture – bit planes and scalable coding
6.6.3 Coding performance
6.6.4 Region of interest coding
6.6.5 Benefits and status
6.7 Summary
References
7 Lossless compression methods
7.1 Motivation for lossless image compression
7.1.1 Applications
7.1.2 Approaches
7.1.3 Dictionary methods
7.2 Symbol encoding
7.2.1 A generic model for lossless compression
7.2.2 Entropy, efficiency, and redundancy
7.2.3 Prefix codes and unique decodability
7.3 Huffman coding
7.3.1 The basic algorithm
7.4 Symbol formation and encoding
7.4.1 Dealing with sparse matrices
7.4.2 Symbol encoding in JPEG
7.5 Golomb coding
7.5.1 Unary codes
7.5.2 Golomb and Golomb–Rice codes
7.5.3 Exponential Golomb codes
7.6 Arithmetic coding
7.6.1 The basic arithmetic encoding algorithm
7.7 Performance comparisons
7.8 Summary
References
8 Coding moving pictures: motion prediction
8.1 Temporal correlation and exploiting temporal redundancy
8.1.1 Why motion estimation?
8.1.2 Projected motion and apparent motion
8.1.3 Understanding temporal correlation
8.1.4 How to form the prediction
8.1.5 Approaches to motion estimation
8.2 Motion models and motion estimation
8.2.1 Problem formulation
8.2.2 Affine and high order models
Node-based warping
8.2.3 Translation-only models
8.2.4 Pixel-recursive methods
8.2.5 Frequency domain motion estimation using phase correlation
Principles
Applications and performance
8.3 Block matching motion estimation (BMME)
8.3.1 Translational block matching
Motion vector orientation
Region of support – size of the search window
8.3.2 Matching criteria
The block distortion measure (BDM)
Absolute difference vs. squared difference measures
8.3.3 Full search algorithm
8.3.4 Properties of block motion fields and error surfaces
The block motion field
The effect of block size
The effect of search range
The motion residual error surface
Motion vector probabilities
8.3.5 Motion failure
8.3.6 Restricted and unrestricted vectors
8.4 Reduced-complexity motion estimation
8.4.1 Pixel grids and search grids
8.4.2 Complexity of full search
8.4.3 Reducing search complexity
8.4.4 2D logarithmic (TDL) search
8.5 Skip and merge modes
8.6 Motion vector coding
8.6.1 Motion vector prediction
8.6.2 Entropy coding of motion vectors
8.7 Summary
References
9 The block-based hybrid video codec
9.1 The block-based hybrid model for video compression
9.1.1 Picture types and prediction modes
Prediction modes
Picture types and coding structures
9.1.2 Properties of the DFD signal
9.1.3 Operation of the video encoding loop
9.2 Intraframe prediction
9.2.1 Intra-prediction for small luminance blocks
9.2.2 Intra-prediction for larger blocks
9.3 Subpixel motion estimation
9.3.1 Subpixel matching
9.3.2 Interpolation methods
9.3.3 Performance
9.3.4 Interpolation-free methods
9.4 Multiple-reference frame motion estimation
9.4.1 Justification
9.4.2 Properties, complexity, and performance of MRF-ME
Properties
Performance and complexity
9.4.3 Reduced-complexity MRF-ME
9.4.4 The use of multiple reference frames in current standards
9.5 Variable block sizes for motion estimation
9.5.1 Influence of block size
9.5.2 Variable block sizes in practice
9.6 Variable-sized transforms
9.6.1 Integer transforms
9.6.2 DC coefficient transforms
9.7 In-loop deblocking operations
9.8 Summary
References
10 Measuring and managing picture quality
10.1 General considerations and influences
10.1.1 What do we want to assess?
10.1.2 Noise, distortion, and quality
10.1.3 Influences on perceived quality
Human visual perception
Viewing environment
Content type
Artifact types
10.2 Subjective testing
10.2.1 Justification
10.2.2 Test sequences and conditions
Test material
Activity or information levels
Test conditions
10.2.3 Choosing subjects
10.2.4 Testing environment
10.2.5 Testing methodology and recording of results
General principles of subjective testing
Double-stimulus methods
Single-stimulus methods
Triple-stimulus methods
Pair comparison methods
10.2.6 Statistical analysis and significance testing
Calculation of mean scores
Confidence interval
Screening of observers
10.3 Test datasets and how to use them
10.3.1 Databases
VQEG FRTV
LIVE
BVI-HD
Netflix public database
UHD subjective databases
Subjective databases based on crowdsourcing
Others
10.3.2 The relationship between mean opinion score and an objective metric
10.3.3 Evaluating metrics using public (or private) databases
Linear correlation
Rank order correlation
Outlier ratio
Prediction error
Significance test
10.4 Objective quality metrics
10.4.1 Why do we need quality metrics?
10.4.2 A characterization of PSNR
10.4.3 A perceptual basis for metric development
10.4.4 Perception-based image and video quality metrics
SSIM
MS-SSIM
VIF
VQM
MOVIE
VSNR
MAD and STMAD
PVM
VMAF
VDP and VDP-2
Reduced-complexity metrics and in-loop assessment
Comparing results
10.4.5 The future of metrics
10.5 Rate-distortion optimization
10.5.1 Classical rate-distortion theory
Distortion measures
The memoryless Gaussian source
10.5.2 Practical rate-distortion optimization
From source statistics to a parameterisable codec
RDO complexity
Lagrangian optimization
10.5.3 The influence of additional coding modes and parameters
Lagrangian multipliers revisited
RDO in H.264/AVC and H.265/HEVC reference encoders
10.5.4 From rate-distortion optimization to rate-quality optimization
10.6 Rate control
10.6.1 Buffering and HRD
10.6.2 Rate control in practice
Buffer model
Complexity estimation
Rate-quantization model
β–³QP limiter
QP initialization
GOP bit allocation
Coding unit bit allocation
10.6.3 Regions of interest and rate control
10.7 Summary
References
11 Communicating pictures: delivery across networks
11.1 The operating environment
11.1.1 Characteristics of modern networks
The IP network
The wireless network edge
11.1.2 Transmission types
Downloads and streaming
Interactive communication
Unicast transmission
Multicast or broadcast transmission
11.1.3 Operating constraints
11.1.4 Error characteristics
Types of errors
Test data
Types of encoding
11.1.5 The challenges and a solution framework
11.2 The effects of loss
11.2.1 Synchronization failure
The effect of a single bit error
11.2.2 Header loss
11.2.3 Spatial error propagation
11.2.4 Temporal error propagation
11.3 Mitigating the effect of bitstream errors
11.3.1 Video is not the same as data!
11.3.2 Error-resilient solutions
11.4 Transport layer solutions
11.4.1 Automatic repeat request (ARQ)
Advantages
Problems
Delay-constrained retransmission
11.4.2 FEC channel coding
Erasure codes
Cross-packet FEC
Unequal error protection and data partitioning
Rateless codes
11.4.3 Hybrid ARQ (HARQ)
11.4.4 Packetization strategies
11.5 Application layer solutions
11.5.1 Network abstraction
11.5.2 The influence of frame type
I-frames, P-frames, and B-Frames
Intra-refresh
Reference picture selection
Periodic reference frames
11.5.3 Synchronization codewords
11.5.4 Reversible VLC
11.5.5 Slice structuring
Flexible macroblock ordering (FMO)
Redundant slices
11.5.6 Error tracking
11.5.7 Redundant motion vectors
11.6 Cross-layer solutions
11.6.1 Link adaptation
11.7 Inherently robust coding strategies
11.7.1 Error-resilient entropy coding (EREC)
Principle of operation
11.8 Error concealment
11.8.1 Detecting missing information
11.8.2 Spatial error concealment (SEC)
11.8.3 Temporal error concealment (TEC)
Temporal copying (TEC_TC)
Motion-compensated temporal replacement (TEC_MCTR)
11.8.4 Hybrid methods with mode selection
11.9 Congestion management
11.9.1 HTTP adaptive streaming (HAS)
11.9.2 Scalable video encoding
11.9.3 Multiple description coding (MDC)
11.10 Summary
References
12 Video coding standards and formats
12.1 The need for and role of standards
12.1.1 The focus of video standardization
12.1.2 The standardization process
12.1.3 Intellectual property and licensing
12.2 H.120
12.2.1 Brief history
12.2.2 Primary features
12.3 H.261
12.3.1 Brief history
12.3.2 Picture types and primary features
Macroblock, GOB, and frame format
Coder control
12.4 MPEG-2/DVB
12.4.1 Brief history
12.4.2 Picture types and primary features
12.4.3 MPEG-2 profiles and levels
12.5 H.263
12.5.1 Brief history
12.5.2 Picture types and primary features
Macroblock, GOB, and frame format
Primary features of H.263
H.263 extensions (H.263+ and H.263++)
12.6 MPEG-4
12.6.1 Brief history
12.6.2 Picture types and primary features
Coding framework
MPEG-4 part 2 advanced simple profile (ASP)
12.7 H.264/AVC
12.7.1 Brief history
12.7.2 Primary features
12.7.3 Network abstraction and bitstream syntax
12.7.4 Pictures and partitions
Picture types
Slices and slice groups
Blocks
12.7.5 The video coding layer
Intra-coding
Inter-coding
Deblocking operations
Variable-length coding
Coder control
12.7.6 Profiles and levels
12.7.7 Performance
12.7.8 Scalable extensions
12.7.9 Multiview extensions
12.8 H.265/HEVC
12.8.1 Brief background
12.8.2 Primary features
12.8.3 Network abstraction and high-level syntax
NAL units
Slice structures
Parameter sets
Reference picture sets and reference picture lists
12.8.4 Pictures and partitions
Coding tree units (CTUs) and coding tree blocks (CTBs)
Quadtree CTU structure
Prediction units (PUs)
Transform units (TUs)
Random access points (RAPs) and clean random access (CRA) pictures
12.8.5 The video coding layer (VCL)
Intra-coding
Inter-coding
Transforms in HEVC
Quantization in HEVC
Coefficient scanning
Context and significance
Entropy coding
In-loop filters
Coding tools for screen content, extended range, and 3D videos
12.8.6 Profiles and levels
Main profile
Main 10 profile
Main still picture profile
Levels
12.8.7 Extensions
12.8.8 Performance
12.9 H.266/VVC
12.9.1 Brief background
12.9.2 Primary features
12.9.3 High-level syntax
NAL units
Parameter sets
12.9.4 Picture partitioning
12.9.5 Intra-prediction
Extended intra-coding modes
Mode-dependent intra-smoothing
Multiple reference lines
12.9.6 Inter-prediction
Affine motion inter-prediction
Adaptive motion vector resolution
Combined inter- and intra-prediction mode
12.9.7 Transformation and quantization
Larger maximum transform block size
Multiple transform selection
Quantization in VVC
12.9.8 Entropy coding
12.9.9 In-loop filters
12.9.10 Coding tools for 360-degree video
Horizontal wrap around motion compensation
12.9.11 Profiles, tiers, and levels
12.9.12 Performance gains for VVC over recent standards
12.10 The alliance for open media (AOM)
12.10.1 VP9 and VP10
12.10.2 AV1
12.11 Other standardized and proprietary codecs
12.11.1 VC-1
12.11.2 Dirac or VC-2
12.11.3 RealVideo
12.12 Codec comparisons
12.13 Summary
References
13 Communicating pictures – the future
13.1 The motivation: more immersive experiences
13.2 New formats and extended video parameter spaces
13.2.1 Influences
13.2.2 Spatial resolution
Why spatial detail is important
UHDTV and ITU-R BT.2020
Spatial resolution and visual quality
Compression performance
13.2.3 Temporal resolution
Why rendition of motion is important
Frame rates and shutter angles – static and dynamic resolutions
Frame rates and visual quality
Compression methods and performance
13.2.4 Dynamic range
Why dynamic range is important
High dynamic range formats
Coding tools for HDR content
13.2.5 360-Degree video
What is 360-degree video?
Compression of 360-degree video
13.2.6 Parameter interactions and the creative continuum
13.3 Intelligent video compression
13.3.1 Challenges for compression: understanding content and context
13.3.2 Parametric video compression
13.3.3 Context-based video compression
13.4 Deep video compression
13.4.1 The need for data – training datasets for compression
Coverage, generalization, and bias
Data synthesis and augmentation
Training datasets for compression
13.4.2 Deep optimization of compression tools
Transforms and quantization
Intra-prediction
Motion prediction
Entropy coding
Postprocessing and loop filtering
Deep resampling and the ViSTRA architecture
Perceptual loss functions
The complexity issues of deep video compression
13.4.3 End-to-end architectures for deep image compression
13.5 Summary
References
A Glossary of terms
B Tutorial problems
Chapter 1: Introduction
Chapter 2: The human visual system
Chapter 3: Signal processing and information theory fundamentals
Chapter 4: Digital picture formats and representations
Chapter 5: Transforms for image and video coding
Chapter 6: Filter-banks and wavelet compression
Chapter 7: Lossless compression methods
Chapter 8: Coding moving pictures: motion prediction
Chapter 9: The block-based hybrid video codec
Chapter 10: Measuring and managing picture quality
Chapter 11: Communicating pictures: delivery across networks
Chapter 12: Video coding standards and formats
Chapter 13: Communicating pictures – the future
Index
Back Cover


πŸ“œ SIMILAR VOLUMES


Intelligent Image and Video Compression:
✍ David Bull, Fan Zhang πŸ“‚ Library πŸ“… 2021 πŸ› Academic Press 🌐 English

<p><i>Intelligent Image and Video Compression: Communicating Pictures, Second Edition</i> explains the requirements, analysis, design and application of a modern video coding system. It draws on the authors’ extensive academic and professional experience in this field to deliver a text that is algor

Communicating pictures : a course in ima
✍ David R Bull πŸ“‚ Library πŸ“… 2014 πŸ› Elsevier Science, Academic Press 🌐 English

<p><i>Communicating Pictures</i> starts with a unique historical perspective of the role of images in communications and then builds on this to explain the applications and requirements of a modern video coding system. It draws on the authors extensive academic and professional experience of signal

Wavelet Image and Video Compression
✍ Pankaj N. Topiwala (auth.), Pankaj N. Topiwala (eds.) πŸ“‚ Library πŸ“… 2002 πŸ› Springer US 🌐 English

<p>An exciting new development has taken place in the digital era that has captured the imagination and talent of researchers around the globe - <em>wavelet image compression</em>. This technology has deep roots in theories of vision, and promises performance improvements over all other compression

Wavelet Image and Video Compression
✍ Pankaj N. Topiwala πŸ“‚ Library πŸ“… 1998 πŸ› Springer 🌐 English

An exciting new development has taken place in the digital era that has captured the imagination and talent of researchers around the globe - wavelet image compression. This technology has deep roots in theories of vision, and promises performance improvements over all other compression methods, suc