JPEG compression does not work very well on non-realistic images, such as cartoons or line drawings.
JPEG compression does not handle black-and-white (1-bit-per-pixel) images, nor does it handle motion picture compression.
The algorithm achieves much of its compression by exploiting known limitations of the human eye, notably the fact that small color variations are not perceived as well as small variations in brightness.
1. Transform the image into a suitable color space. This is a no-op for grayscale images. For color images, RGB information is transformed into a luminance / chrominance color space (e.g., YCbCr, YUV, etc). The luminance component is grayscale and the other two axes are color information.
2. (Optional) Down sample each component by averaging together groups of pixels. The luminance component is left at full resolution, while the chroma components are often reduced 2:1 horizontally and either 2:1 or 1:1 (no change) vertically. In JPEG, these alternatives are usually called 2h2v and 2h1v sampling, but you may also see the terms "411" and "422" sampling. This step immediately reduces the data volume by one-half or one-third. In numerical terms it is highly lossy, but for most images it has almost no impact on perceived quality, because of the eye's poorer resolution for chroma info. Note that down sampling is not applicable to grayscale data; this is one reason color images are more compressible than grayscale.
3. Group the pixel values for each component into 8.times.8 blocks. Transform each 8.times.8 block through a discrete cosine transform (DCT). The DCT is a relative of the Fourier transform and likewise gives a frequency map, with 8.times.8 components. Thus you now have numbers representing the average value in each block and successively higher-frequency changes within the block. The motivation for doing this is that you can now throw away high-frequency information without affecting low-frequency information. (The DCT transform itself is reversible except for round off error.)
4. In each block, divide each of the 64 frequency components by a separate "quantization coefficient" and round the results to integers. This is the fundamental information-losing step. The larger the quantization coefficients, the more data is discarded. Note that even the minimum possible quantization coefficient, 1, loses some info, because the exact DCT outputs are typically not integers. Higher frequencies are always quantized less accurately (given larger coefficients) than lower, since they are less visible to the eye. Also, the luminance data is typically quantized more accurately than the chroma data, by using separate 64-element quantization tables.
5. Encode the reduced coefficients using either Huffman or arithmetic coding.
6. Tack on appropriate headers, etc., and output the result. In normal "interchange" JPEG file, all of the compression parameters are included in the headers so that the decompressor can reverse the process. These parameters include the quantization tables and the Huffman coding tables.
This oversimplified case reveals two of the most difficult problems in motion compensation: 1) determining if an image is stationary; and 2) determining how and what portion of an image to extract for the portion of the image that moves.
However, when there is complex motion or new imagery, these error coding schemes may perform poorly, and the error signal may be harder to encode than the original signal.
For complex scenes and / or low bit rates, a number of visual artifacts may appear as a result of signal distortion from a compression system.
The primary visual artifacts affecting current image compression systems are blocking effects and intermittent distortions, often near object boundaries, often called mosquito noise.
Other artifacts include ripple, contouring, and loss of resolution.
Blocking effects are produced because adjacent blocks in an image are processed independently and the resulting independent distortion from block to block causes a lack of continuity between neighboring blocks.
In addition, block-type contouring, which is a special case of blocking effect, often results in instances when the intensity of an image is slowly changing.
Block DCT applications are not effective at representing sharp edges.
Accordingly, there is considerable distortion at sharp edges: the reconstructed edges are not as sharp as normal and the adjacent regions are not as uniform as they should be.
Mosquito noise is especially evident in images containing text or computer graphics.
Thus, most reproduced images may be adversely affected by blocking effects and edge distortion.
Whether the set-top box is functioning as a encoder or a decoder both analog TVs and DTVs are adversely affected by the block DCT coding introduced image artifacts.