Communications Engineering Room 911, Chungmu-Building 98 Kunja-Dong, Kwangjin-Ku Seoul, South Korea
E-mail: [email protected]
Hyung-Myung Kim
Korea Advanced Institute of Science and Technology (KAIST)
Department of Electrical Engineering 373-1 Kusong-Dong, Yusong-Gu Taejon 305-701 Korea
pictures in the transcoding system, which transforms the bitstream com- pressed at a bit rate, such as the HD bitstream, into another bit rate stream, for example, the SD bitstream. The transcoding is performed in spatial domain. In many applications such as the transcoder, the resolu- tion conversion is very important for changing the image size while the scaled image maintains high quality. The scaling process consists of two steps: fitting the original data with a continuous function, and resampling the function on a new sampling grid. We focus on the modification of the scaler kernel according to the relation between formats of the original and the resized image. In the modification, various formats defined in MPEG standards are considered. We show experimental results that demonstrate the effectiveness of the proposed interpolation method. The algorithm exhibits significant improvement in the minimization of informa- tion loss when compared with the conventional interpolation algorithms. ©2004 Society of Photo-Optical Instrumentation Engineers.
[DOI: 10.1117/1.1758732]
Subject terms: scaler; cubic convolution interpolation; transcoder.
Paper 030478 received Sep. 29, 2003; revised manuscript received Dec. 19, 2003; accepted for publication Feb. 12, 2004.
1 Introduction
Transcoding is an important technique for various video communications over heterogeneous networks whose band- widths are different. In the video-on-demand server, when video is pre-encoded and stored, the characteristics of the channel are not considered. This fact results in a great lack of flexibility in transmission of these pre-encoded video streams through heterogeneous networks. An efficient way to overcome these problems is to use a transcoder that pro- vides a match between a pre-encoded MPEG bitstream and the transmission channel. For example, the transcoder re- ceives as an input a pre-coded MPEG-2 bitstream with a high bit rate and produces another bitstream with a lower bit rate, which meets new bandwidth constraints.
There are two approaches for transcoding, spatial do- main and frequency 共DCT兲 domain transcoding. In spatial domain processing, the stored bitstream is decompressed at first, the down scaling operation is performed in the pixel domain, and then the resulted data are recompressed at a target bit rate.1–3 Shanableh and Ghanbari1 described transcoder architecture that can speed up the processing time, in which spatial and temporal resolutions are changed. In a DCT domain transcoder, the schemes work directly in the DCT domain.4 –7In Ref. 4, Yim and Isnardi described an efficient method for DCT domain image resiz- ing, where the scaling ratio is 2:1. In Ref. 6, the author proposed a method for approximating linear operations on images in the compressed domain using multiplication-free schemes, where the system includes the down sampling module whose ratio is 2:1. As we can see from this re-
search, the resolution conversion is extremely important in the transcoding system so that the transcoded image main- tains high quality. While transcoders proposed in the litera- ture resize the spatial resolution with the integer ratio, for example, 1/2 or 1/4, we would consider a transcoder system to change the resolution with an arbitrary ratio.
The resolution conversion module is called a scaler.
Standard approaches of scaling fit the original discrete data with a continuous model, and resample this function on a new sampling grid.8 Following the sampling theory, the original signal can be reconstructed perfectly from its samples by a convolution with the sinc function. But, since the sinc kernel decays too slowly at infinity, realizing the function physically is difficult. Thus, different approxima- tions, such as the bilinear,9 bicubic,10 cubic splines,11 and high order (⬎3) B-spline8,12 operators, have been pro- posed. To obtain better quality of the scaled image, Park and Schowengerdt13 had adapted the parameters of the cu- bic convolutional interpolation10 to the frequency content of the specific image to be processed, and Ramponi14pro- posed a space-variant technique by introducing the concept of the warped distance.
We design the transcoding system using the resolution conversion, where the cubic convolution scaler10is applied to resize the image with an arbitrary ratio. It has been known that the flexibility of the generic videocoding algo- rithm defined in the MPEG-2 standard supports various size image and formats. To deal the various formats of image, in this work a modified cubic convolution scaler is proposed by considering the relation between sampling positions of
signal. The modification is performed to correct the phase of the interpolation kernel. The two ideas proposed in this work are as follows. At first, while the conventional transcoder systems4,6 change the image size into half or quarter size, we resize the image with an arbitrary ratio, for example, from 1920⫻1080i into 704⫻480i. The resizing techniques with an arbitrary ratio are more realistic than the conventional schemes. Speaking of the second idea, to sup- port the various transcoding cases that are described later, the modification schemes of resampling positions are pro- posed for various cases, for example, from the progressive format picture into the progressive one, from the interlaced one into the interlaced one, from the interlaced one into the progressive one, and from the progressive one into the in- terlaced one.
This work is organized as follows. Section 2 describes the transcoding system and the various image formats used in the system. The scaling schemes required in the transcoder are explained in Sec. 3, where the background information about the interpolation technique is described.
In Sec. 4, we propose a modified cubic convolution inter- polation. Computer simulation results of the proposed algo- rithm are presented in Sec. 5. The brief conclusion is given in Sec. 6.
2 Transcoder System
For digital storage media and the transmission of an MPEG-2 bitstream over a band-limited channel, bit-rate re- duction is an inevitable process for storage efficiency and transmission quality, respectively.2,3,15,16 Such a bit-rate re- duction can be obtained by the transcoder whose block dia- gram is depicted in Fig. 1. The transcoder reduces the bit rate by adjusting coding parameters, such as quantizer step size and spatiotemporal resolution as shown in Fig. 1. Note that the module of the scaler is an up/down-sampler. An input video bitstream is decoded by a variable length de- coder共VLD兲, and then the result is inverse quantized by an inverse quantizer 共IQ兲. The dequantized data are trans- formed into the pixel domain by the inverse discrete cosine transform共IDCT兲, then the data are compensated by a mo- tion compensater共MC兲. The reconstructed picture is scaled into a different size image via the scaler. The resized im- ages are re-encoded into another bit stream with the differ- ent bitrate, where ME/MC, VLC, and Q represent motion
estimation/motion compensation, variable length coder, and quantizer, respectively. The motion vector/mode data ob- tained from the decoder stage are reused at the encoder part. The reuse of the incoming motion vectors has been widely accepted in many transcoder architectures.3 In this work, the motion vectors transferred from the decoder part are scaled according to the ratio of the reduction of spatial resolution. And then, the resized motion vector is refined over the narrow search range, for example,⫾2 pixels.
We have considered a system that can deal with various video streams including MPEG-2 MP@HL and SDTV at MP@ML. It supports the conversion among formats de- fined in MPEG standard.17 The formats include 1920
⫻1080i, 1280⫻720p, 704⫻480p, 704⫻480i, and 640
⫻480p. The available formats are shown in Fig. 2, where i and p represent the interlaced and progressive formats, re- spectively. We note that the resolution of images is changed by a noninteger factor in the horizontal and vertical direc- tions, thus the scaler should be designed to accomplish the noninteger rate change, while the conventional interlaced to progressive conversion 共IPCs兲 resize the resolution of the image only in the vertical direction by double ratio (⫽2)18 –20, and in the horizontal and vertical directions by an integer ratio.21
The key technology we focused on is the scaler. We have focused on the design of the resolution scaler modi- fied to support all available formats. Figures 3, 4, and 5 show the processes that change from 1920⫻1080i to 720
⫻480i, 1920⫻1080i to 1280⫻720p, and from 1280
⫻720p to 720⫻480i, respectively. We note when the
Fig. 1 The transcoder system with up/down scaler.
Fig. 2 The various image formats used in an MPEG-2 standard.
original image is interlaced, the splitter is used to produce an even field and an odd one. And, in the case where the reconstructed image is an interlaced one, the merger is ap- plied to make a frame from two field pictures. In Fig. 4, the scaling system changes the resolution from 1920⫻1080i to 1280⫻720p, where the image size is reduced in the hori- zontal direction, but the vertical resolution is enlarged.
Thus, the up/down scaler should be applied separately to the horizontal and vertical directions, respectively. In Fig.
5, two p frames are changed into an i frame, while an i frame is resized into two p frames in Fig. 4.
3 Resolution Scaler
In this section, we investigate the design of the resolution scaler. The scaling procedure fits the original discrete data with a continuous function, and then resamples this func- tion on a new sampling position. For equally spaced sampled data f (xk), many interpolation functions can be written in the form
f˜共x兲⫽k⫽⫺
兺
⬁ ⬁ck共x⫺xk兲, 共1兲where f˜(x) is the corresponding interpolation function and
(x) is the interpolation kernel. And, x and xk represent continuous and discrete values, respectively. Among the in- terpolation functions that can be characterized in this man- ner are cubic splines and linear interpolating functions. In Eq.共1兲, xk is the interpolation nodes, and ckis parameters that depend on the sampled data f (xk). The interpolation kernel (x) converts discrete data f (xk) into continuous functions f˜(x) by an operation similar to convolution.
From the classical Shannon sampling theorem, if f (x) is bandlimited to (⫺,⫹), then
f˜共x兲⫽k⫽⫺⬁
兺
⬁ f共xk兲sinc共x⫺xk兲, 共2兲where
sinc共x兲⫽sin共x兲
x , 共3兲
i.e., in Eq. 共1兲, ck and (x) are replaced with f (xk) and sinc(x), respectively. For numerical computations, the ideal interpolation formula in Eq.共2兲is not practical due to the slow rate of decay of the interpolation kernel sinc(x). An attractive alternative is a reconstruction by polynomial spline interpolation.
The most practical approach is to estimate the value of each unknown pixel using a small set of its nearest neigh- bors. For example, in methods based on spline functions, the known data samples affect the value to be determined according to an inverse function of the distance.
Let f (xk) be the available data, and f˜(x) be the value to be interpolated. We suppose that its nearest available neigh- bors are located at coordinates xk and xk⫹1, and the spac- ing of the sampling grid be one for these data. We define the distance between x, xk, and xk⫹1 as follows.
s⫽x⫺xk, 共4兲
1⫺s⫽xk⫹1⫺x, 共5兲
where 0⭐s⭐1 and xk⭐x⭐xk⫹1. A simple algorithm of many that can be used to obtain the interpolated pixel f˜(x) is
f˜共x兲⫽共1⫺s兲f共xk兲⫹s f共xk⫹1兲. 共6兲
Fig. 3 The resizing process from 1920⫻1080ito 720⫻480i.
Fig. 4 The resizing process from 1920⫻1080ito 1280⫻720p.
Another method, more complicated but significantly more effective, is the bicubic one.11The problem with this approach is that the slope discontinuity at the ends of the waveform leads to amplitude ripples in a reconstructed function. This problem can be eliminated by generating a cubic convolution function,10,22,23which forces the slope of the ends of the interpolation to be a fixed value. The cubic convolution interpolation function can be expressed in the following general form:
共x兲⫽
再
␣共␣兩x⫹兩32⫺兲兩5x␣兩3兩⫺共x兩2␣⫹⫹8␣3兲兩兩xx兩⫺兩2⫹4␣1,, 10⭐兩⭐兩xx兩⭐兩⭐2.1. 共7兲Rifman22 and Bernstein23 have set ␣⫽⫺1, which causes
(x) to have the same slope, ⫺1, at x⫽1 as the sinc(x) function. Keys10 has proposed setting ␣⫽⫺1/2, which provides an interpolation function that approximates the original unsampled image to as high a degree as possible in the sense of a power series expansion. Since (x) in Eq.
共7兲is zero except in the interval (⫺2,2), substituting Eqs.
共4兲and共7兲into Eq.共1兲gives
f˜共x兲⫽f共xk⫺1兲共␣s3⫺2␣s2⫹␣s兲⫹f共xk兲关共␣⫹2兲s3
⫺共3⫹␣兲s2⫹1兴⫹f共xk⫹1兲关⫺共␣⫹2兲s3
⫹共2␣⫹3兲s2⫺␣s兴⫹f共xk⫹2兲共⫺␣s3⫹␣s2兲, 共8兲 where xk⭐x⭐xk⫹1, s⫽x⫺xk, and 0⭐s⭐1, and interpo- lation functions coincide with the sampled data at the inter- polation nodes, i.e., f˜(xk)⫽f (xk). Moreover, they can be used for interpolating by any factor共an integer, rational, or irrational number兲. By applying ␣⫽⫺1/2, which Keys10 has set, Eq.共8兲becomes
f˜共x兲⫽f共xk⫺1兲共⫺s3⫹2s2⫺s兲/2⫹f共xk兲共3s3⫺5s2⫹2兲/2
⫹f共xk⫹1兲共⫺3s3⫹4s2⫹s兲/2⫹f共xk⫹2兲共s3⫺s2兲/2.
共9兲 The factor␣in Eq.共7兲can be used as a tuning parameter to obtain a best visual interpolation.13 Park and
Schowengerdt13 had adapted the parameter␣ of the inter- polation algorithm to the frequency content of the specific image to be processed.
In the proposed system, we focus on the cubic convolu- tion scaler of Eq.共9兲, since it outperforms a bilinear inter- polator, and is simpler than a B spline.10In the next section, the modified version of Eq. 共9兲 is proposed to resize the various format images.
4 Modified Cubic Convolution Scaler
Figure 6 shows the framework within which the original data f (xk) represents original data. The procedure consists of two phases: an interpolation to make a continuous func- tion f˜(x) and resampling this function on a new grid yn. The scaling converts the discrete data f (xk) into the scaled data f˜(yn) having different resolution. When ␦⬎1, the scaled data f˜( yn) becomes the enlarged one of the original data f (xk), where
␦⫽N
M, 共10兲
M and N are the size of the original and the scaled signal, respectively.
The interpolated continuous function f˜(x) is calculated by Eq. 共9兲. The scaling is a procedure that resamples the continuous function f˜(x) with scaling factor␦as
f˜共yn兲⫽f˜共x兲兩x⫽yn⫽n•1
␦, 0⭐n⬍N. 共11兲
When xk⭐yn⭐xk⫹1 and s⫽yn⫺xk, Eq. 共11兲becomes f˜共yn兲⫽f˜共x兲兩s⫽yn⫺xk, 0⭐n⬍N, 0⭐k⬍M . 共12兲
Applying the scheme to multidimensional data, such as gray levels of an image, is straightforward. The scaling procedure described in Fig. 6 can be applied along each axis separately 共first along the rows and then along the
Fig. 5 The resizing process from 1280⫻720pto 720⫻480i.
Fig. 6 The block diagram of the scaling system.
columns兲. In the following sections, the scaling schemes, which are modified according to the format of the pro- cessed images, are explained.
4.1 Scaling from a Progressive Image
When the scaler is used to resize a progressive image into a progressive one with different resolution, the procedure is depicted in Fig. 7, where W, H, W˜ , and H˜ represent the width and height of the original image and the resized im- age, respectively. In the horizontal direction, the resampling is performed with scaling factor␦h⫽W˜ /W as
f˜h共yn兲⫽f˜h共x兲兩x⫽yn⫽n• 1
␦h⫽n•WW˜ ,n⫽0,1,2,...,W˜⫺1, 共13兲 f˜h共x兲⫽fh共xk⫺1兲共⫺s3⫹2s2⫺s兲/2
⫹fh共xk兲共3s3⫺5s2⫹2兲/2
⫹fh共xk⫹1兲共⫺3s3⫹4s2⫹s兲/2
⫹fh共xk⫹2兲共s3⫺s2兲/2, 共14兲
where fh(xk) and f˜h(yn) are the given discrete data and the scaled one, in the horizontal direction of image, respec- tively. And yn is the resampling position.
On the other hand, in the vertical direction, the resizing is executed as
f˜v共yn兲⫽f˜v共x兲兩x⫽yn⫽n• 1
␦v⫽n•H
H˜,n⫽0,1,2,...,H˜⫺1, 共15兲 f˜v共x兲⫽fv共xk⫺1兲共⫺s3⫹2s2⫺s兲/2
⫹fv共xk兲共3s3⫺5s2⫹2兲/2
⫹fv共xk⫹1兲共⫺3s3⫹4s2⫹s兲/2
⫹fv共xk⫹2兲共s3⫺s2兲/2, 共16兲
where fv(xk), f˜v(yn), and ynare the vertical direction data of the original and the resized image, and resamping posi- tion, respectively.
4.2 Scaling from an Interlaced Image
When the scaler is applied to the interlaced image, the op- erations of Eqs.共13兲and共15兲are executed after the frame image f (xk) is split to two fields, even fe(xk) and odd field fo(xk). In Fig. 8, fhe(xk), fve(xk), fho(xk), and fvo(xk) are
horizontal and vertical data of even and odd fields, respec- tively. In the horizontal direction, the resampling is ex- ecuted on the field data as
f˜he共yn兲⫽f˜he共x兲兩x⫽yn⫽n• 1
␦h
,n⫽0,1,2,...,W˜⫺1, 共17兲 f˜ho共yn兲⫽f˜ho共x兲兩x⫽yn⫽n• 1
␦h
,n⫽0,1,2,...,W˜⫺1. 共18兲
After the fields are resized, the resulting fields have to be merged as in Fig. 8. In the vertical direction, if we define the operations as
f˜ve共yn兲⫽f˜ve共x兲兩x⫽yn⫽n• 1
␦v
,n⫽0,1,2,...,H˜
2 ⫺1, 共19兲 f˜vo共yn兲⫽f˜vo共x兲兩x⫽yn⫽n• 1
␦v
,n⫽0,1,2,...,H˜
2 ⫺1, 共20兲 then the merged vertical data becomes
f˜v共m兲⫽
再
f˜ve冉
␦nv冊
,n⫽b
m2c
if m is even f˜vo冉
␦nv冊
,n⫽b
m2c
else if m is odd. 共21兲
In Eq. 共21兲, the sampling positions 兵n/␦v,n
⫽0,1,2,..., H˜ /2⫺1其 for f˜v
e(n/␦v) correspond to兵2n/␦v,n
⫽0,1,2,..., H˜ /2⫺1其in the original frame data fv(xk). And, the positions兵n/␦v,n⫽0,1,2,..., H˜ /2⫺1其for f˜v
o(n/␦v) are mapped into the 兵1⫹2n/␦v,n⫽0,1,2,..., H˜ /2⫺1其 in the original frame fv(xk). This means that the resampling po- sitions in the frame fv(xk) are
再
␦0v,1,␦2v,1⫹␦2v,␦4v,1⫹␦4v,...,1⫹␦2v冉
H˜2 ⫺1冊 冎
i.e., the positions are not equally spaced.
Thus, when the vertical data are scaled, the resampling position should be modified to compensate the irregularity.
f˜ve共yn兲⫽f˜ve共x兲兩x⫽yn⫽n• 1
␦v
,n⫽0,1,2,...,H˜
2 ⫺1, 共22兲
Fig. 7 The scaling procedure used for resizing from a progressive image to a progressive one.
f˜vo共yn兲⫽f˜vo共x兲兩x⫽yn⫽
冉
n•␦1v⫹d冊
,n⫽0,1,2,...,H˜2⫺1, 共23兲where (n•1/␦v⫹d) is the shifted sampling grid.
In the view point of the vertical axis of the original frame picture, the sampling positions of Eqs.共22兲 and 共23兲 become 兵(2n•1/␦v) and (1⫹2n•1/␦v⫹2•d), n⫽0,1,2,...,H˜ /2⫺1其. Since the sampling positions 兵(2n•1/␦v) and (1⫹2n•1/␦v⫹2•d)其 have to compose
兵r/␦v,r⫽0,1,2,...,H˜⫺1其, i.e., equally spaced points 兵0,1/␦v,2/␦v,3/␦v,4/␦v,...,(H˜⫺1)/␦v其 in the vertical di- rection of the frame picture, the d should be
d⫽1 2
冉
␦1v⫺1
冊
. 共24兲Thus, in the vertical direction of the odd field, the sampling grids have to be corrected as much as d in Eq.共24兲. Note that the merged horizontal data are
Fig. 8 The scaling procedure used for resizing from an interlaced image to an interlaced one.
Fig. 9 The scaling procedure used for resizing from an interlaced image to two progressive images.
f˜h共yn兲
⫽
再
f˜he共yn兲, if f˜h共yn兲 is in the even line of the merged frame f˜ho共yn兲, if f˜h共yn兲 is in the odd line of the merged frame.共25兲
4.3 Scaling from an Interlaced Image to Two Progressive Images
When the scaler is used for resizing an interlaced frame into two progressive frames, the procedure is depicted in Fig. 9, where the merging operation is not used. The scaling is applied to the even and odd fields as follows.
f˜he共yn兲⫽f˜he共x兲兩x⫽yn⫽n• 1
␦h
,n⫽0,1,2,...,W˜⫺1, 共26兲 f˜ve共yn兲⫽f˜ve共x兲兩x⫽yn⫽n• 1
2␦v
,n⫽0,1,2,...,H˜⫺1, 共27兲 f˜ho共yn兲⫽f˜ho共x兲兩x⫽yn⫽n• 1
␦h
,n⫽0,1,2,...,W˜⫺1, 共28兲
f˜vo共yn兲⫽f˜vo共x兲兩x⫽yn⫽
冉
n•21␦v⫹d冊
,n⫽0,1,2,...,H˜⫺1. 共29兲 The sampling positions of Eqs. 共27兲 and 共29兲 in each field data correspond to the positions 兵(n•1/␦v),n⫽0,1,2,...,H˜⫺1其 and兵(1⫹n•1/␦v⫹2•d),n⫽0,1,2,...,H˜
⫺1其 in the viewpoint of the vertical axis of the original frame fv(xk). Since the merged operation is not used in this case, both of the corresponding positions 兵(n•1/␦v)其 and 兵(1⫹n•1/␦v⫹2•d)其 have to be 兵0,1/␦v,2/␦v,3/␦v,4/
␦v,5/␦v,...,(H˜⫺1)/␦v其. Thus, the d is set to d⫽⫺1
2. 共30兲
4.4 Scaling from Two Progressive Images to an Interlaced One
The scaling system resizing two progressive frames into an interlaced frame does not require the splitter, as depicted in Fig. 10. The operations used in this case are summarized in the following equations.
Fig. 10 The scaling procedure used for resizing two progressive images to an interlaced one.
Table 1 The computational complexity required in the obtaining a samplef˜(yn) with Eqs. (9), (11), (12), (22), and (23).Pis the product number, andAis the addition number.
Step
Conventional scheme with Eqs. (9), (11), (12)
Proposed scheme with Eqs. (9), (12), (22), (23) Calculation ofynin Eqs. (11) or (23) P⫽1 P⫽1,A⫽1
Calculation ofsin Eq. (12) A⫽1 A⫽1
Making the filter kernel in Eq. (9) P⫽3,A⫽2 P⫽3,A⫽2
FIR filtering in Eq. (9) P⫽4,A⫽3 P⫽4,A⫽3
Total P⫽8,A⫽6 P⫽8,A⫽7
f˜he共yn兲⫽f˜he共x兲兩x⫽yn⫽n• 1
␦h
,n⫽0,1,2,...,W˜⫺1, 共31兲
f˜ve共yn兲⫽f˜ve共x兲兩x⫽yn⫽2n• 1
␦v
,n⫽0,1,2,...,H˜
2⫺1, 共32兲 f˜ho共yn兲⫽f˜ho共x兲兩x⫽yn⫽n• 1
␦h,n⫽0,1,2,...,W˜⫺1, 共33兲
f˜vo共yn兲⫽f˜vo共x兲兩x⫽yn⫽
冉
2n•␦1v⫹d冊
,n⫽0,1,2,...,H˜2⫺1. 共34兲The sampling positions 兵(2n•1/␦v) and (2n•1/␦v
⫹d), n⫽0,1,2,..., H˜ /2⫺1其of Eqs.共32兲and共34兲compose
the sampling points 兵0,1/␦v,2/␦v,3/␦v,4/␦v,5/␦v,...,(H˜
⫺1)/␦v其in the viewpoint of the vertical axis of the original progressive frames. Thus, d should be
d⫽ 1
␦v
. 共35兲
4.5 Computational Complexity
The computational burdens of the conventional and pro- posed scaler are in the decision of sampling position yn, represented in Eqs.共11兲or共23兲, in the calculation of phase s of Eq. 共12兲, in the construction of the filter kernel de- scribed in Eq. 共9兲, and in FIR filtering with Eq. 共9兲. To obtain a sample value f˜(yn), in the step of calculation of yn, one product operation has to be conducted in the con-
Fig. 11 The four consecutive images are scaled by the conventional cubic convolution scaler, where the ratio is 5.7. Four consecutive images are merged into one image. The resulted consecutive im- ages are discordant vertically.
Fig. 12 The four consecutive images are scaled by the proposed cubic convolution scaler, where the ratio is 5.7. Four consecutive images are merged into one image. The resized consecutive im- ages are regular vertically.
Table 2 Performance comparison of the cubic convolution scalers in the point of the information loss, which is measured as PSNR (dB) and mean square error (MSE). As the measured PSNR increases, the information loss decreases.
Test
image Scheme
704⫻480i⇒704 2 ⫻480
2 i
⇒704⫻480i
704⫻480i⇒490⫻350i
⇒704⫻480i
Mobile and calendar
Conventional scheme
PSNR⫽19.77 dB PSNR⫽22.16 dB
MSE⫽684.19 MSE⫽394.69
Proposed scheme
PSNR⫽20.65 dB PSNR⫽23.61 dB
MSE⫽559.40 MSE⫽282.89
Flower garden
Conventional scheme
PSNR⫽21.02 dB PSNR⫽23.82 dB
MSE⫽513.55 MSE⫽269.73
Proposed scheme
PSNR⫽21.69 dB PSNR⫽24.67 dB
MSE⫽440.04 MSE⫽221.72
Susie Conventional scheme
PSNR⫽34.19 dB MSE⫽24.76
PSNR⫽37.24 dB MSE⫽12.28 Proposed
scheme
PSNR⫽35.30 dB MSE⫽19.19
PSNR⫽38.35 dB MSE⫽9.50
ventional scheme with Eq.共11兲. On the other hand, in the proposed scheme, one product and one addition are re- quired to have ynof Eq.共23兲. Except for obtaining yn, the complexities of other steps, which include the calculation of s, making the filter kernel, and the conducting filtering, are same in the conventional and proposed schemes. The complexities are summarized in Table 1. Assume that the size of the scaled image is W˜⫻H˜ , then the total complexity for scaling for an image becomes
W˜⫻H˜⫻兵P⫹A其. 共36兲
This means that the proposed scheme requires W˜⫻H˜ more addition operations compared to the conventional approach.
Since the product operation ( P) consumes much longer CPU time than the additive operation (A), the increment of complexity is insignificant.
Fig. 13 The PSNR comparison of the images resulting from the transcoding systems using the con- ventional and the proposed scaler, respectively. The test image is mobile and calendar, and the up/down scaler is a cubic convolution scaler.
5 Simulation Results
Computer simulations using real images were performed to evaluate the performance of the proposed algorithm. The test images are: mobile and calendar, flower garden, and Susie. The first simulation is a quantitative test. To have an objective evaluation of the proposed scheme, we checked the quantity of the information loss resulted from the con- secutive scaling procedures. To evaluate the information
loss, various criteria including peak signal-to-noise ratio 共PSNR兲, signal-to-noise ratio共SNR兲, and mean squared er- ror 共MSE兲 can be considered. Among these, PSNR and MSE have been used in several conventional papers,14,24,25 respectively. Thus, we choose the PSNR and MSE as crite- ria for the evaluation of information loss. The PSNR coin- cides with the criterion of other computer simulations, Figs.
13 through 16 in this work.
Fig. 14 The PSNR comparison of the images resulting from the transcoding systems using the con- ventional and the proposed scalers, respectively. The test image is flower garden, and the up/down scaler is a cubic convolution scaler.
The test images are low-pass filtered and decimated to the 共0.5 times兲 reduced images by using the conventional and proposed schemes, respectively. Then, interpolations by a factor␦⫽2 in the row and column directions are used to return to the original size by those techniques. The PSNRs and MSEs are evaluated with respect to the original image in Table 2, where the mobile and calendar, flower garden, and Susie sequences are used as test images. As shown in this table, the PSNRs of the image interpolated by the proposed scheme are higher than that of the convention- ally enlarged image. And we show another test, where the test image is reduced to a 490⫻350i image. Then, the scal- ing process is used to return to the original size. It is ob- served that the proposed scheme outperforms the conven- tional one in various cases. And, we can see that the PSNR trend of test for mobile and calendar repeats in the simula- tions for flower garden and Susie. This is due to the fact that the interpolation kernel of the proposed scheme is modified according to the relationship between formats of the scaled and original images, while the conventional scaler did not concern the relation information. The results imply that the proposed algorithm exhibits significant im- provement in the minimization of information loss when compared with the conventional interpolation. From these results, it can be assured that the proposed scaler is very effective in the resizing of image resolution.
For another simulation to show a qualitative evaluation of the proposed method, the results of the conventional scheme and the proposed method for an enlargement test are compared in Figs. 11 and 12, where the scaling ratio is 5.7, and the resulted images are progressive pictures. The figures represent the fence and nose region of the horse in the mobile-and-calendar image. Four consecutive pictures are merged into one image to compare the resulted images.
As shown in these figures, we added some horizontal lines
to emphasis the difference, while horizontal lines of the enlarged image resulting from the conventional scaler are discordant vertically, the proposed scheme makes an image into a resized image whose data is regular vertically.
To check the usefulness of the proposed scheme in the context of transcoding, the performance of the system de- picted in Fig. 1 is evaluated. The video codec used in this simulation is MPEG-2 MP@ML. The original sequence is coded at Ro(⫽15M or 10M ) bits per second共bps兲, where the size of the picture is 704⫻480i. The encoded bitstream is decoded, and then the image sequences are scaled to 480⫻336i pictures. And, the resized images are transcoded into RT(⫽10M bps or 5 M bps). This simulation is one of the processes that change the i frame into another i frame, as described in Fig. 3. To evaluate the image qualities in terms of PSNR, the scaling process is used to return to the original size image, 704⫻480i. Figures 13 and 14 repre- sent the simulation results for mobile and calendar and flower garden, respectively. Figures 13 and 14 show that the proposed scheme produces 0.7 and 0.8 dB better PSNR on average compared to the conventional approach, respec- tively.
To evaluate the performances of the transcoders whose scenarios are described in Figs. 4 and 5, we composite those scenarios 共the interlaced pictures are transcoded into the progressive ones as in Fig. 4, and then the progressive pictures are changed into the interlaced ones as in Fig. 5兲 into one. The original image sequence共mobile and calen- dar兲is coded at the bit rate R0⫽15M bps, where the size of the picture is 704⫻480i. The encoded bitstream is transcoded into the 480⫻360p bitstream, where R1
⫽10M bps. And then, the transcoded images (480
⫻360p) are retranscoded into the 704⫻480i pictures whose bit rate is R2⫽5 M bps. Figure 15 shows the perfor-
Fig. 15 The PSNR comparison of the images resulting from the composite transcoding systems (Figs.
4 and 5) using the conventional and the proposed scalers, respectively. The test image is mobile and calendar, and the up/down scaler is a cubic convolution scaler.
mance of the composite transcoding system, which consists of both schemes in Figs. 4 and 5. The result indicates that the transcoder system using the proposed scheme produces 1.2 dB better PSNR on average compared to the conven- tional scheme.
To show that the proposed phase modification scheme can be applied to a B-spline filter, the performance of the transcoder using a B-spline filter has been represented in
Fig. 16. The B-spline filter is used as a spatial resolution scaler, while the other simulation conditions are maintained as in Fig. 13. Simulation results indicate that the proposed approach generates the bitstream, whose qualities are higher than those of the conventional scheme. It means that the proposed scheme is useful in the transcoding system, which changes the spatial resolution as well as the bit rate of the image sequence.
Fig. 16 The PSNR comparison of the images resulting from the transcoding systems using the con- ventional and proposed scalers, respectively. The test image is mobile and calendar, and the up/down scaler is a B spline.
the resizing process. The test results show that the proposed scheme outperforms the conventional method. As we can see from the results, the proposed algorithms enable a reli- able scaling technique with arbitrary scale factor.
Acknowledgment
This work was supported by a Korea Research Foundation Grant共KRF-2002-003-D00224兲.
References
1. T. Shanableh and M. Ghanbari, ‘‘Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats,’’
IEEE Trans. Multimedia 2, 101–110共June 2000兲.
2. B. Shen, I. K. Sethi, and B. Vasudev, ‘‘Adaptive motion vector resa- mpling for compress video downscaling,’’ IEEE Trans. Circuits Syst.
Video Technol. 9, 929–936共Sep. 1999兲.
3. J. Youn, M. T. Sun, and C. W. Lin, ‘‘Motion vector refinement for high performance transcoding,’’ IEEE Trans. Multimedia 1, 30– 40 共Mar. 1995兲.
4. C. Yim and M. A. Isnardi, ‘‘An efficient method for DCT-domain image resizing with mixed field/frame-mode macroblocks,’’ IEEE Trans. Circuits Syst. Video Technol. 9, 696 –700共Aug. 1999兲. 5. J. Song and B. L. Yeo, ‘‘A fast algorithm for DCT-domain inverse
motion compensation based on shared information in a macroblock,’’
IEEE Trans. Circuits Syst. Video Technol. 10, 767–775共Aug. 2000兲. 6. N. Merhav, ‘‘Multiplication-free approximation algorithms for compressed-domain linear operations on images,’’ IEEE Trans. Image Process. 8, 247–254共Feb. 1999兲.
7. S. Liu and A. C. Bovik, ‘‘Local bandwidth constrained fast inverse motion compensation for DCT-domain video transcoding,’’ IEEE Trans. Circuits Syst. Video Technol. 12, 309–319共May 2002兲. 8. M. Unser, A. Aldroubi, and M. Eden, ‘‘Enlargement or reduction of
digital images with minimum loss of information,’’ IEEE Trans. Im- age Process. 4, 247–258共Mar. 1995兲.
9. W. K. Pratt, Digital Image Processing, John Wiley and Sons, New York共1991兲.
10. R. G. Keys, ‘‘Cubic convolution interpolation for digital image pro- cessing,’’ IEEE Trans. Acoust., Speech, Signal Process. 29, 1153–
1160共Dec. 1981兲.
11. H. S. Hou and H. C. Andrews, ‘‘Cubic splines for image interpolation and digital filtering,’’ IEEE Trans. Acoust., Speech, Signal Process.
26, 508 –517共1978兲.
12. M. Unser, A. Aldroubi, and M. Eden, ‘‘Fast B-spline transforms for continuous image representation and interpolation,’’ IEEE Trans. Pat- tern Anal. Mach. Intell. 13, 277–285共Mar. 1991兲.
13. S. K. Park and R. A. Schowengerdt, ‘‘Image reconstruction by para- metric cubic convolution,’’ Comput. Vis. Graph. Image Process. 23, 258 –272共Sep. 1983兲.
14. G. Ramponi, ‘‘Warped distance for space-variant linear image inter- polation,’’ IEEE Trans. Image Process. 8, 629– 639共May 1999兲. 15. H. Sun, W. Kwok, and J. W. Zdepski, ‘‘Architectures for MPEG com-
pressed bitstream scaling,’’ IEEE Trans. Circuits Syst. Video Technol.
6, 191–199共Apr. 1999兲.
16. G. D. L. Reyes, A. R. Reibman, S. F. Chang, and J. C. I. Chuang,
‘‘Error resilient transcoding for video over wireless channels,’’ IEEE
laced to progressive conversion,’’ IEEE Trans. Consum. Electron.
38共3兲, 162–167共Aug. 1992兲.
20. M. H. Lee, J. H. Kim, J. S. Lee, K. K. Ryu, and D. I. Song, ‘‘A new algorithm for interlaced to progressive scan conversion based on di- rectional corrections and its IC design,’’ IEEE Trans. Consum. Elec- tron. 40共2兲, 119–129共May 1994兲.
21. R. Li, N. K. Chung, K. T. Mo, D. M. Fisher, and V. Wong, ‘‘A flexible display module for DVD and set-up box applications,’’ IEEE Trans.
Consum. Electron. 43共3兲, 496 –503共Aug. 1997兲.
22. S. S. Rifman, ‘‘Digital rectification of ERTS multispectral imagery,’’
Proc. Symp. Significant Results Obtained from ERTS-1 (NASA SP- 327) I共Sec. B兲, 1131–1142共1973兲.
23. R. Bernstein, ‘‘Digital image processing of Earth observation sensor data,’’ IBM J. Res. Dev. 20, 40–57共1976兲.
24. M. Karczewicz and M. Gabbouj, ‘‘Robust B-spline image modeling with application to image processing,’’ IEEE Trans. Image Process. 7, 912–917共June 1998兲.
25. H. J. Kim and C. C. Li, ‘‘Lossless and lossy image compression using biorthognal wavelet transforms with multiplierless operations,’’ IEEE Trans. Circuits Syst. II: Analog digital signal Process 45, 1113–1118 共Aug. 1998兲.
Jong-Ki Han received the BS, MS, and PhD degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 1992, 1994, and 1999, respectively. From 1999 to 2001 he was a member of the technical staff at Corporate Research and Development Center, Samsung Electronics Company, Suwon, South Korea. He is cur- rently an assistant professor at the Depart- ment of Information and Communications Engineering, Sejong University, Seoul, Korea. His research interests include image and audio signal compression, transcoding, and very large scale integration (VLSI) signal processing.
Hyung-Myung Kim received the BS de- gree in electronics engineering from Seoul National University, Korea, in 1974, and the MS and PhD degrees in electrical engi- neering from the University of Pittsburgh, Pennsylvania, in 1982 and 1985, respec- tively. He is now a professor in the Depart- ment of Electrical Engineering and Com- puter Science, Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea. His research interests include digi- tal signal/image processing, digital transmission of voice, data and image, and multidimensional system theory. He was the Treasurer of the IEEE Taejon Section in 1992. He has been an editorial board member ofMultidimensional Systems and Signal Processingsince 1990.