Modiﬁed cubic convolution scaler for multiformat conversion in a transcoder

(1)

Communications Engineering Room 911, Chungmu-Building 98 Kunja-Dong, Kwangjin-Ku Seoul, South Korea

Hyung-Myung Kim

Korea Advanced Institute of Science and Technology (KAIST)

Department of Electrical Engineering 373-1 Kusong-Dong, Yusong-Gu Taejon 305-701 Korea

pictures in the transcoding system, which transforms the bitstream compressed at a bit rate, such as the HD bitstream, into another bit rate stream, for example, the SD bitstream. The transcoding is performed in spatial domain. In many applications such as the transcoder, the resolution conversion is very important for changing the image size while the scaled image maintains high quality. The scaling process consists of two steps: fitting the original data with a continuous function, and resampling the function on a new sampling grid. We focus on the modification of the scaler kernel according to the relation between formats of the original and the resized image. In the modification, various formats defined in MPEG standards are considered. We show experimental results that demonstrate the effectiveness of the proposed interpolation method. The algorithm exhibits significant improvement in the minimization of information loss when compared with the conventional interpolation algorithms. ©2004 Society of Photo-Optical Instrumentation Engineers.

[DOI: 10.1117/1.1758732]

Subject terms: scaler; cubic convolution interpolation; transcoder.

Paper 030478 received Sep. 29, 2003; revised manuscript received Dec. 19, 2003; accepted for publication Feb. 12, 2004.

1 Introduction

Transcoding is an important technique for various video communications over heterogeneous networks whose band- widths are different. In the video-on-demand server, when video is pre-encoded and stored, the characteristics of the channel are not considered. This fact results in a great lack of flexibility in transmission of these pre-encoded video streams through heterogeneous networks. An efficient way to overcome these problems is to use a transcoder that provides a match between a pre-encoded MPEG bitstream and the transmission channel. For example, the transcoder re- ceives as an input a pre-coded MPEG-2 bitstream with a high bit rate and produces another bitstream with a lower bit rate, which meets new bandwidth constraints.

There are two approaches for transcoding, spatial domain and frequency 共DCT兲 domain transcoding. In spatial domain processing, the stored bitstream is decompressed at first, the down scaling operation is performed in the pixel domain, and then the resulted data are recompressed at a target bit rate.^1–3 Shanableh and Ghanbari¹ described transcoder architecture that can speed up the processing time, in which spatial and temporal resolutions are changed. In a DCT domain transcoder, the schemes work directly in the DCT domain.^{4 –7}In Ref. 4, Yim and Isnardi described an efficient method for DCT domain image resizing, where the scaling ratio is 2:1. In Ref. 6, the author proposed a method for approximating linear operations on images in the compressed domain using multiplication-free schemes, where the system includes the down sampling module whose ratio is 2:1. As we can see from this re-

search, the resolution conversion is extremely important in the transcoding system so that the transcoded image maintains high quality. While transcoders proposed in the litera- ture resize the spatial resolution with the integer ratio, for example, 1/2 or 1/4, we would consider a transcoder system to change the resolution with an arbitrary ratio.

The resolution conversion module is called a scaler.

Standard approaches of scaling fit the original discrete data with a continuous model, and resample this function on a new sampling grid.⁸ Following the sampling theory, the original signal can be reconstructed perfectly from its samples by a convolution with the sinc function. But, since the sinc kernel decays too slowly at infinity, realizing the function physically is difficult. Thus, different approxima- tions, such as the bilinear,⁹ bicubic,¹⁰ cubic splines,¹¹ and high order (⬎3) B-spline^8,12 operators, have been proposed. To obtain better quality of the scaled image, Park and Schowengerdt¹³ had adapted the parameters of the cubic convolutional interpolation¹⁰ to the frequency content of the specific image to be processed, and Ramponi¹⁴proposed a space-variant technique by introducing the concept of the warped distance.

We design the transcoding system using the resolution conversion, where the cubic convolution scaler¹⁰is applied to resize the image with an arbitrary ratio. It has been known that the flexibility of the generic videocoding algorithm defined in the MPEG-2 standard supports various size image and formats. To deal the various formats of image, in this work a modified cubic convolution scaler is proposed by considering the relation between sampling positions of

(2)

signal. The modification is performed to correct the phase of the interpolation kernel. The two ideas proposed in this work are as follows. At first, while the conventional transcoder systems^4,6 change the image size into half or quarter size, we resize the image with an arbitrary ratio, for example, from 1920⫻1080i into 704⫻480i. The resizing techniques with an arbitrary ratio are more realistic than the conventional schemes. Speaking of the second idea, to support the various transcoding cases that are described later, the modification schemes of resampling positions are proposed for various cases, for example, from the progressive format picture into the progressive one, from the interlaced one into the interlaced one, from the interlaced one into the progressive one, and from the progressive one into the interlaced one.

This work is organized as follows. Section 2 describes the transcoding system and the various image formats used in the system. The scaling schemes required in the transcoder are explained in Sec. 3, where the background information about the interpolation technique is described.

In Sec. 4, we propose a modified cubic convolution interpolation. Computer simulation results of the proposed algorithm are presented in Sec. 5. The brief conclusion is given in Sec. 6.

2 Transcoder System

For digital storage media and the transmission of an MPEG-2 bitstream over a band-limited channel, bit-rate reduction is an inevitable process for storage efficiency and transmission quality, respectively.^2,3,15,16 Such a bit-rate reduction can be obtained by the transcoder whose block diagram is depicted in Fig. 1. The transcoder reduces the bit rate by adjusting coding parameters, such as quantizer step size and spatiotemporal resolution as shown in Fig. 1. Note that the module of the scaler is an up/down-sampler. An input video bitstream is decoded by a variable length de- coder共VLD兲, and then the result is inverse quantized by an inverse quantizer 共IQ兲. The dequantized data are trans- formed into the pixel domain by the inverse discrete cosine transform共IDCT兲, then the data are compensated by a motion compensater共MC兲. The reconstructed picture is scaled into a different size image via the scaler. The resized images are re-encoded into another bit stream with the different bitrate, where ME/MC, VLC, and Q represent motion

estimation/motion compensation, variable length coder, and quantizer, respectively. The motion vector/mode data obtained from the decoder stage are reused at the encoder part. The reuse of the incoming motion vectors has been widely accepted in many transcoder architectures.³ In this work, the motion vectors transferred from the decoder part are scaled according to the ratio of the reduction of spatial resolution. And then, the resized motion vector is refined over the narrow search range, for example,⫾2 pixels.

We have considered a system that can deal with various video streams including MPEG-2 MP@HL and SDTV at MP@ML. It supports the conversion among formats defined in MPEG standard.¹⁷ The formats include 1920

⫻1080i, 1280⫻720p, 704⫻480p, 704⫻480i, and 640

⫻480p. The available formats are shown in Fig. 2, where i and p represent the interlaced and progressive formats, respectively. We note that the resolution of images is changed by a noninteger factor in the horizontal and vertical directions, thus the scaler should be designed to accomplish the noninteger rate change, while the conventional interlaced to progressive conversion 共IPCs兲 resize the resolution of the image only in the vertical direction by double ratio (⫽2)^{18 –20}, and in the horizontal and vertical directions by an integer ratio.²¹

The key technology we focused on is the scaler. We have focused on the design of the resolution scaler modified to support all available formats. Figures 3, 4, and 5 show the processes that change from 1920⫻1080i to 720

⫻480i, 1920⫻1080i to 1280⫻720p, and from 1280

⫻720p to 720⫻480i, respectively. We note when the

Fig. 1 The transcoder system with up/down scaler.

Fig. 2 The various image formats used in an MPEG-2 standard.

(3)

original image is interlaced, the splitter is used to produce an even field and an odd one. And, in the case where the reconstructed image is an interlaced one, the merger is applied to make a frame from two field pictures. In Fig. 4, the scaling system changes the resolution from 1920⫻1080i to 1280⫻720p, where the image size is reduced in the horizontal direction, but the vertical resolution is enlarged.

Thus, the up/down scaler should be applied separately to the horizontal and vertical directions, respectively. In Fig.

5, two p frames are changed into an i frame, while an i frame is resized into two p frames in Fig. 4.

3 Resolution Scaler

In this section, we investigate the design of the resolution scaler. The scaling procedure fits the original discrete data with a continuous function, and then resamples this function on a new sampling position. For equally spaced sampled data f (x_k), many interpolation functions can be written in the form

f˜共x兲⫽_k_⫽⫺

兺

^⬁ _⬁^c^k^␤^共^x^⫺^x^k^兲^, ^共¹^兲

where f˜(x) is the corresponding interpolation function and

␤(x) is the interpolation kernel. And, x and x_k represent continuous and discrete values, respectively. Among the interpolation functions that can be characterized in this man- ner are cubic splines and linear interpolating functions. In Eq.共1兲, x_k is the interpolation nodes, and c_kis parameters that depend on the sampled data f (x_k). The interpolation kernel ␤(x) converts discrete data f (x_k) into continuous functions f˜(x) by an operation similar to convolution.

From the classical Shannon sampling theorem, if f (x) is bandlimited to (⫺␲^,⫹␲^{), then}

f˜共x兲⫽_k_⫽⫺⬁

兺

^⬁ ^f^共^x^k^兲^sinc^共^x^⫺^x^k^兲^, ^共²^兲

where

sinc共x兲⫽sin共␲^x兲

␲^x ^, ^共³^兲

i.e., in Eq. 共1兲, c_k and ␤(x) are replaced with f (x_k) and sinc(x), respectively. For numerical computations, the ideal interpolation formula in Eq.共2兲is not practical due to the slow rate of decay of the interpolation kernel sinc(x). An attractive alternative is a reconstruction by polynomial spline interpolation.

The most practical approach is to estimate the value of each unknown pixel using a small set of its nearest neigh- bors. For example, in methods based on spline functions, the known data samples affect the value to be determined according to an inverse function of the distance.

Let f (x_k) be the available data, and f˜(x) be the value to be interpolated. We suppose that its nearest available neigh- bors are located at coordinates x_k and x_k_⫹₁, and the spac- ing of the sampling grid be one for these data. We define the distance between x, x_k, and x_k_⫹₁ as follows.

s⫽x⫺x_k, 共4兲

1⫺s⫽x_k_⫹₁⫺x, 共5兲

where 0⭐s⭐1 and x_k⭐x⭐x_k_⫹₁. A simple algorithm of many that can be used to obtain the interpolated pixel f˜(x) is

f˜共x兲⫽共1⫺s兲f共x_k兲⫹s f共x_k_⫹₁兲. 共6兲

Fig. 3 The resizing process from 1920⫻1080ito 720⫻480i.

Fig. 4 The resizing process from 1920⫻1080ito 1280⫻720p.

(4)

Another method, more complicated but significantly more effective, is the bicubic one.¹¹The problem with this approach is that the slope discontinuity at the ends of the waveform leads to amplitude ripples in a reconstructed function. This problem can be eliminated by generating a cubic convolution function,^10,22,23which forces the slope of the ends of the interpolation to be a fixed value. The cubic convolution interpolation function can be expressed in the following general form:

␤共x兲⫽

再

^␣^共^␣^兩^x^⫹^兩³²^⫺^兲兩⁵^x^␣^兩³^兩^⫺共^x^兩²^␣^⫹^⫹⁸^␣³^兲兩^兩^x^x^兩⫺^兩²^⫹⁴^␣^1,^, ¹⁰^⭐兩^⭐^兩^x^x^兩⭐^兩⭐^2.¹^. ^共⁷^兲

Rifman²² and Bernstein²³ have set ␣⫽⫺1, which causes

␤(x) to have the same slope, ⫺1, at x⫽1 as the sinc(x) function. Keys¹⁰ has proposed setting ␣⫽⫺1/2, which provides an interpolation function that approximates the original unsampled image to as high a degree as possible in the sense of a power series expansion. Since ␤^{(x) in Eq.}

共7兲is zero except in the interval (⫺2,2), substituting Eqs.

共4兲and共7兲into Eq.共1兲gives

f˜共x兲⫽f共x_k_⫺₁兲共␣^s³⫺2␣^s²⫹␣^s兲⫹f共x_k兲关共␣⫹2兲s³

⫺共3⫹␣兲s²⫹1兴⫹f共x_k_⫹₁兲关⫺共␣⫹2兲s³

⫹共2␣⫹3兲s²⫺␣^s兴⫹f共x_k_⫹₂兲共⫺␣^s³⫹␣^s²兲, 共8兲 where x_k⭐x⭐x_k_⫹₁, s⫽x⫺x_k, and 0⭐s⭐1, and interpolation functions coincide with the sampled data at the interpolation nodes, i.e., f˜(x_k)⫽f (x_k). Moreover, they can be used for interpolating by any factor共an integer, rational, or irrational number兲. By applying ␣⫽⫺1/2, which Keys¹⁰ has set, Eq.共8兲becomes

f˜共x兲⫽f共x_k_⫺₁兲共⫺s³⫹2s²⫺s兲/2⫹f共x_k兲共3s³⫺5s²⫹2兲/2

⫹f共x_k_⫹₁兲共⫺3s³⫹4s²⫹s兲/2⫹f共x_k_⫹₂兲共s³⫺s²兲/2.

共9兲 The factor␣^{in Eq.}共7兲can be used as a tuning parameter to obtain a best visual interpolation.¹³ Park and

Schowengerdt¹³ had adapted the parameter␣ of the interpolation algorithm to the frequency content of the specific image to be processed.

In the proposed system, we focus on the cubic convolution scaler of Eq.共9兲, since it outperforms a bilinear inter- polator, and is simpler than a B spline.¹⁰In the next section, the modified version of Eq. 共9兲 is proposed to resize the various format images.

4 Modified Cubic Convolution Scaler

Figure 6 shows the framework within which the original data f (x_k) represents original data. The procedure consists of two phases: an interpolation to make a continuous function f˜(x) and resampling this function on a new grid y_n. The scaling converts the discrete data f (x_k) into the scaled data f˜(y_n) having different resolution. When ␦⬎1, the scaled data f˜( y_n) becomes the enlarged one of the original data f (x_k), where

␦⫽N

M, 共10兲

M and N are the size of the original and the scaled signal, respectively.

The interpolated continuous function f˜(x) is calculated by Eq. 共9兲. The scaling is a procedure that resamples the continuous function f˜(x) with scaling factor␦^as

f˜共y_n兲⫽f˜共x兲兩x⫽y_n⫽n•1

␦, 0⭐n⬍N. 共11兲

When x_k⭐y_n⭐x_k_⫹₁ and s⫽y_n⫺x_k, Eq. 共11兲becomes f˜共y_n兲⫽f˜共x兲兩s⫽y_n⫺x_k, 0⭐n⬍N, 0⭐k⬍M . 共12兲

Applying the scheme to multidimensional data, such as gray levels of an image, is straightforward. The scaling procedure described in Fig. 6 can be applied along each axis separately 共first along the rows and then along the

Fig. 5 The resizing process from 1280⫻720pto 720⫻480i.

Fig. 6 The block diagram of the scaling system.

(5)

columns兲. In the following sections, the scaling schemes, which are modified according to the format of the processed images, are explained.

4.1 Scaling from a Progressive Image

When the scaler is used to resize a progressive image into a progressive one with different resolution, the procedure is depicted in Fig. 7, where W, H, W˜ , and H˜ represent the width and height of the original image and the resized image, respectively. In the horizontal direction, the resampling is performed with scaling factor␦h⫽W˜ /W as

f˜_h共y_n兲⫽f˜_h共x兲兩x⫽y_n⫽n• 1

␦_h^⫽ⁿ^•^WW˜ ,n⫽0,1,2,...,W˜⫺1, 共13兲 f˜_h共x兲⫽f_h共x_k_⫺₁兲共⫺s³⫹2s²⫺s兲/2

⫹f_h共x_k兲共3s³⫺5s²⫹2兲/2

⫹f_h共x_k_⫹₁兲共⫺3s³⫹4s²⫹s兲/2

⫹f_h共x_k_⫹₂兲共s³⫺s²兲/2, 共14兲

where f_h(x_k) and f˜_h(y_n) are the given discrete data and the scaled one, in the horizontal direction of image, respectively. And y_n is the resampling position.

On the other hand, in the vertical direction, the resizing is executed as

f˜_v共y_n兲⫽f˜_v共x兲兩x⫽y_n⫽n• 1

␦v⫽n•H

H˜,n⫽0,1,2,...,H˜⫺1, 共15兲 f˜_v共x兲⫽f_v共x_k_⫺₁兲共⫺s³⫹2s²⫺s兲/2

⫹f_v共x_k兲共3s³⫺5s²⫹2兲/2

⫹f_v共x_k_⫹₁兲共⫺3s³⫹4s²⫹s兲/2

⫹f_v共x_k_⫹₂兲共s³⫺s²兲/2, 共16兲

where f_v(x_k), f˜_v(y_n), and y_nare the vertical direction data of the original and the resized image, and resamping position, respectively.

4.2 Scaling from an Interlaced Image

When the scaler is applied to the interlaced image, the operations of Eqs.共13兲and共15兲are executed after the frame image f (x_k) is split to two fields, even fê(x_k) and odd field fô(x_k). In Fig. 8, f_hê(x_k), f_vê(x_k), f_hô(x_k), and f_vô(x_k) are

horizontal and vertical data of even and odd fields, respectively. In the horizontal direction, the resampling is executed on the field data as

f˜_h^e共y_n兲⫽f˜_h^e共x兲兩x⫽y_n⫽n• 1

␦h

,n⫽0,1,2,...,W˜⫺1, 共17兲 f˜_h^o共y_n兲⫽f˜_h^o共x兲兩x⫽y_n⫽n• 1

␦h

,n⫽0,1,2,...,W˜⫺1. 共18兲

After the fields are resized, the resulting fields have to be merged as in Fig. 8. In the vertical direction, if we define the operations as

f˜_v^e共y_n兲⫽f˜_v^e共x兲兩x⫽y_n⫽n• 1

␦v

,n⫽0,1,2,...,H˜

2 ⫺1, 共19兲 f˜_v^o共y_n兲⫽f˜_v^o共x兲兩x⫽y_n⫽n• 1

␦v

,n⫽0,1,2,...,H˜

2 ⫺1, 共20兲 then the merged vertical data becomes

f˜_v共m兲⫽

再

^f˜^v^e

^冉

^␦ⁿ^v

^冊

^,n^⫽

^b

^m²

^c

if m is even f˜_v^o

冉

^␦ⁿ^v

冊

^,n^⫽

_b

^m²

_c

else if m is odd

. 共21兲

In Eq. 共21兲, the sampling positions 兵^n/␦v,n

⫽0,1,2,..., H˜ /2⫺1其 ^{for f˜}v

e(n/␦v) correspond to兵^2n/␦v,n

⫽0,1,2,..., H˜ /2⫺1其in the original frame data f_v(x_k). And, the positions兵^n/␦v,n⫽0,1,2,..., H˜ /2⫺1其^{for f˜}v

o(n/␦v) are mapped into the 兵¹⫹2n/␦v,n⫽0,1,2,..., H˜ /2⫺1其 ^{in the} original frame f_v(x_k). This means that the resampling positions in the frame f_v(x_k) are

再

^␦⁰^v^,1,^␦²^v^,1^⫹^␦²^v^,^␦⁴^v^,1^⫹^␦⁴^v^,...,1^⫹^␦²^v

^冉

^H^˜² ^⫺¹

^冊冎

i.e., the positions are not equally spaced.

Thus, when the vertical data are scaled, the resampling position should be modified to compensate the irregularity.

f˜_v^e共y_n兲⫽f˜_v^e共x兲兩x⫽y_n⫽n• 1

␦v

,n⫽0,1,2,...,H˜

2 ⫺1, 共22兲

Fig. 7 The scaling procedure used for resizing from a progressive image to a progressive one.

(6)

f˜_v^o共y_n兲⫽f˜_v^o共x兲兩x⫽y_n⫽

冉

ⁿ^•^␦¹^v^⫹^d

冊

^,n^⫽^0,1,2,...,^H^˜²^⫺^1, ^共²³^兲

where (n•1/␦v⫹d) is the shifted sampling grid.

In the view point of the vertical axis of the original frame picture, the sampling positions of Eqs.共22兲 and 共23兲 become 兵⁽²ⁿ•1/␦v) and (1⫹2n•1/␦v⫹2•d), n⫽0,1,2,...,H˜ /2⫺1其^. ^Since ^the ^sampling ^positions 兵⁽²ⁿ•1/␦v) and (1⫹2n•1/␦v⫹2•d)其 have to compose

兵^r/␦v,r⫽0,1,2,...,H˜⫺1其^, ^i.e., ^equally ^spaced ^points 兵^0,1/␦v,2/␦v,3/␦v,4/␦v,...,(H˜⫺1)/␦v其 in the vertical direction of the frame picture, the d should be

d⫽1 2

冉

␦¹v

⫺1

冊

^. ^共²⁴^兲

Thus, in the vertical direction of the odd field, the sampling grids have to be corrected as much as d in Eq.共24兲. Note that the merged horizontal data are

Fig. 8 The scaling procedure used for resizing from an interlaced image to an interlaced one.

Fig. 9 The scaling procedure used for resizing from an interlaced image to two progressive images.

(7)

f˜h共yn兲

⫽

再

^f˜^h^e^共^yⁿ^兲, ^{if f˜}^h^共^yⁿ^兲 is in the even line of the merged frame f˜_h^o共yn兲, if f˜h共yn兲 is in the odd line of the merged frame.

共25兲

4.3 Scaling from an Interlaced Image to Two Progressive Images

When the scaler is used for resizing an interlaced frame into two progressive frames, the procedure is depicted in Fig. 9, where the merging operation is not used. The scaling is applied to the even and odd fields as follows.

␦h

,n⫽0,1,2,...,W˜⫺1, 共26兲 f˜_v^e共y_n兲⫽f˜_v^e共x兲兩x⫽y_n⫽n• 1

2␦v

,n⫽0,1,2,...,H˜⫺1, 共27兲 f˜_h^o共y_n兲⫽f˜_h^o共x兲兩x⫽y_n⫽n• 1

␦h

,n⫽0,1,2,...,W˜⫺1, 共28兲

冉

ⁿ^•²¹^␦^v^⫹^d

冊

^,n^⫽0,1,2,...,H˜⫺1. 共29兲 The sampling positions of Eqs. 共27兲 and 共29兲 in each field data correspond to the positions 兵⁽ⁿ•1/␦v),n

⫽0,1,2,...,H˜⫺1其 ^and兵⁽¹^⫹ⁿ•1/␦v⫹2•d),n⫽0,1,2,...,H˜

⫺1其 in the viewpoint of the vertical axis of the original frame f_v(x_k). Since the merged operation is not used in this case, both of the corresponding positions 兵⁽ⁿ•1/␦v)其 ^and 兵⁽¹^⫹ⁿ•1/␦v⫹2•d)其 ^{have to be} 兵^0,1/␦v,2/␦v,3/␦v,4/

␦v,5/␦v,...,(H˜⫺1)/␦v其. Thus, the d is set to d⫽⫺1

2. 共30兲

4.4 Scaling from Two Progressive Images to an Interlaced One

The scaling system resizing two progressive frames into an interlaced frame does not require the splitter, as depicted in Fig. 10. The operations used in this case are summarized in the following equations.

Fig. 10 The scaling procedure used for resizing two progressive images to an interlaced one.

Table 1 The computational complexity required in the obtaining a samplef˜(y_n) with Eqs. (9), (11), (12), (22), and (23).Pis the product number, andAis the addition number.

Step

Conventional scheme with Eqs. (9), (11), (12)

Proposed scheme with Eqs. (9), (12), (22), (23) Calculation ofy_nin Eqs. (11) or (23) P_⫽1 P_⫽1,A_⫽1

Calculation ofsin Eq. (12) A_⫽1 A_⫽1

Making the filter kernel in Eq. (9) P_⫽3,A_⫽2 P_⫽3,A_⫽2

FIR filtering in Eq. (9) P_⫽4,A_⫽3 P_⫽4,A_⫽3

Total P_⫽8,A_⫽6 P_⫽8,A_⫽7

(8)

␦h

,n⫽0,1,2,...,W˜⫺1, 共31兲

f˜_v^e共y_n兲⫽f˜_v^e共x兲兩x⫽y_n⫽2n• 1

␦v

,n⫽0,1,2,...,H˜

2⫺1, 共32兲 f˜_h^o共y_n兲⫽f˜_h^o共x兲兩x⫽y_n⫽n• 1

␦_h,n⫽0,1,2,...,W˜⫺1, 共33兲

冉

²ⁿ^•^␦¹^v^⫹^d

冊

^,n^⫽^0,1,2,...,^H^˜²^⫺^1. ^共³⁴^兲

The sampling positions 兵⁽²ⁿ•1/␦v) and (2n•1/␦v

⫹d), n⫽0,1,2,..., H˜ /2⫺1其^{of Eqs.}^共³²^兲^and^共³⁴^兲^compose

the sampling points 兵^0,1/␦v,2/␦v,3/␦v,4/␦v,5/␦v,...,(H˜

⫺1)/␦v其in the viewpoint of the vertical axis of the original progressive frames. Thus, d should be

d⫽ 1

␦v

. 共35兲

4.5 Computational Complexity

The computational burdens of the conventional and proposed scaler are in the decision of sampling position y_n, represented in Eqs.共11兲or共23兲, in the calculation of phase s of Eq. 共12兲, in the construction of the filter kernel described in Eq. 共9兲, and in FIR filtering with Eq. 共9兲. To obtain a sample value f˜(y_n), in the step of calculation of y_n, one product operation has to be conducted in the con-

Fig. 11 The four consecutive images are scaled by the conventional cubic convolution scaler, where the ratio is 5.7. Four consecutive images are merged into one image. The resulted consecutive images are discordant vertically.

Fig. 12 The four consecutive images are scaled by the proposed cubic convolution scaler, where the ratio is 5.7. Four consecutive images are merged into one image. The resized consecutive images are regular vertically.

Table 2 Performance comparison of the cubic convolution scalers in the point of the information loss, which is measured as PSNR (dB) and mean square error (MSE). As the measured PSNR increases, the information loss decreases.

Test

image Scheme

704⫻480i_⇒⁷⁰⁴ 2 ⫻480

2 i

⇒704⫻480i

704⫻480i_⇒490⫻350i

⇒704⫻480i

Mobile and calendar

Conventional scheme

PSNR⫽19.77 dB PSNR⫽22.16 dB

MSE⫽684.19 MSE⫽394.69

Proposed scheme

MSE⫽559.40 MSE⫽282.89

Flower garden

Conventional scheme

MSE⫽513.55 MSE⫽269.73

Proposed scheme

MSE⫽440.04 MSE⫽221.72

Susie Conventional scheme

PSNR⫽34.19 dB MSE⫽24.76

PSNR⫽37.24 dB MSE⫽12.28 Proposed

scheme

PSNR⫽35.30 dB MSE⫽19.19

PSNR⫽38.35 dB MSE⫽9.50

(9)

ventional scheme with Eq.共11兲. On the other hand, in the proposed scheme, one product and one addition are required to have y_nof Eq.共23兲. Except for obtaining y_n, the complexities of other steps, which include the calculation of s, making the filter kernel, and the conducting filtering, are same in the conventional and proposed schemes. The complexities are summarized in Table 1. Assume that the size of the scaled image is W˜⫻H˜ , then the total complexity for scaling for an image becomes

W˜⫻H˜⫻兵^P^⫹^A其^. ^共³⁶^兲

This means that the proposed scheme requires W˜⫻H˜ more addition operations compared to the conventional approach.

Since the product operation ( P) consumes much longer CPU time than the additive operation (A), the increment of complexity is insignificant.

Fig. 13 The PSNR comparison of the images resulting from the transcoding systems using the conventional and the proposed scaler, respectively. The test image is mobile and calendar, and the up/down scaler is a cubic convolution scaler.

(10)

5 Simulation Results

Computer simulations using real images were performed to evaluate the performance of the proposed algorithm. The test images are: mobile and calendar, flower garden, and Susie. The first simulation is a quantitative test. To have an objective evaluation of the proposed scheme, we checked the quantity of the information loss resulted from the consecutive scaling procedures. To evaluate the information

loss, various criteria including peak signal-to-noise ratio 共PSNR兲, signal-to-noise ratio共SNR兲, and mean squared error 共MSE兲 can be considered. Among these, PSNR and MSE have been used in several conventional papers,^14,24,25 respectively. Thus, we choose the PSNR and MSE as criteria for the evaluation of information loss. The PSNR coin- cides with the criterion of other computer simulations, Figs.

13 through 16 in this work.

Fig. 14 The PSNR comparison of the images resulting from the transcoding systems using the conventional and the proposed scalers, respectively. The test image is flower garden, and the up/down scaler is a cubic convolution scaler.

(11)

The test images are low-pass filtered and decimated to the 共0.5 times兲 reduced images by using the conventional and proposed schemes, respectively. Then, interpolations by a factor␦⫽2 in the row and column directions are used to return to the original size by those techniques. The PSNRs and MSEs are evaluated with respect to the original image in Table 2, where the mobile and calendar, flower garden, and Susie sequences are used as test images. As shown in this table, the PSNRs of the image interpolated by the proposed scheme are higher than that of the convention- ally enlarged image. And we show another test, where the test image is reduced to a 490⫻350i image. Then, the scaling process is used to return to the original size. It is ob- served that the proposed scheme outperforms the conventional one in various cases. And, we can see that the PSNR trend of test for mobile and calendar repeats in the simulations for flower garden and Susie. This is due to the fact that the interpolation kernel of the proposed scheme is modified according to the relationship between formats of the scaled and original images, while the conventional scaler did not concern the relation information. The results imply that the proposed algorithm exhibits significant improvement in the minimization of information loss when compared with the conventional interpolation. From these results, it can be assured that the proposed scaler is very effective in the resizing of image resolution.

For another simulation to show a qualitative evaluation of the proposed method, the results of the conventional scheme and the proposed method for an enlargement test are compared in Figs. 11 and 12, where the scaling ratio is 5.7, and the resulted images are progressive pictures. The figures represent the fence and nose region of the horse in the mobile-and-calendar image. Four consecutive pictures are merged into one image to compare the resulted images.

As shown in these figures, we added some horizontal lines

to emphasis the difference, while horizontal lines of the enlarged image resulting from the conventional scaler are discordant vertically, the proposed scheme makes an image into a resized image whose data is regular vertically.

To check the usefulness of the proposed scheme in the context of transcoding, the performance of the system depicted in Fig. 1 is evaluated. The video codec used in this simulation is MPEG-2 MP@ML. The original sequence is coded at R_o(⫽15M or 10M ) bits per second共bps兲, where the size of the picture is 704⫻480i. The encoded bitstream is decoded, and then the image sequences are scaled to 480⫻336i pictures. And, the resized images are transcoded into R_T(⫽10M bps or 5 M bps). This simulation is one of the processes that change the i frame into another i frame, as described in Fig. 3. To evaluate the image qualities in terms of PSNR, the scaling process is used to return to the original size image, 704⫻480i. Figures 13 and 14 represent the simulation results for mobile and calendar and flower garden, respectively. Figures 13 and 14 show that the proposed scheme produces 0.7 and 0.8 dB better PSNR on average compared to the conventional approach, respectively.

To evaluate the performances of the transcoders whose scenarios are described in Figs. 4 and 5, we composite those scenarios 共the interlaced pictures are transcoded into the progressive ones as in Fig. 4, and then the progressive pictures are changed into the interlaced ones as in Fig. 5兲 into one. The original image sequence共mobile and calen- dar兲is coded at the bit rate R₀⫽15M bps, where the size of the picture is 704⫻480i. The encoded bitstream is transcoded into the 480⫻360p bitstream, where R₁

⫽10M bps. And then, the transcoded images (480

⫻360p) are retranscoded into the 704⫻480i pictures whose bit rate is R₂⫽5 M bps. Figure 15 shows the perfor-

Fig. 15 The PSNR comparison of the images resulting from the composite transcoding systems (Figs.

4 and 5) using the conventional and the proposed scalers, respectively. The test image is mobile and calendar, and the up/down scaler is a cubic convolution scaler.

(12)

mance of the composite transcoding system, which consists of both schemes in Figs. 4 and 5. The result indicates that the transcoder system using the proposed scheme produces 1.2 dB better PSNR on average compared to the conventional scheme.

To show that the proposed phase modification scheme can be applied to a B-spline filter, the performance of the transcoder using a B-spline filter has been represented in

Fig. 16. The B-spline filter is used as a spatial resolution scaler, while the other simulation conditions are maintained as in Fig. 13. Simulation results indicate that the proposed approach generates the bitstream, whose qualities are higher than those of the conventional scheme. It means that the proposed scheme is useful in the transcoding system, which changes the spatial resolution as well as the bit rate of the image sequence.

Fig. 16 The PSNR comparison of the images resulting from the transcoding systems using the conventional and proposed scalers, respectively. The test image is mobile and calendar, and the up/down scaler is a B spline.

(13)

the resizing process. The test results show that the proposed scheme outperforms the conventional method. As we can see from the results, the proposed algorithms enable a reli- able scaling technique with arbitrary scale factor.

Acknowledgment

This work was supported by a Korea Research Foundation Grant共KRF-2002-003-D00224兲.

References

1. T. Shanableh and M. Ghanbari, ‘‘Heterogeneous video transcoding to lower spatio-temporal resolutions and different encoding formats,’’

IEEE Trans. Multimedia 2, 101–110共June 2000兲.

2. B. Shen, I. K. Sethi, and B. Vasudev, ‘‘Adaptive motion vector resampling for compress video downscaling,’’ IEEE Trans. Circuits Syst.

Video Technol. 9, 929–936共Sep. 1999兲.

3. J. Youn, M. T. Sun, and C. W. Lin, ‘‘Motion vector refinement for high performance transcoding,’’ IEEE Trans. Multimedia 1, 30– 40 共Mar. 1995兲.

4. C. Yim and M. A. Isnardi, ‘‘An efficient method for DCT-domain image resizing with mixed field/frame-mode macroblocks,’’ IEEE Trans. Circuits Syst. Video Technol. 9, 696 –700共Aug. 1999兲. 5. J. Song and B. L. Yeo, ‘‘A fast algorithm for DCT-domain inverse

motion compensation based on shared information in a macroblock,’’

IEEE Trans. Circuits Syst. Video Technol. 10, 767–775共Aug. 2000兲. 6. N. Merhav, ‘‘Multiplication-free approximation algorithms for compressed-domain linear operations on images,’’ IEEE Trans. Image Process. 8, 247–254共Feb. 1999兲.

7. S. Liu and A. C. Bovik, ‘‘Local bandwidth constrained fast inverse motion compensation for DCT-domain video transcoding,’’ IEEE Trans. Circuits Syst. Video Technol. 12, 309–319共May 2002兲. 8. M. Unser, A. Aldroubi, and M. Eden, ‘‘Enlargement or reduction of

digital images with minimum loss of information,’’ IEEE Trans. Im- age Process. 4, 247–258共Mar. 1995兲.

9. W. K. Pratt, Digital Image Processing, John Wiley and Sons, New York共1991兲.

10. R. G. Keys, ‘‘Cubic convolution interpolation for digital image processing,’’ IEEE Trans. Acoust., Speech, Signal Process. 29, 1153–

1160共Dec. 1981兲.

11. H. S. Hou and H. C. Andrews, ‘‘Cubic splines for image interpolation and digital filtering,’’ IEEE Trans. Acoust., Speech, Signal Process.

26, 508 –517共1978兲.

12. M. Unser, A. Aldroubi, and M. Eden, ‘‘Fast B-spline transforms for continuous image representation and interpolation,’’ IEEE Trans. Pat- tern Anal. Mach. Intell. 13, 277–285共Mar. 1991兲.

13. S. K. Park and R. A. Schowengerdt, ‘‘Image reconstruction by para- metric cubic convolution,’’ Comput. Vis. Graph. Image Process. 23, 258 –272共Sep. 1983兲.

14. G. Ramponi, ‘‘Warped distance for space-variant linear image interpolation,’’ IEEE Trans. Image Process. 8, 629– 639共May 1999兲. 15. H. Sun, W. Kwok, and J. W. Zdepski, ‘‘Architectures for MPEG com-

pressed bitstream scaling,’’ IEEE Trans. Circuits Syst. Video Technol.

6, 191–199共Apr. 1999兲.

16. G. D. L. Reyes, A. R. Reibman, S. F. Chang, and J. C. I. Chuang,

‘‘Error resilient transcoding for video over wireless channels,’’ IEEE

laced to progressive conversion,’’ IEEE Trans. Consum. Electron.

38共3兲, 162–167共Aug. 1992兲.

20. M. H. Lee, J. H. Kim, J. S. Lee, K. K. Ryu, and D. I. Song, ‘‘A new algorithm for interlaced to progressive scan conversion based on di- rectional corrections and its IC design,’’ IEEE Trans. Consum. Elec- tron. 40共2兲, 119–129共May 1994兲.

21. R. Li, N. K. Chung, K. T. Mo, D. M. Fisher, and V. Wong, ‘‘A flexible display module for DVD and set-up box applications,’’ IEEE Trans.

Consum. Electron. 43共3兲, 496 –503共Aug. 1997兲.

22. S. S. Rifman, ‘‘Digital rectification of ERTS multispectral imagery,’’

Proc. Symp. Significant Results Obtained from ERTS-1 (NASA SP- 327) I共Sec. B兲, 1131–1142共1973兲.

23. R. Bernstein, ‘‘Digital image processing of Earth observation sensor data,’’ IBM J. Res. Dev. 20, 40–57共1976兲.

24. M. Karczewicz and M. Gabbouj, ‘‘Robust B-spline image modeling with application to image processing,’’ IEEE Trans. Image Process. 7, 912–917共June 1998兲.

25. H. J. Kim and C. C. Li, ‘‘Lossless and lossy image compression using biorthognal wavelet transforms with multiplierless operations,’’ IEEE Trans. Circuits Syst. II: Analog digital signal Process 45, 1113–1118 共Aug. 1998兲.

Jong-Ki Han received the BS, MS, and PhD degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea, in 1992, 1994, and 1999, respectively. From 1999 to 2001 he was a member of the technical staff at Corporate Research and Development Center, Samsung Electronics Company, Suwon, South Korea. He is cur- rently an assistant professor at the Depart- ment of Information and Communications Engineering, Sejong University, Seoul, Korea. His research interests include image and audio signal compression, transcoding, and very large scale integration (VLSI) signal processing.

Hyung-Myung Kim received the BS degree in electronics engineering from Seoul National University, Korea, in 1974, and the MS and PhD degrees in electrical engineering from the University of Pittsburgh, Pennsylvania, in 1982 and 1985, respectively. He is now a professor in the Depart- ment of Electrical Engineering and Com- puter Science, Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea. His research interests include digital signal/image processing, digital transmission of voice, data and image, and multidimensional system theory. He was the Treasurer of the IEEE Taejon Section in 1992. He has been an editorial board member ofMultidimensional Systems and Signal Processingsince 1990.

Modiﬁed cubic convolution scaler for multiformat conversion in a transcoder

兺

兺

再

再

冉

冊

b

c

冉

冊

b

c

再

冉

冊 冎

冉

冊

冉

冊

再

冉

冊

冉

冊

^冉

^冊

^b

^c

_b

_c

^冉

^冊冎