• 검색 결과가 없습니다.

Design of A Multimedia Bitstream ASIP for Multiple CABAC Standards

N/A
N/A
Protected

Academic year: 2021

Share "Design of A Multimedia Bitstream ASIP for Multiple CABAC Standards"

Copied!
7
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

IEIE Transactions on Smart Processing and Computing, vol. 6, no. 4, August 2017

https://doi.org/10.5573/IEIESPC.2017.6.4.292 292

IEIE Transactions on Smart Processing and Computing

Design of A Multimedia Bitstream ASIP for Multiple

CABAC Standards

Seung-Hyun Choi and Seong-Won Lee

Department of Computer Engineering, Kwangwoon University / 20 Kwangwoon-ro, Nowon-gu, Seoul 01897, Korea {ryvius00, swlee}@kw.ac.kr

* Corresponding Author: Seong-Won Lee

Received June 30, 2017; Accepted July 14, 2017; Published August 30, 2017 * Short Paper

Abstract: The complexity of image compression algorithms has increased in order to improve image compression efficiency. One way to resolve high computational complexity is parallel processing. However, entropy coding, which is lossless compression, does not fit into the parallel processing form because of the correlation between consecutive symbols. This paper proposes a new application-specific instruction set processor (ASIP) platform by adding new context-adaptive binary arithmetic coding (CABAC) instructions to the existing platform to quickly process a variety of entropy coding. The newly added instructions work without conflicts with all other existing instructions of the platform, providing the flexibility to handle many coding standards with fast processing speeds. CABAC software is implemented for High Efficiency Video Coding (HEVC) and the performance of the proposed ASIP platform was verified with a field programmable gate array simulation.

Keywords: ASIP, Entropy coding, CABAC, HEVC, Bitstream

1. Introduction

Owing to the proliferation of various multimedia devices, image standards for multimedia devices are adopting various image compression algorithms to increase image compression efficiency. The result is lower capacity and higher image quality. However, the improved compression efficiency increases the number of complex computations; thus, high processing speeds are required for the platform that processes the image standard. To cope with such an increase in computations, a parallel processing platform was introduced [1, 2].

However, recent entropy coding schemes such as context-adaptive variable length coding (CAVLC) [3] and context-adaptive binary arithmetic coding (CABAC) [4] are unable to perform parallel processing because the current symbol is determined by the just previous symbol.

Therefore, various studies have been conducted to improve the entropy coding process on the same platform where other computation processing is performed.

Among the various research methods, the application-specific instruction set processor (ASIP) implementation method is used to implement entropy coding by adding

hardware to execute a specific computation or operation and the new instructions related to it. The ASIP implementation method shows fast processing speed close to the custom hardware implementation method while minimizing the hardware cost added. It also has the flexibility to handle various multimedia standards, because those standards can be implemented as programs with the newly added instructions. However, it is necessary to decompose the operations at the atomic level (atomic instructions) and to carefully design and verify the new instructions in order to realize the minimum hardware cost without affecting the instructions of the existing processor.

A bitstream processor (BSP) [5, 6] is one that can quickly process variable-length coding (VLC) using ASIP. The instructions for high-speed processing of the bitstream, based on table lookups, enables fast processing and adaptive coding of various image standards that are supported by changing the assembly code. However, CABAC, the arithmetic coding used in recent image standards, is a mixed form of arithmetic coding (AC) and VLC that cannot perform table lookups; thus, it cannot be processed using the existing BSP.

(2)

application specific instructions to handle various specifications, such as H.264, High Efficiency Video Coding (HEVC), and CABAC, by adding an arithmetic coding engine to the existing BSP. The newly added AC instructions are compatible with various instructions in the existing BSP. The AC instructions also provide flexibility to cope with various standards on the BSP platform and to improve the processing speed, which are advantages of the ASIP implementation method. We implemented the software for CABAC on the HEVC standard using the proposed ASIP instruction, and verified the operation of the instructions through a field programmable gate array simulation.

2. Related Work

Arithmetic coding [7] is an entropy coding method that can represent multiple symbols as a single real number. A given symbol and the probability distribution of each symbol provide a compression ratio closer to the optimum ratio, compared to other VLC schemes. For image standards such as H.264 [8] and HEVC [9, 10], the CABAC algorithm is used because it determines the initial probability depending on the content to be compressed, and it changes the probability interval based on the type of new symbol.

Fig. 1 shows the CABAC decoding process. Decoding is a deciphering process of a syntax element through inverse binarization with a bin value output through a binary arithmetic decoding process of an input bitstream. The binary arithmetic decoding process has two categories—regular mode and bypass mode. In regular mode, the probability is determined by the context modeler, and the binary output is printed through the regular decoding engine. The context model is then updated for the next probability interval. The bypass mode uses a predetermined probability interval without utilizing the context modeler. If the syntax element has a binary value,

the inverse binarization process is skipped.

In the case of a general-purpose processor, it is necessary for it to perform operations through various instructions, each time it goes through each step, because it does not have the proper instructions and operations for processing a continuously entered CABAC bitstream. On the other hand, if all CABAC processes are implemented in the hardware, the processing speed is fast, but scalability with other entropy coding techniques is insufficient, and the hardware costs increase as the supporting image standard increases [11].

As another implementation method, if the existing ASIP for VLC is used as is, it is possible to perform fast processing by using functions implemented in the past, such as inverse binarization. Yet, in the conventional ASIP for VLC, the VLC instructions are designed to process a large number of bits. It expresses VLC at one time by using the bitstream as an address for the related operation. In CABAC, however, since the context model and probability value used in the next bit changes with the processing result of 1 bit of the bitstream, the operation progresses by only 1 bit. Therefore, the VLC instruction designed for processing a large number of bits at one time is not suitable for CABAC, which operates one bit at a time.

In this paper, we extend the existing ASIP [4, 5] for VLC to construct an ASIP corresponding to CABAC. Therefore, the AC engine for fast CABAC processing is connected to the existing ASIP for VLC with hardware, and fast CABAC processing is enabled by adding an instruction to efficiently handle the AC engine for CABAC. When using CABAC, the instructions are designed to process various operational procedures all at once. It is possible to process the same functions as CABAC with hardware that includes an AC engine. This is achieved by using the Reduced Instruction Set Computer (RISC) architecture instructions, composed of a minimum unit instruction for the necessary function without a CABAC instruction. However, dozens of instructions for one Fig. 1. CABAC decoding process.

(3)

Choi et al.: Design of A Multimedia Bitstream ASIP for Multiple CABAC Standards 294

CABAC process are required. Furthermore, due to CABAC’s nature of processing bitstreams, efficient bitstream processing is possible by linking bitstream-related operations to CABAC instructions using existing bitstream processing hardware. This way, necessary bitstreams are processed by executing additional instructions or multiple instructions.

3. The Proposed Scheme

In this paper, we propose a method that supports various compression formats, such as CAVLC and CABAC, by adding CABAC instructions that are not supported by the existing ASIP platform for VLC.

In our proposed method, we add an AC engine and new hardware (because the existing processor computation device cannot simultaneously generate and extract the necessary data), move the bitstream, and update the random variable at the same time. In the CABAC decoding process shown in Fig. 1, the conventional ASIP function for VLC processes the parts related to the context model, updating, and binarization, while the CABAC AC engine processes the normal and bypass decoding engines.

Fig. 2 shows the entire ASIP, including the binary

arithmetic coding (BAC) engine. The BAC engine was added to the existing ASIP for VLC. The BAC engine operates as part of the syntax processor through the microprocessor core instructions. Run/level values decoded by the syntax processor are transferred to the RLC engine. The run/level engine generates the residual data by combining the delivered run/level values and the location information using a zig-zag table stored in memory. Thereafter, the generated data are transferred to the system connected to the ASIP.

Fig. 3 shows the operation of the BAC engine. The BAC engine operates according to BAC-specific instructions in the microprocessor core. Various tables that are required for CABAC are stored in memory, using memory-related instructions, through the assembly code. Therefore, if a change to the table is needed, then the data in the memory area are changed. These tables will be loaded as the BAC engine demands. The bitstream register serves to supply the bitstream for the BAC engine. It then simultaneously removes the bit used in the BAC engine and readjusts the next bitstream.

A new instruction using the BAC engine designed the decoding regular (DCR), which operates in regular mode, and the decoding bypass (DCB) for the bypass mode. The DCR instruction transfers the operation of regular mode Fig. 2. ASIP architecture include BAC engine.

(4)

and the context index value that delivers both the most probable symbol (MPS) value and state index. The DCB instruction operates the bypass mode using a fixed probability value.

Fig. 4 shows the operation in regular mode. For the DCR instruction in regular mode, the destination register (dr) is the register in which the output bit is stored, and the source register (sr) is the context index value for transfer. To execute the DCR instruction, the context index value must be stored in the register after the related operation is performed but before executing the DCR instruction. Once the context index value of the current state is transferred through the DCR instruction, the BAC engine computes the state transition table and the range table corresponding to the context index value. Thereafter, it receives the bitstream from the register for the AC process. When processing is complete, the context index value and the decoding bit for the next state are outputted; the bitstream

location is adjusted if the bitstream is used. In the BAC engine, the updated context index value is stored in the dr, while the output bit is stored in the sr.

Fig. 5 shows the operation of bypass mode. For the DCB instruction in bypass mode, the dr is a register in which the output bit is stored. When the bypass instruction is passed, the BAC engine runs with a fixed probability value. Therefore, the table stored in memory is not used. The bitstream is used in the register and transferred to the bitstream register after the location is adjusted. Decoding bits are stored in the dr.

The bits output through the BAC engine are either stored in the register, or combined with the previously decoded bits. They are decoded into syntax elements through syntax processing as needed. These values are arranged to complete the residual data used for the various parameters and blocks in order to create the data for use after BSP processing.

Fig. 4. Regular mode process.

(5)

Choi et al.: Design of A Multimedia Bitstream ASIP for Multiple CABAC Standards 296

4. Performance Evaluation

To verify the performance of the proposed ASIP, the BAC engine and the new instructions (DCR and DCB) were implemented for testing on the existing VLC ASIP platform. Verilog language was used to ensure ease of simulation and cycle accuracy. The BAC engine was implemented as a behavioral model. The HEVC CABAC decoding process was simulated to verify the BAC engine and instruction operation. The assembly code for HEVC used in the platform operation was written, and the input bitstream was created, using HM 16.5 [12] as the HEVC reference software. The generated bitstream and assembly code were entered, and the parameter values and residual data decoded in the ASIP platform were compared with the values decoded in HM 16.5 to confirm correct operation.

It is difficult to directly compare the proposed platform with other microprocessor platforms that have different instruction groups. Therefore, to compare the processing of image pixels, the cycles per pixel (CPP) value was calculated for an indirect comparison.

Intel IA32- [13] and ARMv7-based [14] processors were used as representative microprocessor platforms for testing. An Intel i7-6700 processor was used for the Intel IA32 instructions. The i7 processor has four cores and supports hyper-threading (eight threads). The ARM Cortex-A53 MP4 processor was used for the ARMv7 instructions. The ARM Cortex-A53 MP4 processor has four cores. The proposed platform analyzes the simulation cycle based on the clock used for very-large-scale integration (VLSI) implementation.

To compare the same functions of the HM 16.5 software used in the test, computation time was measured by specifying the function interval implemented on the proposed platform. To minimize the influence of other elements of the platform, an average of 10 cycles per pixel operation was calculated. The Intel and ARM processors were analyzed on Linux platforms to ensure optimum performance and accurate measurement times. The compiler used was GCC version 4.9, compiling HM 16.5 software using optimization option O3, and testing with the release mode executable. To simplify data comparison Fig. 6. Test image (704x576).

(6)

simultaneously. In addition, the difference is about 4.67-7.69 times under ARM, which has the same RISC structure as the proposed platform. This is because the proposed platform operates with instructions specific to bitstream processing.

5. Conclusion

This paper proposes a method to support various entropy coding techniques, such as CAVLC and CABAC. To support various standards, a BAC engine and new instructions were added to the existing ASIP platform for VLC, and simulation was performed using Verilog. To confirm the accuracy of the operation, the results were confirmed using HM 16.5, an HEVC standard software, and CABAC.

Experimental results show that the proposed platform can process CABAC bitstreams up to 1.87-13.69 times faster than other platforms. This shows the efficient bitstream processing capability of the proposed CABAC instruction. Therefore, it can be used as a platform for fast processing of various image standards supporting CABAC. We will also study additional instructions for effective processing by analyzing experimental results of various patterns.

Acknowledgement

This research was partly supported by the Institute for Information & communications Technology Promotion (IITP) grant funded by the Ministry of Science, ICT and Future Planning (MSIP) (R7119-16-1009), by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2017R1D1A1B03036361), and the Research Grant of Kwangwoon University in 2017

References

[1] C. Chi, M. Alvarez-Mesa, B. Juulink, G. Clare, F. Henry, S. Pateux, and T. Schierl, “Parallel Scalability and Efficiency of HEVC Parallelization Approaches,” IEEE Trans. Circuits Syst. Video

Technol., vol. 22, no. 12, pp. 1827-1838, Dec. 2012.

Circuits and Systems for Video Technology, Vol.13,

Issue. 7, pp. 620–636, Jul. 2008. Article (CrossRef Link)

[5] S. Choi, et al., “Design of an application specific instruction set processor for a universal bitstream codec,” IEICE Electronics Express, Vol. 11(2014), No. 24, pp. 20141047, Dec. 2014. Article (CrossRef Link)

[6] S. Choi, et al., “ASiPEC: An Application Specific Instruction-Set Processor for High Performance Entropy Coding,” Ubiquitous Computing Application

and Wireless Sensor, pp. 67-75, Mar. 2015. Article (CrossRef Link)

[7] I. H. Witten, et al., “Arithmetic coding for data compression,” Communications of the ACM, Vol. 30, Issue. 6, pp. 520–540, June. 1987. Article (CrossRef Link)

[8] T. Wiegand, et al., “Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions on

Circuits and Systems for Video Technology, Vol. 13,

No. 7, pp. 560–576, Aug. 2003. Article (CrossRef Link)

[9] High Efficiency Video Coding, Rec. ITU-T H.265 and ISO/IEC 23008-2, Jan 2013

[10] G.J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the High Efficiency Video Coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video

Technol., vol. 22, no. 12, pp.1648-1667, Dec. 2012.

Article (CrossRef Link)

[11] D. Zhou, J. Zhou, W. Fei, and S. Goto, “Ultra-High-Throughput VLSI Architecture of H.265/HEVC CABAC Encoder for UHDTV Applications,” IEEE

Transactions on circuits and systems for video technology, Vol. 25, No. 3, pp. 497–507, Mar. 2015.

Article (CrossRef Link)

[12] HM-16.5, Article (CrossRef Link)

[13] Intel64 and IA-32 Architecture Optimization Reference Manual, Article (CrossRef Link)

[14] ARM architecture reference Manual ARMv7-A and ARMv7-R edition, Article (CrossRef Link)

(7)

Choi et al.: Design of A Multimedia Bitstream ASIP for Multiple CABAC Standards 298

Seung-Hyun Choi received the B.S. and M.S. degrees in Computer Engineering from Kwangwoon University, Korea, in 2006 and 2008, respectively. His research interests include the VLSI/SOC architectures, multithreaded architectures, media processor architectures, power aware computing and multimedia signal processing.

Seong-Won Lee received the B.S. and M.S. degrees in Control & Instrumen- tation Engineering from Seoul National University, Korea, in 1988 and 1990, respectively. He obtained the Ph.D. degree in Electrical Engineering from University of Southern California, USA, in 2003. From 1990 to 2004, he had worked on VLSI/SOC design at the Samsung Electronics Co., Ltd., Korea. Since March 2005, he is a professor of the department of Computer Engineering at Kwangwoon University, Seoul, Korea. His research interests include the VLSI/SOC architectures, multithreaded architectures, media processor architectures, power aware computing and multimedia signal processing.

수치

Fig. 1 shows the CABAC decoding process. Decoding  is a deciphering process of a syntax element through  inverse binarization with a bin value output through a  binary arithmetic decoding process of an input bitstream
Fig. 3 shows the operation of the BAC engine. The  BAC engine operates according to BAC-specific  instructions in the microprocessor core
Fig. 5 shows the operation of bypass mode. For the  DCB instruction in bypass mode, the dr is a register in  which the output bit is stored
Fig. 7. Cycles per pixel results.

참조

관련 문서

(The Office of International Affairs arranges a dormitory application for the new students only in the first semester.). 2)

In this paper, we introduce an iterative method for finding common elements of the set of solutions to a general system of variational inequalities for

They combine: application of emissions trading to new sectors and a tightening of the existing EU Emissions Trading System; increased use of renewable energy; greater

To handle this issue, this paper proposes a virtual resource allocation, in which the residual self-interference is fully exploited by employing

and an improvement set of extragradient-type iteration methods in [5], we in- troduce new iteration algorithms for finding a common of the solution set of an equilibrium problem

This means that when there is a new service company joining the platform or a certain aligned company’s Brand Token trading volume increases thanks to the company’s growth,

Design is applied to create better convenience and beneficial values through a variety of design methodology in relevant industry, and new service

paper proposes proposes proposes proposes a a a a new new new new contrast contrast contrast contrast enhancement enhancement enhancement enhancement method