Using Field Programmable Gate Array Hardware for the Performance Improvement of Ultrasonic Wave Propagation Imaging System

(1)

Using Field Programmable Gate Array Hardware for the Performance Improvement of Ultrasonic Wave Propagation Imaging System

Jaffry Syed Shan*, Syed Haider Abbas**, Donghoon Kang*** and Jungryul Lee**^,****^✝

Abstract Recently, wave propagation imaging based on laser scanning-generated elastic waves has been intensively used for nondestructive inspection. However, the proficiency of the conventional software based system reduces when the scan area is large since the processing time increases significantly due to unavoidable processor multitasking, where computing resources are shared with multiple processes. Hence, the field programmable gate array (FPGA) was introduced for a wave propagation imaging method in order to obtain extreme processing time reduction. An FPGA board was used for the design, implementing post-processing ultrasonic wave propagation imaging (UWPI). The results were compared with the conventional system and considerable improvement was observed, with at least 78% (scanning of 100 × 100 mm² with 0.5 mm interval) to 87.5% (scanning of 200 × 200 mm² with 0.5 mm interval) less processing time, strengthening the claim for the research. This new concept to implement FPGA technology into the UPI system will act as a break-through technology for full-scale automatic inspection.

Keywords: Field Programmable Gate Array, Laser Ultrasound, Ultrasonic Propagation Imaging

[Received: July 7, 2015, Revised: September 10, 2015, Accepted: October 12, 2015] *Hamdard University, Pakistan,

**Department of Aerospace Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Korea,

***Advanced Materials Research Team, Korea Railroad Research Institute, Uiwang-si 16105, Gyeonggi-do, Korea,

****X-NDT Inc., Jeonju, Korea,, ✝Corresponding Author: [email protected]

1. Introduction

More effective inspection methods are emerging with the rapidly progressing field of structural health monitoring (SHM) and non- destructive evaluation (NDE) for applications related to, but not limited to, aerospace, mechanical or civil structures. The ultrasonic propagation imaging method is among those regarded as highly efficient and effective methods for fatigue tests or damage visualization and/or localization [1-9]. This technique depends on ultrasonic signal acquisition through a high speed data acquisition system, from piezoelectric sensors attached to any inspecting surface or structure. Later, these signals from sensors (raw data) became available for processing and are now utilized to generate a video, showing the propagation of waves through the specimen.

Ultimately, this wave propagation video shows the structure of the specimen and parameters such as the type, depth, and/or location for damages or fatigues associated with structures [10-14].

This method of wave propagation is gaining more and more reliability among engineers and researchers around the world, and has proved to be effective and proficient in finding damages or cracks; it is hence proving to have more lifesaving capability in life critical applications.

However, with the high risk involved with many

of these structures, such as aerospace structures

or nuclear power plants etc., the time required

for complete inspection is also related to the

proficiency with which it is performed. The

systems being used today utilize software based

processing methods developed within a processor

centric system. This proves to be a major

(2)

Fig. 1 UPI system performance comparison between 64-bit and 32-bit operating system [10]

bottleneck for such systems when larger area dimensions need to be inspected, since the processing time and efforts that are required by the computer processor also increases drastically.

This is shown in Fig. 1, where a highly efficient ultrasonic propagation imaging (UPI) system was used to inspect the specimen while selecting different scanning dimensions. To further improve the performance, the operating system was upgraded to 64-bit (rather than the 32-bit used for initial tests) to demonstrate that, even though minor improvement can be made with such steps, for larger dimensions, the absolute processing time was still high. In addition, the data format optimization is beyond the scope of this research because the various UPI-based damage visualization algorithms have their own optimization format.

To break through the major bottleneck in the signal/image processing time for the UPI system generating large data in real world application, a novel approach was proposed and implemented in the field of SHM and NDE, where hardware improvement rather than software is the key of performance enhancement for processing algorithm(s). The goal was to maximize the use of system resources for their respective tasks;

hence, the field-programmable gate array (FPGA) technology was proposed for this purpose since the FPGAs are the most flexible, cost-effective, and powerful among all other hardware based embedded system technologies such as complex

such as DSP, medical imaging, and networking.

FPGAs are fundamentally simply the com- bination of logic gates, combinational blocks, clock trees, and interconnections. They therefore provide the ability to restructure internal gates for the design of almost any digital logic, provided the number of gates does not exceed the number of logic gates. Note that modern FPGAs contain more than one hundred million logic gates. Hence, FPGA re-configuration in terms of design, number of logic gates utilization and/or gate floor planning provides the designer with design flexibility [15].

Because of such capability, FPGA allows a system to be developed that requires real-time operation, as well as parallel processing. The performance of FPGAs supersedes even that of digital signal processors (DSP) for most high computing applications that demand high speed signal processing [15]. An example of the use of high performance signal processing is found in medical imaging [15-18], where large sized imaging data is processed with FPGAs and massively parallel implementation makes it possible to use complex 2D and/or 3D image processing methods such as image morphing and segmentation [17]. Another influential example is medical tomography, which is a similar type of computational tomography where performance improvement is achieved by transforming from C-language implementation to FPGA imple- mentation [19].

Fig. 2 shows the difference between a DSP

and FPGA based simple calculation and

illustrates how an FPGA computes faster than

(3)

Fig. 2 Difference between a DSP processor operation and FPGA based operation

even the DSP while implementing a simple tap filter that utilizes parallelism, compared with a sequential operation of a DSP.

Section 2 of this paper discusses the details of the UWPI algorithm. Section 3 discusses the implementation of the UWPI algorithm with the FPGA, along with details of implementation at the hardware level. It also describes the communication of the FPGA board with the computer and interfacing protocol. The final results are presented in section 4.

2. Ultrasonic Wave Propagation Imaging Algorithm

In a wave propagation imaging system, a three-dimensional (3D) data structure is used to generate the results, such as a video, to visually detect/evaluate the damage. In the post- processing technique, the results generation (video) process starts only when all the data (all the acquired ultrasonic signals) are completely saved after the scanning process is finished.

The UWPI algorithm is an example of such process that maps the acquired ultrasonic signal from the data acquisition (DAQ) system into a spatial-temporal data structure based on the three-dimensional (3D) data mapping algorithm.

The result of the 3D data mapping algorithm is

Fig. 3 3D data processing for conventional UWPI

UWPI.

Fig. 3 shows the block diagram for the software based conventional UWPI process. The process starts with an ultrasonic signal acquisition using the DAQ system and sampled at the signal length of  samples with the sampling time interval of  for every laser impinging raster scanning. The 3D data mapping function receives the signal and each sample in the ultrasonic signal is stored in  spreadsheets.

The data points were then stored in the spreadsheet corresponding to its local laser impinging point. This process continues to map the samples of the ultrasonic signal into the created  spreadsheets according to their respective laser impinging points as soon as the data is available in the buffer of the DAQ system. After all the signals are transferred for the entire area of interest, the 3D data structure, as shown in Fig. 3, is formed and ready for video generation [8]. A similar procedure is applied for the post processing system, with the data acquisition from DAQ card being replaced by the data transfer from the computer itself, using the already stored data.

3. FPGA Implementation of UWPI Algorithm

This section describes the details of the

UWPI algorithm implementation in FPGA and

how the FPGA based DSP board communicates

with the computer for video generation.

(4)

Fig. 5 FPGA based DSP board

Fig. 6 UWPI block diagram Fig. 4 Ultrasonic propagation imaging system with

a Q-switched laser, a laser mirror scanner, and an ultrasonic sensor

3.1. Overview of FPGA Based Wave Imaging Method in Ultrasonic Wave Propagation Imaging System

As shown in Fig. 4, the UPI system has a laser scanning system based on a Q-switched laser which excites energy in the specimen with laser pulses. The scanning was controlled via a laser mirror scanner (LMS). Both the Q-switched laser and LMS were controlled by an external computer. Fig. 4 shows the scanning system based on Q-switched laser and LMS.

The DSP board includes FPGA Virtex-6 chip having high speed logic fabric with 30- million-gates which is the heart of the system for implementation of the image/signal processing algorithm. A secondary FPGA device is used for the peripheral component interconnect express (PCIe) communication. This FPGA communicates directly with the PCIe switched mezzanine card (XMC) card integrated with the board. It provides a transparent interface between the Virtex-6 device and the PCIe XMC card using 4 lanes (PCIe x4) communication [22].

The DDR3-SDRAM memory [20] is used as a data storage medium. A 4 Gbit DDR3-SDRAM (512 mega byte) high performance memory with 533 MHz frequency is used for this purpose.

The installed capacity is 2 gigabyte (GB) (four 512 megabyte (MB) memories), but for the test experiment one DDR3 of 512 MB is utilized.

For this purpose, an SDRAM controller is needed and hence the Xilinx SDRAM controller intellectual property (IP) core is utilized and programmed within the FPGA. Fig. 5 shows the actual hardware used for the system development.

3.2. Implementation and Working 3.2.1. Hardware and Firmware

With the conventional UWPI implementation,

the processing load is transformed to the

processor and the algorithm implementation is

performed with the software, which inherently

causes slower operations relative to any dedicated

system. In the FPGA based system, the imple-

mentation of the algorithm is at the hardware

level itself. This provides the benefit of data

processing at the binary/gate level without any

extra transformation, as is required for a

(5)

Fig. 7 RTL for the UWPI module

software based system, and also means that the data-processing is implemented over a dedicated area of hardware. Fig. 6 shows how the UWPI algorithm is implemented at the hardware level with FPGAs.

Three main sub-modules exist: the address generator, the traffic controller, and the read data path controller. The purpose of these sub-modules is as follows in respective order:

generate addresses for reading and writing processes, control the traffic relative to memory controller, and collect and accumulate data fetched from the DDR3-SDRAM memory.

The process starts with an ultrasonic signal (data) being stored in the buffer (called direct memory access (DMA) first in first out (FIFO) input buffer), and the corresponding address is then generated by the address generator (which also acts as an initiator block-initiating the UWPI process as it starts the generating address). Note that the DDR3 memory addresses are divided into ROWs, BANKS, and COLUMNs [20], but the memory controller IP provides a flat address space for the end user [21]. Hence, the counter mechanism was used for addressing the appropriate memory locations for the respective signals and samples.

The controller IP communication is divided into two main paths: (i) the address and command path and (ii) the data path. The traffic controller block is responsible for the address path (both for read and write cycle) and the write data path. The address generation block (see Fig. 6) generates the addresses and commands (writing or reading) and buffers them in a FIFO inside the traffic controller block. The traffic controller then fetches these commands and addresses, along with the respective data from DMA-in FIFO (for write data cycle), and passes them to the memory controller.

For the read data cycle, the address path design is the same, while the read data cycle is managed and organized by the read data path

controller. This block collects the data and saves it in the DMA out FIFO, which is then available for the PC to fetch through DMA transfer via the PCIe interface and to generate the video. Fig. 7 shows the register transfer level (RTL) diagram generated with the Xilinx synthesizer (XST). It presents the register level implementation of the design, describes the address generation, read data path controlling and traffic controlling operations performed on incoming data. It also specifies the controlling signals used at each operator to manage the transfer of data.

3.2.2. Software

The FPGA based DSP board needed to communicate with the computer through some communication protocol. Hence, an XMC card was used with the PCIe protocol. A visual C++

(VC++) based interface was developed using a visual studio compiler to control the communi- cation.

Two main communication channels exist: the

command channel and data channel. While the

command channel only performs tasks such as

sending commands to the FPGA internal

(6)

Table 1 Verification data for simulation

Address Data

DEC HEX DEC HEX

Address 96-103

-256 FF00

Address 104-111

-240 FF10

-272 FEF0 -256 FF00

-320 FEC0 -352 FEa0

-384 FE80 -384 FE80

-400 FE70 -226 FEB0

-352 FEA0 -304 FED0

-304 FED0 -304 FED0

-240 FF10 -256 FF00

hardware and firmware section. However, the reading procedure from the board is somewhat complicated. Since the DMA transfer is extremely fast compared to the file-saving mechanism in the software, multi-threading was utilized to vitalize the DMA benefits while reading data from the DSP board. The idea involved using the virtual parallelization of the CPU with threading and dividing the load that is being consumed using the serialized processing and file saving while reading from the FPGA.

Fig. 8 shows how the multi-threading method in visual C++ was used to distribute the task of saving multiple files simultaneously. This way, the processor resources were shared to overcome the processor based operation bottleneck between the fast FPGA speeds and the slower CPU processing.

Fig. 8 Flow chart for multi-threading operation for file reading from DSP board

diagram shown in Fig. 9 presents the addresses generated using address generator and data stored in each address. The expected data in each address is shown in Table 1 for the verification of simulated result. Since, we are loading the data from the text file to the DDR3 memory therefore the sequence of the data is mentioned in Table 1 to match with the data in write data path of memory controller. It can be seen that for address 96-103 in Fig. 9(a), the data (in hex format) complies with the address shown in Table 1. The 16 bit data is concatenated to form 64 bit word for memory controller which takes two 64 bit words and store in 8 memory locations for address 96-103 and address 104-111 as shown in Fig. 9(b).

The final results were generated and

compared with the conventional CPU based

system (C#-LabVIEW based system). Fig. 10

shows the snapshots from videos generated from

the conventional system, as reference, and from

the FPGA based system. Ultrasonic waves

emerging from the sensor location propagating

(7)

(a)

(b)

Fig. 9 Simulation result for write data path using memory controller and DDR3 simulation model

Fig. 10 Imaging results for 200×200 mm² scanning, (a) freeze frames from FPGA-based system, (b) freeze frames from C#-LabVIEW-based system

Fig. 11 Graph comparing performance of FPGA- based and CPU-based systems

over the surface of specimen are captured at 40.4us, 100us and 199us. Results are identical for both systems which shows that UWPI algorithm is successfully implemented in FPGA.

The results with the FPGA based system proved to be considerably faster because of the nanoseconds (ns) operation with FPGA compared to the milliseconds operation of CPU. It was also noted that the FPGA-based system was more stable and predictable than the CPU-based system, as shown in the comparison graph in Fig. 11. It gives the comparison of the time- consumption at various selected scan areas between the proposed FPGA-based system and the conventional C#-LabVIEW based system and shows the stability and predictability for the FPGA based system compared to the software- based (C#-LabVIEW) system.

5. Conclusion

An FPGA-based UPI system was proposed

and implemented for a basic ultrasonic wave

propagation imaging algorithm; more specifically,

UWPI was performed with an FPGA board. The

results showed that the proposed system was

(8)

interval of 0.5 mm (40,401 points) to 87.55%

reduction for a 200 × 200 mm

²

scanning area with an interval of 0.5 mm (160,801 points) as presented in Fig. 11. The predictability and relative stability of the FPGA-based system is evident, where the time increment for the FPGA based system increases smoothly as the scanning area increases, compared to the unstable time increment for the CPU based system. More strikingly even for the 200 × 200 mm

²

scanning area, the average time consumption is around 22 seconds with the FPGA, compared to 176 seconds (avg.) for the same scanning data processed with the C#-LabVIEW.

This demonstrates that the proposed system is an innovative step in the field of damage visualization techniques, specifically for wave propagation techniques, and has the ability to further involve many advanced and sophisticated algorithms which can effectively perform better non-destructive testing. This new idea to implement FPGA technology into the UPI system will act as a break-through technology for full-field inspection of full-scale structures.

Acknowledgement

This research was supported by Leading Foreign Research Institute Recruitment Program (2011-0030065) and Basic Science Research Program (2011-0010489) through the National Research Foundation of Korea, funded by the Ministry of Education, Science and Technology, and by a grant from R&D Program of the Korea Railroad Research Institute, Korea

Vol. 67, pp. 1-7 (2003)

[3] T. E. Michaels and J. E. Michaels,

"Application of acoustic wave field imaging to non-contact ultrasonic inspection of bonded components," Review of Quantitative Nondestructive Evaluation, Vol. 25, pp.

1484-1491 (2006)

[4] C. C. Chia, J. R. Lee, C. Y. Park and H.

M. Jeong, "Laser ultrasonic anomalous wave propagation imaging method with adjacent wave subtraction application to actual damages in composite wings," Laser and Optics Technology," Vol. 44, No. 2, pp. 428-440 (2012)

[5] J. R. Lee, H. J. Shin, C. C. Chia, D. Dhital, D. J. Yoon and Y. H. Huh,

"Long distance laser ultrasonic propagation imaging system for damage visualization,"

Optics and Laser in Engineering, Vol. 49, No. 12, pp. 1361-1371 (2011)

[6] J. R. Lee, J. Takatsubo and N. Toyama,

"Disbond monitoring at wing stringer tip based on built-in ultrasonic transducers and a pulsed laser," Smart Materials and Structures, Vol. 16, pp. 1025-1035 (2007) [7] M. N. Salim, T. Hayashi, M. Murase and

S. Kamiya, "Visualization and modal analysis of guided waves from a defect in a pipe," Japanese Journal of Applied Physics, Vol. 48, pp. 1-5 (2009)

[8] C. C. Chia, J. R. Lee and H. J. Shin,

"Hot target inspection using a welded fiber

acoustic wave piezo electric sensor and a

laser-ultrasonic mirror scanner," Measure-

ment Science and Technology, Vol. 20,

(9)

pp. 1-8 (2009)

[9] J. R. Lee, H. M. Jeong, C. C. Chia, D. J.

Yoon and S. S. Lee, "Application of ultrasonic wave propagation imaging method to automatic damage visualization of nuclear power plant pipeline," Nuclear Engineering and Design, Vol. 240, No.

10, pp. 3513-3520 (2010)

[10] J. R. Lee, "Development of high speed laser ultrasonic propagation imaging sys- tem," Report Number: ADDR-401- 120000, Agency for Defense Development in Korea, System Engineering Book, Vol. 1 [11] J. R. Lee, C. C. Chia, C. Y. Park and H.

M. Jeong, "Laser ultrasonic anomalous wave propagation imaging method with adjacent wave subtraction: algorithm,"

Optics and Laser Technology, Vol. 44, No. 5, pp. 1507-1515 (2012)

[12] J. R. Lee and N. Sunuwar, "Advances in damage visualization algorithm of ultrasonic propagation imaging system," Journal of the Korean Society for Nondestructive Testing, Vol. 33, No. 2, pp. 232-240 (2013)

[13] H. M. Jeong, J. R. Lee and C. Y. Park,

"Advances in hardware of ultrasonic propagation imaging system," Journal of the Korean Society for Nondestructive testing, Vol. 32, No. 2, pp. 214-219 (2012) [14] C. C. Chia, J. R. Lee and J. H. Kim,

"Development of ultrasonic wave propaga-

tion imaging system," Journal of The Korean Society for Nondestructive Testing, Vol. 29, No. 4, pp. 293-302 (2009)

[15] "Smart Grid Ready Instrumentation,"

National Instrument White Paper,

<http://www.ni.com/white-paper/14529/en/>

(29 May. 2013)

[16] M. Birk, S. Koehler, M. Balzer, M.

Huebner, N. V. Ruiter and J. Becker,

"FPGA-based embedded signal processing for 3D ultrasound computer tomography,"

Nuclear Science, IEEE Transactions on, Vol. 58, No. 4, pp. 1647-1651 (2011) [17] S. Klupsch, M. Ernst, S. A. Huss, M.

Rumpf and R. Strzodka, "Real time image processing based on reconfigurable hard- ware acceleration," Proceedings of IEEE Workshop Heterogeneous Reconfigurable Systems on Chip, p. 1 (2002)