Modeling with Thin Film Thickness using Machine Learning

(1)

반도체디스플레이기술학회지 제18권 제2호(2019년 6월) Journal of the Semiconductor & Display Technology, Vol. 18, No. 2. June 2019.

Modeling with Thin Film Thickness using Machine Learning

Dong Hwan Kim

^*

, Jeong Eun Choi

^*

, Tae Min Ha

^*

and Sang Jeen Hong

^*†

*†

Department of Electronics Engineering, Myongji University

ABSTRACT

Virtual metrology, which is one of APC techniques, is a method to predict characteristics of manufactured films using machine learning with saving time and resources. As the photoresist is no longer a mask material for use in high aspect ratios as the CD is reduced, hard mask is introduced to solve such problems. Among many types of hard mask materials, amorphous carbon layer(ACL) is widely investigated due to its advantages of high etch selectivity than conventional photoresist, high optical transmittance, easy deposition process, and removability by oxygen plasma. In this study, VM using different machine learning algorithms is applied to predict the thickness of ACL and trained models are evaluated which model shows best prediction performance. ACL specimens are deposited by plasma enhanced chemical vapor deposition(PECVD) with four different process parameters(Pressure, RF power, C

3

H

6

gas flow, N

2

gas flow). Gradient boosting regression(GBR) algorithm, random forest regression(RFR) algorithm, and neural network(NN) are selected for modeling. The model using gradient boosting algorithm shows most proper performance with higher R-squared value. A model for predicting the thickness of the ACL film within the above- mentioned conditions has been successfully constructed.

Key Words : Modeling, Machine learning, Amorphous carbon layer, Film thickness, Box-Behnken

1. Introduction

¹

In the production environment of semiconductor factories, the monitoring system has been developed, and the collection of historical data of process information of equipment and quality characteristics is complete [1].

Hence, advanced process control(APC) has been gaining more and more relevance in semiconductor manufacturing, thanks to the increase of data volume and computational capabilities [2]. To increase productivity and improve performance of devices, semiconductor industry progresses to reduce the size of device. Accordingly, new fabrication technologies have been developed and researched. In this process, lot of time, cost, strict process condition and resources are required [3]. However, using virtual metrology helps save the time and resources by reducing real manufacturing. Virtual metrology(VM), one of APC techniques, is a method to predict characteristics of manufactured films using machine learning. In practice, there have been many researches to predict characteristics using data of etch, deposition process and data required

†

E-mail: [email protected]

through a variety of sensors [1, 4-6]. As the semiconductor manufacturing

process becomes more complex and the acceptable range of critical

dimension(CD) becomes narrower, VM is a key to achieve high product

quality by providing information based on the metrology result to adjust

tool settings because a small change of a tool setting during process may

cause a significant loss of the final yield [7]. Also, with miniaturization of

semiconductor devices and increasing pattern density of the very large

scale integrated(VLSI) circuit, a single photoresist mask is no longer

applicable for fine line patterning and contact [8]. Memory device has a

capacitor to store charge and its size was getting small. To guarantee

device's performance, capacitor had to compose short width and long

length. Etch fabrication have proceeded high aspect ratio dielectric etch to

satisfy this requirement. Conventional photo resist was no longer fulfill etch

condition because it is soft and has poor selectivity about etch. So hard

mask was introduced to solve such problems. Among many types of hard

mask materials, Amorphous Carbon Layer(ACL) is widely investigated

due to its advantages of high etch selectivity over a photoresist, high optical

transmittance, easy deposition, and removability by oxygen plasma, like

that of a remaining photoresist, after etching [8]. For realizing VM,

machine learning is used to modeling. The use of machine learning is an

(2)

Modeling with Thin Film Thickness using Machine Learning 49

advantageous in that it does not involve any assumptions and that the predictive model can be constructed in an easy and fast way [5].

In this study, virtual metrology using different machine learning algorithms(gradient boosting, random forest, neural network) is applied to predict the deposition rate of ACL through plasma enhanced chemical vapor deposition(PECVD) and learned model are analyzed which model is most proper about ACL thickness data.

2. Background 2.1 Machine Learning Algorithm

There are various algorithms for modeling with its advantage and disadvantage. Among them, modeling using neural network has been widely researched [1, 4-5, 9-10].

Neural network has an advantage that it adapts well to training data, but it doesn’t handle missing values and takes long training time [11].

However, ensemble tree method has advantages of handling missing value, obtaining finer-grain and generalized prediction model [11]. Thus, modeling is carried out using neural network and ensemble tree(random forest algorithm and gradient boosting algorithm).

2.1.1 Random forest regression(RFR) algorithm

In Fig. 1, RFR algorithm uses a bagging technique with unpruned decision trees. Bagging is a method of generating multiple bootstrap data for given training data, modeling them using each bootstrap data and combining them to calculate the final prediction model. Bootstrap refers to multiple specimens of the same size from raw data through random sampling. Thus, the RFR algorithm prediction for a new observation is made by averaging the output of the ensemble of decision trees.

Fig. 1. Structure of random forest algorithm

2.1.2 Gradient boosting regression(GBR) algorithm Before understanding GBR algorithm, concept of boosting should be known. In machine learning, boosting refers to a way to create a more

accurate and stronger learner by combining relatively inaccurate weak learners. In other words, in Fig. 2, the gradient boosting algorithm refers to a boosting technique in which a plurality of weak prediction models is generated step by step based on the slope of the loss function, and then combined with an ensemble method to have a strong prediction power.

Once the accuracy is low, the first tree model is created, and the second tree model complements the revealed weakness(prediction error). In this way, we continue to supplement weaknesses in the following tree model, and eventually build a strong learning machine.

Fig. 2. Structure of gradient boosting algorithm

2.1.3 Neural Network(NN)

A NN consists of an input layer, hidden layers, and an output layer which has many neurons. The left side of Fig. 3 shows how each neuron works. Each input is multiplied by a weight before entering the neuron.

The multiplied inputs are added as output and goes through an activation function that decides whether to activate the output value. Typical activation functions are Sigmoid, ReLU function. After these activation function, the final output becomes the input of the next neuron. At the right side of Fig. 3, this mechanism occurs in all neurons and each layer is fully connected. The performance of NN may vary depending on the number of hidden layers and the number of neurons. So, it is important to choose optimum number of layers and neurons.

Fig. 3. Structure of neural network

(3)

Dong Hwan Kim and Jeong Eun Choi and Tae Min Ha and Sang Jeen Hong 50

2.2 Evaluation method

Although there are several methods to evaluate the modeling, it is wrong to evaluate the modeling with only one method in terms of its reliability. So, in this study, trained models are compared with three of different analysis methods which model is most proper about ACL data in given parameter ranges. To evaluate each model, root mean square error(RMSE), mean absolute percentage error(MAPE) and R-squared, which measure of prediction accuracy of a forecasting method, are used.

2.2.1 RMSE

RMSE is a measure of the differences between values predicted by a model and the values observed.

1 is the number of test data set for model`s validation, y is the value measured through experiments and y is the value measured through the model.

2.2.2 MAPE

MAPE method which is similar with RMSE method, measures the different of real value ( and predicted value ( ), it utilizes absolute value instead of squared value.

|

2.2.3 R-squared

R-squared represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

1

=

explained variation, = total variation

,

3. Experiments

3.1 Deposition process and data acquisition There are many parameters that influence the film thickness. At this

time, changing all process parameter relevant to film thickness one by one needs a lot of time and inefficient. Also, it may contain meaningless data. So, to plan efficient process parameter value before fabrication and acquire valuable ACL film data, Box-Behnken method of design of experiment(DOE) utilize. Box-Behnken refers to analyze the data statistically and plan to obtain the maximum information with the minimum number of experiments [12]. Through the Box-Behnken method, at least 27 experiments are suggested using four factors(pressure, RF power, gas flow, gas flow. The reason for setting these four factors is that plasma is very sensitive to the above factors and the state of the plasma in PECVD results in a change in the process characteristics. The 27 process recipes derived using the Box- Behnken method is shown in Table 1.

Table 1. Process recipe

Type Value Unit

Electrode gap 1.5 ㎝

Temperature 300 ℃

Time 300 s

Power 230, 250, 270 W

Pressure 800, 1000, 1200 mTorr

gas flow 60, 80, 100 sccm

gas flow 30, 40, 50 sccm

Using established ACL process recipes, film data is acquired by PECVD which has merit of operating at lower temperature than other deposition equipment because it uses plasma.

Fig. 4. Structure of PECVD

As shown in Fig. 4, the two gases are injected through the gas line to

side of the chamber, not through the showerhead. Therefore, there is a

limitation that the injected gases may not spread uniformly. The

thickness of the deposited films was measured by an optical measuring

instrument called a reflectometer.

(4)

Modeling with Thin Film Thickness using Machine Learning 51

3.2 Normalization

In order to effectively learn the model using the acquired data, preprocessing of the data is indispensable. If the model is learned by using the data without normalization, the standard deviation is too large for effective learning and training the neural networks would have been very slow because the figure of cost function will be elongated [13].

Normalization was done by the following equation and Table 2. is the normalized total data.

Table 2. Normalization of 27 deposition runs

To train the model, 21 of randomly collected training data is used, and 6 of test data is used to verify the reliability of the models.

4. Result & Discussion

Using normalized thickness data, modeling is done by using different algorithms. In each algorithm, there are various parameters that determine the performance of the model. In case of the RFR in Fig 5(a), the number of trees that are sampled through bagging is specified as 200.

Decision tree constitute of internal nodes which have child node and leaf node. At this time, the minimum number of samples required to split an internal node is set by 2 and the maximum depth of the tree is until all leaves are pure.

In Fig 5(b), the number of samples to be continuously learned through the weak learner is set to 500, and the learning rate was set to 0.01. The number of maximum depth of tree mentioned above was limited to 10.

In the case of NN, Fig 5(c), the parameters of the algorithm used for modeling are 4 hidden layers, 0.01 of learning rate. Learning is updated with the lowest cost value through 50000 times.

(a)

(b)

(c)

Fig. 5. Modeling result to compare between test data and predict data using different algorithms (a) RFR, (b) GBR, and (c) NN

Table 3. Analysis of each modeling

Model RFR GBR NN

RMSE 0.1196 0.1094 0.0388

MAPE 21.0530 18.4199 5.9155

-3.1679 0.8510 0.7783

Table 3 is the results of the analysis for the models which apply three different algorithms using the evaluation methods described in second section. In Fig. 5 of third section, all trained models show that predicted results are considerably similar with test data except the model using NN.

However, Table 3. shows that the model using NN has lowest RMSE

(5)

Dong Hwan Kim and Jeong Eun Choi and Tae Min Ha and Sang Jeen Hong 52

value. The reason for the contradiction between figure and table is that NN optimizes to only reduce cost value. Therefore, it has low R-squared value since it is overfitted to meet low cost by training data. In case of RFR, R-squared value is negative. It means regression line is worse than using the mean value( ̅). The model using RFR shows poor predictive performance. Consequently, the model using GBR algorithm has highest performance than the others with highest R-squared value.

5. Conclusion

In this paper, models that can predict the ACL thickness with VM are made by various algorithm. This paper figures out it is more effective to predict the thickness of film by updating the weight through re-sampled data than through first sampled data like NN. Thus, the model using GBR is the best model to predict the normalized ACL thickness.

Acknowledgment

This work was supported by Korean Institute of Advancement of Technology(Grant ID: P0008458), and authors are grateful to staff of Semiconductor Progress Diagnosis Research Center(SPDRC) at Myongji University for their numerous technical supports and maintaining fabrication facility conditions.

References

1. Chen, W.C., Lee, A.H.I., Deng, W.J., and Liu, K.Y., "The implementation of neural network for semiconductor PECVD process," Expert Systems with Applications, Vol.

32, No. 4, pp. 1148-1153, 2007.

2. Terzi, M., Masiero, C., Beghi, A., Maggipinto, M., and Susto, G.A., "Deep learning for virtual metrology:

Modeling with optical emission spectroscopy data," 2017 Proc. IEEE 3rd Int. Forum Res. Technol. Soc. Ind., pp. 1-6.

3. Kim, H.-C., and Seol, Y.T., "Development of Virtual Integrated Prototyping Simulation Environment for Plasma Chamber Analysis and Design(VIP-SEPCAD),"

Journal of the Korean Society of Semiconductor Equipment Technology, Vol.2, No. 4, pp. 9-12, 2003.

4. Hong, S., May, G.S., and Park, D.-C., "Neural network modeling of reactive ion etching using optical emission spectroscopy data," IEEE Trans. Semi. Manufac., Vol.

16, No. 4, pp. 598-608, 2003.

5. Kim, B., Park, K., and Lee, D. "Use of neural network to model the deposition rate of PECVD-silicon nitride films," Plasma Sources Science and Technology, Vol.

14, No. 1, pp. 83-88, 2005.

6. Purwins, H., Barak, B., Nagi, a., Engel, R., Höckele, U., Kyek. A., Cheria, S., Lenz, B., Pfeifer, G., and Weinziel, K., "Regression methods for virtual metrology of layer thickness in chemical vapor deposition," IEEE/ASME Trans. Mechatronics, Vol. 19, No. 1, pp. 1-8, 2013.

7. Kang, P., Lee, H., Cho, S., Kim, D., Park, J., Park, C., and Doh, S., "A virtual metrology system for semiconductor manufacturing," Expert Systems with Applications Vol.

36, No. 10, pp. 12554-12561, 2009.

8. Kim, J.K., Cho, S.I., Kim, N.G., Jhon, M.S., Min, K.S., Kim, C.K., and Yeom, G.Y., "Study on the etching characteristics of amorphous carbon layer in oxygen plasma with carbonyl sulfide," Journal of Vacuum Science & Technology A: Vacuum, Surfaces, and Films Vol. 31, No. 2, p. 021301, 2013.

9. Hung, M.-H., Lin, T.-H, Cheng, F.-T., and Lin, R.-C., "A novel virtual metrology scheme for predicting CVD thickness in semiconductor manufacturing," IEEE/ASME Transactions on Mechatronics, Vol. 12, No. 3, pp. 308- 316, 2007.

10. Rietman, E., and Lory, E.R., "Use of neural networks in modeling semiconductor manufacturing processes: An example for plasma etch modeling," IEEE Trans. Semi Manufac., Vol. 6, No. 4, pp. 343-347, 1993.

11. Yip, W.K., Law, K.G., and Lee, W.J., "Forecasting Final/Class Yield Based on Fabrication Process E-Test and Sort Data," 2007 IEEE Int. Conf. Automation Science and Engineering, Scottsdale, AZ, pp. 478-483.

12. Cho, I.-H., Lee, N.-H., Chang, S.-W., An, S.-W., Yonn, Y.-H., Zoh, and K.-D., "Analysis of Characteristics and Optimization of Photo-degradation condition of Reactive Orange 16 Using a Box-Behnken Method," Journal of Korean Society of Environmental Engineers, Vol. 28, No. 9, pp. 917-925, 2006.