Oracle Database Health Framework 기술노트

(1)

r

Version 변경일자 변경자(작성자) 주요내용

1 2018.01.10 김도현 최초작성

2 2018.03.28 김도현 문서수정

Author 김도현 Creation Date 2018-01-10

Last Updated Version 1.0

Copyright(C) 2004 Goodus Inc.

All Rights Reserved

Oracle Database Health Framework

기술노트

(2)

- 2 –

Contents

1. Clusterware Monitoring ... 3

1.1. Autonomous Health Framework ... 3

1.2. Autonomous Health Framework Components ... 3

1.2.1. Cluster Health Advisor (CHA) ... 3

1.2.2. CHA 활용방법 ... 4

1.2.3. CHA Model 관리 ... 6

1.2.4. Promblems and Diagnosis View ... 7

1.2.5. Cluster Health Monitor (CHM) ... 7

1.2.6. CHM 추가된 파라미터 ... 7

1.2.7. Trace File Analyzer Collector ... 8

1.2.8. Tfactl 활용법 ... 9

2. Trace File Analyzer (TFA)... 10

2.1. Supported Environments ... 10

2.2. Oracle Trace File Analyzer Key Directories ... 10

2.3. Oracle Trace File Analyzer Command Interfaces ... 10

2.4. Securing access to Oracle Trace File Analyzer ... 10

2.5. Collecting Diagnostics Automatically ... 14

2.6. Collecting Diagnostics and Analyzing Logs On-Demand ... 14

2.6.1. Viewing System and Cluster Summary ...15

2.6.2. Investigating Logs for Errors ...18

2.6.3. Analyzing Logs Using the Included Tools ...21

2.6.4. One Command Service Request Data Collections...26

2.6.5. SRDC Collections ...27

2.6.6. Incident Packaging Service(IPS) Packages ...27

2.7. Managing and Configuring Oracle Trace File Analyzer ... 28

2.7.1. TFA 상태 및 구성 ...28

2.7.2. Oracle Trace File Analyzer Daemon ...29

2.7.3. Repository 관리 ...30

2.8. Database 와 Grid Infrastructure Diagnostic Data 관리 ... 33

2.8.1. 자동 Diagnostic Repository Log 와 Trace Files 관리 ...33

2.8.2. disk usage snapshot 관리 ...34

2.8.3. TFA Log 자동으로 Purge ...34

3. Cluster Health Monitor ... 35

3.1. CHM Data 수집 ... 35

3.2. CHM 모니터링 ... 35

3.2.1. oclumon debug ...35

3.2.2. oclumon dumpnodeview ...36

3.2.3. oclumon manage ...42

3.2.4. oclumon version ...42

(3)

- 3 –

1. Clusterware Monitoring

1.1. Autonomous Health Framework

è Oracle autonomous health framework 은 수집된 diagnostic data 를 분석하고 cluster 또는 RAC database 의 상태에 영향을 미치기 전에 사전에 문제를 식별하는 구성요소들의 모음.

1.2. Autonomous Health Framework Components

è Oracle ORAchk and EXAchk

- software 나 hardware 구성요소의 Oracle 구성에 대한 상태확인이며, 추가적인 요금이나, 라이센스는 발생하지 않음.

è Oracle Health Monitor (CHM)

- CHM 은 Grid Infrastructure 의 구성요소이며, Oracle Clusterware 및 OS resource 를 지속적으로 모니터링 및 저장함.

è Oracle Trace File Analyzer Collector

- Oracle Clusterware , Oracle Grid Infrastructure 및 Oracle RAC 시스템의 진단 데이터를 단순화하고, 클러스터 되지 않은 single instance 를 단순화 하는 진단 유틸리티임.

è Oracle Cluster Health Advisor

- 12.2 도입된 CHA 는 클러스터 노드와 Oracle RAC 데이터베이스의 성능 및 가용성 문제를 지속적으로 모니터링하 여 문제가 심각해지기 전에 문제에 대한 조기경고를 제공.

è Memory Guard

- Memroy Guard 는 클러스터 노드를 모니터링하여, Memory 부족으로 인한 노드 장애를 방재하는 RAC 환경 기능.

è Hang Manager

- Oracle RAC 환경에서 Hang Manager 기능은 자동으로 Hang 을 해결해주고, resource 를 이용할 수 있게 유지해줌.

- 11.1 부터 제공되었지만, 12.2 에서 새로운 기능으로 개선.

è Oracle Database Quality of Service (QOS) Management - applications 간에 resource 를 공유

1.2.1. Cluster Health Advisor (CHA)

è Oracle Cluster Health Advisor 는 12.2 에서 도입되었으며 성능 및 가용성 문제에 대한 지속적인 모니터링을 하고, 문제 발생 전에 조기 경보를 제공해줌.

è CHA 는 분석 결과, 진단 등은 GIMR 에 저장이 되며, 과거 문제를 검토할 수 있는 것도 GIMR 보존 기간에 따라 달라 지며, default retention 기간은 72 시간임.

è Oracle Cluster Health Advisor 은 각각의 노드에서 cluster resource (OCHAD) 로 실행되며, 각 cluster daemon(ochad)는 cluster 노드의 os 와 선택적으로 노드의 각 RAC Database Instance 를 모니터링함.

è Ochad 데몬은 각 database instance 에 연결이 필요하지 않으며, 각 모니터링 되는 database instance 에 대해 Oracle Health Advisor 1 분에 여러 번 상태를 분석함.

(4)

- 4 –

[그림 1] Oracle Cluster Health Advisor Architecture

1.2.2. CHA 활용방법

$ chactl status

monitoring nodes dotang3, dotang2, dotang1 monitoring databases dotang

è 실행 중인 모니터링 상태를 확인

$ chactl config

Databases monitored: dotang

$ chactl config database -db DOTANG Monitor: Enabled

Model: DEFAULT_DB

è 각 대상의 현재 모델과 모니터링 중인 대상 나열

$ crsctl stat res ora.chad -t

--- Name Target State Server State details --- Local Resources

--- ora.chad

ONLINE ONLINE dotang1 STABLE ONLINE ONLINE dotang2 STABLE ONLINE ONLINE dotang3 STABLE

è Cluster Health Advisor resource (ora.chad)

OS Data DB Data

CHM

OCHAD

GIMR

Node Health

Prognostics Engine

Database Health Prognostics Engine

(5)

- 5 –

$ chactl query model -name DEFAULT_CLUSTER -verbose Model: DEFAULT_CLUSTER

Target Type: CLUSTERWARE Version: 12.2.0.1_0 OS Calibrated on:

Calibration Target Name:

Calibration Date:

Calibration Time Ranges:

Calibration KPIs:

Used in Target: dotang

è DEFAULT_CLUSTER 의 상태 정도 출력

$ chactl query model -name DEFAULT_DB -verbose Model: DEFAULT_DB

Target Type: DATABASE Version: 12.2.0.1_0 OS Calibrated on:

Calibration Target Name:

Calibration Date:

Calibration Time Ranges:

Calibration KPIs:

Used in Target: dotang

è Model 의 세부 정보 확인

$ chactl query repository

specified max retention time(hrs): 72 available retention time(hrs) : 410 available number of entities : 17 allocated number of entities : 3 total repository size(gb) : 15.00 allocated repository size(gb) : 5.85

è repository 크기 및 retention 기간 확인

$ chactl set maxretention -time 84

max retention successfully set to 84 hours è repository retention 기간 변경

$ chactl query repository

specified max retention time(hrs): 84 available retention time(hrs) : 410 available number of entities : 14 allocated number of entities : 3 total repository size(gb) : 15.00 allocated repository size(gb) : 5.85

è Default retention 은 72 시간이며, maxretention 변경가능 è repository 확인

$ chactl -help

è chactl 명령의 상세 옵션 및 예시를 보여줌

(6)

- 6 – 1.2.3. CHA Model 관리

è Chactl calibrate 를 사용하여, 더 높은 감도와 정확도를 갖는 모델을 만들 수 있음.

$ Thu Feb 1 14:43:30 KST 2018

$ chactl calibrate database -db DOTANG -model TEST -timeranges 'start=2018-02-01 14:00:00,end=2018-02-01 15:00:00'

CLSCH-3729 : The number of data samples 549 is below the required number of data samples 720.

è DOTANG database 에 2018-02-01 14:00:00 ~ 2018-02-01 15:00:00 자료 수집시 최소 샘플 데이터 720 필요

$ Thu Feb 1 15:24:07 KST 2018

$ chactl calibrate database -db DOTANG -model test -timeranges 'start=2018-02-01 14:00:00,end=2018-02-01 15:00:00'

è DOTANG database 에 2018-02-01 14:00:00 ~ 2018-02-01 15:00:00 사이에 자료 수집

$ chactl query calibration -db DOTANG Database name : dotang

Start time : 2018-01-22 08:40:25 End time : 2018-01-22 12:00:00 Total Samples : 4792

Percentage of filtered data : 100%

1) Disk read (ASM) (Mbyte/sec)

MEAN MEDIAN STDDEV MIN MAX 0.01 0.00 0.15 0.00 6.09

<25 <50 <75 <100 >=100 100.00% 0.00% 0.00% 0.00% 0.00%

2) Disk write (ASM) (Mbyte/sec)

<50 <100 <150 <200 >=200 100.00% 0.00% 0.00% 0.00% 0.00%

3) Disk throughput (ASM) (IO/sec)

<5000 <10000 <15000 <20000 >=20000 100.00% 0.00% 0.00% 0.00% 0.00%

4) CPU utilization (total) (%)

<20 <40 <60 <80 >=80 94.89% 3.38% 1.44% 0.29% 0.00%

5) Database time (per user call) (usec/call) MEAN MEDIAN STDDEV MIN MAX 357.92 0.00 2392.91 0.00 56504.40

(7)

- 7 –

<10000000 <20000000 <30000000 <40000000 <50000000 <60000000 <70000000 >=70000000 100.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%

$ chactl query model

Models: DEFAULT_CLUSTER, DEFAULT_DB, WEEKDAY, test è Model 확인

1.2.4. Promblems and Diagnosis View

$ chactl query diagnosis -cluster

2018-02-01 13:10:05.0 Host dotang2 Host Memory Consumption [detected]

Problem: Host Memory Consumption

Description: CHA detected that more memory than expected is consumed on this server. The memory is not all ocated by sessions of this database.

Cause: The Cluster Health Advisor (CHA) detected an increase in memory consumption by other databases or b y applications not connected to a database on this node.

Action: Identify the top memory consumers by using the Cluster Health Monitor (CHM).

è GIMR 의 최신 분류를 위한 진단 정보, 수정조치 등을 저장할 수 있고, oracle clusterware 이벤트 통지 를 사용하여 EMCC 에 경고 메시지를 전송할 수 있음.

$ chactl query model -name test Model: test

Target Type: DATABASE Version: 12.2.0.1_0

OS Calibrated on: Linux amd64 Calibration Target Name: dotang Calibration Date: 2018-02-01 15:22:12

Calibration Time Ranges: start=2018-02-01 14:00:00,end=2018-02-01 15:00:00 Calibration KPIs: not specified

è Model 세부사항 확인

$ chactl monitor database -db DOTANG -model test

CLSCH-3637 : Database dotang is already being monitored.

è Monitoring 시작

1.2.5. Cluster Health Monitor (CHM)

è Chm 은 memory, swap 공간 사용, IO 사용 및 네트워크 관련 데이터와 OS 통계를 수집하며, 1 초에 한 번씩 정보를 수집함

è 기존부터 존재 했던 기능으로 12c 로 오면서 몇 가지의 파라미터가 추가.

1.2.6. CHM 추가된 파라미터

$ oclumon dumpnodeview -format csv

dumpnodeview: Node name not given. Querying for the local host ---

Node: dotang1 Clock: '2018-02-01 16.18.40+0900' SerialNo:2299

(8)

- 8 –

---

SYSTEM:

"#pcpus","#cores","#vcpus","cpuht","chipname","cpuusage[%]","cpusys[%]","cpuuser[%]","cpunice[%]","cpuiowa it[%]","cpusteal[%]","cpuq","physmemfree[KB]","physmemtotal[KB]","mcache[KB]","swapfree[KB]","swaptotal[KB ]","hugepagetotal","hugepagefree","hugepagesize","ior[KB/S]","iow[KB/S]","ios[#/S]","swpin[KB/S]","swpout[

KB/S]","pgin[#/S]","pgout[#/S]","netr[KB/S]","netw[KB/S]","#procs","#procsoncpu","#procs_blocked","#rtproc s","#rtprocsoncpu","#fds","#sysfdlimit","#disks","#nics","loadavg1","loadavg5","loadavg15","#nicErrors"

2,2,4,N,"Intel(R) Xeon(R) CPU E5-2699 v3 @

2.30GHz",5.29,2.10,3.18,0.00,0.20,0.00,0,759892,8175444,4562668,3858320,5119996,0,0,2048,68,168,33,0,0,64, 139,25.079,32.477,372,1,0,18,N/A,36096,6815744,16,4,0.36,0.58,0.49,0

$

oclumon dumpnodeview -format csv -system -cpu -topconsumer -dir '/home/grid’

è csv 파일로 저장 할 수 있는 옵션이 지원되면서, excel 문서로 확인이 가능

$ oclumon dumpnodeview -procag

dumpnodeview: Node name not given. Querying for the local host ---

Node: dotang1 Clock: '2018-02-01 16.19.55+0900' SerialNo:2314 ---

PROCESS AGGREGATE:

cpuusage[%] privatemem[KB] maxshmem[KB] #threads #fd #processes category sid 10.76 1760292 1099416 27 349 27 DBBG DOTANG11 21.19 3977024 354064 95 2688 91 DBBG DOTANG1 0.00 150120 63464 3 158 3 DBFG DOTANG1 21.29 704468 37212 33 2019 31 ASMBG +ASM1 0.94 342140 34244 13 528 13 ASMFG +ASM1 43.82 1976440 118880 404 3083 26 CLUST 1.98 226708 10192 291 1188 188 OTHER

è DBBG (DB backgrounds), MDBG (GIMR backgrounds), ASMBG (ASM backgrounds), DBFG (DB foregrounds) , MDBFG (GIMR foregrounds), ASMFG (ASM foregrounds), CLUST (cluster), OTHER (other processes)

$ oclumon dumpnodeview -help

è oclumon dumpnodeview 의 상세 옵션 및 예시를 보여줌

è 시작지점과 끝 지점을 지정해서 필요한 시간에만 출력 가능 지정하지 않을 시 많은 양의 데이터가 출력

1.2.7. Trace File Analyzer Collector

è Oracle Trace File Analyzer (TFA) Collector 는 Oracle Clusterware, Grid Infrastructure, RAC system 에 대한

(9)

- 9 –

진단 정보를 수집하는 유틸리티임.

è 12.2 출시되면서 Oracle Trace Analyzer 은 JRE 1.8 기반으로 구축되었으며, bash shell 은 더 이상 Trace File Analyzer 에 필요하지 않고, TFA 가 Java 에서 실행되므로 windows 플랫폼에서도 지원이 가능.

1.2.8. Tfactl 활용법

$ tfactl ips show incidents

Multiple ADR basepaths were found, please select one ...

( ) option[0] /grid/app/grid_base ( ) option[1] /oracle/app

( ) option[2] /oracle11/app

ADR Home = /grid/app/grid_base/diag/rdbms/_mgmtdb/-MGMTDB:

*************************************************************************

INCIDENT_ID PROBLEM_KEY CREATE_TIME --- --- --- 20113 ORA 700 [kskvmstatact: excessive swapping observed] 2018-01-09 17:31:50.218000 +09:00 20114 ORA 700 [kskvmstatact: excessive swapping observed] 2018-01-09 18:32:02.002000 +09:00 20115 ORA 700 [kskvmstatact: excessive swapping observed] 2018-01-09 19:32:20.529000 +09:00 24113 ORA 700 [kskvmstatact: excessive swapping observed] 2018-01-09 20:28:43.270000 +09:00

è incident 정보 출력

$ tfactl ips show problems

ADR Home = /grid/app/grid_base/diag/rdbms/_mgmtdb/-MGMTDB:

*************************************************************************

PROBLEM_ID PROBLEM_KEY LAST_INCIDENT LASTINC_TIME

--- --- --- 1 ORA 700 [kskvmstatact: excessive swapping observed] 24113 2018-01-09

20:28:43.270000 +09:00 è problems 정보 출력

$ tfactl managelogs -purge -older 2d -dryrun Output from host : dotang1

---

2018-02-02 10:06:33: INFO Estimating files older than 2 days

2018-02-02 10:06:33: INFO Estimating purge for diagnostic destination "diag/crs/dotang1/crs" for files ~ 4992 files deleted , 996.73 MB freed ]

Output from host : dotang2 ---

2018-02-02 10:06:37: INFO Estimating purge for diagnostic destination "diag/asm/+asm/+ASM2" for files ~ 544 files deleted , 17.29 MB freed ]

2018-02-02 10:06:37: MESSAGE Estimation for Grid Infrastructure [ Files to delete : ~ 5455 files | Space to be freed : ~ 1.35 GB ]

è 최근 2 일보다 오래된 파일의 크기 산정

(10)

- 10 –

2. Trace File Analyzer (TFA)

è Oracle Trace File Analyzer 는 진단 데이터를 수집하고 분석하는데 도움을 주며, Log 에 서비스에 잠재적으로 영 향을줄 수 있는 중대한 문제점을 모니터링 함.

2.1. Supported Environments

è Trace File Analyzer 을 지원하는 OS

- Linux OEL, RedHat, SuSE, Itanium, zLinux - Oracle Solaris SPARC, x86-64

- AIX

- HPUX Itanium, PA-RISC - Microsoft Windows 64-bit è Java 1.8 version 을 사용

è Oracle Clusterware 설치는 버전 11.2.0.4 , 12.1.0.2 부터 Oracle Trace File Analyzer 함께 제공 è MOS 1513912.1 에서 최신 TFA 를 다운 할 수 있음.

2.2. Oracle Trace File Analyzer Key Directories

Directory Description

Tfa/bin Oracle Clusterware 가 설치 된 경우 tfactl 명령어는 GRID_HOME/bin

에 설치

Tfa/repository Tfa 진단 정보를 저장

Tfa/node/tfa_home/database 시스템에 대한 데이터를 저장하는 Berkeley 데이터베이스

Tfa/node/tfa_home/diag Tfa 문제해결을 위한 도구

Tfa/node/tfa_home/diagnostics_to_collect 파일을 넣어 다음 컬렉션에 포함시킨 다음 나중에 삭제

Tfa/node/tfa_home/log Tfa 동작에 대한 Log

Tfa/node/tfa_home/resources Resource files 이 위치 Tfa/node/tfa_home/output 환경에 대한 Metadata 가 위치

2.3. Oracle Trace File Analyzer Command Interfaces

Interface Command How to use

Command-line $ tfactl command 명령 옵션을 지정

Shell interface $ tfactl Context 를 설정, 변경하고 Shell 에서 명령 수행 Menu Interface $ tfactl menu 메뉴 옵션을 선택한 다음 실행 명령

2.4. Securing access to Oracle Trace File Analyzer

è Tfactl 명령 실행은 권한이 부여 된 사용자로 제한 [root@dotang1 ~]# tfactl access lsusers

.---.

(11)

- 11 –

| TFA Users in dotang1 | +---+---+---+

| User Name | User Type | Status | +---+---+---+

| grid | USER | Allowed |

| oracle | USER | Allowed | '---+---+---'

.---.

è Tfa 접근 유저 확인

[root@dotang1 ~]# tfactl access add -user oracle11 Sucessfully added 'oracle11' to TFA Access list.

.---.

| oracle | USER | Allowed |

| oracle11 | USER | Allowed | '---+---+---'

è Tfa 사용 유저 추가

è Local 옵션을 사용하지 않는 경우 클러스터 전체에 적용

[root@dotang1 ~]# tfactl access remove -user oracle11 -local

Sucessfully removed 'oracle11' from TFA Access list.

.---.

è tfa 접근 하지 못하게 유저삭제

[root@dotang1 ~]# tfactl access removeall;

Sucessfully removed all users and groups from TFA Access list.

è tfa 등록된 모든 유저 삭제 [root@dotang1 ~]# tfactl access lsusers

(12)

- 12 –

No Users in TFA Access Manager list in dotang1.

[root@dotang1 ~]# tfactl access reset

Sucessfully restored to default TFA Access list.

è 모든 유저 접근상태를 Reset

[root@dotang1 ~]# tfactl access lsusers .---.

.---.

[root@dotang1 ~]# tfactl access enable

Enabling Access for Non-root Users on dotang1...

Enabling Access for Non-root Users on dotang2...

è 유저의 접근 enable

[root@dotang1 ~]# tfactl access lsusers;

.---.

(13)

- 13 –

[root@dotang1 ~]# tfactl access disable

è 유저의 접근 disable

Disabling Access for Non-root Users on dotang1...

[root@dotang1 ~]# tfactl access lsusers;

TFA for all Non-Root Users is currently disabled. Please enable it using 'tfactl access enable'.

.---.

| grid | USER | Disabled |

| oracle | USER | Disabled |

| oracle11 | USER | Disabled | '---+---+---'

.---.

| grid | USER | Disabled |

| oracle | USER | Disabled | '---+---+---'

(14)

- 14 – 2.5. Collecting Diagnostics Automatically

è TFA 는 문제를 발견하면 아래와 같이 수행

① 필요한 진단을 실행하고, 문제 발생시 모든 관련 로그 데이터를 수집

② Oracle Trace File Analyzer 이 진단에 필요한 것만 수집하도록 문제 발생 전후의 로그 파일을 정리

③ 클러스터 내의 모든 노드에서 진단 자료를 수집하고 패키지화하여 통합

④ TFA 저장소에 진단 내용을 저장

⑤ MOS 로 업로드할 준비가 된 문제점 및 세부사항을 이메일로 통지

2.6. Collecting Diagnostics and Analyzing Logs On-Demand

è Tfactl 명령을 사용하여 선택한 시간부터 모든 관련 로그 데이터를 수집하고, 진단에 필요한 것만 수집하 여 시간대의 로그 파일을 정리, 패키지화 할 수 있음.

(15)

- 15 – 2.6.1. Viewing System and Cluster Summary

[root@dotang1 ~]# tfactl summary -help

è tfactl summary 의 상세 옵션을 표시하며, crs, asm, database 등의 상세 정보를 확인할 수 있음

[root@dotang1 ~]# tfactl summary -overview

Executing Summary in Parallel on Following Nodes:

Node : dotang1 Node : dotang2

LOGFILE LOCATION :

/grid/app/grid_base/tfa/repository/suptools/dotang1/summary/root/20180314151413/log/summary_command_201803 14151413_dotang1_29568.log

Component Specific Summary collection : - Collecting CRS details ... Done.

- Collecting ASM details ... Done.

- Collecting ACFS details ... Done.

- Collecting DATABASE details ... Don

중 간 생 략

tfactl_summary>list

Components : Select Component - select [component_number|component_name]

1 => overview 2 => crs_overview 3 => asm_overview 4 => acfs_overview 5 => database_overview 6 => patch_overview 7 => listener_overview 8 => network_overview 9 => os_overview 10 => tfa_overview 11 => summary_overview

è 전체 summary 로 확인 가능하면 특정 옵션만 사용할 수도 있음.

[root@dotang1 ~]# tfactl summary -network

Executing Summary in Parallel on Following Nodes:

Node : dotang1 Node : dotang2

LOGFILE LOCATION :

/grid/app/grid_base/tfa/repository/suptools/dotang1/summary/root/20180314152328/log/summary_command_201803 14152328_dotang1_10551.log

Component Specific Summary collection : - Collecting NETWORK details ... Done.

Remote Summary Data Collection : In-Progress - Please wait ...

(16)

- 16 –

- Data Collection From Node - dotang2 .. Done.

Prepare Clusterwide Summary Overview ... Done cluster_status_summary

STATUS COMPONENT DETAILS +---+---+---+

OK NETWORK .---.

| CLUSTER_NETWORK_STATUS : | '---' +---+---+---+

### Entering in to SUMMARY Command-Line Interface ###

tfactl_summary>list

Components : Select Component - select [component_number|component_name]

1 => overview 2 => crs_overview 3 => asm_overview 4 => acfs_overview 5 => database_overview 6 => patch_overview 7 => listener_overview 8 => network_overview 9 => os_overview 10 => tfa_overview 11 => summary_overview

tfactl_summary>8

DETAILS STATUS_TYPE +---+---+

GOOD CLUSTER_NETWORK_STATUS +---+---+

tfactl_summary_networkoverview>list

Status Type: Select Status Type - select [status_type_number|status_type_name]

1 => network_clusterwide_status 2 => network_dotang1

3 => network_dotang2

tfactl_summary_networkoverview>1 STATUS_TYPE DETAILS +---+---+

CLUSTER_NETWORK_STATUS GOOD +---+---+

1 => network_clusterwide_status

(17)

- 17 –

tfactl_summary_networkoverview>2

=====> network_ocrcheck_details

| 1040 KB | 4 | 408528 KB | 409568 KB | 1765162140 | '---+---+---+---+---' +---+---+

=====> network_cluvfy_details

Observer CTSS_STATE +---+---+

=====> network_interface_details

TYPE DETAILS +---+---+

CLUSTER_INTERFACE .---.

| ens192 | public | 172.40.40.0 | global | | ens256 | cluster_interconnect,asm | 10.10.10.0 | global | '---+---+---+---' INTERFACE_LIST .---.

| INTERFACE_NAME | IP | +---+---+

| ens192 | 172.40.40.0 | | ens256 | 10.10.10.0 | | ens256 | 169.254.0.0 | | virbr0 | 192.168.122.0 | '---+---' +---+---+

1 => network_clusterwide_status 2 => network_dotang1

tfactl_summary_networkoverview>help

(18)

- 18 –

c|clear => Clear Console

q|quit => Quit Summary Command-Line Interface ~ => Summary Command-Line Interface Home h|help => Help

2.6.2. Investigating Logs for Errors

è TFA 를 사용하여 클러스터의 모든 로그를 분석하여 최근 오류 식별 [grid@dotang1 ~]$ tfactl analyze -last 10d

INFO: analyzing all (Alert and Unix System Logs) logs for the last 14400 minutes... Please wait...

INFO: analyzing host: dotang1

Report title: Analysis of Alert,System Logs Report date range: last ~10 day(s)

Report (default) time zone: KST - Korea Standard Time Analysis started at: 14-Mar-2018 04:27:16 PM KST Elapsed analysis time: 2 second(s).

Configuration file: /grid/app/12.2.0.1/grid/tfa/dotang1/tfa_home/ext/tnt/conf/tnt.prop Configuration group: all

Total message count: 42,605, from 09-Jan-2018 04:56:00 PM KST to 14-Mar-2018 04:25:01 PM KST

Messages matching last ~10 day(s): 24,402, from 04-Mar-2018 04:27:29 PM KST to 14-Mar-2018 04:25:01 PM KST

last ~10 day(s) error count: 5, from 05-Mar-2018 06:54:13 PM KST to 05-Mar-2018 07:40:20 PM KST

last ~10 day(s) ignored error count: 0 last ~10 day(s) unique error count: 5

Message types for last ~10 day(s)

Occurrences percent server name type --- --- --- --- 24,390 100.0% dotang1 generic 7 0.0% dotang1 WARNING 5 0.0% dotang1 ERROR --- ---

24,402 100.0%

Unique error messages for last ~10 day(s)

Occurrences percent server name error --- --- --- ---

1 20.0% dotang1 [OCSSD(15737)]CRS-1601: CSSD Reconfiguration complete. Active nodes are dotang1 dotang2 .

1 20.0% dotang1 [OCSSD(3870)]CRS-1601: CSSD Reconfiguration complete. Active nodes are dotang1 dotang2 dotang3 .

1 20.0% dotang1 [OCSSD(15737)]CRS-1601: CSSD Reconfiguration complete. Active nodes are dotang1 .

(19)

- 19 –

--- --- 5 100.0%

Messages matching last ~10 day(s): 26,755, from 05-Mar-2018 01:13:15 AM KST to 14-Mar-2018 04:25:01 PM KST

last ~10 day(s) error count: 7, from 05-Mar-2018 06:54:13 PM KST to 06-Mar-2018 04:50:30 PM KST

last ~10 day(s) ignored error count: 0 last ~10 day(s) unique error count: 7

Message types for last ~10 day(s)

Occurrences percent server name type --- --- --- --- 26,744 100.0% dotang2 generic 7 0.0% dotang2 ERROR 4 0.0% dotang2 WARNING --- ---

26,755 100.0%

Unique error messages for last ~10 day(s)

Occurrences percent server name error --- --- --- ---

1 14.3% dotang2 WARNING: Heavy swapping observed on system in last 5 mins.

Heavy swapping can lead to timeouts, poor performance, and instance eviction.

Errors in file

/oracle/app/diag/rdbms/dotang/DOTANG2/trace/DOTANG2_dbrm_32547.trc (incident=74355):

ORA-00700: soft internal error, arguments: [kskvmstatact:

excessive swapping observed], [], [], [], [], [], [], [], [], [], [], []

Incident details in:

/oracle/app/diag/rdbms/dotang/DOTANG2/incident/incdir_74355/DOTANG2_dbrm_32547_i74355.trc

1 14.3% dotang2 WARNING: Heavy swapping observed on system in last 5 mins.

Heavy swapping can lead to timeouts, poor performance, and instance eviction.

Errors in file

(20)

- 20 –

/oracle/app/diag/rdbms/tang/TANG/trace/TANG_dbrm_28098.trc (incident=2681):

ORA-00700: soft internal error, arguments: [kskvmstatact:

excessive swapping observed], [], [], [], [], [], [], [], [], [], [], []

Incident details in:

/oracle/app/diag/rdbms/tang/TANG/incident/incdir_2681/TANG_dbrm_28098_i2681.trc

--- --- 7 100.0%

è 10 일간 수집된 모든 에러 분석

[grid@dotang1 ~]$ tfactl analyze -search "ORA-700" -last 30d

Parameter: ORA-700

Messages matching last ~30 day(s): 38,722, from 12-Feb-2018 04:32:52 PM KST to 14-Mar-2018 04:31:40 PM KST

Matching regex: ORA-700 Case sensitive: false Match count: 0

Parameter: ORA-700

Total message count: 180,887, from 09-Jan-2018 05:05:14 PM KST to 14-Mar-2018 04:30:01

(21)

- 21 –

PM KST

Messages matching last ~30 day(s): 166,605, from 12-Feb-2018 04:33:06 PM KST to 14-Mar-2018 04:30:01 PM KST

Matching regex: ORA-700 Case sensitive: false Match count: 0

è 30 일(43200 분)동안 발생한 ora-700 에 대한 자료만 분석

2.6.3. Analyzing Logs Using the Included Tools

## Linux and UNIX 에 포함된 Tool

Tool Description

orachk or exachk TFA 는 Exachk 와 Orachk 를 설치함 MOS 1070954.1(Exachk) 1268927.2(Orachk)

oswatcher os 의 성능측정을 수집

MOS 301137.1

procwatcher database 성능 진단 및 session hang level 의 정보를 자동화하고, 캡처함 MOS 459694.1

oratop 실시간 데이터베이스 모니터링

MOS 1500864.1

alertsummary 모든 노드의 하나 이상의 database 또는 asm 에 alert.log 에 대한 이벤트 요약

ls TFA 가 모든 노드에서 주어진 파일 이름 패턴을 알고 있는 모든 파일을 나열

pstack 모든 노드에서 지정된 프로세스의 프로세스 스택을 생성

grep 지정된 데이터베이스가 있는 alert 이나 trace 파일에서 문자열 검색

summary 구성에 대한 high-level 에 요약을 제공

vi vi 편집기에서 alert, trace 파일을 볼 수 있음

tail alert.trace 파일을 확인

param 지정된 패턴과 일치하는 모든 데이터베이스 및 OS 의 Parameter 값 확인

dbglevel 한 명령으로 여러 CRS trace level 설정 및 해제

history tfactl shell 의 history 를 보여줌

changes 주어진 시간 동안 시스템 설정의 변경사항을 보고

calog cluster event 로그에서 주요 event 를 보고

events log 에 보여지는 warning 및 error 를 보고

managelogs 디스크 공간을 표시하고 ADR 로 log 및 trace 파일을 삭제

ps 프로세스 찾기

triage oswatcher or exawatcher data 를 요약

## Tool 확인

[root@dotang1 ~]# tfactl toolstatus

.---.

| TOOLS STATUS - HOST : dotang1 | +---+---+---+---+

(22)

- 22 –

| | oratop | 14.1.2 | DEPLOYED | +---+---+---+---+

| | prw | 12.1.13.11.4 | NOT RUNNING | +---+---+---+---+

| | ls | 12.2.1.1.0 | DEPLOYED |

| | ps | 12.2.1.1.0 | DEPLOYED |

| | vi | 12.2.1.1.0 | DEPLOYED | '---+---+---+---' Note :-

DEPLOYED : Installed and Available - To be configured or run interactively.

NOT RUNNING : Configured and Available - Currently turned off interactively.

RUNNING : Configured and Available.

è tfa 로 사용 가능한 tool 상태 정보 è 초기 설치시 구성상태

[root@dotang1 ~]# tfactl run orachk

tfa orachk : /grid/app/12.2.0.1/grid/tfa/dotang1/tfa_home/ext/orachk/orachk has version 20171212 suptools orachk : /grid/app/12.2.0.1/grid/suptools/orachk/orachk has version 0

TFA using orachk : /grid/app/12.2.0.1/grid/tfa/dotang1/tfa_home/ext/orachk/orachk

중 간 생 략

Searching for running databases . . . List of running databases registered in OCR

1. DOTANG 2. GOODUS 3. TANG

4. All of above 5. None of above

Select databases from list for checking best practices. For multiple databases, select 4 fo 1

(23)

- 23 –

Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS

. . . . . . . . . . . . . . . . . . . . --- Oracle Stack Status

--- Host Name CRS Installed RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name --- dotang1 Yes Yes Yes Yes Yes DOTANG1 dotang2 Yes Yes Yes Yes Yes DOTANG2 dotang3 Yes Yes Yes Yes Yes DOTANG3

중 간 생 략

--- CLUSTERWIDE CHECKS

--- FAIL => cellinit.ora does not match across database servers --- Detailed report (html) -

/grid/app/grid_base/tfa/repository/suptools/dotang1/orachk/root/orachk_dotang1_DOTANG_031518_165746/orachl

UPLOAD [if required] -

/grid/app/grid_base/tfa/repository/suptools/dotang1/orachk/root/orachk_dotang1_DOTANG_031518_165746.zip è tfa 를 사용해서 orachk 사용 /tfa/repository 아래로 저장

tfactl> oratop -help

è oratop 의 상세 옵션 확인

tfactl> oratop -database DOTANG

Connecting ...

Processing ...

Oracle 12c - Primary DOTANG 17:38:38 up: 10d, 3 ins, 0 sn, 0 us, 6.8G mt, 0% frab ID %CPU LOAD %DCU AAS ASC ASI ASW ASP AST UST MBPS IOPS IORL LOGR PHYR PHYW %FR T 3 7 0 0 0 0 0 0 0 0 0 0 4 322u 0 0 0 6 71 1 9 0 0 0 0 0 0 1 0 0 0 4 434u 0 0 0 14 74 2 4 0 0 0 0 0 0 0 0 0 0 4 301u 0 0 0 18 76

EVENT (C) TOTAL WAITS TIME(s) AVS DB CPU 77619 RMA: IPC0 completion sync 2687294 52335 r db file sequential read 2898661 20636 O control file sequential read 9960092 18505 O db file scattered read 1778120 18007 O

ID SID SPID USERNAME PROGRAM SRV SERVICE PGA SQLID/BLOCKER OPN E/T STA STE T 3 259 5223 B/G RMS0 DED SYS$BAC 3.8M 10d ACT WAI m

è 위와 같은 OS Top 과 비슷한 모니터링 화면이 보임.

è 옵션에 따라 모니터링 내용이 달라짐

(24)

- 24 –

[oracle:dotang1:DOTANG1:/home/oracle]tfactl diagcollect -srdc ora700 Enter the time of the ORA-00700 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] : Enter the Database Name [<RETURN>=ALL] : DOTANG

1. Jan/11/2018 13:57:31 : [dotang] ORA-00700: soft internal error, arguments: [kskvmstatact]

Please choose the event : 1-3 [1]

Selected value is : 1 ( Jan/11/2018 13:57:31 )

Scripts to be run by this srdc: ipspack rdahcve1210 rdahcve1120 rdahcve1110 Components included in this srdc: OS CRS DATABASE NOCHMOS

Collecting data for local node(s)

Scanning files from Jan/11/2018 07:57:31 to Jan/11/2018 19:57:31 Collection Id : 20180315181338dotang1

Detailed Logging at :

/grid/app/grid_base/tfa/repository/srdc_ora700_collection_Thu_Mar_15_18_15_05_KST_2018_node_local/diagcoll ectg

2018/03/15 18:15:07 KST : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom

2018/03/15 18:15:07 KST : Collection Name : tfa_srdc_ora700_Thu_Mar_15_18_15_05_KST_2018.zip 2018/03/15 18:15:07 KST : Collecting additional diagnostic information...

2018/03/15 18:15:07 KST : Scanning of files for Collection in progress...

2018/03/15 18:15:12 KST : Getting list of files satisfying time range [01/11/2018 07:57:31 KST, 01/11/2018 19:57:31 KST]

2018/03/15 18:15:43 KST : Collecting ADR incident files...

2018/03/15 18:16:15 KST : Completed collection of additional diagnostic information...

2018/03/15 18:16:19 KST : Completed Local Collection .---.

| Collection Summary | +---+---+---+---+

/grid/app/grid_base/tfa/repository/srdc_ora700_collection_Thu_Mar_15_18_15_05_KST_2018_node_local

/grid/app/grid_base/tfa/repository/srdc_ora700_collection_Thu_Mar_15_18_15_05_KST_2018_node_local/dotang1.

tfa_srdc_ora700_Thu_Mar_1p

è SR 시 필요한 자료를 한번에 수집

[oracle:dotang1:DOTANG1:/home/oracle]tfactl diagcollect -srdc dbperf Enter the Database Name [Required for this SRDC] : DOTANG

Do you have a performance issue now [Y|y|N|n] [Y]: y

As you have indicated that the performance issue is currently happening, will be collecting snapshots for the following periods:

Start time when the performance was bad: Mar/15/2018 17:23:21 Stop time when the performance was bad: Mar/15/2018 18:23:21

For comparison, it is useful to gather data from another period with similar load where problems are not seen. Typically this is li

(25)

- 25 –

Enter start time when the performance was good [YYYY-MM-DD HH24:MI:SS] : Baseline From time is mandatory for this SRDC

Enter start time when the performance was good [YYYY-MM-DD HH24:MI:SS] : Mar/10/2018 12:00:00 Start time when the performance was good Mar/10/2018 12:00:00

Enter stop time when the performance was good [YYYY-MM-DD HH24:MI:SS] : Mar/15/2018 12:00:00 Stop time when the performance was good Mar/15/2018 12:00:00

If any particular SQL causes the Database to be slow enter the SQL_ID ?(Refer to Doc 1627387.1 for more information on how to deter

Found 1 snapshot(s) for Bad Performance time range in DOTANG Found 120 snapshot(s) for baseline range in DOTANG

"Automatic Workload Repository (AWR) is a licensed feature.Refer to My Oracle Support Document ID 1490798.1 for more information"

Scripts to be run by this srdc: orachk_dbperf ipspack dbslow srdc_db_lfsdiag.sql Components included in this srdc: OS CRS DATABASE CHMOS

Collecting data for all nodes

Collection Id : 20180315182428dotang1

중 간 생 략

.---.

| dotang2 | Completed | 64MB | 333s |

| dotang1 | Completed | 52MB | 320s | '---+---+---+---'

Logs are being collected to:

/grid/app/grid_base/tfa/repository/srdc_dbperf_collection_Fri_Mar_16_09_15_53_KST_2018_node_all

/grid/app/grid_base/tfa/repository/srdc_dbperf_collection_Fri_Mar_16_09_15_53_KST_2018_node_all/dotang1.tf a_srdc_dbperf_Fri_Mar_16_09_15_53_KST_2018.zip

/grid/app/grid_base/tfa/repository/srdc_dbperf_collection_Fri_Mar_16_09_15_53_KST_2018_node_all/dotang2.tf a_srdc_dbperf_Fri_Mar_16_09_15_53_KST_2018.zip

è AWR 라이선스에 대한 이슈가 발생할 수 있으니 사용 시 주의

tfactl diagcollect -from “yyyy-mm-dd” -to “yyyy-mm-dd” 기간 설정으로도 가능하고, tfactl diagcollect - crs -os -node1 node2 -last 5h 특정 components 로도 수집 가능.

[root@dotang1 ~]# tfactl diagcollect -help è diagcollect 옵션을 확인

[root@dotang1 ~]# tfactl diagcollect -asm -node dotang1 -from "2018-03-15" -to "2018-03-16"

Collecting data for dotang1 node(s)

Scanning files from mar/15/2018 00:00:00 to mar/16/2018 23:59:59

중 간 생 략

.---.

(26)

- 26 –

| dotang1 | Completed | 76MB | 42s | '---+---+---+---'

Logs are being collected to:

/grid/app/grid_base/tfa/repository/collection_Fri_Mar_16_10_40_25_KST_2018_node_dotang1

/grid/app/grid_base/tfa/repository/collection_Fri_Mar_16_10_40_25_KST_2018_node_dotang1/dotang1.tfa_Fri_Ma r_16_10_40_25_KST_2018.zip

è 2018-03-15 ~ 2018-03-16 사이에 dotang1 노드에 대한 asm 진단 정보수집

2.6.4. One Command Service Request Data Collections

Type of Problem Available SRDCs Collection Scope

ORA Errors ORA-00020 ORA-04031

ORA-00060 ORA-07445 ORA-00600 ORA-27300 ORA-00700 ORA-27301 ORA-01555 ORA-27302 ORA-01628 ORA-29548 ORA-04030 ORA-30036

Local-only

Database performance Problems dbperf Cluster-wide

Database Resource Problems dbunixresources Local-only Other internal database errors internalerror Local-only Database patching problems dbpatchinstall

dbpatchconflict

Local-only

Database Export dbexp

dbexpdp dbexpdpapi dbexpdpperf dbexpdptts

Local-only

Database Import dbimp

dbimpdp dbimpdpperf

Local-only

RMAN dbrman600

dbrman8137_8120 dbrmanbackup dbrmancrossplatform dbrmanmaint

dbrmanperf dbrmanrr

Local-only

System change number dbscn Local-only

(27)

- 27 –

GolddenGate dbggclassicmode

dbggintegratedmode

Local-only

Database install / Upgrade Problems

dbinstall dbupgrade dbpreupgrade

Local-only

Database Storage Problems dbasm Local-only

2.6.5. SRDC Collections

Command What gets collected

tfactl diagcollect -srdc ORA-04031 IPS package Patch listing AWR report

Memory information RDA HCVE output tfactl diagcollect -srdc dbperf ADDM report

AWR for good period ans Problem period AWR Compare Period report

ASH report for good and problem period OSWatcher

IPS Package Oracle Orachk

2.6.6. Incident Packaging Service(IPS) Packages

Incident Packaging Service 는 나중에 진단 할 수 있도록 Oracle Database 의 ADR 에 저장된 문제점의 세부 사항을 패키지로 재공

TFA 를 통해서도 패키지 내용을 수집하기 위해서 IPS 를 실행할 수 있음.

## tfactl ips command Parameters

Command Description

tfactl ips Runs the IPS

tfactl ips show incidents shows all IPS incidents tfactl ips show problems shows all IPS problems tfactl ips show package shows all IPS Packages

tfactl diagcollect -ips -h shows all available diagcollect IPS options tfactl diagcollect -ips -adrbasepath

adr_base -adrhomepath adr_home

-silent 모드로 수집을 수행

tfactl diagcollect -ips -incident incident_id

특정 incident id 에 대해서 ADR details 수집

(28)

- 28 –

tfactl diagcollect -ips -problem

problem_id

특정 problem id 에 대해서 ADR details 수집

[oracle:dotang1:DOTANG1:/home/oracle]tfactl ips show incidents Multiple ADR basepaths were found, please select one ...

( ) option[0] /grid/app/grid_base ( ) option[1] /oracle/app

( ) option[2] /oracle11/app

Pls select an ADR basepath [0..2] ?1 /oracle/app was selected

ADR Home = /oracle/app/diag/clients/user_oracle/host_2067746558_107:

*************************************************************************

0 rows fetched

ADR Home = /oracle/app/diag/asmtool/user_oracle/host_2067746558_107:

*************************************************************************

0 rows fetched

ADR Home = /oracle/app/diag/tnslsnr/dotang1/listener_goodus:

*************************************************************************

0 rows fetched

ADR Home = /oracle/app/diag/rdbms/dotang/DOTANG1:

*************************************************************************

INCIDENT_ID PROBLEM_KEY CREATE_TIME --- --- ---

121 ORA 700 [kskvmstatact: excessive swapping observed] 2018-01-09 20:44:59.564000 +09:00

ADR Home = /oracle/app/diag/rdbms/goodus/GOODUS:

*************************************************************************

0 rows fetched

2.7. Managing and Configuring Oracle Trace File Analyzer

2.7.1. TFA 상태 및 구성

## print 명령을 사용하여 상태 또는 구성상태 확인

[oracle:dotang1:DOTANG1:/home/oracle]tfactl print config

.---.

| dotang1 | +---+---+

| Configuration Parameter | Value | +---+---+

| TFA Version | 18.1.1.0.0 |

| Java Version | 1.8 |

| Public IP Network | true |

| Automatic Diagnostic Collection | true |

| Alert Log Scan | true |

| Disk Usage Monitor | true |

(29)

- 29 –

| Managelogs Auto Purge | false |

| Trimming of files during diagcollection | true |

| Inventory Trace level | 1 |

| Collection Trace level | 1 |

| Scan Trace level | 1 |

| Other Trace level | 1 |

| Repository current size (MB) | 2770 |

| Repository maximum size (MB) | 10240 |

| Max Size of TFA Log (MB) | 50 |

| Max Number of TFA Logs | 10 |

| Max Size of Core File (MB) | 20 |

| Max Collection Size of Core Files (MB) | 200 |

| Minimum Free Space to enable Alert Log Scan (MB) | 500 |

| Time interval between consecutive Disk Usage Snapshot(minutes) | 60 |

| Time interval between consecutive Managelogs Auto Purge(minutes) | 60 |

| TFA IPS Pool Size | 5 | '---+---'

è tfa 구성상태를 출력

## 구성 설명

Configuration Listing Default Value Description

Trimming of files during diagcollection

True True / False

Repository maximum size 10GB or Filesystem 의 여부공간의 50% 보다 작은 크기

저장소가 될 수 있는 최대 크기

Trace Level 1 1,2,3,4 의 Level 로 설정 가능

최소 값은 1 이며, MOS 요청시에만 변경하는 것을 추천

Automatic Purging True 저장소 여유 공간이 1GB 미만이거나

저장소를 Close 하기전에 수행

2.7.2. Oracle Trace File Analyzer Daemon

## init controlfile 은 /etc/init.d/init.tfa 는 플랫폼에 따라 다름 tfactl start : TFA Daemon 을 시작

tfactl stop : TFA Daemon 정지

è TFA Daemon 실패하면, OS 에서 자동으로 Daemon 을 재시작 함.

## TFA Daemon 을 자동 재시작을 enable or disable 하는 방법 tfactl disable : TFA Daemon 의 자동 재시작을 Disable tfactl enable : TFA Daemon 의 자동 재시작을 enable

(30)

- 30 – 2.7.3. Repository 관리

## TFA 자동 관리 Purge

è TFA_HOME 경로의 여유공간이 100MB 보다 작으면 indexing 도 중지 è ORACLE_BASE 가 100MB 보다 작으면 indexing 도 중지

è Repository 여유공간이 1GB 보다 작아지면 정지

è Repository 의 현재 크기가 Repository 최대 크기보다 커지면 정지 (reposizeMB)

è TFA Daemon 은 여유 공간이 1GB 미만 또는 Repository 를 닫기전에 모니터링하고 자동으로 제거함.

è TFA 는 minagetopurge 이전의 수집된 것만 자동으로 제거하며, default 값은 12 [root@dotang1 ~]# tfactl set minagetopurge=48

Successfully set minFileAgeToPurge=48

.---.

| dotang1 | +---+---+

| Configuration Parameter | Value | +---+---+

| TFA Version | 18.1.1.0.0 |

| Java Version | 1.8 |

| Public IP Network | true |

| Automatic Diagnostic Collection | true |

| Alert Log Scan | true |

| Managelogs Auto Purge | false |

| Trimming of files during diagcollection | true |

| Inventory Trace level | 1 |

| Collection Trace level | 1 |

| Scan Trace level | 1 |

| Other Trace level | 1 |

| Repository current size (MB) | 2770 |

| Repository maximum size (MB) | 10240 |

| Max Size of TFA Log (MB) | 50 |

| Max Number of TFA Logs | 10 |

| Max Size of Core File (MB) | 20 |

| Max Collection Size of Core Files (MB) | 200 |

| Minimum Free Space to enable Alert Log Scan (MB) | 500 |

| Time interval between consecutive Disk Usage Snapshot(minutes) | 60 |

| TFA IPS Pool Size | 5 | '---+---'

è 자동 관리 삭제를 48 시간으로 변경

[root@dotang1 ~]# tfactl set autopurge=off Successfully set autoPurge=OFF

생 략

| Automatic Purging | false | è 자동 관리를 off

(31)

- 31 –

[root@dotang1 ~]# tfactl set repositorydir=/grid

Repository Max size (10240 MB) should be greater than Current Repository Size (21170 MB) No changes made to repository size.

.---.

| dotang1 | +---+---+

| Repository Parameter | Value | +---+---+

| Location | /grid/app/grid_base/tfa/repository |

| Maximum Size (MB) | 10240 |

| Current Size (MB) | 2770 |

| Free Space (MB) | 7470 |

| Status | OPEN | '---+---'

è respoitory 위치 경로 변경

[root@dotang1 ~]# tfactl set reposizeMB=20480

Repository size will consume more than 50% of available space in filesystem.

Do you wish to continue with the new size ? [Y/y/N/n] [N] y Successfully changed repository size

.---.

| Old Maximum Size (MB) | 10240 |

| New Maximum Size (MB) | 20480 |

| Status | OPEN | '---+---' è size 를 조정 filesystem 사용 가능한 공간의 50%이상을 소비

## Repository 수동 관리

[root@dotang1 ~]# tfactl print repository

.---.

| dotang1 | +---+---+

| Free Size (MB) | 7441 |

| Status | OPEN | '---+---'

.---.

| dotang2 | +---+---+

(32)

- 32 –

| Free Size (MB) | 9832 |

| Status | OPEN | '---+---'

è TFA Repository 상태 확인

[root@dotang1 ~]# tfactl print collections

.---

| Collection Id | Nodelist | Collection Time

| Collect

ion Details |

+---+---+---

| 1515487041511dotang1 | [dotang1] | Start Time: Tue Jan 09 17:21:49 KST 2018 | Tag:

|

| | |

| /grid/a

pp/grid_base/tfa/repository/collection_201 |

| | |

| 8_01_09

T17_21_49_node_dotang1 |

| Auto Collection | Initiating node: dotang1 | End Time: Tue Jan 09 17:37:21 KST 2018 | Zip: do

tang1.2018_01_09T17_31_49.zip |

| Events: [.*ORA-00700.*] | |

| Compone

nts: [os, rdbms] |

| Request User: root | |

| Zip Siz

e: 1102 |

| | |

| Time Ta

ken: 25 s |

[root@dotang1 ~]# tfactl purge -older 5h List of files in the repository older than 5h:

/grid/app/grid_base/tfa/repository/collection_2018_03_14T12_21_50_node_dotang1 /grid/app/grid_base/tfa/repository/collection_2018_01_09T17_21_49_node_dotang1 Do you want to delete the above files. [Y|y|N|n] [Y]: y

Deleting /grid/app/grid_base/tfa/repository/collection_2018_03_14T12_21_50_node_dotang1 ...Deleted.

Deleting /grid/app/grid_base/tfa/repository/collection_2018_01_09T17_21_49_node_dotang1 ...Deleted.

è 5 시간 보다 오래된 데이터들은 삭제

[root@dotang1 ~]# tfactl set trimfiles=on Successfully set trimfiles=ON

| Trimming of files during diagcollection | true | è Default 는 On 으로 설정되어 있음

è maxcorefilesize 는 default 20MB 이며, 설정시 지정된 값보다 큰 core 파일은 skip 함 è maxcorecollectionsize 에 지정된 값에 도달한 후 skip 함 (default=200m)

(33)

- 33 –

2.8. Database 와 Grid Infrastructure Diagnostic Data 관리

2.8.1. 자동 Diagnostic Repository Log 와 Trace Files 관리

## tfactl managelogs

[oracle:dotang1:DOTANG1:/home/oracle]tfactl managelogs -help è tfactl managelogs 의 상세 옵션 및 예시 확인

[root@dotang1 ~]# tfactl managelogs -purge -older 30d -dryrun Output from host : dotang1

---

2018-03-19 15:27:33: INFO Space is calculated in bytes [without round off]

중 간 생 략

2018-03-19 15:27:38: MESSAGE Estimation for Database Home : /oracle/app/product/12.2.0.1 [ Files to delete : ~ 152 files | Space to be freed : ~ 12.06 MB ]

2018-03-19 15:27:38: MESSAGE Estimation for Database Home : /oracle/app/product/12.2.0.1 [ Files to delete : ~ 0 files | Space to be freed : ~ 0 bytes ]

è 제거 할 파일 수와 예상 공간을 확인 하기 위해서 -dryrun 옵션을 사용

[root@dotang1 ~]# tfactl managelogs -purge -older 30d

2018-03-19 15:29:40: MESSAGE Grid Infrastructure : /grid/app/12.2.0.1/grid [ Files deleted : 521 files | Space Freed : 375.74 MB ]

.---.

| File System Variation : /grid/app/12.2.0.1/grid | +---+---+---+---+---+---+---+

| Before | /dev/sdb1 | 51473888 | 19796876 | 29039240 | 41% | /grid |

| After | /dev/sdb1 | 51473888 | 19412008 | 29424108 | 40% | /grid | è 파일을 제거하고 disk 공간을 정리

è tfactl managelogs -purge -older 30d -gi (database) 따로 설정 가능

[root@dotang1 ~]# tfactl managelogs -show usage Output from host : dotang1

.---.

| Grid Infrastructure Usage | +---+---+

| Location | Size | +---+---+

| /grid/app/grid_base/diag/crs/dotang1/crs/alert | 704.00 KB |

| /grid/app/grid_base/diag/crs/dotang1/crs/incident | 4.00 KB |

중 간 생 략

| /grid/app/grid_base/diag/asmtool/user_grid/host_2067746558_107/cdump | 4.00 KB |

| /grid/app/grid_base/diag/asmtool/user_grid/host_2067746558_107/log | 12.00 KB | +---+---+

| Total | 3.53 GB | '---+---'

(34)

- 34 –

.---.

| Database Homes Usage | +---+---+

| Location | Size | +---+---+

| /oracle/app/diag/tnslsnr/dotang1/listener_goodus/alert | 256.00 KB |

| /oracle/app/diag/tnslsnr/dotang1/listener_goodus/incident | 4.00 KB |

중 간 생 략

| /oracle/app/diag/rdbms/goodus/GOODUS/log | 24.00 KB | +---+---+

| Total | 207.22 MB | '---+---'

è 각각의 diagnostic 사용 공간 확인

2.8.2. disk usage snapshot 관리

[root@dotang1 usage_snapshot]# tfactl set diskUsageMonInterval=50 Successfully set diskUsageMonInterval=50

| Time interval between consecutive Disk Usage Snapshot(minutes) | 50 | è TFA 디스크 사용 스냅샷을 관리하며, default 시간 간격은 60

è tfa/repository/suptools/node/managelogs/usage_snapshot 에 저장

[root@dotang1 usage_snapshot]# tfactl set diskUsageMon=on Successfully set diskUsageMon=ON

| Disk Usage Monitor | true | è disk 사용량 모니터는 on | off 가능 하면 default 는 on

2.8.3. TFA Log 자동으로 Purge

[root@dotang1 usage_snapshot]# tfactl set manageLogsAutoPurge=on Successfully set manageLogsAutoPurge=ON

| Logs older than the time period will be auto purged(days[d]|hours[h]) | 30d |

[root@dotang1 usage_snapshot]# tfactl set manageLogsAutoPurgePolicyAge=20d Successfully set manageLogsAutoPurgePolicyAge=20d

| Logs older than the time period will be auto purged(days[d]|hours[h]) | 20d |

[root@dotang1 usage_snapshot]# tfactl set manageLogsAutoPurgeInterval=50 Successfully set manageLogsAutoPurgeInterval=50

(35)

- 35 –

3. Cluster Health Monitor

Cluster Health Monitor (CHM) 는 시스템 모니터(osysmond) 및 클러스터 로그(ologgerd) 서비스를 사용하여 데이터 를 수집함.

è 시스템 모니터 서비스는 (osysmond) 는 각 클러스터 노드에서 실행되는 실시간 모니터링 및 os 정보를 수 집하는 서비스 이며, 고 가용성 서비스 (HAS) 리소스로 관리, 수집 된 metric 을 클러스터 로그 프로그램 서비스 인 ologgerd 로 전달하고, Grid Infrastructure 저장소 데이터베이스에 데이터를 저장함.

è 클러스터 로그 프로그램(ologgerd)은 Oracle Grid Infrastructure 관리 저장소 데이터베이스에서 osysmond 가 수집한 데이터를 보관하며, cluster 에는 32node 당 하나의 ologgerd 가 있음

ologgerd 는 고정 된 횟수만큼 재시도 후 다시 시작할 수 없거나, 실행되고 있는 노드가 다운됬을 때 다른 노드로 서비스를 재배치함

3.1. CHM Data 수집

[root@dotang3 ~]# oclumon manage -get master Master = dotang3

[root@dotang2 ~]# diagcollection.pl --collect

The following CRS diagnostic archives will be created in the local directory.

crsData_dotang2_20180320_0913.tar.gz -> logs,traces and cores from CRS home. Note: core files will be packaged only with the --core option.

중 간 생 략

Collecting OS logs Collecting sysconfig data

è 노드에서 grid_home/bin/diagcollection.pl 스크립트를 실행하여, 모든 노드에서 CHM 데이터를 수집 è root 권한이 있는 사용자로 실행해야하며, 모든 노드에서 스크립트를 실행하는 것이 좋음

3.2. CHM 모니터링

oclumon 을 사용하여, chm 모니터링을 사용할 수 있으며, debug, version, dumpnoview 등의 옵션을 사용.

3.2.1. oclumon debug

oclumon debug 명령을 사용하여 CHM log level 을 설정 [root@dotang3 ~]# oclumon debug log osysmond CRFMOND:3 osysmond Module: CRFMOND Log Level: 3

è osysmond level 을 설정

[root@dotang3 ~]# oclumon debug version OCLUMON version :0.02

OSYSMOND version :12.01 OLOGGERD version :2.01 NODEVIEW version :12.01

(36)

- 36 –

Clusterware version - label date:

12.2.0.1.0 – 161111 è chm version 정보 확인

[root@dotang3 ~]# oclumon debug -help è oclumon debug 의 상세 옵션 확인

## oclumon debug Parameter

Parameter 설 명

log daemon module:log_level daemon 의 모듈과 log level 을 변경 지원 가능한 daemon

osysmond ologgerd client all

지원 가능한 daemon 모듈

osysmond : CRFMOND, CRFM, allcomp

ologgerd : CRFLOGD, CRFLDREP, CRFM, allcomp client : OCLUMON, CRFM, allcomp

all : allcomp

지원 가능한 log_level : 0,1,2,3

version daemon 의 version 정보 출력

3.2.2. oclumon dumpnodeview

oclumon dumpnodeview 를 사용하여 시스템으로부터 로그 정보를 확인

è SYSTEM : CPU Count , CPU 사용량 , Memory 사용량 같은 시스템 metrics 를 나열 è TOP CONSUMERS : 가장 많이 사용하는 프로세스를 나열

è CPUS : 각각 CPU 의 통계를 나열

è PROCESSES : PID, name, thread 수 , 메모리 사용량 과 같은 프로세스 metrics 를 나열 è DEVICES : Disk read,write 속도, I/O 대기시간과 같은 device metrics 를 나열 è NICS : 네트워크 수신 및 전송 속도, 유효한 대역폭 등의 NIC metrics 를 나열 è FILESYSTEM : 이용가능한 공간, 사용량 , 총 Disk 등의 Filesystem metrics 를 나열 è PROTOCOL ERRORS : protocol 에러를 나열

[root@dotang1 ~]# oclumon dumpnodeview -n dotang2 -last "00:10:00" -i 15 ---

Node: dotang2 Clock: '2018-03-21 13.14.05+0900' SerialNo:86107 ---

SYSTEM:

#pcpus: 2 #cores: 2 #vcpus: 4 cpuht: N chipname: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz cpuusage: 3.44 cpusystem: 1.18 cpuuser: 2.26 cpunice: 0.00 cpuiowait: 0.05 cpusteal: 0.00 cpuq: 0 physmemfree: 342348 physmemtotal: 8175444 mcache: 4909160 swapfree: 4526972 swaptotal: 5119996 hugepagetotal: 0 hugepagefree: