Look in:

Web oracle-core-dba.blogspot.com
Showing posts with label Links. Show all posts
Showing posts with label Links. Show all posts

Sunday, April 22, 2007

Moving, Rebuilding and or Resizing Indexes

As this procedure deals only with indexes, I consider it a very low risk process. The main source of risk if any is if your target tablespace for the index is either low on available space or is very fragmented. Just because the index was removed from a tablespace does not mean it can be recreated in the same tablespace. Unless there is sufficient free space and or contiguous extents in the target tablespace the index may not recreate.

Why Rebuild ?
The reasons for moving and or resizing indexes are relatively self explanatory, however, it is worth explaining why you would want to regularly rebuild some indexes.
When records are added to a table, the table and index grows by the number of records added. When records are deleted from a table the records are physically deleted from the table but are logically deleted from the index.
Table Index
Task count count
------------------------------------
Add 100 Records 100 100
Delete 100 Records 0 100
Add 100 Records 100 200
delete 100 Records 0 200
As you can see from the above example, the index will continue to grow, where as the table may remain static.

How to identify indexes with "excessive" logically deleted rows.
The first place to look is at any dynamic table I.E. Interface and temporary tables.
Once the target tables have been identified, obtain a list of index names by using the following example SQL*PLUS script.
SELECT index_name
FROM dba_indexes
WHERE owner = 'SCOTT'
and table_name = 'EMP'
/
For each of the indexes identified, execute the validate index command followed by the SQL*PLUS script shown below to obtain the results of the validate index command. Note: The index_stats table holds only one row.
SQL> validate index gl.gl_interface_n1
/

Index analyzed.

SQL>
Use the following SQL*PLUS script to obtain the results.
column A heading "Index Name" format a30
column B heading "Rows" format 999,999,990
column C heading "Deleted|Rows" format 999,999,990
column D heading "% Del" format 990.0

SELECT name A,
lf_rows B,
del_lf_rows C,
(del_lf_rows * 100 ) / lf_rows D
FROM index_stats
/
I usually rebuild indexes that have more than 20 % logically deleted rows.

Rebuilding an index.
The following PL*SQL script will build an SQL*PLUS script to drop and then recreate the index.
Note: The storage clause and the index owner's password in the output file will have to be modified before you execute the resulting script. Modifying the storage clause will allow you to resize and or move the index.


clear columns

set serveroutput on;
set echo off
set heading off
set feedback off
set verify off

WHENEVER SQLERROR CONTINUE

/**************************************/
/** OBTAIN COMMAND LINE PARAMETERS **/
/** OWNER = Index owner **/
/** INDEX_NAME = Index name **/
/**************************************/

DEFINE owner = '&1'
DEFINE index_name = '&2'

DECLARE
/**************************************/
/** CURSOR C1 Get the index columns **/
/**************************************/
CURSOR C1 is
SELECT column_name
FROM dba_ind_columns
WHERE index_owner = upper('&&owner')
and index_name = upper('&&index_name')
ORDER by column_position;

/**************************************/
/** Decalre variables **/
/**************************************/
d_column_name VARCHAR(30);
d_table_name VARCHAR(30);
d_uniqe VARCHAR(9);
d_max_columns NUMBER(4) := 0;
d_column_counter NUMBER(4) := 0;

BEGIN

dbms_output.put_line('/****************************************/' );
dbms_output.put_line('/* Drop and create index script */' );
dbms_output.put_line('/* */' );
dbms_output.put_line('/* Dont forget to edit this file for: */' );
dbms_output.put_line('/* */' );
dbms_output.put_line('/* 1. Connect password */' );
dbms_output.put_line('/* 2. Storage params */' );
dbms_output.put_line('/* */' );
dbms_output.put_line('/****************************************/' );
dbms_output.put_line('');
dbms_output.put_line('');

dbms_output.put_line('connect &&owner/password' );
dbms_output.put_line('');
dbms_output.put_line('');

dbms_output.put_line('DROP INDEX &&index_name;' );
dbms_output.put_line('');
dbms_output.put_line('');

/**************************************/
/** Determine the number of columns **/
/** in the index **/
/**************************************/

SELECT max(column_position)
INTO d_max_columns
FROM dba_ind_columns
WHERE index_owner = upper('&&owner')
and index_name = upper('&&index_name');

SELECT table_name, uniqueness
INTO d_table_name, d_uniqe
FROM dba_indexes
WHERE owner = upper('&&owner')
and index_name = upper('&&index_name');

/**************************************/
/** Determine if the index is uniqe **/
/**************************************/
SELECT uniqueness
INTO d_uniqe
FROM dba_indexes
WHERE owner = upper('&&owner')
and index_name = upper('&&index_name');

dbms_output.put_line('CREATE '|| d_uniqe ||' INDEX &&index_name' );
dbms_output.put_line('ON '||d_table_name );
dbms_output.put_line('(');

OPEN C1;
LOOP
FETCH C1 INTO d_column_name;
d_column_counter := d_column_counter + 1;
EXIT WHEN C1%NOTFOUND;
IF d_column_counter < d_max_columns THEN
dbms_output.put_line( d_column_name||',' );
ELSE
dbms_output.put_line( d_column_name||' )' );
END IF;
END LOOP;

dbms_output.put_line('STORAGE ( initial X M next X M');
dbms_output.put_line(' minextents 1 maxextents 50');
dbms_output.put_line(' pctincrease 0 );');

END;
/


NOTE 1: If the index is being used by another user, you will not be able to either validate or drop the index.
NOTE 2: Beware of the primary key constraint.
General details about the index can also be found by:
Analyze index compute statistics;
Select * from user_indexes
where index_name= ‘’;
To obtain further detail about an index:
Analyze index validate structure;
The command:
Validate index ;
Performs the same function.
This places detailed information about the index in the table INDEX_STATS. This table can only contain one row, describing only the one index. This SQL also verifies the integrity of each data block in the index and checks for block corruption.
For example, to get the size of an index:
validate index ;
select name "INDEX NAME", blocks * 8192 "BYTES ALLOCATED",
btree_space "BYTES USED",
(btree_space / (blocks * 8192))*100 "PERCENT USED"
from index_stats;
This assumes a block size of 8K (i.e. 8192 bytes). It shows the number of bytes allocated to the index and the number of bytes actually used.
Note that it does not confirm that each row in the table has an index entry or that each index entry points to a row in the table. To check this:
Analyze table validate structure cascade;

Systemwide Tuning using StatsPack Reports

Systemwide Tuning using StatsPack Reports


PURPOSE
This article is a reference to understand the output generated by the STATSPACK utility. Since performance tuning is a very broad area this document only provide tuning advice in very specific areas. Several documents are available in Metalink to resolve contention in specific resources. The following resources are available to find specific documentation related to a performance topic:
• Metalink Database Performance Technical Library
• Oracle9i Database Performance Tuning Guide and Reference
• OTN Performance Technical Library

CONTENT

• Introduction
• Summary Information
• Instance cache information
• Load profile Information
• Instance Efficiency Ratios
• Top 5 Events section
• Cluster Statistics
• Foreground Wait Events
• Background Wait Events
• Notes Regarding Waitevents
• SQL Information
• Statistics
• IO Activity
• Buffer cache Activity Information
• Instance Recovery Statistics
• PGA Memory Statistics
• Enqueue Activity
• Undo (Rollback) Information
• Latch Information
• Dictionary Cache Statistics
• Library Cache Statistics
• SGA Memory Summary
• SGA Memory Detail
• Init.ora Parameters Summary

Introduction
StatsPack was created in response to a need for more relevant and more extensive statistical reporting beyond what was available via UTLBSTAT/UTLESTAT reports. Further, this information can be stored permanently in the database instance itself so that historical data is always available for comparison and diagnosis.
Statspack has been available since version 816, but can be installed on 806 and above. Snapshots created using older versions of statspack can usually be read using newer versions of Statspack although the newer features will not be available.
See the following notes for information on installing, configuring snapshots, and generating reports:
- Installing and Configuring StatsPack Package
- Gathering a StatsPack snapshot
- Creating a StatsPack performance report
- FAQ- StatsPack Complete Reference
Timed_statistics must be set to true prior to the creation of a snapshot. If it is not, the data within statspack will not be relevant. You can tell if timed_statistics was not set by looking at the total times columns in the report. If these are zero then timed_statistics was not set.
Snapshots during which the instance was recycled will not contain accurate information and should not be included in a statspack report.
In general, we suggest that snapshots intervals be 15 minutes in length. This allows fine-grained reporting when hangs are suspected/detected. The snapshots can also be combined into hourly reports for general performance tuning.
When a value is too large for the statspack field it will be represented by a series of pound signs such as #######. Should this occur and you need to see the value in the field you will need to decrease the number of snapshots in the report until the field can be read. Should there only be one snapshot in the report, then you will need to decrease the snapshot interval.
Profiles created using statspack information are quite helpful in determining long-term trends such as load increases, usage trends, resource consumption, latch activity, etc. It is especially important that a DBA know these things and be able to demonstrate changes in them that necessitate hardware improvements and load balancing policies. This document will describe the main sections of an statspack report, which will help to understand what information is available to diagnose and resolve performance tuning problems. Some of the sections of the statspack report may contain different information based on the Statspack release that was used to generate the report. This document will also indicate these changes for the different sections.

Summary Information
The summary information begins with the identification of the database on which the statspack report was run along with the time interval of the statspack report. Here is the 8i instance information:
STATSPACK report for

DB Name DB Id Instance Inst Num Release OPS Host
------------ ----------- ------------ -------- ----------- --- ------------
PHS2 975244035 phs2 2 8.1.7.2.0 YES leo2

Snap Id Snap Time Sessions
------- ------------------ --------
Begin Snap: 100 03-Jan-02 08:00:01 #######
End Snap: 104 03-Jan-02 09:00:01 #######
Elapsed: 60.00 (mins)

The database name, id, instance name, instance number if OPS is being utilized, Oracle binary release information, host name and snapshot information are provided.
Note that here the number of sessions during the snapshot was too large for the sessions field and so the overflow symbol is displayed.
Here is an example of an 806 instance using statspack:
STATSPACK report for

DB Name DB Id Instance Inst Num Release OPS Host
---------- ----------- ---------- -------- ---------- ---- ----------
GLOVP 1409723819 glovp 1 8.0.6.1.0 NO shiver

Snap Length
Start Id End Id Start Time End Time (Minutes)
-------- -------- -------------------- -------------------- -----------
454 455 07-Jan-03 05:28:20 07-Jan-03 06:07:53 39.55
Here is the 9i instance information. Note that the OPS column is now entitled 'Cluster' to accommodate the newer Real Applications Cluster (RAC) terminology and that the Cursors/Session and Comment columns have been added.
STATSPACK report for

DB Name DB Id Instance Inst Num Release Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
ETSPRD7 1415901831 etsprd7a 1 9.2.0.2.0 YES tsonode1

Snap Id Snap Time Sessions Curs/Sess Comment
------- ------------------ -------- --------- -------------------
Begin Snap: 20 03-Jan-03 00:00:05 ####### .0
End Snap: 21 03-Jan-03 01:00:05 ####### .0
Elapsed: 60.00 (mins)


Instance Workload Information
Every statspack report start with a section that describes the instance's workload profile and instance metrics that may help to determine the instance efficiency.
- Instance cache information:
In the 8i report the buffer cache size can be determined by multiplying the db_block_buffers by the db_block_size.
Cache Sizes
~~~~~~~~~~~
db_block_buffers: 6400 log_buffer: 104857600
db_block_size: 32768 shared_pool_size: 150000000


In 9i this has been done for you. Std Block size indicates the primary block size of the instance.
Cache Sizes (end)
~~~~~~~~~~~~~~~~~
Buffer Cache: 704M Std Block Size: 8K
Shared Pool Size: 256M Log Buffer: 1,024K
Note that the buffer cache size is that of the standard buffer cache. If you have multiple buffer caches, you will need to calculate the others separately.
- Load profile Information:
The load profile information is next. It is identical in both 8i and 9i.
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 351,530.67 7,007.37
Logical reads: 5,449.81 108.64
Block changes: 1,042.0 8 20.77
Physical reads: 37.71 0.75
Physical writes: 134.68 2.68
User calls: 1,254.72 25.01
Parses: 4.92 0.10
Hard parses: 0.02 0.00
Sorts: 15.73 0.31
Logons: -0.01 0.00
Executes: 473.73 9.44
Transactions: 50.17

% Blocks changed per Read: 19.12 Recursive Call %: 4.71
Rollback per transaction %: 2.24 Rows per Sort: 20.91
Where:
. Redo size: This is the amount of redo generated during this report.
. Logical Reads: This is calculated as Consistent Gets + DB Block Gets = Logical Reads
. Block changes: The number of blocks modified during the sample interval
. Physical Reads: The number of requests for a block that caused a physical I/O.
. Physical Writes: The number of physical writes issued.
. User Calls: The number of queries generated
. Parses: Total of all parses: both hard and soft
. Hard Parses: Those parses requiring a completely new parse of the SQL statement. These consume both latches and shared pool area.
. Soft Parses: Not listed but derived by subtracting the hard parses from parses. A soft parse reuses a previous hard parse and hence consumes far fewer resources.
. Sorts, Logons, Executes and Transactions are all self explanatory
- Instance Efficiency Ratios:
Hit ratios are calculations that may provide information regarding different structures and operations in the Oracle instance. Database tuning never must be driven by hit ratios. They only provide additional information to understand how the instance is operating. For example, in a DSS systems a low cache hit ratio may be acceptable due the amount of recycling needed due the large volume of data accesed. So if you increase the size of the buffer cache based on this number, the corrective action may not take affect and you may be wasting memory resources.
See - THE COE PERFORMANCE METHOD , for further reference on how to approach a performance tuning problem.
This section is identical in 8i and 9i.
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.99 Redo NoWait %: 100.00
Buffer Hit %: -45.57 In-memory Sort %: 97.55
Library Hit %: 99.89 Soft Parse %: 99.72
Execute to Parse %: -1.75 Latch Hit %: 99.11
Parse CPU to Parse Elapsd %: 52.66 % Non-Parse CPU: 99.99

Shared Pool Statistics Begin End
------ ------
Memory Usage %: 42.07 43.53
% SQL with executions>1: 73.79 75.08
% Memory for SQL w/exec>1: 76.93 77.64

It is possible for both the 'buffer hit ratio' and the 'execute to parse' ratios to be negative. In the case of the buffer hit ration, the buffer cache is too small and the data in is being aged out before it can be used so it must be retrieved again. This is a form of thrashing which degrades performance immensely.
The execute to parse ratio can be negative when the number of parses is larger than the number of executions. The Execute to Parse ratio is determined by the following formula:

100 * (1 - Parses/Executions) = Execute to Parse
Here this becomes:
100 * (1 - 42,757 / 42,023 ) = 100 * (1 - 1.0175) = 100* -0.0175 = -1.75

This can be caused by the snapshot boundary occurring during a period of high parsing so that the executions have not occurred before the end of the snapshot. Check the next snapshot to see if there are enough executes to account for the parses in this report.
Another cause for a negative execute to parse ratio is if the shared pool is too small and queries are aging out of the shared pool and need to be reparsed. This is another form of thrashing which also degrades performance tremendously.
- Top 5 Events section:
This section shows the Top 5 timed events that must be considered to focus the tuning efforts. Before Oracle 9.2 this section was called "Top 5 Wait Events". It was renamed in Oracle 9.2 to "Top 5 Timed Events" to include the "CPU Time" based on the 'CPU used by this session'. This information will allow you to determine SQL tuning problems.
For further see the Statspack readme file called $ORACLE_HOME/rdbms/admin/spdoc.txt. These events are particularly useful in determining which sections to view next. For instance if there are fairly high waits on latch free or one of the other latches you might want to examine the latch sections first. On the other hand, if the db file read waits events seem abnormally high, you might want to look at the file io section first.
Top 5 Wait Events
~~~~~~~~~~~~~~~~~ Wait % Total
Event Waits Time (cs) Wt Time
-------------------------------------------- ------------ ------------ -------
db file sequential read 12,131,221 173,910 58.04
db file scattered read 93,310 86,884 29.00
log file sync 18,629 9,033 3.01
log file parallel write 18,559 8,449 2.82
buffer busy waits 304,461 7,958 2.66


Notice that in Oracle 9.2 references are made "Elapsed Time" rather than to "Wait Time". Also the "CPU Time" is included as part of the Top events section.
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
log file sync 3,223,927 32,481 64.05
CPU time 7,121 14.04
global cache open x 517,153 3,130 6.17
log file parallel write 985,732 2,613 5.15
KJC: Wait for msg sends to complete 568,061 1,794 3.54
-------------------------------------------------------------
Note that db file scattered and sequential read are generally the top wait events when the instance is tuned well and not OPS/RAC. Wait Events
Cluster Statistics
In Oracle 9i with the introduction of real Application Clusters, several sections were added to the statspack report to show information related to cluster database environment. The following sections are now available in statspack to monitor RAC environments and are only displayed when a cluster is detected.
Oracle 9.0 and 9.1 Cluster Statistics :
Global Lock Statistics
----------------------
Ave global lock get time (ms): 0.3
Ave global lock convert time (ms): 0.0
Ratio of global lock gets vs global lock releases: 1.0

Global cache statistics
-----------------------
Global cache hit %: 0.3
Ave global cache get time (ms): 1.7
Ave global cache convert time (ms): 3.1

Cache fusion statistics
-----------------------
Ave time to process CR block request (ms): 0.2
Ave receive time for CR block (ms): 1.6
Ave build time for CR block (ms): 0.1
Ave flush time for CR block (ms): 0.0
Ave send time for CR block (ms): 0.1

Ave time to process current block request (ms): 0.2
Ave receive time for current block (ms): 2.5
Ave pin time for current block (ms): 0.0
Ave flush time for current block (ms): 0.0
Ave send time for current block (ms): 0.1

GCS and GES statistics
----------------------
Ave GCS message process time (ms): 0.1
Ave GES message process time (ms): 0.1
% of direct sent messages: 59.5
% of indirect sent messages: 40.3
% of flow controlled messages: 0.1
% of GCS messages received by LMD: 96.4
% of GES messages received by LMD: 3.6
% of blocked converts: 10.3
Ave number of logical side channel messages: 33.8
Ave number of logical recovery claim messages:
Oracle 9.2 Cluster Statistics :
Global Cache Service - Workload Characteristics
-----------------------------------------------
Ave global cache get time (ms): 4.6
Ave global cache convert time (ms): 20.2

Ave build time for CR block (ms): 0.0
Ave flush time for CR block (ms): 0.6
Ave send time for CR block (ms): 0.1
Ave time to process CR block request (ms): 0.7
Ave receive time for CR block (ms): 0.9

Ave pin time for current block (ms): 2.9
Ave flush time for current block (ms): 0.1
Ave send time for current block (ms): 0.1
Ave time to process current block request (ms): 3.1
Ave receive time for current block (ms): 7.2

Global cache hit ratio: 0.8
Ratio of current block defers: 0.0
% of messages sent for buffer gets: 0.5
% of remote buffer gets: 0.4
Ratio of I/O for coherence: 12.3
Ratio of local vs remote work: 1.2
Ratio of fusion vs physical writes: 0.0

Global Enqueue Service Statistics
---------------------------------
Ave global lock get time (ms): 0.2
Ave global lock convert time (ms): 2.3
Ratio of global lock gets vs global lock releases: 1.0

GCS and GES Messaging statistics
--------------------------------
Ave message sent queue time (ms): 0.1
Ave message sent queue time on ksxp (ms): 12.3
Ave message received queue time (ms): 0.0
Ave GCS message process time (ms): 0.1
Ave GES message process time (ms): 0.0
% of direct sent messages: 81.2
% of indirect sent messages: 13.1
% of flow controlled messages: 5.7
In all the Oracle9i release, a separate section shows the actual value for all the cluster statistics:
GES Statistics for DB: FUSION Instance: ecfsc2 Snaps: 161 -162

Statistic Total per Second per Trans
--------------------------------- ---------------- ------------ ------------
dynamically allocated gcs resourc 0 0.0 0.0
dynamically allocated gcs shadows 0 0.0 0.0
flow control messages received 0 0.0 0.0
flow control messages sent 10 0.0 0.0
gcs ast xid 30 0.0 0.0
gcs blocked converts 531,572 147.7 0.2
gcs blocked cr converts 55,739 15.5 0.0
gcs compatible basts 45 0.0 0.0
gcs compatible cr basts (global) 6,183 1.7 0.0
....
For further reference on tuning RAC clustered instances please refer to the documentation manual called Oracle9i Real Application Clusters Deployment and Performance
Wait Events Information
The following section will describe in detail most of the sections provided in a statspack report.
- Foreground Wait Events:
Foreground wait events are those associated with a session or client process waiting for a resource. The 8i version looks like this:
Wait Events for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (cs) (ms) /txn
---------------------------- ------------ ---------- ----------- ------ ------
PX Deq: Execution Msg 15,287 6,927 1,457,570 953 694.9
enqueue 30,367 28,591 737,906 243 ######
direct path read 45,484 0 352,127 77 ######
PX Deq: Table Q Normal 7,185 811 241,532 336 326.6
PX Deq: Execute Reply 13,925 712 194,202 139 633.0
....


The 9.2 version is much the same but has different time intervals in the header.
Wait Events for DB: FUSION Instance: ecfsc2 Snaps: 161 -162
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (s) (ms) /txn
---------------------------- ------------ ---------- ---------- ------ --------
log file sync 3,223,927 1 32,481 10 1.0
global cache open x 517,153 777 3,130 6 0.2
log file parallel write 985,732 0 2,613 3 0.3
KJC: Wait for msg sends to c 568,061 34,529 1,794 3 0.2
- Background Wait Events:
Background wait events are those not associated with a client process. They indicate waits encountered by system and non-system processes. The output is the same for all the Oracle releases.
Background Wait Events for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> ordered by wait time desc, waits desc (idle events last)
Avg
Total Wait wait Waits
Event Waits Timeouts Time (cs) (ms) /txn
---------------------------- ------------ ---------- ----------- ------ ------
latch free 88,578 32,522 18,341 2 ######
enqueue 319 230 5,932 186 14.5
row cache lock 4,941 0 2,307 5 224.6
control file parallel write 1,172 0 332 3 53.3
db file parallel write 176 0 67 4 8.0
log file parallel write 315 0 65 2 14.3
db file scattered read 137 0 62 5 6.2
LGWR wait for redo copy 66 10 47 7 3.0

Examples of background system processes are LGWR and DBWR. An example of a non-system background process would be a parallel query slave.
Note that it is possible for a wait event to appear in both the foreground and background wait events statistics. Examples of this are the enqueue and latch free events.
The idle wait events appear at the bottom of both sections and can generally safely be ignored. Typically these type of events keep record of the time while the clien is connected to the database but not requests are being made to the server.
- Notes Regarding Waitevents:
- The idle wait events associated with pipes are often a major source of concern for some DBAs. Pipe gets and waits are entirely application dependent. To tune these events you must tune the application generating them. High pipe gets and waits can affect the library cache latch performance. Rule out all other possible causes of library cache contention prior to focusing on pipe waits as it is very expensive for the client to tune their application.A list of most wait events used by the RDBMS kernel can be found in Appendix A of the Oracle Reference manual for the version being used.
Some wait events to watch:
- global cache cr request: (OPS) This wait event shows the amount of time that an instance has waited for a requested data block for a consistent read and the transferred block has not yet arrived at the requesting instance. See Note 157766.1 'Sessions Wait Forever for 'global cache cr request' Wait Event in OPS or RAC'. In some cases the 'global cache cr request' wait event may be perfectly normal if large buffer caches are used and the same data is being accessed concurrently on multiple instances. In a perfectly tuned, non-OPS/RAC database, I/O wait events would be the top wait events but since we are avoiding I/O's with RAC and OPS the 'global cache cr request' wait event often takes the place of I/O wait events.
- Buffer busy waits, write complete waits, db file parallel writes and enqueue waits: If all of these are in the top wait events the client may be experiencing disk saturation. See Note 155971.1 Resolving Intense and "Random" Buffer Busy Wait Performance Problems for troubleshooting tips.
- log file switch, log file sync or log switch/archive: If the waits on these events appears excessive check for checkpoint tuning issues. See Note 147468.1 Checkpoint Tuning and Troubleshooting Guide.
- write complete waits, free buffer waits or buffer busy waits: If any of these wait events is high, the buffer cache may need tuning. See Note 62172.1 'Understanding and Tuning Buffer Cache and DBWR in Oracle7, Oracle8, and Oracle8i'
- latch free: If high, the latch free wait event indicates that there was contention on one or more of the primary latches used by the instance. Look at the latch sections to diagnose and resolve this problem.
SQL Information
The SQL that is stored in the shared pool SQL area (Library cache) is reported to the user via three different formats in 8i. Each has their own usefulness.


. SQL ordered by Buffer Gets
. SQL ordered by Physical Reads
. SQL ordered by Executions
9i has an additional section:
. SQL ordered by Parse Calls
- SQL ordered by Gets:
SQL ordered by Gets for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> End Buffer Gets Threshold: 10000
-> Note that resources reported for PL/SQL includes the resources used by
all SQL statements called within the PL/SQL code. As individual SQL
statements are also reported, it is possible and valid for the summed
total % to exceed 100

Buffer Gets Executions Gets per Exec % Total Hash Value
--------------- ------------ -------------- ------- ------------
198,924 37,944 5.2 41.7 2913840444
select length from fet$ where file#=:1 and block#=:2 and ts#=:3

111,384 7 15,912.0 23.4 1714733582
select f.file#, f.block#, f.ts#, f.length from fet$ f, ts$ t whe
re t.ts#=f.ts# and t.dflextpct!=0 and t.bitmapped=0

105,365 16 6,585.3 22.1 4111567099
CREATE TABLE "PHASE".:Q3236003("PID","CAMPAIGN","SCPOS1","SCPOS2
","SCPOS3","SCPOS4","SCPOS5","SCPOS6","SCPOS7","SCPOS8","SCPOS9"
,"SCPOS10","SCPOS11","SCPOS12","SCPOS13","SCPOS14","SCPOS15","SC
POS16","SCPOS17","MCELL","MAILID","RSPPROD","STATTAG","RSPREF","
RSPCRED","MAILDATE","RSPTDATE","BDATE","STATE","ZIP","INCOME","R
....


This section reports the contents of the SQL area ordered by the number of buffer gets and can be used to identify CPU Heavy SQL.
- Many DBAs feel that if the data is already contained within the buffer cache the query should be efficient. This could not be further from the truth. Retrieving more data than needed, even from the buffer cache, requires CPU cycles and interprocess IO. Generally speaking, the cost of physical IO is not 10,000 times more expensive. It actually is in the neighborhood of 67 times and actually almost zero if the data is stored in the UNIX buffer cache.
- The statements of interest are those with a large number of gets per execution especially if the number of executions is high.
- High buffer gets generally correlates with heavy CPU usage.
- SQL ordered by Physical Reads:
SQL ordered by Reads for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> End Disk Reads Threshold: 1000

Physical Reads Executions Reads per Exec % Total Hash Value
--------------- ------------ -------------- ------- ------------
98,401 16 6,150.1 14.2 3004232054
SELECT C0 C0 FROM (SELECT C0 C0 FROM (SELECT /*+ NO_EXPAND ROWID
(A1) */ A1."PID" C0 FROM "PHASE"."P0201F00_PLAT_MCOP_TB" PX_GRAN
ULE(0, BLOCK_RANGE, DYNAMIC) A1) UNION ALL SELECT C0 C0 FROM (S
ELECT /*+ NO_EXPAND ROWID(A2) */ A2."PID" C0 FROM "PHASE"."P0201
F00_UCS_MCOP_TB" PX_GRANULE(1, BLOCK_RANGE, DYNAMIC) A2) UNION

50,836 32 1,588.6 7.3 943504307
SELECT /*+ Q3263000 NO_EXPAND ROWID(A1) */ A1."PID" C0 FROM "PHA
SE"."P9999F00_NEW_RESP_HIST_TB" PX_GRANULE(0, BLOCK_RANGE, DYNAM
IC) A1 WHERE A1."CAMPAIGN"='200109M' AND A1."RSPPROD"='B'

50,836 32 1,588.6 7.3 3571039650
SELECT /*+ Q3261000 NO_EXPAND ROWID(A1) */ A1."PID" C0 FROM "PHA
SE"."P9999F00_NEW_RESP_HIST_TB" PX_GRANULE(0, BLOCK_RANGE, DYNAM
IC) A1 WHERE A1."CAMPAIGN"='200109M' AND A1."RSPPROD"='P'
....

This section reports the contents of the SQL area ordered by the number of reads from the data files and can be used to identify SQL causing IO bottlenecks which consume the following resources.
- CPU time needed to fetch unnecessary data.
- File IO resources to fetch unnecessary data.
- Buffer resources to hold unnecessary data.
- Additional CPU time to process the query once the data is retrieved into the buffer.
- SQL ordered by Executions:
SQL ordered by Executions for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> End Executions Threshold: 100

Executions Rows Processed Rows per Exec Hash Value
------------ ---------------- ---------------- ------------
37,944 16,700 0.4 2913840444
select length from fet$ where file#=:1 and block#=:2 and ts#=:3

304 1,219 4.0 904892542
select file#,block#,length from fet$ where length>=:1 and
ts#=:2 and file#=:3

295 0 0.0 313510536
select job from sys.job$ where next_date < sysdate and (field1
= :1 or (field1 = 0 and 'Y' = :2)) order by next_date, job

273 273 1.0 3313905788
insert into col$(obj#,name,intcol#,segcol#,type#,length,precisio
n#,scale,null$,offset,fixedstorage,segcollength,deflength,defaul
t$,col#,property,charsetid,charsetform,spare1,spare2)values(:1,:
2,:3,:4,:5,:6,decode(:7,0,null,:7),decode(:5,2,decode(:8,-127/*M
AXSB1MINAL*/,null,:8),178,:8,179,:8,180,:8,181,:8,182,:8,183,:8,
....
This section reports the contents of the SQL area ordered by the number of query executions. It is primarily useful in identifying the most frequently used SQL within the database so that they can be monitored for efficiency. Generally speaking, a small performance increase on a frequently used query provides greater gains than a moderate performance increase on an infrequently used query
- SQL ordered by Parse Calls (9i Only):
SQL ordered by Parse Calls for DB: S901 Instance: S901 Snaps: 2 -3
-> End Parse Calls Threshold: 1000
% Total
Parse Calls Executions Parses Hash Value
------------ ------------ -------- ----------
295 295 0.48 1705880752
select file# from file$ where ts#=:1

60 60 0.10 3759542639
BEGIN DBMS_APPLICATION_INFO.SET_MODULE(:1,NULL); END;

33 2,222 0.05 3615375148
COMMIT

1 200,000 0.00 119792462
INSERT into free.freelist_test values (:b2||'J'||:b1,'AAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAA')

....
This section shows the number of times a statement was parsed as compared to the number of times it was executed. One to one parse/executions may indicate that:
- Bind variables are not being used.
- On RDBMS version 8172 and higher the init.ora parameter session_cached_cursors was not set in the init.ora (100 is usually the suggested starting value). See enhancement bug 1589185 for an explanation of the change that shifts some of the load from the library cache to the user session cache.
- The shared pool may be too small and the parse is not being retained long enough for multiple executions.
- cursor_sharing is set to exact (this should NOT be changed without considerable testing on the part of the client).
Statistics
The statistics section shows the overall database statistics. These are the statistics that the summary information is derived from. A list of the statistics maintained by the RDBMS kernel can be found in Appendix C of the Oracle Reference manual for the version being utilized. The format is identical from 8i to 9i.
Instance Activity Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104

Statistic Total per Second per Trans
--------------------------------- ---------------- ------------ ------------
CPU used by this session 84,161 23.4 3,825.5
CPU used when call started 196,346 54.5 8,924.8
CR blocks created 709 0.2 32.2
DBWR buffers scanned 0 0.0 0.0
DBWR checkpoint buffers written 245 0.1 11.1
DBWR checkpoints 33 0.0 1.5
DBWR cross instance writes 93 0.0 4.2
DBWR free buffers found 0 0.0 0.0
....
Of particular interest are the following statistics.

- CPU USED BY THIS SESSION, PARSE TIME CPU or RECURSIVE CPU USAGE: These numbers are useful to diagnose CPU saturation on the system (usually a query tuning issue). The formula to calculate the CPU usage breakdown is:
Service (CPU) Time = other CPU + parse time CPU
Other CPU = "CPU used by this session" - parse time CPU
Some releases do not correctly store this data and can show huge numbers. The rule to decide if you can use these metrics is:

Trustworthy if :
(db version>= 8.1.7.2 and 9.0.1)
OR ((db version >= 9.0.1.1) = 8.0.6.0 AND not using job_queue_processes AND CPU_PER_CALL = default)

- DBWR BUFFERS SCANNED: the number of buffers looked at when scanning the lru portion of the buffer cache for dirty buffers to make clean. Divide by "dbwr lru scans" to find the average number of buffers scanned. This count includes both dirty and clean buffers. The average buffers scanned may be different from the average scan depth due to write batches filling up before a scan is complete. Note that this includes scans for reasons other than make free buffer requests.
- DBWR CHECKPOINTS: the number of checkpoints messages that were sent to DBWR and not necessarily the total number of actual checkpoints that took place. During a checkpoint there is a slight decrease in performance since data blocks are being written to disk and that causes I/O. If the number of checkpoints is reduced, the performance of normal database operations improve but recovery after instance failure is slower.
- DBWR TIMEOUTS: the number of timeouts when DBWR had been idle since the last timeout. These are the times that DBWR looked for buffers to idle write.
- DIRTY BUFFERS INSPECTED: the number of times a foreground encountered a dirty buffer which had aged out through the lru queue, when foreground is looking for a buffer to reuse. This should be zero if DBWR is keeping up with foregrounds.
- FREE BUFFER INSPECTED: the number of buffers skipped over from the end of the LRU queue in order to find a free buffer. The difference between this and "dirty buffers inspected" is the number of buffers that could not be used because they were busy or needed to be written after rapid aging out. They may have a user, a waiter, or being read/written.
- RECURSIVE CALLS: Recursive calls occur because of cache misses and segment extension. In general if recursive calls is greater than 30 per process, the data dictionary cache should be optimized and segments should be rebuilt with storage clauses that have few large extents. Segments include tables, indexes, rollback segment, and temporary segments.
NOTE: PL/SQL can generate extra recursive calls which may be unavoidable.
- REDO BUFFER ALLOCATION RETRIES: total number of retries necessary to allocate space in the redo buffer. Retries are needed because either the redo writer has gotten behind, or because an event (such as log switch) is occurring
- REDO LOG SPACE REQUESTS: indicates how many times a user process waited for space in the redo log buffer. Try increasing the init.ora parameter LOG_BUFFER so that zero Redo Log Space Requests are made.
- REDO WASTAGE: Number of bytes "wasted" because redo blocks needed to be written before they are completely full. Early writing may be needed to commit transactions, to be able to write a database buffer, or to switch logs
- SUMMED DIRTY QUEUE LENGTH: the sum of the lruw queue length after every write request completes. (divide by write requests to get average queue length after write completion)
- TABLE FETCH BY ROWID: the number of rows that were accessed by a rowid. This includes rows that were accessed using an index and rows that were accessed using the statement where rowid = 'xxxxxxxx.xxxx.xxxx'.
- TABLE FETCH BY CONTINUED ROW: indicates the number of rows that are chained to another block. In some cases (i.e. tables with long columns) this is unavoidable, but the ANALYZE table command should be used to further investigate the chaining, and where possible, should be eliminated by rebuilding the table.
- Table Scans (long tables) is the total number of full table scans performed on tables with more than 5 database blocks. If the number of full table scans is high the application should be tuned to effectively use Oracle indexes. Indexes, if they exist, should be used on long tables if less than 10-20% (depending on parameter settings and CPU count) of the rows from the table are returned. If this is not the case, check the db_file_multiblock_read_count parameter setting. It may be too high. You may also need to tweak optimizer_index_caching and optimizer_index_cost_adj.
- Table Scans (short tables) is the number of full table scans performed on tables with less than 5 database blocks. It is optimal to perform full table scans on short tables rather than using indexes.
IO Activity
IO ActivityInput/Output (IO) statistics for the instance are listed in the following sections/formats:
- Tablespace IO Stats for DB: Ordered by total IO per tablespace.
- File IO Stats for DB: Ordered alphabetically by tablespace, filename.
In Oracle 8.1.7 many other columns were included as follow:
- Avg. Read / Second
- Avg. Blocks / Read
- Avg. Writes / Second
- Buffer Waits
- Avg. Buffer Waits / Milisecond

- Tablespace IO Stats
Tablespace IO Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
->ordered by IOs (Reads + Writes) desc

Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
PHASE_WORK_TS
138,361 38 0.0 3.9 6,859 2 0 0.0
OFFER_HISTORY_TS
24,714 7 0.0 4.0 0 0 0 0.0
ATTR1_TS
7,823 2 0.0 4.0 0 0 0 0.0
TEMP
886 0 0.0 20.1 1,147 0 0 0.0
SYSTEM
184 0 3.9 2.8 56 0 18 3.3

- File IO Stats
File IO Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
->ordered by Tablespace, File

Tablespace Filename
------------------------ ----------------------------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
ATTR1_TS /oradata/phs2/hsz16/attr1_01.dbf
398 0 0.0 3.9 0 0 0
/oradata/phs2/hsz17/attr1_02.dbf
400 0 0.0 4.0 0 0 0
/oradata/phs2/hsz18/attr1_03.dbf
398 0 0.0 4.0 0 0 0
/oradata/phs2/hsz19/attr1_04.dbf
480 0 0.0 4.0 0 0 0
....

Note that Oracle considers average read times of greater than 20 ms unacceptable. If a datafile consistently has average read times of 20 ms or greater then:
- The queries against the contents of the owning tablespace should be examined and tuned so that less data is retrieved.
- If the tablespace contains indexes, another option is to compress the indexes so that they require less space and hence, less IO.
- The contents of that datafile should be redistributed across several disks/logical volumes to more easily accommodate the load.
- If the disk layout seems optimal, check the disk controller layout. It may be that the datafiles need to be distributed across more disk sets.
Buffer cache Activity Information
The buffer statistics are comprised of two sections:
- Buffer Pool Statistics:
This section can have multiple entries if multiple buffer pools are allocated. This section is in both 8i and 9i and is identical in both.
Buffer Pool Statistics for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> Pools D: default pool, K: keep pool, R: recycle pool

Free Write Buffer
Buffer Consistent Physical Physical Buffer Complete Busy
P Gets Gets Reads Writes Waits Waits Waits
- ----------- ------------- ----------- ---------- ------- -------- ----------
D 4,167 362,492 3,091 413 0 0 60



A baseline of the database's buffer pool statistics should be available to compare with the current statspack buffer pool statistics. A change in that pattern unaccounted for by a change in workload should be a cause for concern.
- Buffer Wait Statistics:
This section shows a breakdown of each type of object waited for. This section follows the Instance Recovery Stats for DB in 9i and is identical to that in 8i.
Buffer wait Statistics for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> ordered by wait time desc, waits desc

Tot Wait Avg
Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
undo header 42 21 1
data block 18 6 0

The above shows no real contention. Typically, when there is buffer contention, it is due to data block contention with large average wait times, like the example below:
Buffer wait Statistics for DB: GLOVP Instance: glovp Snaps: 454 - 455

Tot Wait Avg
Class Waits Time (cs) Time (cs)
------------------ ----------- ---------- ---------
data block 9,698 17,097 2
undo block 210 1,225 6
segment header 259 367 1
undo header 259 366 1
Instance Recovery Statistics
This section was added in 9i and is useful for monitoring the recovery and redo information.
Instance Recovery Stats for DB: S901 Instance: S901 Snaps: 2 -3
-> B: Begin snapshot, E: End snapshot

Targt Estd Log File Log Ckpt Log Ckpt
MTTR MTTR Recovery Actual Target Size Timeout Interval
(s) (s) Estd IOs Redo Blks Redo Blks Redo Blks Redo Blks Redo Blks
- ----- ----- ---------- ---------- ---------- ---------- ---------- ----------
B 15 8 8024 21033 20691 92160 20691 ##########
E 15 11 8024 77248 92160 92160 285818 ##########
PGA Memory Statistics

This section was added in 9i and which helps when using the new model to allocate PGA in Oracle9i using PGA_AGGREGATE_TARGET.
PGA Memory Stats for DB: S901 Instance: S901 Snaps: 2 -3
-> WorkArea (W/A) memory is used for: sort, bitmap merge, and hash join ops

Statistic Begin (M) End (M) % Diff
----------------------------------- ---------------- ---------------- ----------
maximum PGA allocated 10.405 10.405 .00
total PGA allocated 7.201 7.285 1.17
total PGA inuse 6.681 6.684 .04

This section is particularly useful when monitoring session memory usage on Windows servers.
Enqueue Activity
An enqueue is simply a locking mechanism. This section is very useful and must be used when the wait event "enqueue" is listed in the "Top 5 timed events".
In 8i the section looks like this.
Enqueue activity for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> ordered by waits desc, gets desc

Enqueue Gets Waits
---------- ------------ ----------
PS 2,632 716
ST 192 185
TM 973 184
TC 66 57
US 80 53
TS 68 46
TT 349 36
PI 56 32
HW 10 5
CF 275 3
DV 4 3
TX 499 1
In 9i the section looks like this.
Enqueue activity for DB: S901 Instance: S901 Snaps: 2 -3
-> Enqueue stats gathered prior to 9i should not be compared with 9i data
-> ordered by waits desc, requests desc

Avg Wt Wait
Eq Requests Succ Gets Failed Gets Waits Time (ms) Time (s)
-- ------------ ------------ ----------- ----------- ----------- ------------
HW 656 656 0 139 2.04 0
The action to take depends on the lock type that is causing the most problems. The most common lock waits are generally for:
- TX - Transaction Lock: Generally due to application concurrency mechanisms, or table setup issues.
- TM - DML enqueue: Generally due to application issues, particularly if foreign key constraints have not been indexed.
- ST - Space management enqueue: Usually caused by too much space management occurring. For example: create table as select on large tables on busy instances, small extent sizes, lots of sorting, etc.
Undo (Rollback) Information
Undo (Rollback) information is provided in two sections. They are identical in both 8i and 9i and are self explanatory.
- Rollback Segment Stats
- Rollback Segment Storage
In 9i the following two sections are added to provide similar information on the System Managed Undo (SMU) tablespace. Both are self explanatory.
- Undo Segment Summary for DB
- Undo Segment Stats for DB
The examples below show typical performance problem related to Undo (rollback) segments:
- Rollback Segment Stats for DB
Rollback Segment Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
->A high value for "Pct Waits" suggests more rollback segments may be required

Trans Table Pct Undo Bytes
RBS No Gets Waits Written Wraps Shrinks Extends
------ ------------ ------- --------------- -------- -------- --------
0 9.0 0.00 0 0 0 0
4 6,838.0 0.18 554,206 0 0 0
5 2,174.0 0.55 292,474 0 0 0
6 4,309.0 0.23 471,992 0 0 0
....
In this case, the PCT Waits on three of the rollback segments indicates that there is some minor contention on the rollbacks and that either another rollback or more space should be added.
- Rollback Segment Storage for DB
Rollback Segment Storage for DB: PHS2 Instance: phs2 Snaps: 100 -104
->Optimal Size should be larger than Avg Active

RBS No Segment Size Avg Active Optimal Size Maximum Size
------ --------------- --------------- --------------- ---------------
0 753,664 0 753,664
4 2,520,743,936 0 2,520,743,936
5 2,109,702,144 0 2,109,702,144
6 528,449,536 0 528,449,536
In this case, the client does not have optimal set.
Rollback Segment Storage for DB: RW1PRD Instance: rw1prd Snaps: 10489 - 1
->The value of Optimal should be larger than Avg Active

RBS No Segment Size Avg Active Optimal Size Maximum Size
------ --------------- ----------- --------------- ---------------
0 5,087,232 0 5,087,232
1 52,420,608 ########### 52,428,800 335,536,128
2 52,420,608 10,551,688 52,428,800 283,107,328
3 52,420,608 10,621,742 52,428,800 283,107,328
4 52,420,608 10,736,056 52,428,800 283,107,328
5 52,420,608 17,861,266 52,428,800 325,050,368
6 52,420,608 19,579,373 52,428,800 335,536,128
7 52,420,608 11,571,513 52,428,800 283,107,328
8 52,420,608 44,140,215 52,428,800 335,536,128
9 52,420,608 65,045,643 52,428,800 325,050,368
In this instance optimal is set and we can see an overflow for average active for RBS 1 and that RBS 9 was also larger than optimal. If this is a consistent problem it may be that the optimal value should be raised.
- Undo Segment Summary for DB
Undo Segment Summary for DB: S901 Instance: S901 Snaps: 2 -3
-> Undo segment block stats:
-> uS - unexpired Stolen, uR - unexpired Released, uU - unexpired reUsed
-> eS - expired Stolen, eR - expired Released, eU - expired reUsed

Undo Undo Num Max Qry Max Tx Snapshot Out of uS/uR/uU/
TS# Blocks Trans Len (s) Concurcy Too Old Space eS/eR/eU
---- -------------- ---------- -------- ---------- -------- ------ -------------
1 20,284 1,964 8 12 0 0 0/0/0/0/0/0
The description of the view V$UNDOSTAT in the Oracle9i Database Reference guide provides some insight as to the columns definitions. Should the client encounter SMU problems, monitoring this view every few minutes would provide more useful information.
- Undo Segment Stats for DB
Undo Segment Stats for DB: S901 Instance: S901 Snaps: 2 -3
-> ordered by Time desc

Undo Num Max Qry Max Tx Snap Out of uS/uR/uU/
End Time Blocks Trans Len (s) Concy Too Old Space eS/eR/eU
------------ ------------ -------- ------- -------- ------- ------ -------------
12-Mar 16:11 18,723 1,756 8 12 0 0 0/0/0/0/0/0
12-Mar 16:01 1,561 208 3 12 0 0 0/0/0/0/0/0
This section provides a more detailed look at the statistics in the previous section by listing the information as it appears in each snapshot.
It should be noted that 9i introduces an optional init.ora parameter called UNDO_RETENTION which allows the DBA to specify how long the system will attempt to retain undo information for a committed transaction without being overwritten or recaptured. This parameter, based in units of wall-clock seconds, is defined universally for all undo segments.
Use of UNDO_RETENTION can potentially increase the size of the undo segment for a given period of time, so the retention period should not be arbitrarily set too high. The UNDO tablespace still must be sized appropriately. The following calculation can be used to determine how much space a given undo segment will consume given a set value of UNDO_RETENTION.
Undo Segment Space Required = (undo_retention_time * undo_blocks_per_seconds)
As an example, an UNDO_RETENTION of 5 minutes (default) with 50 undo blocks/second (8k blocksize) will generate:
Undo Segment Space Required = (300 seconds * 50 blocks/ seconds * 8K/block) = 120 M
The retention information (transaction commit time) is stored in every transaction table block and each extent map block. When the retention period has expired, SMON will be signaled to perform undo reclaims, done by scanning each transaction table for undo timestamps and deleting the information from the undo segment extent map. Only during extreme space constraint issues will retention period not be obeyed.
Latch Information
Latch information is provided in the following three sections.
. Latch Activity
. Latch Sleep breakdown
. Latch Miss Sources
This information should be checked whenever the "latch free" wait event or other latch wait events experience long waits.
- Latch Activity
Latch Activity for DB: PHS2 Instance: phs2 Snaps: 100 -104
->"Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
willing-to-wait latch get requests
->"NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests
->"Pct Misses" for both should be very close to 0.0

Pct Avg Pct
Get Get Slps NoWait NoWait
Latch Name Requests Miss /Miss Requests Miss
----------------------------- -------------- ------ ------ ------------ ------
KCL freelist latch 9,382 0.0 0
KCL lock element parent latch 15,500 0.0 0.0 0
KCL name table latch 3,340 0.0 0
Token Manager 12,474 0.0 0.0 0
active checkpoint queue latch 2,504 0.0 0
batching SCNs 114,141 0.0 0.0 0
begin backup scn array 6,697 0.0 0
cache buffer handles 1 0.0 0
cache buffers chains 1,056,119 0.1 0.2 6,303 0.0
cache buffers lru chain 104,996 0.0 4,078 0.0


This section is identical in both 8i and 9i.
This section is particularly useful for determining latch contention on an instance. Latch contention generally indicates resource contention and supports indications of it in other sections.
Latch contention is indicated by a Pct Miss of greater than 1.0% or a relatively high value in Avg Sleeps/Miss.
While each latch can indicate contention on some resource, the more common latches to watch are:
- cache buffer chains: Contention on this latch confirms a hot block issue. See Note 62172.1 'Understanding and Tuning Buffer Cache and DBWR in Oracle7, Oracle8, and Oracle8i' for a discussion of this phenomenon.
- shared pool: Contention on this latch in conjunction with reloads in the SQL Area of the library cache section indicates that the shared pool is too small. Contention on this latch indicates that one of the following is happening:
. The library cache, and hence, the shared pool is too small.
. Literal SQL is being used. See Note 62143.1 'Understanding and Tuning the Shared Pool for an excellent discussion of this topic.
. On versions 8.1.7.2 and higher, session_cached_cursors might need to be set. See enhancement bug 1589185 for details.
See Note 62143.1 Understanding and Tuning the Shared Pool in Oracle7, Oracle8, and Oracle8i for a good discussion on literal SQL and its impact on the shared pool and library cache.
- Latch Sleep breakdown
Latch Sleep breakdown for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> ordered by misses desc

Get Spin &
Latch Name Requests Misses Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
row cache objects 1,908,536 70,584 16,976 54656/14893/
1022/13/0
dlm resource hash list 624,455 15,931 71,868 118/959/1483
5/19/0
parallel query alloc buffe 37,000 4,850 362 4502/335/12/
1/0
shared pool 176,560 3,238 773 2649/431/134
/24/0
library cache 871,408 1,572 935 925/433/151/
63/0
cache buffers chains 1,056,119 872 209 670/195/7/0/
0
....

This section provides additional supporting information to the previous section. It is identical in 8i and 9i.
Latch Miss Sources
Latch Miss Sources for DB: PHS2 Instance: phs2 Snaps: 100 -104
-> only latches with sleeps are shown
-> ordered by name, sleeps desc

NoWait Waiter
Latch Name Where Misses Sleeps Sleeps
------------------------ -------------------------- ------- ---------- -------
batching SCNs kcsl01 0 1 1
cache buffers chains kcbgtcr: kslbegin 0 114 39
cache buffers chains kcbgcur: kslbegin 0 62 62
cache buffers chains kcbrls: kslbegin 0 29 104
cache buffers chains kcbchg: kslbegin: bufs not 0 1 1
dlm group lock table lat kjgalk: move a lock from p 0 1 0
dlm lock table freelist kjlalc: lock allocation 0 10 6
dlm lock table freelist kjgdlk: move lock to paren 0 1 2
dlm lock table freelist kjlfr: remove lock from pa 0 1 3
dlm resource hash list kjucvl: open cr lock reque 0 36,732 562
dlm resource hash list kjxcvr: handle convert req 0 29,189 39,519
dlm resource hash list kjskchcv: convert on shado 0 3,907 25
dlm resource hash list kjrrmas1: lookup master no 0 1,603 18
dlm resource hash list kjcvscn: remove from scan 0 383 0
dlm resource hash list kjrlck: lock resource 0 26 1,965

This section provides a detailed breakdown of which latches are missing and sleeping. It is particularly useful in identifying library cache bugs as it provides latch child information not available in the previous two sections.
Search on the latch child name experiencing high misses or sleeps and you can often find the bug responsible.
It is identical in 8i and 9i.
Dictionary Cache Statistics
This is an interesting section to monitor but about which you can do very little as the only way to change the size of the dictionary cache is to change the shared pool size as the dictionary cache is a percentage of the shared pool. It is identical in 8i and 9i.
Dictionary Cache Stats for DB: PHS2 Instance: phs2 Snaps: 100 -104
->"Pct Misses" should be very low (< 2% in most cases)
->"Cache Usage" is the number of cache entries being used
->"Pct SGA" is the ratio of usage to allocated size for that cache

Get Pct Scan Pct Mod Final Pct
Cache Requests Miss Requests Miss Req Usage SGA
---------------------- ------------ ------ -------- ----- -------- ------ ----
dc_constraints 0 0 0 0 0
dc_database_links 0 0 0 0 0
dc_files 0 0 0 161 98
dc_free_extents 226,432 16.8 304 0.0 288 ###### 99
...
Library Cache Statistics
This section of the report shows information about the different sub-areas activity in the library cache.
The 8i version looks like this.
Library Cache Activity for DB: PHS2 Instance: phs2 Snaps: 100 -104
->"Pct Misses" should be very low

Get Pct Pin Pct Invali-
Namespace Requests Miss Requests Miss Reloads dations
--------------- ------------ ------ -------------- ------ ---------- --------
BODY 48 0.0 48 0.0 0 0
CLUSTER 7 0.0 8 0.0 0 0
INDEX 0 0 0 0
OBJECT 0 0 0 0
PIPE 0 0 0 0
SQL AREA 42,640 0.2 193,249 0.1 23 17
TABLE/PROCEDURE 287 3.8 1,701 2.6 6 0
TRIGGER 0 0 0 0
The 9i version looks like this.
Library Cache Activity for DB: S901 Instance: S901 Snaps: 2 -3
->"Pct Misses" should be very low

Get Pct Pin Pct Invali-
Namespace Requests Miss Requests Miss Reloads dations
--------------- ------------ ------ -------------- ------ ---------- --------
BODY 29 0.0 29 0.0 0 0
SQL AREA 579 5.7 2,203,964 0.0 0 0
TABLE/PROCEDURE 292 0.0 496 0.0 0 0
TRIGGER 12 0.0 12 0.0 0 0
Values in Pct Misses or Reloads in the SQL Area, Tables/Procedures or Trigger rows indicate that the shared pool may be too small. To confirm this, consistent values (not sporadic) in Pct Misses or Reloads in the Index row indicate that the buffer cache is too small. (No longer available in 9i.)
Values in Invalidations in the SQL Area indicate that a table definition changed while a query was being run against it or a PL/SQL package being used was recompiled.
SGA Memory Summary
This section provides a breakdown of how the SGA memory is used at the time of the report. It is useful to be able to track this over time. This section is identical in 8i and 9i.
SGA regions Size in Bytes
------------------------------ ----------------
Database Buffers 209,715,200
Fixed Size 103,396
Redo Buffers 104,873,984
Variable Size 423,956,480
----------------
sum 738,649,060
SGA Memory Detail
This section shows a detailed breakdown of memory usage by the SGA at the beginning and ending of the reporting period. It allows the DBA to track memory usage throughout the business cycle. It is identical in 8i and 9i.
SGA breakdown difference for DB: PHS2 Instance: phs2 Snaps: 100 -104

Pool Name Begin value End value Difference
----------- ------------------------ -------------- -------------- -----------
java pool free memory 20,000,768 20,000,768 0
large pool PX msg pool 230,386,744 230,386,744 0
large pool free memory 299,976 299,976 0
shared pool Checkpoint queue 189,280 189,280 0
shared pool KGFF heap 252,128 252,128 0
shared pool KGK heap 31,000 31,000 0
shared pool KQLS heap 2,221,552 2,246,640 25,088
shared pool PL/SQL DIANA 436,240 436,240 0
shared pool PL/SQL MPCODE 138,688 138,688 0
Init.ora Parameters Summary
The final section shows the current init.ora parameter settings. It displays those that are more commonly used including some hidden. It is identical in 8i and 9i.
init.ora Parameters for DB: PHS2 Instance: phs2 Snaps: 100 -104
End value
Parameter Name Begin value (if different)
----------------------------- --------------------------------- --------------
_PX_use_large_pool TRUE
always_anti_join HASH
audit_trail TRUE
background_dump_dest /u01/app/oracle/admin/phs2/bdump
bitmap_merge_area_size 10485760
compatible 8.1.7
control_files /oradata/phs2/hsz16/control_01.db
core_dump_dest /u01/app/oracle/admin/phs2/cdump
cursor_space_for_time TRUE

Tuesday, February 20, 2007

Usefull links

**http://dbataj.blogspot.com**
http://metalink.oracle.com
http://vasujeedigunta.blogspot.com/
http://www.idevelopment.info/cgi/ORACLE_dba_scripts.cgi#Tuning
http://www.managedtime.com/freesqlbook.php3
http://www.idevelopment.info/data/Oracle/DBA_tips/Export_Import/EXP_2.shtml
http://www.dbapool.com
http://tkyte.blogspot.com/
http://oracle- online-help. blogspot. com
http://www.oracle-base.com/articles/misc/OracleShellScripting.php
http://www.oracle-base.com/dba/DBACategories.php
http://orafaq.com/scripts/index.htm#UNIX
http://orafaq. com/node/ 3
http://www.ixora. com.au/q+ a/0104/11095147. htm
http://www.dbapool. com/articles/ 091306.html
http://www.intuitive.com/wicked/wicked-cool-shell-script-library2.shtml
http://www.databasejournal.com/scripts/archives.php
http://www.dbazine.com/oracle/or-articles/
http://dba.fyicenter.com/article/99988578.html
http://askanantha.googlepages.com/

http://www.ss64.com/index.html
http://www.jlcomp.demon.co.uk/faq/ind_faq.html#Recovery
http://www.ss64.com/ora/
http://techpubs.sgi.com/library/tpl/cgi-bin/browse.cgi?db=man&coll=linux&pth=/man1

Sunday, August 20, 2006

Common Performance Tuning Issues

TROUBLESHOOTING GUIDE
Common Performance Tuning Issues

Table of Contents

1. Introduction
2. Shared Pool and Library Cache Performance Tuning
3. Buffer Cache Performance Tuning
4. Latch Contention
5. Redo Log Buffer Performance Tuning
6. Rollback Segment Performance Tuning
7. Temporary Tablespace Performance Tuning
8. Checkpoint Performance Tuning
9. Query Performance Tuning
10. Import Performance Tuning
11. STATPACK Utility
12. Utlbstat/Utlestat Utility

1. Introduction

This document covers some of the most common tuning scenarios. More specific
information on each tuning area is available through the links provided.

2. Shared Pool and Library Cache Performance Tuning

Oracle keeps SQL statements, packages, object information and many other items
in an area in the SGA known as the shared pool. This sharable area of memory
is managed as a sophisticated cache and heap manager rolled into one. It has 3
fundamental problems to overcome:

- The unit of memory allocation is not a constant. Memory
allocations from the pool can be anything from a few bytes to
many kilobytes

- Not all memory can be 'freed' when a user finishes with, as the
aim of the shared pool is to maximize sharability of
information.

- There is no disk area to be able to page out to so this is not
like a traditional cache where there is a file-backing store.
Only "recreatable" information can be discarded from the cache
and it has to be re-created when it is next needed.

Here are some tips in tuning the shared pool:

· Flushing the Shared Pool will coalesce small chunks of memory. When
the shared pool is highly fragmented, this may temporarily restore
performance. To flush the shared pool issue: alter system flush shared_pool;
Please note that executing this statement will cause a spike in performance
while the objects are reloaded and should be done when the database is not
being heavily used.

· Make sure that OLTP application uses "bind variables".
This is not that important for DSS.

· Make sure that the library cache pinhitratio is > 95%

. Increasing the size of the shared pool is not always the answer for poor
hitratios.




3. Buffer Cache Performance Tuning

The database buffer cache holds copies of data blocks read from disk.
Since the cache is usually limited due to memory constraints, all the data on
the disk cannot fit in the cache. When the cache is full, subsequent cache
misses cause Oracle to write data already in the cache to disk. A subsequent
access to the data written to disk results in a cache miss.

Here are some tips in tuning the buffer cache:

· Enable Buffer Cache Advisory in order to size your Buffer Cache correctly.


Avoid the following

· 'cache buffers lru chain' latch contention

· Large "Average Write Queue" length

· Lots of time spent waiting for "write complete waits"

· Lots of time spent waiting for "free buffer waits"




4. Latch Contention

Latches are low level serialization mechanisms used to protect
shared data structures in the SGA. A latch is a type of a lock
that can be very quickly acquired and freed. The implementation
of latches is operating system dependent, particularly in regard
to whether a process will wait for a latch and for how long.

Here are some of the important latches to tune:

- Redo Copy/Allocation Latch
- Shared Pool Latch
- Library Cache Latch




5. Redo Log Buffer Performance Tuning

LGWR writes redo entries from the redo log buffer to a redo log file.
Once LGWR copies the entries to the redo log file the user process can
over write these entries. The statistic "redo log space requests" reflects
the number of times a user process waits for space in the redo log buffer.

Here are some tips in sizing the redo logs:

· The value of "redo log space requests" statistic in v$sysstat should be near 0.

· Size your redo appropriately. The recommendation is to have the redo
log switch every 15-30 minutes.

Using the options UNRECOVERABLE in Oracle7 and NOLOGGING in Oracle8 you can avoid
redolog entries generation of certain operation to improve the performance.
Operations like: index creation, create table as select,SQL*Loader operation, etc.
can be easily rebuild without having redolog entries available.



6. Rollback Segment Performance Tuning

The Oracle database provides read consistency on rows fetched for
operations such as SELECT, INSERT, UPDATE, and DELETE against any
database object. Rollback segments are used to store undo transactions
in case the actions need to be "rolled back" or the system needs to
generate a read-consistent image from an earlier time.

Here are some tips in sizing the rollback segments:

· It is recommended to have at least 1 rollback segment for every 4
transactions.

· One large rollback segment is recommended for long running queries.




7. Temporary Tablespace Performance Tuning

In RDBMS release 7.3, Oracle introduced the concept of a temporary
tablespace. This tablespace would be used to hold temporary objects,
like sort segments. Sort segments take their storage parameters from
the DEFAULT STORAGE (NEXT) clause of the tablespace in which they reside.

Here are some tips in tuning the temporary tablespace:

· If there is a lot of contention for the Sort Extent Pool latch, even in
the stable state, then you should increase the extent size by changing
the NEXT value of the DEFAULT STORAGE clause of the temporary tablespace.

· If there is a lot of contention for the Sort Extent Pool latch and if the
wait is the result of too many concurrent sorts, you should increase the
SORT_AREA_SIZE parameter so that more sorts stay in memory.

. It is recommended to have the extent size equal to sort_area_size.
Here is an example why. Say your extent size = 500K and sort_area_size = 1Mg.
Now if there is a sort to the disk, it aquires 2 extents of 500K each and
this could cause performance degradation.




8. Checkpoint Performance Tuning

A Checkpoint is a database event, which synchronizes the data blocks in memory
with the datafiles on disk. A checkpoint has two purposes:

(1) to establish data consistency, and
(2) Enable faster database recovery.

When a checkpoint fails messages must be verified into into the alert.log file.

Here are some tips to tune the checkpoint process:

· The CKPT process can improve performance significantly and decrease the
amount of time users have to wait for a checkpoint operation to complete.

· If the value of LOG_CHECKPOINT_INTERVAL is larger than the size of the redo
log, then the checkpoint will only occur when Oracle performs a log switch
from one group to another, which is preferred. There has been a change in
this behaviour in Oracle 8i.

· The LOG_CHECKPOINTS_TO_ALERT when set to TRUE allows you to log checkpoint
start and stop times in the alert log. This is very helpful in determining
if checkpoints are occurring at the optimal frequency

. Ideally checkpoints should occur only at log swiches.




9. Query Performance Tuning

If queries are running slow consider the following:

· How fast do you want the query to run and is it a reasonable request?
· What is the OPTIMIZER_MODE set to?
· Are all indexes involved in the query valid?
· Is there any other long running query on the database?

In case of CBO:

· Are there statistics on the tables and indexes?
· Were the statistics computed or estimated?

Here are the 2 main diagnostic tools used for query performance tuning


- TKPROF
- AUTOTRACE




10. Import Performance Tuning

There is very little consolidated information on how to speed up import when it
is unbearably slow. Obviously import will take whatever time it needs to
complete, but there are some things that can be done to shorten the time it
will take.




11. STATPACK Utility

The STATPACK utility is the next generation of the Utlbstat/Utlesta report which
helps the database administrator to gather statistical information to detect
performance problems. Statspack improves on the existing UTLBSTAT/UTLESTAT performance
scripts collecting more data, including high resource SQL, pre-calculating some ratios,
such as cache hit ratios, per transaction and per second statistics, keeping a
permanent repository which makes historical data comparisons easier and separating
data collection from the report generation.




12. Utlbstat/Utlestat Utility

Bstat/Estat is a set of sql scripts located under your $ORACLE_HOME/rdbms/admin
directory that are useful for capturing a snapshot of system wide database
performance statistics. UTLESTAT creates a second snapshot of these views and
reports on the differences between the two snapshots to a file called
'report.txt'.

Bstat.sql creates a set of tables and views in your sys account, which contain
a beginning snapshot of database performance statistics.

Estat.sql creates a set of tables in your sys account, which contain an ending
snapshot of the database performance statistics and them to a file called
'report.txt'.

Here are some tips:

· Make sure that you have TIMED_STATSTICS set to TRUE (this adds only a very
small overhead to database operations).

. Make sure that the database is up and running for a while before running
utlbstat.

· Run the utbstat.sql and the utlestat.sql from svrmgrl and not sql*plus.

. Make sure that the database is not shutdown while the utlbstat/estat scripts
are running, otherwise the statstics generated are not accurate.

· Run utlbstat/estat at least for 1-3hrs during the period you are trying to
tune the database.

Performance Tuning Approaches on Oracle and UNIX

Performance Tuning Approaches on Oracle and UNIX



As a system administrator, you will often be confronted with users who say that the response time on the system is unacceptable. What is unacceptable performance? There are two sides to this:

* Quantifiable performance
* Unmeasurable performance like user dissatisfaction.



This paper will help identify quantifiable performance problems and methods to solve them. When performance-tuning databases, it is a good to split the problem into parts to facilitate the process. Performance tuning Oracle databases can be divided into three subcategories:

1. Application tuning
2. RDBMS tuning
3. UNIX system tuning



Each of the above categories has a different maximum possible effect on the performance of your ORACLE database.





1. Application Tuning

Application tuning is by far the most effective aspect for database tuning. You will need to tune your SQL statements since they intervolve both your application and database. To analyze your SQL statements you will need to:



1. Add the following lines to your init.ora file:

SQL_TRACE = true

TIMED_STATISTICS = true

2. Restart your database.
3. Run your application.
4. Run tkprof against the tracefile created by your application:

tkprof EXPLAIN=username/passwd

5. Look at formatted output of trace command and make sure that your

SQL statement is using indexes correctly. Refer to DBA guide for a list

of rules that the oracle optimizer uses when choosing a path for a SQL statement.



<<<<<< OUTPUT OF TKPROF FILE >>>>>>>



count = number of times OPI procedure was executed

cpu = cpu time executing in hundredths of seconds

elap = elapsed time executing in hundredths of secs

phys = number of physical reads of buffers (from disk)

cr = number of buffers gotten for consistent read

cur = number of buffers gotten in current mode (usually for update)

rows = number of rows processed by the OPI call

===========================================================

select * from emp where empno=7369





count cpu elap phys cr cur rows

Parse: 1 0 0 0 0 0

Execute: 1 0 0 0 0 2 0

Fetch: 1 0 0 219 227 0 2



Execution plan:

TABLE ACCESS (FULL) OF 'EMP'

===========================================================

select empno from emp where empno=7934





count cpu elap phys cr cur rows

Parse: 2 0 0 0 0 0

Execute: 2 0 0 0 0 2 0

Fetch: 2 0 0 0 2 0 2



Execution plan:

INDEX (RANGE SCAN) OF 'ALEX_INDX' (NON-UNIQUE)

===========================================================



When your query is returning less than 10% of the rows of a table and the table is a reasonably large table, you will want to index your query. Above is an example of the same query run twice. The first time the optimizer chose to do a full table scan on the table EMP. The second time an index was created called ALEX_INDX. The optimizer chose to use the index. On a table

with 1000 rows this should have resulted in a faster query.



Oracle uses a rule-based optimizer for its SQL statements. When a SQL statement is parsed, the optimizer decides which query path will be chosen. The optimizer can often times make mistakes or simply illustrate that your SQL statement was written incorrectly. You can refer to Chapter 2 of the Performance Tuning Guide for more information on SQL statement tuning. Page 19-17 of the V6 DBA guide has a listing of the query paths that the optimizer will use ranked by speed.



Other considerations for application tuning may include the actual design of your application. A common problem found in menu5/forms30 applications is the way in which one calls the other. Make sure that menu5 is calling forms30 directly and not through a UNIX system call. Calling forms30 through a UNIX system call has the effect of doubling the number of connections to the database and also doubling the load on the UNIX machine.





2. RDBMS Tuning



The default init.ora database configuration file is inadequate for large

RDBMS's. Tools for tuning the database are:

1. SQLDBA * Monitor

2. Bstat/Estat

3. V$ database tables.



By looking at SQLDBA * Monitor one can get a good idea of what is happening to the database at that moment. The first place to look is the IO display. This will display the CACHE HIT RATIO. The cache-hit ratio is the ratio of hits to misses in the SGA for data. The higher the ratio, the more data is being found in the SGA. This means that oracle will not have to do as many disk reads to retrieve information, thus saving time and CPU processing.



While a hit ratio of 1.00 would be ideal, it is more realistic to aim for achieving a hit ratio of about .80.DB_BUFFER values by setting the init.ora parameter

DB_BLOCK_LRU_EXTENDED_STATISTICS equal to the additional number of DB_BUFFERS you wish to add. You will then be able to see the additional number of cache hits that would occur if you add X number of database buffers. For more information on this you can refer to pg 3-20 of the database performance tuning guide.



After looking at the cache-hit ratio and adjusting the number of database buffers accordingly you can analyze FILE IO. This screen

in SQLDBA * Monitor will show IO to each oracle datafile. It allows you

to identify two problem areas:

1. Table scans (indicated by a low number of reads/s & high number of blks/R)

2. Heavily used tablespaces



Table scans indicate that your SQL statements might not be tuned correctly. If your query is returning less than 15% of the rows of a table you should be using indexes to query the tables. Secondly you will also be aware of which tablespaces are used heavily and perhaps consider moving them to a separate disk to balance the IO.



The Rollback screen can be very useful on an update or insert-heavy system. Whenever you are changing or inserting new data, you write the changes to a rollback segment until they are committed. If you have 30 transactions and 1 rollback segment, you will have contention for that rollback segment. This will incur a performance penalty. To make sure that this is not happening, verify that every rollback segment has a maximum of 4 active transactions using it during heavy updating of the database.



The table V$ROWCACHE also contains useful information for tuning. The following select statement can determine whether or not any of the init.ora parameters beginning with DC_XXXX should be raised. These parameters control the size of various data dictionary caches.



SELECT parameter,gets,getmisses,count,usage FROM sys.v$rowcache;



PARAMETER GETS GETMISSES COUNT USAGE

------------- ------- ---------- ---------- -----

dc_usernames 134 5 50 5

dc_columns 11772 288 300 300



The parameter dc_usernames has a very low number of cache misses (GETMISSES). We have allocated 50 username entries (COUNT) and are only using 5 (USAGE). There is no need to raise the value of dc_usernames.



We have allocated 300 DC_COLUMNS and we are using all 300. In addition, the number of GETMISSES or cache misses are high for the parameter DC_COLUMNS. In this case, it is recommended to increase the value of DC_COLUMNS to reduce the number of GETMISSES or cache misses. A cache miss can be a very expensive operation for oracle since it means that the data requested was not in cache and thus a recursive call had to be made. For more information on this you can refer to pg 3-13 of the v6 performance tuning guide.





3. UNIX System Analysis



It is useful to subdivide UNIX system analysis into three subcategories:

1. Memory
2. CPU
3. IO



1. MEMORY

One of the most common problems when running large numbers of concurrent users on UNIX machines is lack of memory. In this case, a quick review of UNIX memory management is useful to see what effect lack of RAM can have on performance. A UNIX machine has virtual memory: the total addressable memory range. Virtual memory is composed of RAM, DISK and SWAP space.

Generally, you will want to have the available SWAP space equal to 2 to 3 times the RAM.



How does UNIX use SWAP space? It uses two memory management policies: swapping and paging. Swapping occurs when UNIX transfers an entire process from RAM to a SWAP device. This frees up a large amount of RAM. Paging occurs when UNIX only transfers a "PAGE" of memory to the SWAP device. Only a tiny portion of a process might actually be "paged out" to a SWAP device. While swapping frees up memory - it is slower than paging. Paging generally is more efficient but does not allow for large amounts of memory to be freed simultaneously. Most UNIX systems today use a combination of paging and swapping to manage memory. Generally, you will see the following behavior:

* System lightly used: no paging or swapping occurs.
* System under a medium load: paging occurs as RAM memory runs low
* System under a very heavy load: paging stops and swapping begins.



When analyzing your UNIX machine, make sure that the machine is not swapping at all and at worst paging lightly. This indicates a system with a healthy amount of memory available. To analyze paging and swapping, use the following commands. Commands used in Berkeley UNIX based systems will be marked as BSD. Commands used in ATT system V will be marked as ATT.



1. vmstat 5 5 (BSD)

procs memory page disk faults cpu

r b w avm fre re at pi po fr de sr d0 s1 d2 s3 in sy cs us sy id

0 0 0 0 1088 0 2 2 0 1 0 0 0 0 0 0 26 72 24 0 1 98



Note: There are NO pageouts (po) occurring on this system. There are also 1088 * 4k pages of free RAM available (4 Meg). It is OK and normal to have page out (po) activity. You should get worried when the number of page ins (pi) starts rising. This indicates that you system is starting to page.



2. pstat -s (BSD)

12112k allocated + 3252k reserved = 15364k used, 37280k available



Note: pstat will also give you the amount of RAM and SWAP currently available on your machine.



3. sar -wpg 5 5 (ATT)

09:54:29 swpin/s pswin/s swpot/s pswot/s pswch/s

atch/s pgin/s ppgin/s pflt/s vflt/s slock/s

pgout/s ppgout/s pgfree/s pgscan/s %s5ipf



09:54:34 0.00 0.0 0.00 0.0 12

0.00 0.22 0.22 0.65 3.90 0.87

0.00 0.00 0.00 0.00 0.00



Note: There is absolutely no swapping or paging going on. (swpin,swpot,ppgin,ppgout).



4. sar -r 5 5 (ATT)

10:10:22 freemem freeswp

10:10:27 790 5862



This will give you a good indication of how much free swap and RAM you have on your machine. There are 790 pages of memory available and 5862 disk blocks of SWAP available.





2. CPU

Once you have monitored your systems available memory you will want to make sure the the CPU(s) are not being overloaded. Here is some general information about how processes get allocated CPU time. UNIX is a multi-processing operating system. That means that a UNIX machine has to manage and process multiple user processes simultaneously. UNIX does this in the same way that people wait in line to buy groceries.



When a process is ready to be processed by a CPU it will be placed on the waiting line or RUN-QUEUE. This is a queue of processes waiting to be run. Obviously there are limits within which one wants to keep the RUN-QUEUE size. Another factor of interest is the percentage of time the the CPU spends in user mode, system mode, or idle mode. Some commands that determine whether or not there is a CPU resource problem occurring:



1. vmstat 5 5 (BSD)

procs memory page disk faults cpu

r b w avm fre re at pi po fr de sr d0 s1 d2 s3 in sy cs us sy id

0 0 0 0 1088 0 2 2 0 1 0 0 0 0 0 0 26 72 24 0 1 98



Note: The CPU is spending most of its time in IDLE mode (id). That means that the CPU is not being heavily used at all! There are no processes that are waiting to be run (r), blocked (b), or waiting for IO (w) in the RUN QUEUE.



2. sar -qu 5 5 (ATT)

10:58:02 runq-sz %runocc swpq-sz %swpocc

%usr %sys %wio %idle



10:58:07 2.8 100

0 2 4 94



Note: The CPU is spending most (94%) of its time in idle mode. This CPU is not being heavily used at all. Two solutions to this are:

1. Obtain a faster processor
2. Use more CPU's.



Avoid overloading your CPU. Response time on your machine will suffer if it is overloaded. Try to keep the run queue 100% occupied and have less that 6 processes waiting to be run for one CPU. This changes as you add more CPU's or a faster CPU. You may also want to avoid the CPU spending most of its time (more than 50%) in system mode. This may indicate that you are spending too much time in kernel mode servicing interrupts, swapping processes etc.





3. I/O

The last step in analyzing your UNIX machine is taking a look at IO. After having looked at SQLDBA monitor to see which datafiles are being used heavily you may also want to take a look at what UNIX says about file IO. These commands are for analyzing file IO on file systems.



1. iostat -d 5 5(BSD)

sd1 sd3

bps tps msps bps tps msps

1 0 0.0 4 0 0.0



iostat will display the number of kilobytes transferred per second, the number of transfers per second, and the milliseconds per average seek. In the example above, both SD1 and SD3 are not used heavily at all. BPS rates over 30 indicate heavy usage of a particular disk. If only one disk shows heavy usage, consider moving some of your datafiles off it or striping your data across several disks.



2. sar -d 5 5 (ATT)

09:17:20 device %busy avque r+w/s blks/s avwait avserv

09:17:26 iop0/pdisk010 472.45 1.18 9 107 39.26 512.66

iop0/pdisk000 18.43 2.66 8 132 31.36 24.10

iop0/pdisk020 317.08 1.11 11 165 31.95 294.02

iop0/pdisk021 590.88 1.34 27 518 96.26 219.96

iop0/pdisk040 34.94 1.64 18 113 43.70 19.58

iop0/pdisk041 45.33 1.17 20 79 3.73 22.89



Note: The "-d" option reports activity for each block device, The following is an explanation on the output. %busy, avque - portion of time device was busy servicing a transfer request, average number of requests outstanding during that time; r+w/s, blks/s - number of data transfers from or to device, number of bytes transferred in 512-byte units; avwait, avserv - average time (in ms) that transfer requests wait idly on queue, and average time to be serviced. There is a relationship between the number of blocks transferred per second and the average wait time. You can use this to identify which disks are heavily used and which are

underutilized.



3. sar -b 5 5 (ATT)

15:52:57 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s

15:53:12 0 2 90 1 2 38 0 0



Note: The "-b" option indicates the overall health of the IO subsystem. The %rcache should be greater than 90% and %wcache should be greater than 60%. If this is not the case, your system may be bound by disk IO. The sum of bread, bwrit, pread, and pwrit gives a good indicator of how well your file subsystem is doing. The sum should not be greater than 40 for 2 drives and

60 for 4-8 drives. If you exceed these values, your system may be IO bound. For more information on this refer to pg 2-22 of the Oracle V7 technical reference guide.



When analyzing disk IO, make sure that you have balanced the load on your system. Here is a "wish" list of steps for designing a disk layout for Oracle:

1. Make sure that your logfiles and archived logfiles are NOT on the same disk as your datafiles. This is a basic safety precaution against disk failure.
2. Put your files on raw devices.
3. Allocate one disk for the User Data Tablespace.
4. Place Rollback, Index, and System Tablespaces on separate disks.
5. Consider using Raid level 5 disk striping to stripe your datafiles across separate disks. You can also use the above-mentioned UNIX commands to monitor the IO on your system and identify problem areas.



Other tools for system administratation are platform-dependent. For example, HP and SEQUENT provide a facility called MONITOR that gives much of the above information in a graphical format. ATT SVR 4 also provides a product called GSAR (graphical sar). Look in your operating system (OS) documentation for additional monitoring commands. As a further

reference, consider "SYSTEM PERFORMANCE TUNING" published by O'Reilly and Associates.





Conclusion: There are many ways to approach Oracle performance issues. A structured approach like the one discussed above will allow the system administrator to systematically analyze her system and identify any problem areas. Once this has been accomplished, measures can be taken to correct problems. Performance is subjective, so find out what is expected.