You
can read a sample of this e-Book before making your purchase:
I. Detecting network bottlenecks
II. Troubleshooting latency
III. Analyze effect of workload on resources
IV. Make the best use of resources
I
will start with in-depth
look at the I/O data path from the application through the operating system, also SAN Storage performance analysis must be used in conjunction
with all the other components like:
I. OS
II. Database
III. Application, and
IV. Network
So
it’s important for SAN Administrator to learn about all the components
I
will monitor these:
I. Performance of the entire array
II. Individual controllers
III. Virtual disks
IV. Storage systems
V. Port
VI. Processor
VII.
Ldev
Qlogic
and Emulex
I
will recommend a Dynamic Sparing Technology that can avoid RAID rebuild before drive failure.
I
will tune sd_max_throttle and the maxphys.
maxphys
determines the maximum number of bytes that can be transferred per SCSI transaction.
Time
(workload) = Time (CPU) + Time (I/O) - Time (Overlap)
Response
time = Queue + Device Service time
20%
to 30%
I. IO read/write ratios
II. IO rates and data transfer rates
III. Cache read hit ratios
I
will use host-based logical volume manager striping and Build the volume across multiple LUNs.
I
will Map the LUN across multiple host paths.
I
will tune HBA LUN Queue Depth value for optimum IO performance.
I. Identify bottlenecks
II. Eliminate bottlenecks
III. Monitoring
IV. It should run always on system as a background task .
V. It should not
interfere with any normal system operations.
VI. It should report on three
main areas: CPU, memory utilization,
VII.
and physical disk performance.
More
than one application or thread writing to a LUN or disk at the same time
Data
in write cache written to disk
I/O
size, Read/Write ratio, Random, Sequential, Data sets that have “linked” contention
RDBMS
distributed tables
Multiple
components of the same DB
Snapshots/Clones
- Switch to faster disk drives.
- Migrate from RAID5 to RAID1.
- Spread the load across more RAID Groups.
Bandwidth throttling is a reactive measure to regulate network traffic and minimize bandwidth congestion.
I have implemented throttling to control the
number of requests a server responds to within a specified period of time.
Yes.
I have a hardware based RAID system on SAN and presented LUNS to hosts with Oracle
Database and users were complaining about performance.
For NFS servers with large set of files, the default value of ncsize
may noot be sufficient and it might need to be increased.
The ncsize parameter should be set in /etc/system.
I will tune it to approximately double the number of active files.
set ncsize=1048576
I will tune :
TCP_KEEPALIVE_INTERVAL
The keepAlive packet ensures that a connection
stays in an active and established state.
I will use ndd:
ndd -set /dev/tcp tcp_keepalive_interval 600000
I will change the following parameter when a
high rate of incoming connection requests result in connection failures:
I. ndd
-get /dev/tcp tcp_conn_req_max_q
II. ndd -set /dev/tcp tcp_conn_req_max_q
8000
III.
Default value: For Solaris 8, the default value is 128.
IV.
Recommended value: 8000
File system cache grows dynamically and steals memory pages from important applications.
Solution:
Priority paging
The priority paging algorithm allows the system to place a boundary
around the file cache, so that file system I/O does not cause paging
of applications.
To enable priority paging, set the following in /etc/system:
set priority_paging=1
Sar functions
-a reports usage of file access system calls.
-b reports buffer cache usage and hit rate.
-c reports system calls.
-d report block device activity.
-g report paging activity (V.4 only)
-k report kernel memory allocation activity. (v.4 only)
-m report message and sephamore activity.
-p report paging activity.
-q report average queue length waiting for CPU.
-r report unused memory pages and disk blocks.
-u report CPU utilization.
-v report status of system tables.
-w report swapping and paging activity.
-x report RFS operation (V.4 only)
-y reports terminal activity.
-A reports all data
-C reports RFS buffer caching overhead.
-Db report buffer cache usage for RFS and local activity.
-Dc report system calls separately for RFS and local activity.
-Du report CPU utilization by RFS and local activity.
-S reports RFS server and request queue status.
If
the database block size is smaller than the file system buffer size, then DBWn has to perform partial block writes. The database
block size must match the file system buffer size exactly.
I
have observed that for the best performance:
SGA (database block buffers + redo buffers (log buffer)) + shared_pool_size
+ memory for user connections should be at least half the physical RAM for all database instances combined.
Here are filesystems buffer size for
UNIX:
File
System Buffer Size Operating Systems
4K AIX, Linux
8K
Solaris, HP-UX, Tru64 Unix
16K Reliant Unix
If the database block size is smaller
than the file system buffer size, then DBWn has to perform partial block writes.