Veritas Cluster Debugging Tips

Initial Notes * Veritas Cluster Overview
Cluster Not Up - HELP * Find Current Status * Single System NOT Faulted * Check License
Clear Faults * Clearing Faults that will NOT clear * Reviewing Logfiles * Calling Support
Restart Services * DB Maintenance * Admin Network Maintenance
Shutting Down Machines * Manual Startup for Emergency

Initial Notes

Veritas cluster server is a high availability server. This means that processes switch between servers when a server fails. All database processes are run through this server - and as such, this needs to run smoothly. Note that the oracle process should only actually be running on the server which is active. On monitoring tools, the procs light for whichever box is secondary should be yellow, because oracle is not running. Yet, the cluster is running on both systems.

Cluster Not Up -- HELP

To find out Current Status:

If hastatus fails on both machines (it returns that the cluster is not up or returns nothing), try to start the cluster

Starting Single System NOT Faulted

hagrp -online group -sys desired-system

If it did NOT clear, did you check licenses?

Bringing up Machines when fault will NOT clear:

gedb002# hastatus -summary

-- SYSTEM STATE
-- System               State                Frozen

A  gedb001              RUNNING              0
A  gedb002              RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State         

B  oragrp          gedb001              Y          N               OFFLINE       
B  oragrp          gedb002              Y          N               OFFLINE       

gedb002#  hares -display | grep  ONLINE
nic-qfe3  State           gedb001   ONLINE
nic-qfe3  State           gedb002   ONLINE

gedb002# vxdg list
NAME         STATE           ID
rootdg       enabled  957265489.1025.gedb002

gedb001# vxdg list
NAME         STATE           ID
rootdg       enabled  957266358.1025.gedb001

Recovery Commands:

Reviewing Log Files:

If you are still having troubles, look at the logs in /var/VRTSvcs/log. Look at the most recent ones for debugging purposes (ls -ltr). Here is a short description of the logs in /var/VRTSvcs/log:

engine.log_A: primary log, usually what you will be reading for debugging

Oracle_A: oracle process log (related to cluster only)

Sqlnet_A: sqlnet process log (related to cluster only)

IP_A: related to shared IP

Volume_A: related to Volume manager

Mount_A: related to mounting actual filesystes (filesystem)

DiskGroup_A: related to Volume Manager/Cluster Server

NIC_A: related to actual network device

By looking at the most recent logs, you can know what failed last (or most recently). You can also tell what did NOT run which may be jut as much of a clue. Of course, if none of this helps, open a call with veritas tech support.

Calling Tech Support:

If you have tried the previously described debugging methods, call Veritas tech support: 800-634-4747. Your company needs to have a Veritas support contract.

Restarting Services:

If a system is gracefully shutdown and it was running oracle or other high availability services, it will NOT transfer them. It only transfers services when the system crashes or has an error.

Doing Maintenance on DBs:

BEFORE working on DB

Run hastop -all -force

AFTER working on Dbs:

Once Oracle is up, run:

Shutting down db machines:

If you shutdown the machine that is running veritas cluster, it will NOT start on the other machine. It only fails over if the machine crashes. You need to manually switch the services if you shutdown the machine. To switch processes:

Doing Maintenance on Admin Network:

If the admin network is brought down (that the veritas cluster uses), veritas WILL fault both machines AND bring down oracle (nicely). You will need to do the following to recover:

Manual start/stop WITHOUT veritas cluster:

THIS IS ONLY USED WHEN THERE ARE DB FAILURES

If possible, use the section on DB Maintenance. Only use this if system fails on coming up AND you KNOW that it is due to a db configuration error. If you manually startup filesystems/oracle -- manually shut them down and restart using hastart when done.

To startup:

Make sure ONLY rootdg volume group is active on BOTH NODEs. This is EXTREMELY important as if it is active on both nodes corruption occurs. [ie. oradg or xxoradg is NOT present]

Once you have confirmed that the oracle datagroup is not active, on ONE machine do the following:

[this may be xxoradg where xx is the client 2 char code]

vxvol -g oradg startall

mount -F vxfs /dev/vx/dsk/oradg/name /mountpoint [Find volumes and mount points in /etc/VRTSvcs/conf/config/main.cf]

Let DBAs do their stuff

To shutdown:

[foreach mountpoint]

vxdg deport oradg

vxvol -g oradg stopall

clear faults; start cluster as described above

A wonderful reference book for Veritas Clusters is:

Shared Data Clusters: Scaleable, Manageable, and Highly Available Systems (VERITAS Series)

Veritas Product Overview
Veritas FileSystem Overview
Veritas Volume Manager Overview
Veritas Cluster Overview * Veritas Cluster Install
Veritas Cluster Debugging * Veritas Cluster Testing

Unix Tutorials ~ Unix System Security ~ Unix Help
Free URL Submit ~ UnixTools.com ~ Free Web Resources
Unix Software ~ Unix Hardware ~ Web Related Books

Free Computer-Related Freebies - Free Software, ISP access, email, webmaster goodies, and fun stuff.

Please email additional sites, redirected/broken link info, suggestions & questions to:

webmaster@unixtools.com

Veritas Cluster Debugging Tips

Unix Tutorials ~ Unix System Security ~ Unix Help Free URL Submit ~ UnixTools.com ~ Free Web Resources Unix Software ~ Unix Hardware ~ Web Related Books

Unix Tutorials ~ Unix System Security ~ Unix Help
Free URL Submit ~ UnixTools.com ~ Free Web Resources
Unix Software ~ Unix Hardware ~ Web Related Books