In sweet memories of my ever loving brother "kutty thambi " ARUN KUMAR

Friday, August 20, 2010

Recover Corrupt/Missing OCR and Voting Disk with No Backup

Recover Corrupt/Missing OCR and Voting Disk without Backup

It happens. Not very often, but it can happen. You are faced with a corrupt or missing Oracle Cluster Registry (OCR) and have no backup to recover from.

/u01/crs/oracle/product/10.2.0/crs/log/rac1/alertrac1.log

[client(20186)]CRS-1006:The OCR location /dev/raw/raw1 is inaccessible.
Details in /u01/crs/oracle/product/10.2.0/crs/log/rac1/client/ocrcheck_20186.log.

ocrcheck_20186.log file contents
2010-08-20 12:19:19.796: [ default][4143896256]a_init:7!: Backend init unsuccessful : [22]
2010-08-20 12:19:19.804: [OCRCHECK][4143896256]Failed to initialize OCR context:
[PROC-22: The OCR backend has an invalid format]
2010-08-20 12:19:19.804: [OCRCHECK][4143896256]Failed to initialize ocrchek2
2010-08-20 12:19:19.804: [OCRCHECK][4143896256]Exiting [status=failed].

[root@rac1 bin]# ./ocrcheck
PROT-601: Failed to initialize ocrcheck
[root@rac1 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[root@rac1 bin]#
[root@rac1 bin]# ./crsctl query css votedisk
OCR initialization failed with invalid format: PROC-22: The OCR backend has an invalid format
[root@rac1 bin]#

[root@rac2 bin]# ./crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

verify all the database are disconnect and shutdown

1.Execute rootdelete.sh from All Nodes.

The rootdelete.sh script can be found at $ORA_CRS_HOME/install/rootdelete.sh on all nodes in the cluster

[root@rac1 install]# pwd
/u01/crs/oracle/product/10.2.0/crs/install
[root@rac1 install]# ./rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed with invalid format: PROC-22: The OCR backend has an invalid format
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
[root@rac1 install]#

[root@rac2 install]# pwd
/u01/crs/oracle/product/10.2.0/crs/install
[root@rac2 install]# ./rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed with invalid format:
PROC-22: The OCR backend has an invalid format
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down...
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in '/etc/oracle/scls_scr'
[root@rac2 install]#

2.Run rootdeinstall.sh from the Primary Node.


[root@rac1 install]# ./rootdeinstall.sh

Removing contents from OCR device
2560+0 records in
2560+0 records out

3.Run root.sh from the Primary Node. (same node as above)

[root@rac1 install]# cd ..
[root@rac1 crs]# ./root.sh
WARNING: directory '/u01/crs/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/crs/oracle/product' is not owned by root
WARNING: directory '/u01/crs/oracle' is not owned by root
WARNING: directory '/u01/crs' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/crs/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/crs/oracle/product' is not owned by root
WARNING: directory '/u01/crs/oracle' is not owned by root
WARNING: directory '/u01/crs' is not owned by root
WARNING: directory '/u01' is not owned by root
assigning default hostname rac1 for node 1.
assigning default hostname rac2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: rac1 rac1-priv rac1
node 2: rac2 rac2-priv rac2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/raw/raw2
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
rac1
CSS is inactive on these nodes.
rac2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.
[root@rac1 crs]#
[root@rac2 crs]# pwd
/u01/crs/oracle/product/10.2.0/crs
[root@rac2 crs]# ./root.sh
WARNING: directory '/u01/crs/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/crs/oracle/product' is not owned by root
WARNING: directory '/u01/crs/oracle' is not owned by root
WARNING: directory '/u01/crs' is not owned by root
WARNING: directory '/u01' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/u01/crs/oracle/product/10.2.0' is not owned by root
WARNING: directory '/u01/crs/oracle/product' is not owned by root
WARNING: directory '/u01/crs/oracle' is not owned by root
WARNING: directory '/u01/crs' is not owned by root
WARNING: directory '/u01' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname rac1 for node 1.
assigning default hostname rac2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: rac1 rac1-priv rac1
node 2: rac2 rac2-priv rac2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
rac1
rac2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
The given interface(s), "eth0" is not public. Public interfaces should be used to configure virtual IPs.


Oracle 10.2.0.1 users should note that running root.sh on the last node will fail. Most notably is the silent mode VIPCA configuration failing because of BUG 4437727 in 10.2.0.1.

to workaround these errors.

[root@rac2 bin]# pwd
/u01/crs/oracle/product/10.2.0/crs/bin
[root@rac2 bin]# ./vipca &

When the "VIP Configuration Assistant" appears, this is how I answered the screen prompts:

Welcome: Click Next
Network interfaces: Select only the public interface - eth0
Virtual IPs for cluster nodes:
Node Name: rac1
IP Alias Name: rac1-vip.localdomain
IP Address: 192.168.1.111
Subnet Mask: 255.255.255.0

Node Name: rac2
IP Alias Name: rac2-vip.localdomain
IP Address: 192.168.1.112
Subnet Mask: 255.255.255.0

Summary: Click Finish
Configuration Assistant Progress Dialog: Click OK after configuration is complete.
Configuration Results: Click Exit


4.Configure Server-Side ONS using racgons.

CRS_home/bin/racgons add_config hostname1:port hostname2:port

[root@rac1 crs]# cd bin
[root@rac1 bin]# pwd
/u01/crs/oracle/product/10.2.0/crs/bin
[root@rac1 bin]# ./racgons add_config rac1:6200 rac2:6200
[root@rac1 bin]# ./onsctl ping
Number of onsconfiguration retrieved, numcfg = 2
onscfg[0]
{node = rac1, port = 6200}
Adding remote host rac1:6200
GETHOSTBYNAME(rac1): 2
onscfg[1]
{node = rac2, port = 6200}
Adding remote host rac2:6200
GETHOSTBYNAME(rac2): 2
ons is running ...
[root@rac1 bin]#

5.Configure Network Interfaces for Clusterware.
Log in as the owner of the Oracle Clusterware software which is typically the oracle user account and configure all network interfaces. The first step is to identify the current interfaces and IP addresses using oifcfg iflist.

[root@rac1 bin]# pwd
/u01/crs/oracle/product/10.2.0/crs/bin
[root@rac1 bin]# ./oifcfg iflist
eth0 192.168.1.0
eth1 192.168.2.0
[root@rac1 bin]# ./oifcfg setif -global eth0/192.168.1.0:public
[root@rac1 bin]# ./oifcfg setif -global eth1/192.168.2.0:cluster_interconnect
[root@rac1 bin]# ./oifcfg getif
eth0 192.168.1.0 global public
eth1 192.168.2.0 global cluster_interconnect
[root@rac1 bin]#

6.Add TNS Listener using NETCA.

As the Oracle Clusterware software owner (typically oracle), add a cluster TNS listener configuration to OCR using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added during the original Oracle Clusterware software installation.

[oracle@rac2 admin]$ mv listener.ora /tmp/listener.ora.original

[oracle@rac1 admin]$ mv listener.ora /tmp/listener.ora.original
[oracle@rac1 admin]$ netca &

Screen Name Response
Select the Type of Oracle
Net Services Configuration
Select Cluster configuration
Select the nodes to configure Select all
Type of Configuration Select Listener configuration.
Listener Configuration
Next 6 Screens
The following screens are now like any other normal listener configuration. You can simply accept the default parameters for the next six screens:
What do you want to do: Add
Listener name: LISTENER
Selected protocols: TCP
Port number: 1521
Configure another listener: No
Listener configuration complete! [ Next ]
You will be returned to this Welcome (Type of Configuration) Screen.
Type of Configuration Select Naming Methods configuration.
Naming Methods Configuration The following screens are:
Selected Naming Methods: Local Naming
Naming Methods configuration complete! [ Next ]
You will be returned to this Welcome (Type of Configuration) Screen.
Type of Configuration Click Finish to exit the NETCA.

7.Add all Resources Back to OCR using srvctl



[root@rac1 bin]# ./crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2

As a final step, log in as the Oracle Clusterware software owner (typically oracle) and add all resources back to the OCR using the srvctl command.

Please ensure that these commands are not run as the root user account.

Add ASM INSTANCE(S) to OCR:

srvctl add asm -n -i -o

[oracle@rac1 bin]$ pwd
/u01/crs/oracle/product/10.2.0/crs/bin
[oracle@rac1 bin]$ ./srvctl add asm -i +ASM1 -n rac1 -o /u01/app/oracle/product/10.2.0/db_1
[oracle@rac1 bin]$ ./srvctl add asm -i +ASM2 -n rac2 -o /u01/app/oracle/product/10.2.0/db_1
[oracle@rac1 bin]$





Add DATABASE to OCR:

srvctl add database -d -o

[oracle@rac1 bin]$ ./srvctl add database -d cdbs -o /u01/app/oracle/product/10.2.0/db_1

Add INSTANCE(S) to OCR:

srvctl add instance -d -i -n

[oracle@rac1 bin]$ ./srvctl add instance -d cdbs -i cdbs1 -n rac1
[oracle@rac1 bin]$ ./srvctl add instance -d cdbs -i cdbs2 -n rac2

Add SERVICE(S) to OCR:

srvctl add service -d -s -r -P

where TAF_policy is set to NONE, BASIC, or PRECONNECT

[oracle@rac1 bin]$ ./srvctl add service -d cdbs -s cdbs_srvc -r cdbs1,cdbs2 -P BASIC

After completing the steps above, the OCR should have been successfully recreated. Bring up all of the resources that were added to the OCR and run cluvfy to verify the cluster configuration.

[oracle@rac1 bin]$ ./crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....s1.inst application OFFLINE OFFLINE
ora....s2.inst application OFFLINE OFFLINE
ora....bs1.srv application OFFLINE OFFLINE
ora....bs2.srv application OFFLINE OFFLINE
ora....srvc.cs application OFFLINE OFFLINE
ora.cdbs.db application OFFLINE OFFLINE
ora....SM1.asm application OFFLINE OFFLINE
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application OFFLINE OFFLINE
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2

[oracle@rac1 bin]$ srvctl start asm -n rac1
[oracle@rac1 bin]$ srvctl start asm -n rac2
[oracle@rac1 bin]$ srvctl start database -d cdbs
[oracle@rac1 bin]$ srvctl start service -d cdbs
[oracle@rac1 bin]$ ./crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....s1.inst application ONLINE ONLINE rac1
ora....s2.inst application ONLINE ONLINE rac2
ora....bs1.srv application ONLINE ONLINE rac1
ora....bs2.srv application ONLINE ONLINE rac2
ora....srvc.cs application ONLINE ONLINE rac1
ora.cdbs.db application ONLINE ONLINE rac2
ora....SM1.asm application ONLINE ONLINE rac1
ora....C1.lsnr application ONLINE ONLINE rac1
ora.rac1.gsd application ONLINE ONLINE rac1
ora.rac1.ons application ONLINE ONLINE rac1
ora.rac1.vip application ONLINE ONLINE rac1
ora....SM2.asm application ONLINE ONLINE rac2
ora....C2.lsnr application ONLINE ONLINE rac2
ora.rac2.gsd application ONLINE ONLINE rac2
ora.rac2.ons application ONLINE ONLINE rac2
ora.rac2.vip application ONLINE ONLINE rac2
[oracle@rac1 bin]$

[oracle@rac1 bin]$ cluvfy stage -post crsinst -n rac1,rac2

Performing post-checks for cluster services setup

Checking node reachability...
Node reachability check passed from node "rac1".


Checking user equivalence...
User equivalence check passed for user "oracle".

Checking Cluster manager integrity...


Checking CSS daemon...
Daemon status check passed for "CSS daemon".

Cluster manager integrity check passed.

Checking cluster integrity...


Cluster integrity check passed


Checking OCR integrity...

Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.

Uniqueness check for OCR device passed.

Checking the version of OCR...
OCR of correct Version "2" exists.

Checking data integrity of OCR...
Data integrity check for OCR passed.

OCR integrity check passed.

Checking CRS integrity...

Checking daemon liveness...
Liveness check passed for "CRS daemon".

Checking daemon liveness...
Liveness check passed for "CSS daemon".

Checking daemon liveness...
Liveness check passed for "EVM daemon".

Checking CRS health...
CRS health check passed.

CRS integrity check passed.

Checking node application existence...


Checking existence of VIP node application (required)
Check passed.

Checking existence of ONS node application (optional)
Check passed.

Checking existence of GSD node application (optional)
Check passed.


Post-check for cluster services setup was successful.
[oracle@rac1 bin]$




source and reference:
http://www.idevelopment.info/data/Oracle/DBA_tips/Oracle10gRAC/CLUSTER_70.shtml#Recover%20Corrupt/Missing%20OCR

6 comments:

Anonymous said...

Very nice posting...really saved my day..."ur da man" ....

Anonymous said...

Hi,

very nice post.

How to find out, which is the primary node.

thx.

Rajeshkumar Govindarajan said...

refer this link:-
http://www.oracleracexpert.com/2010/08/how-to-find-master-node-in-oracle-rac.html

chetan said...

hi i am dealing with live environment in which my db has data. and i faced this problem, by reinstalling will my data be lost

chetan said...

Hi i am dealing with live environment. and faced the same error. If i reinstall will i loos my data?

Anonymous said...

Hi, can you please tell how to recover from mising ocr/vd with no backups in case of 11gR2

 
Share/Bookmark