一.背景说明
Oracle RMAN 备份的恢复分2个步骤:RESTRE 和 RECOVER。 在这个过程中,Recover 是依赖与归档文件的。 假设一种情况:周一对数据库做了全备,然后保留归档。周四发现数据库有异常,准备恢复,发现周二的时候少了一个归档。 按照正常的情况,我们只能将数据库恢复到周二缺失归档的之前的点。 那么我这里就是一个研究,如何跳过这个缺失的归档,让数据库继续进行Recover。 根据测试结果,Recover 是可以继续,但是测试的结果意义不是很大,因为还是有数据丢失。 所以这里更多的是对这种方法的抛砖引玉。 二.测试案例 2.1 使用RMAN 全备数据库此步骤直接备份即可。 2.2 创建测试表dave1并切换归档SQL> select sequence# from v$log wherethread#=1; SEQUENCE#---------- 152 151 SQL> create table dave1 as select * fromdba_users;Table created. SQL> alter system switch logfile;System altered. SQL> select sequence# from v$log wherethread#=1; SEQUENCE#---------- 152 153 2.3 创建测试表dave2并切换归档 SQL> create table dave2 as select * fromdba_users;Table created. SQL> alter system switch logfile;System altered. SQL> select sequence#,status from v$logwhere thread#=1; SEQUENCE# STATUS---------- ---------------- 154 CURRENT 153 ACTIVE SQL> select sequence# fromv$archived_log where thread#=1; SEQUENCE#---------- 148 149 150 151 152 153 6 rows selected. 2.4 删除153的归档 [oracle@dave arch]$ lltotal 42200-rw-r-----. 1 oracle oinstall 42715136Jul 5 22:56 1_125_816661296.dbf-rw-r-----. 1 oracle oinstall 248320 Jul 6 23:14 1_152_816661296.dbf-rw-r-----. 1 oracle oinstall 127488 Jul 6 23:15 1_153_816661296.dbf-rw-r-----. 1 oracle oinstall 113664 Jul 6 23:19 1_154_816661296.dbf[oracle@dave arch]$ rm-rf 1_153_816661296.dbf[oracle@dave arch]$ lltotal 42072-rw-r-----. 1 oracle oinstall 42715136Jul 5 22:56 1_125_816661296.dbf-rw-r-----. 1 oracle oinstall 248320 Jul 6 23:14 1_152_816661296.dbf-rw-r-----. 1 oracle oinstall 113664 Jul 6 23:19 1_154_816661296.dbf[oracle@dave arch]$ 2.5 然后进行restore 和recover SQL> shutdown immediateDatabase closed.Database dismounted.ORACLE instance shut down.SQL> SQL> startup mount RMAN> restore database; Starting restore at 06-JUL-13released channel: ORA_DISK_1allocated channel: ORA_DISK_1channel ORA_DISK_1: SID=20 device type=DISK channel ORA_DISK_1: starting datafilebackup set restorechannel ORA_DISK_1: specifying datafile(s)to restore from backup setchannel ORA_DISK_1: restoring datafile00001 to /u01/app/oracle/oradata/dave/system.256.816661027channel ORA_DISK_1: restoring datafile00003 to /u01/app/oracle/oradata/dave/undotbs1.258.816661037channel ORA_DISK_1: restoring datafile00005 to /u01/app/oracle/oradata/dave/undotbs2.265.816661787channel ORA_DISK_1: reading from backuppiece /u01/backup/dave_lev0_06oe3kdv_1_1_20130706channel ORA_DISK_1: piecehandle=/u01/backup/dave_lev0_06oe3kdv_1_1_20130706 tag=DAVE_LEV0channel ORA_DISK_1: restored backup piece 1channel ORA_DISK_1: restore complete,elapsed time: 00:00:25channel ORA_DISK_1: starting datafilebackup set restorechannel ORA_DISK_1: specifying datafile(s)to restore from backup setchannel ORA_DISK_1: restoring datafile00002 to /u01/app/oracle/oradata/dave/sysaux.257.816661033channel ORA_DISK_1: restoring datafile00004 to /u01/app/oracle/oradata/dave/users.259.816661039channel ORA_DISK_1: restoring datafile00006 to /u01/app/oracle/oradata/dave/dave01.dbfchannel ORA_DISK_1: restoring datafile00007 to /u01/app/oracle/oradata/dave/dave02.dbfchannel ORA_DISK_1: reading from backuppiece /u01/backup/dave_lev0_05oe3kdv_1_1_20130706channel ORA_DISK_1: piece handle=/u01/backup/dave_lev0_05oe3kdv_1_1_20130706tag=DAVE_LEV0channel ORA_DISK_1: restored backup piece 1channel ORA_DISK_1: restore complete,elapsed time: 00:00:25Finished restore at 06-JUL-13 RMAN> RMAN> recoverdatabase; Starting recover at 06-JUL-13using channel ORA_DISK_1 starting media recovery archived log for thread 1 with sequence 152is already on disk as file /u01/arch/1_152_816661296.dbfarchived log for thread 1 with sequence 154is already on disk as file /u01/arch/1_154_816661296.dbfRMAN-00571: ===========================================================RMAN-00569: =============== ERROR MESSAGESTACK FOLLOWS ===============RMAN-00571:===========================================================RMAN-03002: failure of recover command at07/06/2013 23:23:48RMAN-06053: unable to perform mediarecovery because of missing logRMAN-06025: no backup ofarchived log for thread 1 with sequence 153 and starting SCN of 3836001 foundto restore RMAN> 这个153 是我们刚才手工删掉的归档。如果这个不搞定,后面没办法恢复。 2.6 BBED 推荐SCN 2.6.1 修改原理说明 -- System Checkpoint SCN:SQL> select checkpoint_change# fromv$database; CHECKPOINT_CHANGE#------------------ 3836654 --- Datafile CheckpointSCN:SQL> select name,checkpoint_change# fromv$datafile; NAME CHECKPOINT_CHANGE#-------------------------------------------------------------------------/u01/app/oracle/oradata/dave/system.256.816661027 3836654/u01/app/oracle/oradata/dave/sysaux.257.816661033 3836654/u01/app/oracle/oradata/dave/undotbs1.258.816661037 3836654/u01/app/oracle/oradata/dave/users.259.816661039 3836654/u01/app/oracle/oradata/dave/undotbs2.265.816661787 3836654/u01/app/oracle/oradata/dave/dave01.dbf 3836654/u01/app/oracle/oradata/dave/dave02.dbf 3836654 7 rows selected. ---START SCN:SQL> select name,checkpoint_change# fromv$datafile_header; NAME CHECKPOINT_CHANGE#-------------------------------------------------------------------------/u01/app/oracle/oradata/dave/system.256.816661027 3835435/u01/app/oracle/oradata/dave/sysaux.257.816661033 3835434/u01/app/oracle/oradata/dave/undotbs1.258.816661037 3835435/u01/app/oracle/oradata/dave/users.259.816661039 3835434/u01/app/oracle/oradata/dave/undotbs2.265.816661787 3835435/u01/app/oracle/oradata/dave/dave01.dbf 3835434/u01/app/oracle/oradata/dave/dave02.dbf 3835434 7 rows selected. +++++SCN号与数据库启动:在数据库启动过程中,当SystemCheckpoint SCN、Datafile Checkpoint SCN和Start SCN号都相同时,数据库可以正常启动,不需要做mediarecovery.三者当中有一个不同时,则需要做media recovery。 如果在启动的过程中,EndSCN号为NULL,则需要做instance recovery。ORACLE在启动过程中首先检查是否需要media recovery,然后再检查是否需要instance recovery。 在进行recovery的时候,我们根据归档,推进START SCN,但是归档缺失,导致无法推荐,数据库也无法启动。 我们这里缺失的是153的归档,我们只需要手工的修改datafile header,让数据库认为这个归档已经恢复了,即可。 这是一种欺骗行为,虽然可以继续,但还是会出现问题。 可以使用如下方法确定具体缺失的归档SCN,然后使用BBED 跳过这些SCN 即可。SQL> selectsequence#,first_change#,next_change# from v$archived_log; SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#---------- ------------- ------------ 148 3834837 3835155 149 3835155 3835184 150 3835184 3835498 151 3835498 3835507 152 3835507 3836001 153 3836001 3836079 154 3836079 3836303 7 rows selected. 这个正好与我们之前RMAN 错误一致:RMAN-03002: failure of recover command at07/06/2013 23:23:48RMAN-06053: unable to perform mediarecovery because of missing logRMAN-06025: no backup ofarchived log for thread 1 with sequence 153 and starting SCN of 3836001 foundto restore 2.6.2 使用BBED 推进所有DATAFILE header SCN 这里,我们只修改kscnbas的值:kscnbas (at offset 484) - SCN of lastchange to the datafile. BBED> info File# Name Size(blks) ----- ---- ---------- 1 /u01/app/oracle/oradata/dave/system.256.816661027 129280 2 /u01/app/oracle/oradata/dave/sysaux.257.816661033 97280 3 /u01/app/oracle/oradata/dave/undotbs1.258.816661037 9600 4 /u01/app/oracle/oradata/dave/users.259.816661039 640 5 /u01/app/oracle/oradata/dave/undotbs2.265.816661787 12800 6 /u01/app/oracle/oradata/dave/dave01.dbf 393216 7 /u01/app/oracle/oradata/dave/dave02.dbf 6400 +++我们需要将所有datafile 的SCN从3836001 推到3836079:SQL> selectto_char('3836079','xxxxxxxxx') from dual;TO_CHAR('3----------3a88af 因此我们的kscnbas 的新值是:0x003a88af。 但是注意,对于little-endian的format,他存储是先存储低位的,因此实际block 存储的是:af883a00. 我们需要使用BBED 将所有datafileheader 的@484 的值修改成:af883a00。BBED> d /v dba 1,1 offset 484 File:/u01/app/oracle/oradata/dave/system.256.816661027 (1) Block: 1 Offsets: 484 to 499 Dba:0x00400001------------------------------------------------------- 2b863a00 00000000 bfd1e130 01000000 l+.:........0.... <16 bytes per line> BBED> modify /x af88 dba 1,1 offset 484 File: /u01/app/oracle/oradata/dave/system.256.816661027(1) Block: 1 Offsets: 484 to 499 Dba:0x00400001------------------------------------------------------------------------ af883a00 00000000 bfd1e130 01000000 <32 bytes per line> BBED> sum dba 1,1 applyCheck value for File 1, Block 1:current = 0xe9ba, required = 0xe9ba +++按照同样的步骤,把剩下的6个datafile都修改。 --BBED 推荐成功:SQL> selectfile#,checkpoint_change#,status from v$datafile_header; FILE# CHECKPOINT_CHANGE# STATUS---------- ------------------ ------- 1 3836079 ONLINE 2 3836079 ONLINE 3 3836079 ONLINE 4 3836079 ONLINE 5 3836079 ONLINE 6 3836079 ONLINE 7 3836079 ONLINE 7 rows selected. 这里的datafile 的SCN 都跳过了我们缺失的归档,我们可以继续进行recover了。 2.7 重新进行RecoverRMAN> recover database; Starting recover at 07-JUL-13using channel ORA_DISK_1 starting media recoverymedia recovery failedRMAN-00571:===========================================================RMAN-00569: =============== ERROR MESSAGESTACK FOLLOWS ===============RMAN-00571: ===========================================================RMAN-03002: failure of recover command at07/07/2013 01:04:43ORA-00283: recovery session canceled due toerrorsRMAN-11003: failure during parse/executionof SQL statement: alter database recover if needed startORA-00283: recovery session canceled due toerrorsORA-00600: internal errorcode, arguments: [3020], [3], [8077], [12590989], [], [], [], [], [], [], [],[]ORA-10567:Redo is inconsistent with data block (file# 3, block# 8077, file offset is 66166784 bytes)ORA-10564: tablespaceUNDOTBS1ORA-01110: data file 3:'/u01/app/oracle/oradata/dave/undotbs1.258.816661037'ORA-10560: block type 'KTU UNDO BLOCK' 根据官网的说明,我们这是UNDO 表空间恢复无法继续了,详见:Resolving ORA-600[3020] Raised During Recovery (文档 ID 361172.1) 尝试跳过坏块测试:RMAN> recover database allow 50 corruption; Starting recover at 07-JUL-13using channel ORA_DISK_1 starting media recoverymedia recovery complete, elapsed time:00:00:01 Finished recover at 07-JUL-13 RMAN> 恢复是没有问题,但是打开是有问题的:SQL> alter database open;alter database open*ERROR at line 1:ORA-01092: ORACLE instance terminated.Disconnection forcedORA-01578: ORACLE data block corrupted(file # 3, block # 128)ORA-01110: data file 3:'/u01/app/oracle/oradata/dave/undotbs1.258.816661037'Process ID: 32549Session ID: 16 Serial number: 5 2.8 重建UNDO 表空间 这里里面的 3 就是我们的undo 表空间,我们把重新创建一个UNDO 在拉起数据库: 2.8.1 用spfile 创建pfile,然后修改参数#*.undo_tablespace='UNDOTBS1'*.undo_management='MANUAL'*.rollback_segments='SYSTEM' 2.8.2 用修改之后的pfile,重启DBSQL> startuppfile='/u01/app/oracle/product/11.2.0/dbhome_1/dbs/initdave.ora'ORACLE instance started. Total System Global Area 718188544 bytesFixed Size 2231832 bytesVariable Size 436208104 bytesDatabase Buffers 276824064 bytesRedo Buffers 2924544 bytesDatabase mounted.Database opened.SQL> 2.8.3 删除原来的表空间,创建新的UNDO 表空间SQL> select tablespace_name fromdba_tablespaces; TABLESPACE_NAME------------------------------SYSTEMSYSAUXUNDOTBS1TEMPUSERSUNDOTBS2DAVE 7 rows selected. SQL> droptablespace undotbs1;SQL> create undo tablespace undotbs1datafile '/u01/app/oracle/oradata/dave/undotbs1.dbf' size 50M; Tablespace created. 2.8.4 关闭数据库,修改pfile参数,然后用新的pfile创建spfile,在正常启动数据库。*.undo_tablespace='UNDOTBS1'#*.undo_management='MANUAL'#*.rollback_segments='SYSTEM' SQL> shutdown immediateDatabase closed.Database dismounted.ORACLE instance shut down. SQL> startupORACLE instance started. Total System Global Area 718188544 bytesFixed Size 2231832 bytesVariable Size 436208104 bytesDatabase Buffers 276824064 bytesRedo Buffers 2924544 bytesDatabase mounted.Database opened.SQL> 库终于拉起来了。 2.9 验证SQL> select name,checkpoint_change# fromv$datafile; NAME CHECKPOINT_CHANGE#-------------------------------------------------------------------------/u01/app/oracle/oradata/dave/system.256.816661027 3857521/u01/app/oracle/oradata/dave/sysaux.257.816661033 3857521/u01/app/oracle/oradata/dave/undotbs1.dbf 3857521/u01/app/oracle/oradata/dave/users.259.816661039 3857521/u01/app/oracle/oradata/dave/undotbs2.265.816661787 3857521/u01/app/oracle/oradata/dave/dave01.dbf 3857521/u01/app/oracle/oradata/dave/dave02.dbf 3857521 7 rows selected. SQL> select name,checkpoint_change# fromv$datafile_header; NAME CHECKPOINT_CHANGE#-------------------------------------------------------------------------/u01/app/oracle/oradata/dave/system.256.816661027 3857521/u01/app/oracle/oradata/dave/sysaux.257.816661033 3857521/u01/app/oracle/oradata/dave/undotbs1.dbf 3857521/u01/app/oracle/oradata/dave/users.259.816661039 3857521/u01/app/oracle/oradata/dave/undotbs2.265.816661787 3857521/u01/app/oracle/oradata/dave/dave01.dbf 3857521/u01/app/oracle/oradata/dave/dave02.dbf 3857521 7 rows selected. SQL> select checkpoint_change# fromv$database; CHECKPOINT_CHANGE#------------------ 3857521 SQL> select count(1) from dave1;select count(1) from dave1 *ERROR at line 1:ORA-00942: table or view does not exist SQL> select count(1) from dave2;select count(1) from dave2 *ERROR at line 1:ORA-00942: table or view does not exist 库是正常拉起来了,不过之前创建的表都没有成功恢复。转载自:http://blog.csdn.net/tianlesoftware