db2 HA fallover problem
DB2 HA on two AIX server A and B, HACMP take over test is OK.
but when we issue"halt -q" on on server A, B can take over all the resource of A but it is very slow when it come to "db2start", 16nodes start cost 30 minutes।
some problems about networking is reported:
1. the service ip of B is moved to another interface. also the service ip of A is moved to the same interface.
2। "Interface 192.168.7.3 has failed on node PDBA","Interface 192.168.7.3 is now available on node PDBA". 7.3 is bootip of B.
I have to leave on tomorrow, and suggest 800 support। wait for further progress.
4 comments:
Possible APAR
--------------------------------------
APAR status
Closed as program error.
Error description
During hacmp failover , db2start command is seeing
delays.
Local fix
Problem summary
1. Application startup is slow during hacmp takeover
2. Client lock reclaiming may be slow as client lock
threads sleeps for long time.
Problem conclusion
Changed the lm_delay function to wakeup and return after
predefined sleep.
Temporary fix
Comments
APAR information
APAR number IY92336
Reported component name AIX 5L POWER V5
Reported component ID 5765E6200
Reported release 520
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2006-11-30
Closed date 2007-01-03
Last modified date 2007-01-03
APAR is sysrouted FROM one or more of the following:
IY92334
APAR is sysrouted TO one or more of the following:
Publications Referenced
Fix information
Fixed component name AIX 5L POWER V5
Fixed component ID 5765E6200
Applicable component levels
R520 PSY UP
-----------------------------------------------
APAR status
Closed as program error.
Error description
During hacmp failover , db2start command is seeing
delays.
Local fix
Problem summary
1. Application startup is slow during hacmp takeover
2. Client lock reclaiming may be slow as client lock
threads sleeps for long time.
Problem conclusion
Changed the lm_delay function to wakeup and return after
predefined sleep.
Temporary fix
Comments
APAR information
APAR number IY92334
Reported component name AIX 5.3
Reported component ID 5765G0300
Reported release 530
Status CLOSED PER
PE NoPE
HIPER NoHIPER
Submitted date 2006-11-30
Closed date 2007-01-03
Last modified date 2007-01-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
IY92336
Publications Referenced
Fix information
Fixed component name AIX 5.3
Fixed component ID 5765G0300
Applicable component levels
R530 PSY UP
继续跟进,明天去现场看看。
高度怀疑IY92334, IY89475...
IY89475: RPC.STATD DROPS CORE ON HACMP FAILOVER
APAR status
Closed as duplicate of another APAR.
Error description
Steps to reproduce
1. start HACMP on both node1sp and node2sp node.
2. clstop on node2sp.
3. clstart on node2sp.
4. COREDUMP was created with rpc.statd on node2sp.
The stacks of "_svc_run_mt" and "svc_run.svc_run" show:
(dbx) t
__svc_rm_from_xlist(??, ??, ??) at 0xd0777c2c
__xprt_unregister_private(??, ??) at 0xd0777498
_svc_vc_destroy_private(??, ??) at 0xd0784ba4
_svc_destroy_private(??) at 0xd07781e0
_svc_done_private(??) at 0xd077ebb0
_svc_run_mt() at 0xd077f1d4
svc_run.svc_run() at 0xd077fc04
main(0x5, 0x2ff22da0) at 0x100011c4
(dbx) x
$r0:0xfb81ffe0 $stkp:0x2ff229d0 $toc:0xf03bc888
$r3:0xfba1ffe8
$r4:0x30009f48 $r5:0x00000008 $r6:0x00000002
$r7:0x100c51ff
$r8:0x000c51ff $r9:0x00000000 $r10:0x301b2410
$r11:0x00000000
$r12:0xd0777c04 $r13:0x00000001 $r14:0x301b21e8
$r15:0x301b20d8
$r16:0x301b22e8 $r17:0xf0443b98 $r18:0xf0402930
$r19:0xf03b5924
$r20:0xf0402ae8 $r21:0xf0443b90 $r22:0xf03bfda4
$r23:0xf03bfda0
$r24:0xf0400928 $r25:0xf0402b28 $r26:0x00000035
$r27:0x00000001
$r28:0xf0402af0 $r29:0x00019870 $r30:0x301b2238
$r31:0xf03bfa00
$iar:0xd0777c2c $msr:0x0000d0b2 $cr:0x8248822b
$link:0xd0777c04
$ctr:0xd07d7430 $xer:0x20000000 $mq:0xffffffff
Condition status = 0:l 1:e 2:g 3:l 4:l 5:e 6:e
7:leo
[unset $noflregs to view floating point
registers]
[unset $novregs to view vector registers]
in __svc_rm_from_xlist at 0xd0777c2c ($t1)
0xd0777c2c (__svc_rm_from_xlist+0x64) 80030004 lwz
r0,0x4(r3)
Local fix
Problem summary
Problem conclusion
Temporary fix
Comments
This APAR is a duplicate of IY91868
APAR information
APAR number IY89475
Reported component name AIX 5.3
Reported component ID 5765G0300
Reported release 530
Status CLOSED DUB
PE NoPE
HIPER NoHIPER
Submitted date 2006-09-13
Closed date 2007-01-03
Last modified date 2007-01-03
APAR is sysrouted FROM one or more of the following:
APAR is sysrouted TO one or more of the following:
IY89476 IY89477
IY92334 ifix got and patched. problem resolved.
发表评论 Add/View Comments