Наблюдали с Русланом Бикбаевым на обычном, не-EXADATA RAC с ADG 12.1.0.2 проблему Bug 22364629 : EXADATA: ORA-29771: PROCESS USER BLOCKS LGWR FOR MORE THAN 70 SECONDS — казалось бы, мало ли в Oracle 12c багов?)
Тут интересно, что блокером для LGWR выступают пользовательские процессы (+PX-slaves), выполняющие известный* рекурсивный запрос frjd8zfy2jfdq, описанный в Bug 20413540 Excessive executions of SQL frjd8zfy2jfdq, являющемся base bug для Bug 22364629:
The query with SQLID frjd8zfy2jfdq is executed by Dynamic Statistics to figure out the past execution statistics and guide the time allowance for future execution of Dynamic Statistics
*) о том же запросе Рекурсивно-адаптивный SQL_ID «frjd8zfy2jfdq» версии 12.1.0.2
Детали проблемы и место SQLID frjd8zfy2jfdq расписаны в трейсе системного процесса LMHB:
*** 2016-06-15 18:54:43.781 ============================== LGWR (ospid: 10005) has not moved for 20 sec (1466006083.1466006063) : heartbeat check status 2 (acceptable) (threshold 70 sec) : heartbeat poke time 0x57617a37 req 0x4 ack 0x3 freq 10 : heartbeat state 0x1.ffff (inwait) pso-flag 0x0 : waiting for event 'library cache lock' for 19 secs with wait_id 502. ===[ Wait Chain ]=== -- блокировка LGWR (ospid: 10005) waits for event 'library cache lock'. USER (ospid: 10375) waits for event 'PX Deq: Parse Reply'. PPA7 (ospid: 10377) waits for event 'library cache lock'. LGWR (ospid: 10005) waits for event 'library cache lock'. A cycle is detected. -- может выглядеть даже так, циклически ... *** 2016-06-15 18:55:43.819 ============================== LGWR (ospid: 10005) has not moved for 80 sec (1466006143.1466006063) : heartbeat check status 6 (no-heartbeat) (threshold 70 sec) ================================================== === LGWR (ospid: 10005) Heartbeat Report ================================================== LGWR (ospid: 10005) has no heartbeats for 80 sec. (threshold 70) : heartbeat state 0x1.ffff (inwait) pso-flag 0x0 : waiting for event 'library cache lock' for 79 secs with wait_id 502. ===[ Wait Chain ]=== LGWR (ospid: 10005) waits for event 'library cache lock'. -- или так USER (ospid: 10395) waits for event 'PX Deq: Parse Reply'. -- **** PPA4 (ospid: 12137) waits for event 'PX Deq: Execution Msg'. ============================== Dumping PROCESS LGWR (ospid: 10005) States ============================== ===[ System Load State ]=== CPU Total 32 Raw 32 Core 16 Socket 2 Load normal: Cur 1315 Highmark 40960 (5.13 160.00) ===[ Latch State ]=== Not in Latch Get ===[ Session State Object ]=== ---------------------------------------- ... service name: SYS$BACKGROUND Current Wait Stack: 0: waiting for 'library cache lock' -- LGWR Wait Stack handle address=0x288ffd6cd8, lock address=0x2b7feec308, 100*mode+namespace=0x1004a0003 wait_id=502 seq_num=575 snap_id=1 wait times: snap=1 min 19 sec, exc=1 min 19 sec, total=1 min 19 sec wait times: max=infinite, heur=1 min 19 sec wait counts: calls=28 os=28 in_wait=1 iflags=0x15a2 There is at least one session blocking this session. Dumping 2 direct blocker(s): -- блокерами USER и PPA4 процессы inst: 1, sid: 2, ser: 58708 inst: 1, sid: 287, ser: 18448 ... ============================== Dumping PROCESS USER (ospid: 10395) States -- USER процесс ============================== ===[ System Load State ]=== CPU Total 32 Raw 32 Core 16 Socket 2 Load normal: Cur 1315 Highmark 40960 (5.13 160.00) ===[ Latch State ]=== Not in Latch Get ===[ Session State Object ]=== ---------------------------------------- SO: 0x2c111dc750, type: 4, owner: 0x2c18fedb10, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2c18fedb10, name=session, file=ksu.h LINE:13957, pg=0 conuid=0 (session) sid: 287 ser: 18448 trans: (nil), creator: 0x2c18fedb10 flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40009) -/-/INC DID: 0001-0042-00000002, short-term DID: 0001-0042-00000001 txn branch: (nil) edition#: 711465 user#/name: 0/SYS oct: 3, prv: 0, sql: 0x2b47ea2cd0, psql: (nil) -- Library Cache Handle for Cursor stats: 0x27dfb6d350, PX stats: 0xcf26290 ... ---------------------------------------- SO: 0x27cfb18538, type: 96, owner: 0x2c111dc750, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2c18fedb10, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9220, pg=0 conuid=0 LibraryObjectLock: Address=0x27cfb18538 Handle=0x2b47ea2cd0 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0 User=0x2c111dc750 Session=0x2c111dc750 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=57617a30 LibraryHandle: Address=0x2b47ea2cd0 Hash=d3fafbf4 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD -- Handle Address ObjectName: Name=select open_mode from v$database -- SQL Text** ... NamespaceDump: Parent Cursor: sql_id=a5xcbum9zpyzn parent=0x2b47ea1d50 maxchild=1 plk=y ppn=n -- Parent Cursor SQL ID ... *** 2016-06-15 18:55:48.123 ===[ Heartbeat State Object ]=== ===[ Trace Buffer ]=== ============================== Dumping PROCESS PPA4 (ospid: 12137) States -- PPA4 процесс ============================== ===[ System Load State ]=== CPU Total 32 Raw 32 Core 16 Socket 2 Load normal: Cur 1315 Highmark 40960 (5.13 160.00) ===[ Latch State ]=== Not in Latch Get ===[ Session State Object ]=== ---------------------------------------- SO: 0x2c0943e958, type: 4, owner: 0x2c10ffd398, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2c10ffd398, name=session, file=ksu.h LINE:13957, pg=0 conuid=0 (session) sid: 2418 ser: 33114 trans: (nil), creator: 0x2c10ffd398 flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40009) -/-/INC DID: 0001-0051-00000002, short-term DID: 0000-0000-00000000 txn branch: (nil) edition#: 711465 user#/name: 0/SYS oct: 3, prv: 0, sql: 0x27e7b1e830, psql: (nil) -- Library Cache Handle for Cursor stats: 0x27dfb1ffb8, PX stats: 0xcf26290 ksuxds FALSE at location: 0 service name: SYS$USERS client details: ... Current Wait Stack: 0: waiting for 'PX Deq: Execution Msg' ... ---------------------------------------- SO: 0x27c3a382b8, type: 96, owner: 0x2c0943e958, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x2c10ffd398, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9220, pg=0 conuid=0 LibraryObjectLock: Address=0x27c3a382b8 Handle=0x27e7b1e830 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=1 User=0x2c0943e958 Session=0x2c0943e958 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=57617a6e LibraryHandle: Address=0x27e7b1e830 Hash=bc28b9b6 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD -- Handle Address ObjectName: Name=SELECT executions, end_of_fetch_count, elapsed_time/px_servers... -- SQL Text FullHashValue=28b7c484b3e39566ebc5a8fbbc28b9b6 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=3156785590 OwnerIdn=0 Statistics: InvalidationCount=0 ExecutionCount=2 LoadCount=3 ActiveLocks=4 TotalLockCount=4 TotalPinCount=1 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=2 Version=0 BucketInUse=3 HandleInUse=3 HandleReferenceCount=0 Concurrency: DependencyMutex=0x27e7b1e8e0(0, 2, 0, 0) Mutex=0x27e7b1e978(0, 42, 0, 0) Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000] WaitersLists: Lock=0x27e7b1e8c0[0x27e7b1e8c0,0x27e7b1e8c0] Pin=0x27e7b1e8a0[0x27e7b1e8a0,0x27e7b1e8a0] LoadLock=0x27e7b1e918[0x27e7b1e918,0x27e7b1e918] Timestamp: Current=06-15-2016 18:54:24 HandleReference: Address=0x27e7b1ed20 Handle=(nil) Flags=[00] LibraryObject: Address=0x27e7b1d7e8 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000] ChildTable: size='16' Child: id='0' Table=0x27e7b1e668 Reference=0x27e7b1e120 Handle=0x27e7b1d5c0 Child: id='1' Table=0x27e7b1e668 Reference=0x27e7b1e4d0 Handle=0x27e7b19808 NamespaceDump: Parent Cursor: sql_id=frjd8zfy2jfdq parent=0x27e7b1d8b0 maxchild=2 plk=y ppn=n -- sql_id=frjd8zfy2jfdq*** CursorDiagnosticsNodes: ChildNode: ChildNumber=1 ID=40 reason=Bind mismatch(17) size=4x4 bind_position=0 original_oacflg=19 original_oacflg=19 new_oacflg=33 ChildNode: ChildNumber=0 ID=40 reason=Bind mismatch(17) size=4x4 bind_position=0 original_oacflg=33 original_oacflg=33 new_oacflg=19 ---------------------------------------- ... *** 2016-06-15 18:55:50.317 kjgcr_DoAction: KJGCR_ACTION? - id 0 Incident 744270 created, dump file: .../incident/incdir_744270/orcl1_lmhb_9997_i744270.trc ORA-29771: process USER (OSID 10395) blocks LGWR (OSID 10005) for more than 70 seconds ============================== ERROR: Killing USER (ospid: 10395) which blocks LGWR (ospid: 10005) in a wait -- finita la comedia ERROR: Please check the waiter and blocker states in the incident file. ==============================
**) как правило, попадаются вполне пользовательские запросы к системным обзорам типа
select open_mode from v$database
— этот, правда, читает controlfile)
***) sql_id=frjd8zfy2jfdq рекурсивно выполняется PX-slave процессом в процессе parsing-а (****)
Игорь, здравствуйте.
Недавно сталкивались с похожей проблемой на 11.2.0.4. LGWR периодически зависал на «LGWR wait for redo copy» секунд на 30-40, при этом блокировали его неактивный пользовательские процессы. Цепочка ожиданий примерно следующая: ‘log file sync’ => ‘LGWR wait for redo copy’ => . В конце концов выяснилось, что причиной является Bug 22243719, а корнем проблемы — PGA snapshot collection.
комментарий от Pavel Trishin — 14.07.2016 @ 08:48
Спасибо, Павел, будет полезно для тех, кто ещё не обновился до 12c)
В нашем случае Сергей Щукин успешно выяснил через SR причину — Bug 17018214 — ORA-600 [krdrsb_end_qscn_2] ORA-4021 in Active Dataguard Standby Database with fix for bug 16717701 — и решение нашей проблемы:
В документе по ссылке — описание EVENT 16717701 и относительно новых параметров:
комментарий от Игорь Усольцев — 22.07.2016 @ 17:17