Oracle mechanics

29.06.2016

12c: проблемы SQLID frjd8zfy2jfdq в ADG

Filed under: Active Data Guard,bugs,Oracle — Игорь Усольцев @ 00:27
Tags:

Наблюдали с Русланом Бикбаевым на обычном, не-EXADATA RAC с ADG 12.1.0.2 проблему Bug 22364629 : EXADATA: ORA-29771: PROCESS USER BLOCKS LGWR FOR MORE THAN 70 SECONDS — казалось бы, мало ли в Oracle 12c багов?)

Тут интересно, что блокером для LGWR выступают пользовательские процессы (+PX-slaves), выполняющие известный* рекурсивный запрос frjd8zfy2jfdq, описанный в Bug 20413540 Excessive executions of SQL frjd8zfy2jfdq, являющемся base bug для Bug 22364629:

The query with SQLID frjd8zfy2jfdq is executed by Dynamic Statistics to figure out the past execution statistics and guide the time allowance for future execution of Dynamic Statistics

*) о том же запросе Рекурсивно-адаптивный SQL_ID «frjd8zfy2jfdq» версии 12.1.0.2

Детали проблемы и место SQLID frjd8zfy2jfdq расписаны в трейсе системного процесса LMHB:

*** 2016-06-15 18:54:43.781
==============================
LGWR (ospid: 10005) has not moved for 20 sec (1466006083.1466006063)
  : heartbeat check status 2 (acceptable) (threshold 70 sec)
  : heartbeat poke time 0x57617a37 req 0x4 ack 0x3 freq 10
  : heartbeat state 0x1.ffff (inwait) pso-flag 0x0
  : waiting for event 'library cache lock' for 19 secs with wait_id 502.
  ===[ Wait Chain ]===                                         -- блокировка
  LGWR (ospid: 10005) waits for event 'library cache lock'.
  USER (ospid: 10375) waits for event 'PX Deq: Parse Reply'.
  PPA7 (ospid: 10377) waits for event 'library cache lock'.
  LGWR (ospid: 10005) waits for event 'library cache lock'.
  A cycle is detected.                                         -- может выглядеть даже так, циклически
...
*** 2016-06-15 18:55:43.819
==============================
LGWR (ospid: 10005) has not moved for 80 sec (1466006143.1466006063)
  : heartbeat check status 6 (no-heartbeat) (threshold 70 sec)
==================================================
=== LGWR (ospid: 10005) Heartbeat Report
==================================================
LGWR (ospid: 10005) has no heartbeats for 80 sec. (threshold 70)
  : heartbeat state 0x1.ffff (inwait) pso-flag 0x0
  : waiting for event 'library cache lock' for 79 secs with wait_id 502.
===[ Wait Chain ]===
LGWR (ospid: 10005) waits for event 'library cache lock'.      -- или так
USER (ospid: 10395) waits for event 'PX Deq: Parse Reply'.     -- ****
PPA4 (ospid: 12137) waits for event 'PX Deq: Execution Msg'.
==============================
Dumping PROCESS LGWR (ospid: 10005) States
==============================
===[ System Load State ]===
  CPU Total 32 Raw 32 Core 16 Socket 2
  Load normal: Cur 1315 Highmark 40960 (5.13 160.00)
===[ Latch State ]===
  Not in Latch Get
===[ Session State Object ]===
  ----------------------------------------
...
  service name: SYS$BACKGROUND
  Current Wait Stack:
   0: waiting for 'library cache lock'                         -- LGWR Wait Stack
      handle address=0x288ffd6cd8, lock address=0x2b7feec308, 100*mode+namespace=0x1004a0003
      wait_id=502 seq_num=575 snap_id=1
      wait times: snap=1 min 19 sec, exc=1 min 19 sec, total=1 min 19 sec
      wait times: max=infinite, heur=1 min 19 sec
      wait counts: calls=28 os=28
      in_wait=1 iflags=0x15a2
  There is at least one session blocking this session.
    Dumping 2 direct blocker(s):                               -- блокерами USER и PPA4 процессы
      inst: 1, sid: 2, ser: 58708
      inst: 1, sid: 287, ser: 18448
...
==============================
Dumping PROCESS USER (ospid: 10395) States                     -- USER процесс
==============================
===[ System Load State ]===
  CPU Total 32 Raw 32 Core 16 Socket 2
  Load normal: Cur 1315 Highmark 40960 (5.13 160.00)
===[ Latch State ]===
  Not in Latch Get
===[ Session State Object ]===
  ----------------------------------------
  SO: 0x2c111dc750, type: 4, owner: 0x2c18fedb10, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x2c18fedb10, name=session, file=ksu.h LINE:13957, pg=0 conuid=0
  (session) sid: 287 ser: 18448 trans: (nil), creator: 0x2c18fedb10
            flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
            flags2: (0x40009) -/-/INC
            DID: 0001-0042-00000002, short-term DID: 0001-0042-00000001
            txn branch: (nil)
            edition#: 711465            user#/name: 0/SYS
            oct: 3, prv: 0, sql: 0x2b47ea2cd0, psql: (nil)                                             -- Library Cache Handle for Cursor
            stats: 0x27dfb6d350, PX stats: 0xcf26290
...
    ----------------------------------------
    SO: 0x27cfb18538, type: 96, owner: 0x2c111dc750, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x2c18fedb10, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9220, pg=0 conuid=0

    LibraryObjectLock:  Address=0x27cfb18538 Handle=0x2b47ea2cd0 Mode=N
      CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0

      User=0x2c111dc750 Session=0x2c111dc750 ReferenceCount=1
      Flags=CNB/[0001] SavepointNum=57617a30
    LibraryHandle:  Address=0x2b47ea2cd0 Hash=d3fafbf4 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD -- Handle Address
      ObjectName:  Name=select open_mode from v$database                                               -- SQL Text**
...
      NamespaceDump:
        Parent Cursor:  sql_id=a5xcbum9zpyzn parent=0x2b47ea1d50 maxchild=1 plk=y ppn=n                -- Parent Cursor SQL ID
...
*** 2016-06-15 18:55:48.123
===[ Heartbeat State Object ]===
===[ Trace Buffer ]===
==============================
Dumping PROCESS PPA4 (ospid: 12137) States                     -- PPA4 процесс
==============================
===[ System Load State ]===
  CPU Total 32 Raw 32 Core 16 Socket 2
  Load normal: Cur 1315 Highmark 40960 (5.13 160.00)
===[ Latch State ]===
  Not in Latch Get
===[ Session State Object ]===
  ----------------------------------------
  SO: 0x2c0943e958, type: 4, owner: 0x2c10ffd398, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x2c10ffd398, name=session, file=ksu.h LINE:13957, pg=0 conuid=0
  (session) sid: 2418 ser: 33114 trans: (nil), creator: 0x2c10ffd398
            flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
            flags2: (0x40009) -/-/INC
            DID: 0001-0051-00000002, short-term DID: 0000-0000-00000000
            txn branch: (nil)
            edition#: 711465            user#/name: 0/SYS
            oct: 3, prv: 0, sql: 0x27e7b1e830, psql: (nil)                                             -- Library Cache Handle for Cursor
            stats: 0x27dfb1ffb8, PX stats: 0xcf26290
  ksuxds FALSE at location: 0
  service name: SYS$USERS
  client details:
...
  Current Wait Stack:
   0: waiting for 'PX Deq: Execution Msg'
...
    ----------------------------------------
    SO: 0x27c3a382b8, type: 96, owner: 0x2c0943e958, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
     proc=0x2c10ffd398, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9220, pg=0 conuid=0

    LibraryObjectLock:  Address=0x27c3a382b8 Handle=0x27e7b1e830 Mode=N
      CanBeBrokenCount=1 Incarnation=1 ExecutionCount=1

      User=0x2c0943e958 Session=0x2c0943e958 ReferenceCount=1
      Flags=CNB/[0001] SavepointNum=57617a6e
    LibraryHandle:  Address=0x27e7b1e830 Hash=bc28b9b6 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD -- Handle Address
      ObjectName:  Name=SELECT executions, end_of_fetch_count,              elapsed_time/px_servers... -- SQL Text
        FullHashValue=28b7c484b3e39566ebc5a8fbbc28b9b6 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=3156785590 OwnerIdn=0
      Statistics:  InvalidationCount=0 ExecutionCount=2 LoadCount=3 ActiveLocks=4 TotalLockCount=4 TotalPinCount=1
      Counters:  BrokenCount=1 RevocablePointer=1 KeepDependency=2 Version=0 BucketInUse=3 HandleInUse=3 HandleReferenceCount=0
      Concurrency:  DependencyMutex=0x27e7b1e8e0(0, 2, 0, 0) Mutex=0x27e7b1e978(0, 42, 0, 0)
      Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000]
      WaitersLists:
        Lock=0x27e7b1e8c0[0x27e7b1e8c0,0x27e7b1e8c0]
        Pin=0x27e7b1e8a0[0x27e7b1e8a0,0x27e7b1e8a0]
        LoadLock=0x27e7b1e918[0x27e7b1e918,0x27e7b1e918]
      Timestamp:  Current=06-15-2016 18:54:24
      HandleReference:  Address=0x27e7b1ed20 Handle=(nil) Flags=[00]
      LibraryObject:  Address=0x27e7b1d7e8 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000]
        ChildTable:  size='16'
          Child:  id='0' Table=0x27e7b1e668 Reference=0x27e7b1e120 Handle=0x27e7b1d5c0
          Child:  id='1' Table=0x27e7b1e668 Reference=0x27e7b1e4d0 Handle=0x27e7b19808
      NamespaceDump:
        Parent Cursor:  sql_id=frjd8zfy2jfdq parent=0x27e7b1d8b0 maxchild=2 plk=y ppn=n                -- sql_id=frjd8zfy2jfdq***
          CursorDiagnosticsNodes:
            ChildNode:  ChildNumber=1 ID=40 reason=Bind mismatch(17) size=4x4 bind_position=0 original_oacflg=19 original_oacflg=19 new_oacflg=33
            ChildNode:  ChildNumber=0 ID=40 reason=Bind mismatch(17) size=4x4 bind_position=0 original_oacflg=33 original_oacflg=33 new_oacflg=19
    ----------------------------------------
...
*** 2016-06-15 18:55:50.317
kjgcr_DoAction: KJGCR_ACTION? - id 0
Incident 744270 created, dump file: .../incident/incdir_744270/orcl1_lmhb_9997_i744270.trc
ORA-29771: process USER (OSID 10395) blocks LGWR (OSID 10005) for more than 70 seconds

==============================
ERROR: Killing USER (ospid: 10395) which blocks LGWR (ospid: 10005) in a wait                          -- finita la comedia
ERROR: Please check the waiter and blocker states in the incident file.
==============================

**) как правило, попадаются вполне пользовательские запросы к системным обзорам типа

select open_mode from v$database

— этот, правда, читает controlfile)

***) sql_id=frjd8zfy2jfdq рекурсивно выполняется PX-slave процессом в процессе parsing-а (****)

2 комментария

  1. Игорь, здравствуйте.
    Недавно сталкивались с похожей проблемой на 11.2.0.4. LGWR периодически зависал на «LGWR wait for redo copy» секунд на 30-40, при этом блокировали его неактивный пользовательские процессы. Цепочка ожиданий примерно следующая: ‘log file sync’ => ‘LGWR wait for redo copy’ => . В конце концов выяснилось, что причиной является Bug 22243719, а корнем проблемы — PGA snapshot collection.

    комментарий от Pavel Trishin — 14.07.2016 @ 08:48

    • Спасибо, Павел, будет полезно для тех, кто ещё не обновился до 12c)

      В нашем случае Сергей Щукин успешно выяснил через SR причину — Bug 17018214 — ORA-600 [krdrsb_end_qscn_2] ORA-4021 in Active Dataguard Standby Database with fix for bug 16717701 — и решение нашей проблемы:

      SQL > alter system set "_adg_parselock_timeout" = 550 scope=both sid='*';

      В документе по ссылке — описание EVENT 16717701 и относительно новых параметров:

      SQL> @param_ _adg_parselock_timeout
       
      NAME                          VALUE  IS_DEF  DSC
      ----------------------------- ------ ------- ---------------------------------------------------------------
      _adg_parselock_timeout        0      TRUE    timeout for parselock get on ADG in centiseconds
      _adg_parselock_timeout_sleep  100    TRUE    sleep duration after a parselock timeout on ADG in milliseconds

      комментарий от Игорь Усольцев — 22.07.2016 @ 17:17


RSS feed for comments on this post.

Создайте бесплатный сайт или блог на WordPress.com.