Time taken by savepoint to perform backup in SAP HANA


M_SAVEPOINTS view stores Current and historical savepoint statistics. There is column DURATION which tells the total time taken by savepoint.

You can extract the following information from the numbers in this view −

  • CRITICAL_PHASE_DURATION shows the period of time during which the updaters were blocked in a savepoint. Normally, this should be in the milliseconds range, except for a global savepoint for data backup, which may take longer due to global synchronization across all nodes. If the critical phase duration is too long, there is probably some problem (e.g., I/O load is too high).
  • DURATION shows the total time taken by a savepoint. This should be significantly less than configured savepoint frequency REQUESTED_FREQUENCY (in the range 0-10%, depending on load). Higher ratios indicates I/O overload.
  • TIME_SINCE_PREVIOUS should be close to REQUESTED_FREQUENCY. If it is significantly higher, this indicates that the savepoint is encountering a block, such as a very long column merge operation.
  • Ratio of FLUSHED_PAGES* vs. FLUSHED_ROWSTORE_PAGES* or ratio of FLUSHED_SIZE* vs. FLUSHED_ROWSTORE_SIZE* show the respective load of column store vs. row store. Row store is only flushed during savepoint, column store also flushes the data between savepoints to balance the load.
  • High ratio of FLUSHED_*PAGES_IN_CRITICAL_PHASE vs. FLUSHED_*PAGES or ratio of FLUSHED_*SIZE_IN_CRITICAL_PHASE vs. FLUSHED_*SIZE indicate potential I/O overload. Normally, zero or only a few pages should be written in the critical phase, except for special situations like global savepoint for data backup (but also in this case, the number of pages written in the critical phase should be on the order of magnitude 1% or less of asynchronously flushed pages). High amount of data written in the critical phase indicates overload of the I/O subsystem and will most probably lead to increased blocking times of update transactions due to increased 
  • Big RTT_SIZE (more than a few entries) indicates some problem in distributed transaction handling. RTT (rollback transaction table) holds rollback entries for distributed transactions currently in rollback. Normally, these entries are eliminated very fast, after the respective rollback is finished. In case where a slave node failed, entries for this slave node are held persistently until the slave node restarts. Normally, this number should go to zero or close to zero a short time after the restart of a failed slave node (or after the restart of the whole system).
Published on 05-Feb-2018 22:37:56