SAP HANA overview
SAP HANA is an in-memory database, columnar-oriented database management system developed by SAP. The key differentiators of HANA are its tight integration into the SAP application stack coupled with its columnar store, the latter of which is intended to enable customers to fold both OLTP and OLAP operations into a single database entity. In practical terms, this means an SAP HANA database stores both analytical and transactional data. Analysis of the data combination is possible on the fly, helping businesses make real-time analytical decisions.
SAP HANA as a database product continues to evolve dramatically. Key among recent changes has been the movement from a Single Database Container (SDC) to Multi-Tenant Database Containers (MDC), an architectural change that has become the exclusive architecture of HANA version 2.0.
Implications of the HANA column store
In a row-based relational database, data is stored by row, usually in large files. Data is then accessed from the database by reading and returning full rows of data.
Row
Country Product Sales
1 US Alpha 3000
2 US Beta 1250 <-----storing this line across------>
3 JP Alpha 400
4 UK Alpha 700
Column
Country Product Sales ^
US Alpha 3000 |
US Beta 1250 storing each column up / down
JP Alpha 400 |
UK Alpha 700 v
Select
In a columnar database, the entire row does not need to be read, only the columns associated with the query. This columnar structure improves (read) efficiency and, therefore, query performance, particularly for extensive data sets.
Columnar data storage allows highly efficient compression because most of the columns contain only a few distinct values (compared to the number of rows). The data, therefore, has a smaller memory footprint and takes fewer processing cycles to interrogate.
Better parallel Processing: in a column store, data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core.
Insert / Update
Storing data in columns instead of rows is challenging for workloads with many data modifying operations. Therefore, the concept of a differential buffer was introduced, where new entries are written to a differential buffer first. In SAP HANA, this is referred to as the delta store. In contrast to the main store, the delta store is optimized for inserts. At a later point in time and depending on thresholds, e.g. the frequency of changes and new entries, the data in the delta store is merged into the main store. This process is referred to as delta merge.
The delta merge process has important implications for filesystem-based table modifications. Delta merge on any table will reorganize all data in that table and as a corollary will re-write the entire table to different filesystem blocks. The space savings Delphix can offer by tracking block changes between snapshots becomes far less effective or predictable when entire tables are reorganized as a function of delta merge.