• 검색 결과가 없습니다.

B-Tree deduplication in PostgreSQL 13: design and background

N/A
N/A
Protected

Academic year: 2022

Share "B-Tree deduplication in PostgreSQL 13: design and background"

Copied!
11
0
0

로드 중.... (전체 텍스트 보기)

전체 글

(1)

B-Tree deduplication in PostgreSQL 13:

design and background

ISS2020 — October 14 , 2020

(2)

Peter 


Geoghegan

@petervgeoghegan

(3)

B-Tree deduplication

New feature in Postgres 13 (which was just released):

Deduplication

Work by Anastasia Lubennikova and myself (Peter Geoghegan)

Targets standard B-Tree access method (default index type in Postgres)

Simple idea: only store key once — not once per index entry - Before: ‘KeyA’, (1,22), ‘KeyA’, (1,23), ‘KeyA’, (1,24)

- After: ‘KeyA’, (1,22)(1,23)(1,24)

(4)

Why deduplicate?

Design changes very little about existing B-Tree indexing Increases performance, decreases storage costs

Occurs only when needed, to avoid a page split

Makes B-Tree indexes like bitmap indexes — though only when that makes sense

- Made possible by existing “logical MVCC” design

- No changes to locking, despite making indexes with duplicates much smaller

(5)

Index size reductions

Size of all indexes taken together

0%

12.5%

25%

37.5%

50%

62.5%

75%

87.5%

100%

TPC-H MGD

Postgres 12 (original size) Patch

(6)

Index size reductions

Size of single low cardinality index on text columns

0%

12.5%

25%

37.5%

50%

62.5%

75%

87.5%

100%

"UK Land Registry" test

Postgres 12 (original size) Patch

(7)

Design goals for deduplication

Standard B-Trees support high concurrency access

Deduplication is enabled by default now, so it must not hurt performance for workloads/indexes with few

duplicates

“Best of both worlds” performance

Posting lists (Heap TID lists) are not compressed - Limits low-level contention

- Very little downside compared to no deduplication

(8)

The challenge of write amplification

Write amplification is a traditional problem when using Postgres — at least with some workloads

Indexes require new entries for MVCC versions All indexes on a table receive new tuples — not just the indexes on columns that an UPDATE

changed.

This only happens when at least one indexed

column is touched by an update, though

(9)

A solution: B-Tree “deletion deduplication”

Traditional complaint about Postgres is that

sometimes indexes require new entries for MVCC

versions, even when UPDATEs do not change the keys

New mechanism extends dedup to delete garbage tuples to avoid a page split

Reliably prevents “logically unnecessary” page

splits. These are splits caused by version churn

alone.

(10)

Conclusions

There are significant differences among all major systems that implement MVCC

Versioning of logical rows in PostgreSQL (as oppose to versioning of tables or of individual pages) enables innovation around indexing

B-Tree deduplication

Transactional full-text search (using GIN)

Geospatial indexing (using GiST)

(11)

https://speakerdeck.com/peterg/postgresql-indexing-2020

Thanks!

참조

관련 문서

This paper presents a new software development environment that supports an integrated methodology for covering all phases of software development and gives integrated

Alcohol, aliphatic, alicyclic and aromatic hydrocarbon solvents, andrology, arsenic, cadmium, diet, environmental factors, ergonomic, fungicides herbicides and

The obverse symbolizes a new start depicting Republic of Korea taking a leading position for creating a new world order and Gwang-Hwa-Mun restored

However, defining the citing article impact by the citation counts it receives, or a variant of it, does not necessarily indicate a high academic significance; for example,

This paper presents an overview of structural systems that are frequently adopted in tall building design; typical beams and columns, concrete filled steel tube columns and

objective of the design is to obtain greater field uniformity in the enlarged test volume and lessen the limitation on usable frequency while decreasing the overall dimension

In Fig. 8, the results for the square columns SC150, show negligible differences on the critical times for both end support conditions. The experimental tests on the pin-

플록 구조의 측면에서 볼 때 폴리머를 일차 응집제로 사용하면 플록 강도와 크기가 향상되었지만, 다른 처리 옵션과 비교해 볼 때 플록 침전 속도에는 개선이 없었으며 유기 물질