PostgreSQL数据库排序的小结

一 OS环境

 [root@localhost ~]# cat /etc/redhat-release 
CentOS release 6.5 (Final)
[root@localhost ~]# uname -rm
2.6.32-431.el6.x86_64 x86_64
[root@localhost ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         48216      47094       1122          0        286      44795
-/+ buffers/cache:       2012      46204
Swap:        24175       1712      22463
[root@localhost ~]#

二 数据库环境

ai=> select version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit
(1 row)

ai=> \dt+ t_btree 
                    List of relations
 Schema |  Name   | Type  | Owner |  Size  | Description 
--------+---------+-------+-------+--------+-------------
 public | t_btree | table | ai    | 346 MB | 
(1 row)

ai=> select count(*) from t_btree ;
  count   
----------
 10000000
(1 row)

ai=>

三 external merge

ai=> show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1580363.83..1605363.83 rows=10000000 width=4) (actual time=8397.774..10610.457 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: external merge  Disk: 136936kB
   Buffers: shared hit=15871 read=28377, temp read=50518 written=50518
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.028..861.348 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=15871 read=28377
 Planning time: 2.056 ms
 Execution time: 11135.532 ms
(10 rows)

ai=>

这里的排序方法采用的external mege,意味着数据量太大了,在内存里排序排不下(work_mem=4MB对于当前场景的排序,不够用),只好交换到磁盘上排序,磁盘排序完成之后,再把排序结果交换到内存中。效率最低。

shared_buffers相当于Oracle数据库的database buffer cache,单纯加大该参数,即使所有的数据都能从shared buffers命中,对该排序依然无效,如下:

ai=> explain (analyze,verbose,buffers,costs,timing) select id from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1580363.83..1605363.83 rows=10000000 width=4) (actual time=8032.685..10126.444 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: external merge  Disk: 136936kB
   Buffers: shared hit=44248, temp read=50518 written=50518
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.021..792.516 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=44248
 Planning time: 0.062 ms
 Execution time: 10655.508 ms
(10 rows)

ai=> select 44248*8;
 ?column? 
----------
   353984
(1 row)

ai=> select 44248*8/1024.0;
       ?column?       
----------------------
 345.6875000000000000
(1 row)

ai=> show shared_buffers ;
 shared_buffers 
----------------
 4GB
(1 row)

ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=>

shared_buffers=4GB,work_mem=4MB时。执行计划中,Buffers: shared hit=44248,内存命中44248个内存块儿,每个内存页8KB,换算之后,约为346MB,足够缓存该表的全部数据,即所有数据都从内存中读取。但是,排序方法依然是external merge。

四 quicksort排序

ai=> show shared_buffers ;
 shared_buffers 
----------------
 512MB
(1 row)

ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=> set work_mem ='800MB';
SET
ai=> show work_mem ;
 work_mem 
----------
 800MB
(1 row)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1306922.83..1331922.83 rows=10000000 width=4) (actual time=15001.049..17213.924 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: quicksort  Memory: 741817kB
   Buffers: shared hit=15903 read=28345
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.060..920.913 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=15903 read=28345
 Planning time: 0.054 ms
 Execution time: 17756.761 ms
(10 rows)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1306922.83..1331922.83 rows=10000000 width=4) (actual time=6075.205..7945.550 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: quicksort  Memory: 741817kB
   Buffers: shared hit=16063 read=28185
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.060..879.729 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=16063 read=28185
 Planning time: 0.070 ms
 Execution time: 8466.822 ms
(10 rows)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1306922.83..1331922.83 rows=10000000 width=4) (actual time=5400.587..7208.614 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: quicksort  Memory: 741817kB
   Buffers: shared hit=16095 read=28153
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.059..847.957 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=16095 read=28153
 Planning time: 0.068 ms
 Execution time: 7724.861 ms
(10 rows)

ai=>

说明:

1 默认情况下work_mem=4MB,不满足在内存中完成该排序操作,通过向上调整work_mem=800MB ,使其能够在内存中完成该排序,这里的800MB是逐渐上调得到的一个值。为了学习研究用,实际生产环境,调整该参数值时,要注意该值work_mem*max_connections不能超过总物理内存大小;

2 调整完work_mem=800MB后,多次观察该SQL的执行计划,可以发现随着Buffers: shared hit=16095 read=28153的变化,其Execution time逐渐降低。

This will definitely be faster than external merge, since all of the data is brought into memory and then the sorting is done in memory itself.

五  top-N heapsort

ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=> show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

ai=> explain (analyze,verbose,buffers,costs,timing) select id from t_btree order by id limit 10;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=360344.40..360344.43 rows=10 width=4) (actual time=1784.116..1784.117 rows=10 loops=1)
   Output: id
   Buffers: shared hit=3 read=44248
   ->  Sort  (cost=360344.40..385344.40 rows=10000000 width=4) (actual time=1784.113..1784.113 rows=10 loops=1)
         Output: id
         Sort Key: t_btree.id
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=3 read=44248
         ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.023..856.626 rows=10000000 loops=1)
               Output: id
               Buffers: shared read=44248
 Planning time: 0.396 ms
 Execution time: 1784.149 ms
(13 rows)

ai=>

说明:

1 这里SQL加上了limit 10,则执行计划选择了top-N heapsort,同时,内存只用了25kB。

2 PostgreSQL数据库维护了一个heap的内存结构,该heap有一个上限大小。排序过程大致如下:PostgreSQL先根据limit大小,顺序的把数据放入heap中,等到heap被填满之后,再去读取下一个数据,判断其值是否小于heap中已有的最大值?如果大于最大值,则直接丢弃这个数据,返回heap中的结果集,排序完成。如果小于最大值,则把这个值放入heap中的最大值处,并把heap中之前的那个最大值移除heap,继续读取下一个值,重复该过程,直到读取的新值不再大于heap中的最大值为止。

3 由于SQL中有order by id,才会涉及到排序操作,如果没有这个order by从句的话,那么无需涉及排序,直接返回表中读取的前10条记录;

4 这里无需对整个表进行排序,只需获取最小的10条数据,并对其进行排序即可。

六 index 排序

ai=> create index idx_t_btree_id on t_btree(id);
CREATE INDEX
ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=> show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id limit 10;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.87 rows=10 width=4) (actual time=0.030..0.045 rows=10 loops=1)
   Output: id
   Buffers: shared hit=13
   ->  Index Only Scan using idx_t_btree_id on public.t_btree  (cost=0.43..436680.67 rows=10000033 width=4) (actual time=0.028..0.039 rows=10 loops=1)
         Output: id
         Heap Fetches: 10
         Buffers: shared hit=13
 Planning time: 0.120 ms
 Execution time: 0.066 ms
(9 rows)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id desc limit 10;
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.87 rows=10 width=4) (actual time=0.024..0.038 rows=10 loops=1)
   Output: id
   Buffers: shared hit=12 read=1
   ->  Index Only Scan Backward using idx_t_btree_id on public.t_btree  (cost=0.43..436680.67 rows=10000033 width=4) (actual time=0.022..0.035 rows=10 loops=1)
         Output: id
         Heap Fetches: 10
         Buffers: shared hit=12 read=1
 Planning time: 0.213 ms
 Execution time: 0.061 ms
(9 rows)

ai=>

说明:

实际情况下,通过index来访问数据,并不是严格意义的排序,只不过是index本身已经是排好序的数据集,数据库只需直接从index读取数据即可,并不需要执行额外的排序操作。这里,id字段上有一个B-tree index,无论是按照id的升序、降序排序,都可以直接从index读取数据。

七 小结

1 此前,并没有认真的关注过external merge排序操作,原来是因为内存空间不够,work_mem内存区不足以用于排序的操作,故而数据库不得不采用将数据交换到磁盘排序,然后把排序结果交换回内存;

2 quick sort是一种成熟的排序方法,大学的数据结构课程学过,早已还给老师了,找时间回顾一下该知识点;

3 heap sort方法对于自己是一种新的排序,通过小结,也大致清楚了其排序的实现方式。

八 参考

https://madusudanan.com/blog/all-you-need-to-know-about-sorting-in-postgres/

发表评论

邮箱地址不会被公开。 必填项已用*标注