一 OS环境

 [root@localhost ~]# cat /etc/redhat-release 
CentOS release 6.5 (Final)
[root@localhost ~]# uname -rm
2.6.32-431.el6.x86_64 x86_64
[root@localhost ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         48216      47094       1122          0        286      44795
-/+ buffers/cache:       2012      46204
Swap:        24175       1712      22463
[root@localhost ~]#

二数据库环境

ai=> select version();
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit
(1 row)

ai=> \dt+ t_btree 
                    List of relations
 Schema |  Name   | Type  | Owner |  Size  | Description 
--------+---------+-------+-------+--------+-------------
 public | t_btree | table | ai    | 346 MB | 
(1 row)

ai=> select count(*) from t_btree ;
  count   
----------
 10000000
(1 row)

ai=>

三 external merge

ai=> show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1580363.83..1605363.83 rows=10000000 width=4) (actual time=8397.774..10610.457 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: external merge  Disk: 136936kB
   Buffers: shared hit=15871 read=28377, temp read=50518 written=50518
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.028..861.348 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=15871 read=28377
 Planning time: 2.056 ms
 Execution time: 11135.532 ms
(10 rows)

ai=>

这里的排序方法采用的external mege，意味着数据量太大了，在内存里排序排不下(work_mem=4MB对于当前场景的排序，不够用)，只好交换到磁盘上排序，磁盘排序完成之后，再把排序结果交换到内存中。效率最低。

shared_buffers相当于Oracle数据库的database buffer cache，单纯加大该参数，即使所有的数据都能从shared buffers命中，对该排序依然无效，如下：

ai=> explain (analyze,verbose,buffers,costs,timing) select id from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1580363.83..1605363.83 rows=10000000 width=4) (actual time=8032.685..10126.444 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: external merge  Disk: 136936kB
   Buffers: shared hit=44248, temp read=50518 written=50518
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.021..792.516 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=44248
 Planning time: 0.062 ms
 Execution time: 10655.508 ms
(10 rows)

ai=> select 44248*8;
 ?column? 
----------
   353984
(1 row)

ai=> select 44248*8/1024.0;
       ?column?       
----------------------
 345.6875000000000000
(1 row)

ai=> show shared_buffers ;
 shared_buffers 
----------------
 4GB
(1 row)

ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=>

shared_buffers=4GB，work_mem=4MB时。执行计划中，Buffers: shared hit=44248，内存命中44248个内存块儿，每个内存页8KB，换算之后，约为346MB，足够缓存该表的全部数据，即所有数据都从内存中读取。但是，排序方法依然是external merge。

四 quicksort排序

ai=> show shared_buffers ;
 shared_buffers 
----------------
 512MB
(1 row)

ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=> set work_mem ='800MB';
SET
ai=> show work_mem ;
 work_mem 
----------
 800MB
(1 row)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1306922.83..1331922.83 rows=10000000 width=4) (actual time=15001.049..17213.924 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: quicksort  Memory: 741817kB
   Buffers: shared hit=15903 read=28345
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.060..920.913 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=15903 read=28345
 Planning time: 0.054 ms
 Execution time: 17756.761 ms
(10 rows)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1306922.83..1331922.83 rows=10000000 width=4) (actual time=6075.205..7945.550 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: quicksort  Memory: 741817kB
   Buffers: shared hit=16063 read=28185
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.060..879.729 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=16063 read=28185
 Planning time: 0.070 ms
 Execution time: 8466.822 ms
(10 rows)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id;
                                                            QUERY PLAN                                                             
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1306922.83..1331922.83 rows=10000000 width=4) (actual time=5400.587..7208.614 rows=10000000 loops=1)
   Output: id
   Sort Key: t_btree.id
   Sort Method: quicksort  Memory: 741817kB
   Buffers: shared hit=16095 read=28153
   ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.059..847.957 rows=10000000 loops=1)
         Output: id
         Buffers: shared hit=16095 read=28153
 Planning time: 0.068 ms
 Execution time: 7724.861 ms
(10 rows)

ai=>

说明：

1 默认情况下work_mem=4MB，不满足在内存中完成该排序操作，通过向上调整work_mem=800MB ，使其能够在内存中完成该排序，这里的800MB是逐渐上调得到的一个值。为了学习研究用，实际生产环境，调整该参数值时，要注意该值work_mem*max_connections不能超过总物理内存大小；

2 调整完work_mem=800MB后，多次观察该SQL的执行计划，可以发现随着Buffers: shared hit=16095 read=28153的变化，其Execution time逐渐降低。

This will definitely be faster than external merge, since all of the data is brought into memory and then the sorting is done in memory itself.

五 top-N heapsort

ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=> show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

ai=> explain (analyze,verbose,buffers,costs,timing) select id from t_btree order by id limit 10;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=360344.40..360344.43 rows=10 width=4) (actual time=1784.116..1784.117 rows=10 loops=1)
   Output: id
   Buffers: shared hit=3 read=44248
   ->  Sort  (cost=360344.40..385344.40 rows=10000000 width=4) (actual time=1784.113..1784.113 rows=10 loops=1)
         Output: id
         Sort Key: t_btree.id
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=3 read=44248
         ->  Seq Scan on public.t_btree  (cost=0.00..144248.00 rows=10000000 width=4) (actual time=0.023..856.626 rows=10000000 loops=1)
               Output: id
               Buffers: shared read=44248
 Planning time: 0.396 ms
 Execution time: 1784.149 ms
(13 rows)

ai=>

说明:

1 这里SQL加上了limit 10，则执行计划选择了top-N heapsort，同时，内存只用了25kB。

2 PostgreSQL数据库维护了一个heap的内存结构，该heap有一个上限大小。排序过程大致如下：PostgreSQL先根据limit大小，顺序的把数据放入heap中，等到heap被填满之后，再去读取下一个数据，判断其值是否小于heap中已有的最大值？如果大于最大值，则直接丢弃这个数据，返回heap中的结果集，排序完成。如果小于最大值，则把这个值放入heap中的最大值处，并把heap中之前的那个最大值移除heap，继续读取下一个值，重复该过程，直到读取的新值不再大于heap中的最大值为止。

3 由于SQL中有order by id，才会涉及到排序操作，如果没有这个order by从句的话，那么无需涉及排序，直接返回表中读取的前10条记录；

4 这里无需对整个表进行排序，只需获取最小的10条数据，并对其进行排序即可。

六 index 排序

ai=> create index idx_t_btree_id on t_btree(id);
CREATE INDEX
ai=> show work_mem ;
 work_mem 
----------
 4MB
(1 row)

ai=> show shared_buffers ;
 shared_buffers 
----------------
 128MB
(1 row)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id limit 10;
                                                                      QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.87 rows=10 width=4) (actual time=0.030..0.045 rows=10 loops=1)
   Output: id
   Buffers: shared hit=13
   ->  Index Only Scan using idx_t_btree_id on public.t_btree  (cost=0.43..436680.67 rows=10000033 width=4) (actual time=0.028..0.039 rows=10 loops=1)
         Output: id
         Heap Fetches: 10
         Buffers: shared hit=13
 Planning time: 0.120 ms
 Execution time: 0.066 ms
(9 rows)

ai=> explain (analyze,verbose,buffers,costs,timing) select * from t_btree order by id desc limit 10;
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.87 rows=10 width=4) (actual time=0.024..0.038 rows=10 loops=1)
   Output: id
   Buffers: shared hit=12 read=1
   ->  Index Only Scan Backward using idx_t_btree_id on public.t_btree  (cost=0.43..436680.67 rows=10000033 width=4) (actual time=0.022..0.035 rows=10 loops=1)
         Output: id
         Heap Fetches: 10
         Buffers: shared hit=12 read=1
 Planning time: 0.213 ms
 Execution time: 0.061 ms
(9 rows)

ai=>

说明：

实际情况下，通过index来访问数据，并不是严格意义的排序，只不过是index本身已经是排好序的数据集，数据库只需直接从index读取数据即可，并不需要执行额外的排序操作。这里，id字段上有一个B-tree index，无论是按照id的升序、降序排序，都可以直接从index读取数据。

七小结

1 此前，并没有认真的关注过external merge排序操作，原来是因为内存空间不够，work_mem内存区不足以用于排序的操作，故而数据库不得不采用将数据交换到磁盘排序，然后把排序结果交换回内存；

2 quick sort是一种成熟的排序方法，大学的数据结构课程学过，早已还给老师了，找时间回顾一下该知识点；

3 heap sort方法对于自己是一种新的排序，通过小结，也大致清楚了其排序的实现方式。

八参考

https://madusudanan.com/blog/all-you-need-to-know-about-sorting-in-postgres/

PostgreSQL数据库排序的小结

PostgreSQL数据库排序的小结

一 OS环境

二数据库环境

三 external merge

四 quicksort排序

五 top-N heapsort

六 index 排序

七小结

八参考

发表评论
取消回复

发表评论

PostgreSQL数据库排序的小结

一 OS环境

二 数据库环境

三 external merge

四 quicksort排序

五 top-N heapsort

六 index 排序

七 小结

八 参考

发表评论 取消回复

发表评论

二数据库环境

七小结

八参考

发表评论
取消回复