The purpose of this alogrithm is to have the last merge phase optimize(maximize)
the
input files available: i.e., Maximum Input files = Total-Files -
1.
In Table4-2 of optimal merge, the last phase (phase 10)
does not have the optimal condition (it used only two input files).
But this last phase issued the maximum I/O operations!
So, Knuth figured out a mathematical way to sacrifice the first
few
phases, which involes less data, and make the last phase
be optimal.
The concept utilizes the Fibonacci series:
A Fibonacci series of order 3 is
1 1 1 3 5 9 17 31 ...
(1+1+1=3, 1+1+3=5, 1+3+5=9, 3+5+9=17,
...)
That is every number is the sum of the previous 3 numbers.
So, A Fibonacci number of order 4 is
1 1 1 1 4 7 13 25 49
... (......., .4+7+13+25=49, ...)
Now if we have 4 files in total, then the optimal (maximum)
input files would be 3(one for output).
So, what we need is a Fibonacci series of order 3.
In the following, I will create table 4-4 using reverse engineering.
First we assume that the final merge involved the max. three inputs.
But in this consideration, it will happened only when the total-input-files
number is in Fibonacci series
of order 3; i.e., 1, 3, 5, 9, 17 or 31 partitions, then we can
have the perfect result.
Since, in the text, we assume that total partitions is 20.
So, we need extra 11=(31-20) dummy partitions (Zero record partitions).
(31 is the smallest number no less then 20)
And the distribution of partitions should be 13, 11,7 as derived
in table4-4.
(It is important to know how to derive this table:)
The last phase is always 1 0
0 for order 3
1 0 0 0 for order 4
and last-1 is always
1 1 1
1 1 1 1
and last-2 is always
2 2 1
2 2 2 1
last -3
4 3 2
4 4 3 2
last -4
7 6 4
8 7 6 4
The following is the trick to derive the above distribution
table:
To get the number of each row: (1) starting with the rightmost number
of a row.
This number is the left most number of the previouse phase (such
as 4 in last-3)
The rest of the number is computed as follows:
6= 4+ 2( the 2 is right above 4 of last-3)
7= 4+3 (the 3 is right above 6 of last-3)
8=4+4 (the 4 is right above 7 of last-3)
If you draw lines on the above table, you can see the realtionship
better..
Now let us calculate the distribution of dummy partitions.
The best way to distribute dummy partition over the three inputs are
"as evenly as possible".
So, 4 , 4, 3 is the choice.
This leads to the result of the 20 real partition distribution as 13
11 7 Needed
(Derived mathematically in reverse engineering to optimize the last
phase)
4 4 3 Dummy (as evenly distrubuted as possible)
-----------------
9 7 4 the calculated real partution distribution
(13-4=9, 11-4=7 and 7-3=4)
This is Figure 4-9.
(The sequence of Ra, Rb, Rc.... is not important. It is the total partitions
in each files that is our concern)
Now let me derive Table4-5 as follows:
I will represent the 20 real partitions as twenty M's
(Because each partition has fixed M records in our example)
D is used to represent Dummy partition.
F-1 F-2 F-3 F-4
----------------------------------------
D D D ..................................Row-1
D D D ..................................Row-2
D D D ..................................Row-3
D D M .................................Row-4
M M M .................................Row-5
M M M .................................Row-6
M M M .................................Row-7
M M
M M
M M
M M
M
M
===================================
Now I will merge F-1, F-2 and F-3, if there are inputs
(D and M):(That is from Row-1 to Row-7),
and ouput the merged data to F-4 as follows:
Phase-1:
F-1 F-2 F-3 F-4
----------------------------------------
D D D ...........
Row-1
D D D ............
Row-2
D D D ............
Row-3
D D M ............Row-4
M M M ............Row-5
M M M ............Row-6
M M M ............Row-7
-----------------------------------------
M M
D (D+D+D=D, just like 0+0+0=0) From Row-1... to be Row -8
M M
D (D+D+D=D) From Row-2............................. . to be
Row -9
M M
D (D+D+D=D) .................................................
to be Row-10
M M
M (D+D+M) ........................................................
...Row -11
M
3M(M+M+M)...........................................................
..Row-12
M
3M (M+M+M)..........................................................
..Row-13
3M (M+M+M)..........................................................
..Row-14
=======================================
Up to now we used up all the input data from F-3.
So, we switch inputs files to F-1 F-2 and F-4, and used F-3 as output
file!
And merge the F-1, F-2 , F-4, so long as there are data in these three
files, to F-3:
(Row-8 to Row-11.)
This setup the second phase.
Phase-2:
F-1 F-2 F-3
F-4
----------------------------------------
D D
D .......... Row-1
D D
D ............ Row-2
D D
D ............ Row-3
D D
M ............Row-4
M M
M ............Row-5
M M
M ............Row-6
M M
M ............Row-7
-----------------------------------------
M M
D (D+D+D=D, just like 0+0+0-0) From Row-1.... .Row -8
M M
D (D+D+D=D) From Row-2............................... Row -9
M M
D (D+D+D=D)....................................................
Row-10
M M
M (D+D+M)........................................................
Row -11
--------------------------------------------------------------------------------------------------------
M
2M 3M (M+M+M)......................................................Row-12
M
2M 3M (M+M+M)......................................................Row-13
2M 3M (M+M+M)......................................................Row-14
3M........................................................................
to be Row-15
================================================================
We used up all the data in F-2 now, so we switch inputs to F-1, F-3,
F-4, and use F-2 as
output file to merge Row-12, Row-13 to F-2:
Phase-3:
F-1 F-2 F-3
F-4
----------------------------------------
D D
D .......... .Row-1
D D
D ............Row-2
D D
D ............Row-3
D D
M ............Row-4
M M M
............Row-5
M M M
............Row-6
M M M
............Row-7
-----------------------------------------
M M
D (D+D+D=D, just like 0+0+0-0)
Row -8
M M
D (D+D+D=D)
Row -9
M M
D (D+D+D=D)
Row-10
M M
M (D+D+M)......................................... Row -11
--------------------------------------------------------------------------------------------------------
M
2M 3M(M+M+M).........................................Row-12
M
2M 3M(M+M+M).........................................Row-13
--------------------------------------------------------------------------------------------------------
6M
2M 3M(M+M+M)..........................................Row-14
6M
3M ................................................... .................
Row-15
===============================================================
Again we used up all the data in F-1, so we switch inputs to F-2,
F-2 and F-4
and use F-1 as output file to merge Row-14
Phase-4:
F-1 F-2 F-3
F-4
----------------------------------------
D D
D .......... .Row-1
D D
D ............Row-2
D D
D ............Row-3
D D
M ............Row-4
M M M
............Row-5
M M M
............Row-6
M M M
............Row-7
-----------------------------------------
M M
D (D+D+D=D, just like 0+0+0-0)
Row -8
M M
D (D+D+D=D)
Row -9
M M
D (D+D+D=D)
Row-10
M M
M (D+D+M)........................................ .Row -11
--------------------------------------------------------------------------------------------------------
M
2M 3M(M+M+M).........................................Row-12
M
2M 3M(M+M+M).........................................Row-13
--------------------------------------------------------------------------------------------------------
6M
2M 3M(M+M+M)..........................................Row-14
--------------------------------------------------------------------------------------------------------
11M 6M 3M ...................................................
................. Row-15
==================================================================
Finally, we used up all the data in F-4, so we switch inputs back
to F-1,
F-2 and F-3 and merge them into F-4.
Phase-5:
F-1 F-2 F-3
F-4
----------------------------------------
D D
D .......... . Row-1
D D
D ............ Row-2
D D
D ............ Row-3
D D
M ............Row-4
M M M
............Row-5
M M M
............ Row-6
M M M
............ Row-7
-----------------------------------------
M M
D (D+D+D=D, just like 0+0+0-0)
Row -8
M M
D (D+D+D=D)
Row -9
M M
D (D+D+D=D)
Row-10
M M
M (D+D+M)......................................... Row -11
--------------------------------------------------------------------------------------------------------
M
2M 3M(M+M+M).........................................Row-12
M
2M 3M(M+M+M).........................................Row-13
--------------------------------------------------------------------------------------------------------
6M 2M
3M(M+M+M)..........................................Row-14
--------------------------------------------------------------------------------------------------------
11M 6M 3M ...................................................
................. Row-15
-------------------------------------------------------------------------------------------------------
20M (11M+6M+3M)
This is Table4-5.
________________________________________________________________________________