Next:9.11
To Probe FurtherUp:9.
Sorting MethodsPrevious:9.9
Radix Sorting
9.10 Merge Sort

Divide and conquer algorithm

O(nlog n) worst case performance

Useful in external sorting applications.
Algorithm
listtype mergesort (listtype L; int
n)
{
if (n = 1)
return (L)
else {
split L into two halves L_{1}and
L_{2};
return (merge (mergesort L_{1},,
mergesort L_{2},
}
}
Let T(n) be the running time of mergesort
on a list of size n. Then.
T(n) 

C_{1} (n = 1) 







+
(n > 1) 

It can be shown that T(n) is O(nlog
n).
Mergesort as an External Sorting Procedure

Assume that the data to be sorted is stored in a file

The essential idea is to organize a file into progressively larger
runs
where:
A run is a sequence of records r_{1},...,
r_{k},
such that r_{i}r_{i
+ 1} for i = 1,...,
k
 1.

We say a file, r_{1},
r_{2},...,
r_{m},
of records is organized into
runs or length
k if: i
1 such that ki m,
the sequence
r_{k(i
 1) + 1}, r_{k(i  1) + 2},..., r_{ki}
is a run of length k and furthermore if m is not divisible
by k and m = pk +
q
where q < k, the sequence (r_{m
 q + 1},..., r_{m}) which is called the tail
is a run of length q.
Example
The following is organized into runs of length 3 with the tail having
length 2.
7 15 29 
8 11 13 
16 22 31 
5 12 
Basic step
Consider two files f1 and f2 organized into runs of length k. Assume
that

1.

If n_{i} is the number of runs (including tails) of f_{1}(i
= 1, 2), then  n_{1}
 n_{2}
1

2.

At most one of f_{1} and f_{2} has a tail

3.

The one with a tail (if any) has at least as many runs as the other.
In mergesort, we read one run from each of f_{1}
and f_{2}, merge the runs into a run of
length 2k and append it to one of two files g_{1}
and g_{2}. By alternating between g_{1}
and g_{2}, these files would be organized into runs
of length 2k, satisfying (1), (2), and (3).
Algorithm

First divide all n records into two files f_{1} and
f_{2}
as evenly as possible. f_{1} and f_{2} can
be regarded as organized into runs of length 1.

Merge the runs of length 1 and distribute them into files g_{1}
and g_{2}which will now be organized into runs of length
2.

Now make f_{1} and f_{2} empty and merge
g_{1}
and g_{2} into f_{1} and
f_{2}
creating runs of length 4.
Repeat ^{...}
Example
f_{1} 
28 
3 
93 
54 
65 
30 
90 
10 
69 
8 
22 
f_{2} 
31 
5 
96 
85 
9 
39 
13 
8 
77 
10 

runs of length 1
merge into
g_{1} and g_{2}
g_{1} 
28 
31 
93 
96 
9 
65 
13 
90 
69 
77 
22 
g_{2} 
3 
5 
54 
85 
30 
39 
8 
10 
8 
10 

runs of length 2
merge into
f_{1} and f_{2}
f_{1} 
3 
5 
28 
31 
9 
30 
39 
65 
8 
10 
69 
77 
f_{2} 
54 
85 
93 
96 
8 
10 
13 
90 
22 



runs of length 4
merge into
g_{1} and g_{2}
g_{1} 
3 
5 
28 
31 
54 
85 
93 
96 
8 
10 
22 
69 
77 
g_{2} 
8 
9 
10 
13 
30 
39 
65 
90 





runs of length 8
merge
into
f_{1} and f_{2}
f_{1} 
3 
5 
8 
9 
10 
13 
28 
30 
31 

39 
54 
65 
85 
90 
93 
96 












f_{2} 
8 
10 
22 
69 
77 




runs of length 16
merge into
g_{1} and g_{2}
g_{1} will be left with the sorted sequence
Multiway Mergesort

Here we merge m files at a time. Let the files f_{1},...,
f_{m}
be organized as runs of length k.
We can read m runs, one from each file, and merge them into one
run of length
mk. This run is placed on one of m files g_{1},...,
g_{m},
each getting a run in turn.

The merging process will involve each time selecting the minimum among
m
currently smallest unselected records from each file.
By using a priority queue that supports insert and delete in logarithmic
time (e.g., partially ordered tree), the process can be accomplished in
O(log
m)time.

Number of passes  log_{m}n
Effort in each pass = O(nlog_{2}m)
overall complexity O(nlog_{2}m.log_{m}n)

Advantages


We save by a factor of log_{2}m, the number of times we
read each record


We can process data O(m) times faster with m disk
units.
Next:9.11
To Probe FurtherUp:9.
Sorting MethodsPrevious:9.9
Radix Sorting
eEL,CSA_Dept,IISc,Bangalore