tuyển tập toán cao cấp: Tài liệu Image and Videl Comoression P10 ppt

© 2000 by CRC Press LLC

A convergence proof of the procedure is presented by Jain and Jain (1981), under the assumption
that the dissimilarity monotonically increases as the search point moves away from the point
corresponding to the minimum dissimilarity.

FIGURE 11.3

(a) A 2-D logarithmic search procedure. Points at (j, k+2), (j+2, k+2), (j+2, k+4), and (j+1,
k+4) are found to give the minimum dissimilarity in steps 1, 2, 3, and 4, respectively. (b) A 2-D logarithmic
search procedure. Points at (j, k-2), (j+2, k-2), and (j+2, k-1) are found to give the minimum dissimilarity in
steps 1, 2, 3, and 4, respectively.

© 2000 by CRC Press LLC

11.3.3 C

OARSE

-F

INE

T

HREE

-S

TEP

S

EARCH

Another important work on the block matching technique was completed at almost the same time
by Koga et al. (1981). A coarse-ﬁne three-step procedure was developed for fast searching.
The three-step search is very similar to the 2-D logarithm search. There are, however, three
main differences between the two procedures. First, each step in the three-step search compares a
set of nine points that form a 3

¥

3 2-D grid structure. Second, the distances between the points in
the 3

¥

3 2-D grid structure in the three-step search decrease monotonically in steps 2 and 3. Third,
a total of only three steps are carried out. Obviously, these three items are different from the 2-D
logarithmic search described in Section 11.3.2. An illustrative example of the three-step search is
shown in Figure 11.4.

11.3.4 C

ONJUGATE

D

IRECTION

S

EARCH

The conjugate direction search is another fast search algorithm that was developed by Srinivasan
and Rao (1984). In principle, the procedure consists of two parts. In the ﬁrst part, it ﬁnds the
minimum dissimilarity along the horizontal direction with the vertical coordinate ﬁxed at an initial
position. In the second part, it ﬁnds the minimum D value along the vertical direction with the
horizontal coordinate ﬁxed in the position determined in the ﬁrst part. Starting with the vertical
direction followed by the horizontal direction is, of course, functionally equivalent. It was reported
that this search procedure works quite efﬁciently (Srinivasan and Rao, 1984).
Figure 11.5 illustrates the principle of the conjugate direction search. In this example, each
step involves a comparison between three testing points. If a point assumes the minimum D value
compared with both of its two immediate neighbors (in one direction), then it is considered to be
the best match along this direction, and the search along another direction is started. Speciﬁcally,
the procedure starts to compare the D values for three points (j, k–1), (j, k), and (j, k+1). If the D
value of point (j, k–1) appears to be the minimum among the three, then points (j, k-2), (j, k–1),

FIGURE 11.4

Three-step search procedure. Points (j+4, k-4), (j+4, k-6), and (j+5, k-7) give the minimum
dissimilarity in steps 1, 2, and 3, respectively.

© 2000 by CRC Press LLC

and (j, k) are examined. The procedure continues, ﬁnding point (j, k–3) as the best match along
the horizontal direction since its D value is smaller than that of points (j, k–4) and (j, k–2). The
procedure is then conducted along the vertical direction. In this example the best matching is ﬁnally
found at point (j+2, k–3).

11.3.5 S

UBSAMPLING

IN

THE

C

ORRELATION

W

INDOW

In the evaluation of the matching criterion, either MAD or MSE, all pixels within a correlation
window at the

t

n

-1

frame and an original block at the

t

n

frame are involved in the computation. Note
that the correlation window and the original block are the same size (refer to Figure 11.1). In order
to further reduce the computational effort, a subsampling inside the window and the block is
performed (Bierling, 1988). Aliasing effects can be avoided by using low-pass ﬁltering. For instance,
only every second pixel, both horizontally and vertically inside the window and the block, is taken
into account for the evaluation of the matching criterion. Obviously, by using this subsampling
technique, the computational burden is reduced by a factor of 4. Since 3/4 of the pixels within the
window and the block are not involved in the matching computation, however, the use of such a
subsampling procedure may affect the accuracy of the estimated motion vectors, especially in the
case of small-size blocks. Therefore, the subsampling technique is recommended only for those
cases with a large enough block size so that the matching accuracy will not be seriously affected.
Figure 11.6 shows an example of 2

¥

2 subsampling applied to both an original block of 16

¥

16
at the

t

n

frame and a correlation window of the same size at the

t

n

-1

frame.

11.3.6 M

ULTIRESOLUTION

B

LOCK

M

ATCHING

It is well known that a multiresolution structure, also known as a pyramid structure, is a very
powerful computational conﬁguration for various image processing tasks. To save computation in
block matching, it is natural to resort to the pyramid structure. In fact, the multiresolution technique
has been regarded as one of the most efﬁcient methods in block matching (Tzovaras et al., 1994).
In a named top-down multiresolution technique, a typical Gaussian pyramid is formed ﬁrst.

FIGURE 11.5

Conjugate direction search.

© 2000 by CRC Press LLC

Before diving into further description, let us pause here to give those readers who have not been
exposed to the Gaussian pyramid a short introduction to the concept. For those who know the
concept, this paragraph can be skipped. Brieﬂy speaking, a Gaussian pyramid can be understood
as a set of images with different resolutions related to an original image in a certain way. The
original image has the highest resolution and is considered as the lowest level, sometimes called
the bottom level, in the set. From the bottom level to the top level, the resolution decreases
monotonically. Speciﬁcally, between two consecutive levels, the upper level is half as large as the
lower level in both horizontal and vertical directions. The upper level is generated by applying a
low-pass ﬁlter (which has a group of weights) to the lower level, followed by a 2

¥

2 subsampling.
That is, each pixel in the upper level is a weighted average of some pixels in the lower level. In
general, this iterative procedure of generating a level in the set is equivalent to convolving a speciﬁc
weight function with the original image at the bottom level followed by an appropriate subsampling.
Under certain conditions, these weight functions can closely approximate the Gaussian probability
density function, which is why the pyramid is named after Gauss. (For a detailed discussion, readers
are referred to Burt and Adelson [1983, 1984].) A Gaussian pyramid structure is depicted in
Figure 11.7. Note that the Gaussian pyramid depicted in Figure 11.7 resembles a so-called quad-
tree structure in which each node has four children nodes. In the simplest quad-tree pyramid, each
pixel in an upper level is assigned an average value of its corresponding four pixels in the next
lower level.
Now let’s return to our discussion on the top-down multiresolution technique. After a Gaussian
pyramid has been constructed, motion search ranges are allocated among the different pyramid
levels. Block matching is initiated at the lowest resolution level to obtain an initial estimation of
motion vectors. These computed motion vectors are then propagated to the next higher resolution
level, where they are corrected and then propagated to the next level. This procedure continues
until the highest resolution level is reached. As a result, a large amount of computation can be
saved. Tzovaras et al. (1994) showed that a two-level Gaussian pyramid outperforms a three-level
pyramid. Compared with full search block matching, the top-down multiresolution block search
saves up to 67% of computations without seriously affecting the quality of the reconstructed images.
In conclusion, it has been demonstrated that multiresolution is indeed an efﬁcient computational
structure in block matching. This once again conﬁrms the high computational efﬁciency of the
multiresolution structure.

FIGURE 11.6

An example of 2

¥

2 subsampling in the original block and correlation window for a fast
search.

© 2000 by CRC Press LLC

11.3.7 T

HRESHOLDING

M

ULTIRESOLUTION

B

LOCK

M

ATCHING

With the multiresolution technique discussed above, the computed motion vectors at any interme-
diate pyramid level are projected to the next higher resolution level. In reality, some computed
motion vectors at the lower resolution levels may be inaccurate and have to be further reﬁned,
while others may be relatively accurate and able to provide satisfactory motion compensation for
the corresponding block. From a computation-saving point of view, for the latter class it may not
be worth propagating the motion vectors to the next higher resolution level for further processing.
Motivated by the above observation, a new multiresolution block matching method with a
thresholding technique was developed by Shi and Xia (1997). The thresholding technique prevents
those blocks, whose estimated motion vectors provide satisfactory motion compensation, from
further processing, thus saving a lot of computation. In what follows, this technique is presented
in detail so as to provide readers with an insight to both multiresolution block matching and
thresholding multiresolution block matching techniques.

Algorithm

— Let

f

n

(

x

,

y

) be the frame of an image sequence at current moment

n

. First, two
Gaussian pyramids are formed, pyramids

n

and

n

– 1, from image frames

f

n

(

x

,

y

) and

f

n

–1

(

x

,

y

),
respectively. Let the levels of the pyramids be denoted by

l

,

l

= 0

,

1, …,

L

, where 0 is the lowest
resolution level (top level),

L

is the full resolution level (bottom level), and

L

+1 is the total number
of layers in the pyramids. If (

i

,

j

) are the coordinates of the upper-left corner of a block at level

l

of pyramid

n

, the block is referred to as block (i, j)

1
n

. The horizontal and vertical dimensions of a
block at level

l

are denoted by b

1
x

and b

1
y

, respectively. Like the variable block size method (refer
to Method 1 in Tzovaras et al. [1994]), the size of the block in this work varies with the pyramid
levels. That is, if the size of a block at level

l

is b

1
x

, then the size of the block at level

l

+ 1 becomes
2b

1
x

¥

2b

1
y

. The variable block size method is used because it gives more efﬁcient motion estimation
than the ﬁxed block size method. Here, the matching criterion used for motion estimation is the
MAD because it does not require multiplication and performs similar to the MSE. The MAD
between block (i, j)

1

b

1
n

of the current frame and block (i + v

x

, j + v

y

)

1

b

1
n–1

of the previous frame at
level

l

can be calculated as

FIGURE 11.7

Gaussian pyramid structure.

© 2000 by CRC Press LLC

(11.5)
where V

1

= (v

1
x

,v

1
y

) is one of the candidates of the motion vector of block (i, j)

1
n

,

v

l
x

,

v

l
y

are the two
components of the motion vector along the x and y directions, respectively.
A block diagram of the algorithm is shown in Figure 11.8. The threshold in terms of MAD
needs to be determined in advance according to the accuracy requirement of the motion estimation.
Determining the threshold is discussed below in Part B of this subsection. Gaussian pyramids are
formed for two consecutive frames of an image sequence from which motion estimation is desired.
Block matching is then performed at the top level with the full-search scheme. The estimated
motion vectors are checked to see if they provide satisfactory motion compensation. If the accuracy
requirement is met, then the motion vectors will be directly transformed to the bottom level of the
pyramid. Otherwise, the motion vectors will be propagated to the next higher resolution level for
further reﬁnement. This thresholding process is discussed below in Part C of this subsection. The
algorithm continues in this fashion until either the threshold has been satisﬁed or the bottom level
has been reached. The skipping of some intermediate-level calculations provides for computational
saving. Experimental work with quite different motion complexities demonstrates that the proposed
algorithm reduces the processing time from 14 to 20%, while maintaining almost the same quality
in the reconstructed image compared with the fastest existing multiresolution block matching
algorithm (Tzovaras et al., 1994).

FIGURE 11.8

Block diagram for a three-level threshold multiresolution block matching.
MAD v v
bb
fi kj m f i k vj m v
ij
x
l
y
l
x
l
y
l
n
l
n
l
x
l
y
l
m
b
k
b
n
l
y
l
x
l
,
,,,
()
-
=
-
=
-
()
=
¥
++
()
-++++
()
ÂÂ
1
1
0
1
0
1

© 2000 by CRC Press LLC

Threshold Determination

— The MAD accuracy criterion is used in this work for the sake of
saving computations. The threshold value has a direct impact on the performance of the proposed
algorithm. A small threshold value can improve the reconstructed image quality at the expense of
increased computational effort. On the other hand, a large threshold value can reduce the compu-
tational complexity, but the quality of the reconstructed image may be degraded. One possible way
to determine a threshold value, which was used in many experiments by Shi and Xia (1997), is as
follows.
The peak signal-to-noise ratio (PSNR) is commonly used as a measure of the quality of the
reconstructed image. As introduced in Chapter 1, it is deﬁned as
(11.6)
From the given required PSNR, one can ﬁnd the necessary MSE value. A square root of this
MSE value can be chosen as a threshold value, which is applied to the ﬁrst two images from the
sequence. If the resulting PSNR and required processing time are satisfactory, it is then used for
the rest of the sequence. Otherwise, the threshold can be slightly adjusted accordingly and applied
to the second and third images to check the PSNR and processing time. It was reported in numerous
experiments that this adjusted threshold value was accurate enough, and that there was no need for
further adjustment. As shown in Table 11.1, the threshold values used for the “Miss America,”
“Train,” and “Football” sequences (three sequences having quite different motion complexities) are
2, 3, and 4, respectively. They are all determined in this fashion and give satisfactory performance,
as shown in the three rows marked “New Method (TH=2),” “New Method (TH=3)” and “New
Method (TH=4),” respectively, in Table 11.2. That is, the PSNR experiences only about 0.1 dB loss
and the processing time decreases drastically. In the experiments, the threshold value of 3, i.e., the
average value of 2, 3, and 4, was also tried. Refer to the three rows marked “New Method (TH=3)”
in Table 11.2. It is noted that this average threshold value 3 has already given satisfactory perfor-
mance for all three sequences. Speciﬁcally, for the “Miss America” sequence, since the criterion
increases from 2 to 3, the PSNR loss increases from 0.12 to 0.48 dB, and the reduction in processing
time increases from 20 to 38%. For the “Football” sequence, since the criterion decreases from
4 to 3, the PSNR loss decreases from 0.08 to 0.05 dB, and the reduction in processing time decreases

TABLE 11.1
Parameters Used in the Experiments
Parameters at Level Low Resolution Level Full Resolution Level
Miss America
Search range 3 ¥ 31 ¥ 1
Block size 4 ¥ 48 ¥ 8
Thresholding value 2 None (not applicable)
Train
Search range 4 ¥ 41 ¥ 1
Block size 4 ¥ 48 ¥ 8
Thresholding value 3 None (not applicable)
Football
Search range 4 ¥ 41 ¥ 1
Block size 4 ¥ 48 ¥ 8
Thresholding value 4 None (not applicable)
PSNR
MSE
= 10
255
10
2
log
© 2000 by CRC Press LLC
from 14 to 9%. Obviously, for the “Train” sequence, the criterion, as well as the performance,
remains the same. One can therefore conclude that the threshold determination may not require
much computation at all.
Thresholding — Motion vectors estimated at each pyramid level will be checked to see if they
provide satisfactory motion compensation. Assume V
l
(i, j) = (v
l
x
, v
l
y
) is the estimated motion vector
for block (i, j)
1
n
at level l of pyramid n. For thresholding, V
l
(i, j) should be directly projected to
the bottom level L. The corresponding motion vector for the same block at the bottom level of
pyramid n will be V
L
(2
(L–l)
i,2
(L–l)
j), and is given as
(11.7)
The MAD between the block at the bottom pyramid level of the current frame and its counterpart
in the previous frame can be determined according to Equation 11.5, where the motion vector is
V
L
= V
L
(2
(L–l)
i,2
(L–l)
j). This computed MAD value can be compared with the predeﬁned threshold.
If this MAD value is less than the threshold, the computed motion vector V
L
(2
(L–l)
i,2
(L–l)
j) will
be assigned to block (2
(L–l)
i,2
(L–l)
j)
L
n
at level L in the current frame and motion estimation for this
block will be stopped. If not, the estimated motion vector V
l
(i, j) at level l will be propagated to
level l + 1 for further reﬁnement. Figure 11.9 gives an illustration of the above thresholding process.
Experiments — To verify the effectiveness of the proposed algorithm, extensive experiments have
been conducted. The performance of the new algorithm is evaluated and compared with that of
Method 1, one of the most efﬁcient multiresolution block matching methods (Tzovaras et al., 1994)
in terms of PSNR, error image entropy, motion vector entropy, the number of blocks stopped at
the top level vs. the total number of blocks, and processing time. The number of blocks stopped
at the top level is the number of blocks withheld from further processing, while the total number
of blocks is the number of blocks existing at the top level. It is noted that the total number of
blocks is the same for each level in the pyramid. The processing time is the sum of the total number
of additions involved in the evaluation of the MAD and the thresholding operation.
In the experiments, two-level pyramids are used since they give better performance for motion
estimation purposes (Tzovaras et al., 1994). The algorithms are tested on three video sequences
with different motion complexities, i.e., the “Miss America,” “Train,” and “Football.” The “Miss
America” sequence has a speaker imposed on a static background and contains less motion. The
“Train” sequence has more detail and contains a fast-moving object (train). The 20th frame of the
sequence is shown in Figure 11.10. The “Football” sequence contains the most complicated motion
FIGURE 11.9 The thresholding process.
Vij Vij
L
Ll Ll Ll
l
22 2
-
()
-
()
-
()
()
=
()
,,
© 2000 by CRC Press LLC
compared with the other two sequences. The 20th frame is shown in Figure 11.11. Table 11.1 is
the list of implementing parameters used in the experiments. Tables 11.2 and 11.3 give the perfor-
mance of the proposed algorithm compared with Method 1. In all three cases, the motion estimation
has a half-pixel accuracy, the meaning of which will be explained in the next section. All perfor-
mance measures listed there are averaged for the ﬁrst 25 frames of the testing sequences.
Each frame of the “Miss America” sequence is of 360 ¥ 288 pixels. For convenience, only the
central portion, 320 ¥ 256 pixels, is processed. Using the operational parameters listed in Table 11.1
(with a criterion value of 2), 38% of the total blocks at the top level satisfy the predeﬁned criterion
and are not propagated to the bottom level. The processing time needed by the proposed algorithm
is 20% less than Method 1, while the PSNR, the error image entropy, and the vector entropy are
almost the same. Compared with Method 1, an extra amount of computation (around 0.16 ¥ 10
6
FIGURE 11.10 The 20th frame of the “Train” sequence.
FIGURE 11.11 The 20th frame in the “Football” sequence.
© 2000 by CRC Press LLC
additions) is conducted on the thresholding operation, but a large computational savings (around
2.16 ¥ 10
6
additions) is achieved by withholding from further processing those blocks whose MAD
values at the full resolution level are less than the predeﬁned accuracy criterion.
The frames of the “Train” sequence are 720 ¥ 288 pixels, and only the central portion, 640 ¥
256 pixels, is processed. Using the operational parameters listed in Table 11.1 (with a criterion
value of 3), about 52% of the total blocks are stopped at the top level. The processing time is
reduced about 17% by the new algorithm, compared with Method 1. The PSNR, the error image
entropy, and the vector entropy are almost the same.
The frames of the “Football” sequence are 720 ¥ 480 pixels, and only the central portion,
640 ¥ 384 pixels, is processed. Using the operational parameters listed in Table 11.1 (with a criterion
value of 4), about 38% of the total blocks are stopped at the top level. The processing time is about
14% less than that required by Method 1, while the PSNR, the error image entropy, and the vector
entropy are almost the same.
As discussed, the experiments with a single accuracy criterion of 3 also produce similarly good
performance for the three different image sequences.
In summary, it is clear that with the three different testing sequences, the thresholding multi-
resolution block matching algorithm works faster than the fastest existing top-down multiresolution
block matching algorithm while achieving almost the same quality of the reconstructed image.
11.4 MATCHING ACCURACY
Apparently, the two components of the displacement vectors obtained using the technique described
above are an integer multiple of pixels. This is referred to as one-pixel accuracy. If a higher accuracy
is desired, i.e., the components of the displacement vectors may be a non-integer multiple of pixels,
then spatial interpolation is required. Not only will more computation be involved, but also more
bits will be required to represent motion vectors. The gain is a more accurate motion estimation,
hence less prediction error. In practice, half-pixel or quarter-pixel accuracy are two widely utilized
accuracies other than one-pixel accuracy.
TABLE 11.2
Experimental Results (I)
PSNR
(dB)
Error Image
Entropy
(bits per pixel)
Vector Entropy
(bits/vector)
Block Stopped at
Top Level/Total Block
Processing Times
(No. of
Additions, 10
6
)
Miss America Sequence
Method 1 (Tzovaras
et al., 1994)
New method (TH=2)
New method (TH=3)
38.91
38.79
38.43
3.311
3.319
3.340
6.02
5.65
5.45
0/1280
487/1280
679/1280
10.02
8.02
6.17
Train Sequence
Method 1 (Tzovaras
et al., 1994)
New method (TH=3)
27.37
27.27
4.692
4.788
6.04
5.65
0/2560
1333/2560
22.58
18.68
Football Sequence
Method 1 (Tzovaras
et al., 1994)
New method (TH=4)
New method (TH=3)
24.26
24.18
24.21
5.379
5.483
5.483
7.68
7.58
7.57
0/3840
1464/3840
1128/3840
30.06
25.90
27.10

tuyển tập toán cao cấp

Thứ Ba, 25 tháng 2, 2014

Tài liệu Image and Videl Comoression P10 ppt

Xem chi tiết: Tài liệu Image and Videl Comoression P10 ppt

Không có nhận xét nào:

Đăng nhận xét