**Post: #1**

An Advanced Motion Detection Algorithm with Video Quality Analysis for Video Surveillance Systems

An Advanced Motion Detection Algorithm.pdf (Size: 2.41 MB / Downloads: 99)

Abstract

Motion detection is the first essential process in the

extraction of information regarding moving objects and makes

use of stabilization in functional areas, such as tracking, classification,

recognition, and so on. In this paper, we propose a novel and

accurate approach to motion detection for the automatic video

surveillance system. Our method achieves complete detection of

moving objects by involving three significant proposed modules:

a background modeling (BM) module, an alarm trigger (AT)

module, and an object extraction (OE) module. For our proposed

BM module, a unique two-phase background matching procedure

is performed using rapid matching followed by accurate

matching in order to produce optimum background pixels for

the background model. Next, our proposed AT module eliminates

the unnecessary examination of the entire background region,

allowing the subsequent OE module to only process blocks

containing moving objects. Finally, the OE module forms the

binary object detection mask in order to achieve highly complete

detection of moving objects. The detection results produced by

our proposed (PRO) method were both qualitatively and quantitatively

analyzed through visual inspection and for accuracy,

along with comparisons to the results produced by other stateof-

the-art methods. The analyses show that our PRO method

has a substantially higher degree of efficacy, outperforming other

methods by an F1 metric accuracy rate of up to 53.43%.

Index Terms—Background model, entropy, morphology, motion

detection, video surveillance.

I. Introduction

IN THE LAST DECADE, video surveillance systems have

become an extremely active research area due to increasing

levels of terrorist activity and general social problems. This

has led to motivation for the development of a strong and

precise automatic processing system, an essential tool for

safety and security in both public and private sectors. The

need for advanced video surveillance systems has inspired

progress in many important areas of science and technology

including traffic monitoring [1], [2], transport networks, traffic

flow analysis, understanding of human activity [3], [4], home

Manuscript received October 22, 2009; revised February 8, 2010; accepted

June 16, 2010. Date of publication October 18, 2010; date of current version

February 24, 2011. This work was supported by the National Science Council,

under Grant NSC 98-2218-E-027-008. This paper was recommended by

Associate Editor I. Ahmad.

The author is with the Department of Electronic Engineering, National

Taipei University of Technology, Taipei 106, Taiwan (e-mail:

schuang[at]ntut.edu.tw).

Color versions of one or more of the figures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2010.2087812

nursing, monitoring of endangered species, observation of

people and vehicles within a busy environment [5]–[12], along

with many others.

The design of an advanced automatic video surveillance

system requires the application of many important functions

including, but not limited to, motion detection [13]–[25],

classification [26], tracking [27], [28], behavior [29], activity

analysis, and identification [30], [31]. Motion detection is one

of the greatest problem areas in video surveillance as it is not

only responsible for the extraction of moving objects but also

critical to many computer vision applications including objectbased

video encoding, human motion analysis, and humanmachine

interactions [32]. Therefore, our focus here is the

further development of the motion detection phase for an

advanced video surveillance system.

The three major classes of methods for motion detection

are background subtraction, temporal differencing, and optical

flow [13]. Background subtraction [14]–[23], [33] is

the most popular motion detection method and consists of

the differentiation of moving objects from a maintained and

updated background model, which can be further grouped

into parametric type and non-parametric type [33]. Based on

the implicit assumption along with the choice of parameters,

the parametric model may achieve perfect performance corresponding

to the real data along with parametric information

[22]. On the contrary, the non-parametric model is heavily data

dependent without any parameters [22], [33]. Apart from background

subtraction, two other motion detection methods—

optical flow and temporal differencing—are discussed in [25].

While the optical flow method shows the projected motion on

the image plane with successful approximation of the complex

background handling, it often requires levels of computational

complexity that are very high and which subsequently create

difficulties in its implementation [34]. The temporal differencing

method, while effectively adapting to environmental

changes, often results in incomplete detection of the shapes of

moving objects, due to the limitations in temporal differencing

with a sensitive threshold for noisy and local consistency

properties of the change mask [35].

The currently implemented method for background subtraction

accomplishes its objective by subtracting each pixel of

the incoming video frame from the background model, thus

generating an absolute difference. It then applies a threshold

to get the binary objects detection mask [20]. Threshold

1051-8215/$26.00 c 2010 IEEE

2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011

selection is a critical operation and can be conducted by a

variety of previously researched methods [36]–[40]. Although

the currently implemented background subtraction method is

convenient for implementation, the noise tolerance in the video

frame relies on the determined threshold. Functionalities such

as object classification, tracking, behavior, and identification

are then performed on the regions where moving objects have

been detected.

The computational costs of traditional foreground analysis

methods are usually relatively expensive for the video surveillance

systems based on the traditional optical flow implementation

[34]. For more accurate motion detection design,

foreground analysis is always needed for the most popular

background subtraction method in order to achieve the analysis

of the motion information [34].

With respect to background maintenance, the pixel-level

processes and region-level processes should be clearly designed

into the background subtraction approach [41]. This

is because pixel-level processes can handle the adaptation

to changing background at each pixel independently without

pixels group observation, and region-level process can refine

the raw classification of the pixel-level with regard to interpixel

relationships [41].

This paper presents a novel background subtraction method

which generates a background model using the selected suitable

background candidates. Then, through the use of an alarm

trigger (AT) module, it detects the pixels of moving objects

within the regions determined to significantly feature objects.

The organization of the proposed (PRO) method is as follows.

1) A two-phase background matching procedure is used to

select suitable background candidates for generation of

an updated background model.

2) A block-based entropy evaluation with morphological

operations is conducted through a triggered block-based

alarm module.

3) Production of motion detection is completed through the

automatic threshold selection algorithm.

When compared to other state-of-the-art methods included

in the performance study, our method proved to be of higher

efficacy. This was indicated by both qualitative and quantitative

results through analysis using a wide range of natural

video sequences. The remainder of our paper is organized as

follows. Section II presents a condensed overview of the various

background subtraction approaches used for comparison.

Section III contains our proposed motion detection method.

Section IV presents the experimental results achieved by our

PRO method compared to those of other methods. Section V

contains our concluding remarks.

II. Related Work

The major purpose of background subtraction is to generate

a reliable background model and thus significantly improve

the detection of moving objects [35], [42]. Some state-ofthe-

art background subtraction methods include simple background

subtraction (SBS), running average (RA) [14], −

estimation (SDE) [16], multiple − estimation (MSDE)

[19], simple statistical difference (SSD) [20], RA with discrete

cosine transform (DCT) domain [21], and temporal median

filter (TMF) [23]. These methods are briefly reviewed in the

following sections.

A. Simple Background Subtraction

Both the reference image B(x, y) and the incoming video

frame It(x, y) are obtained from the video sequence. A binary

motion detection mask D(x, y) is calculated as follows:

D(x, y) =

1, if |It(x, y) − B(x, y)| > τ

0, if |It(x, y) − B(x, y)| ≤ τ

(1)

where τ is the predefined threshold which designates pixels as

either the background or the moving objects in a video frame.

If the absolute difference between a reference image and

an incoming video frame does not exceed τ, the pixels of

the detection mask are labeled “0,” which means it contains

background, otherwise, active ones are labeled “1,” which designates

it as containing moving objects. A significant problem

experienced by the SBS method in most real video sequences

is that it fails to respond precisely when noise occurs in

the incoming video frame It(x, y) and static objects occur

in the reference image B(x, y) [20]. Note that the reference

image B(x, y) represents the fixed background model, which

is selected from the test frames [20].

B. Running Average

The problem can be countered by using the RA [14] to

generate the adaptive background model for adaptation to

temporal changes in the video sequence. RA differs from SBS

in that it updates each background image frame Bt(x, y) of the

adaptive background model frequently in order to ensure the

reliability of motion detection.

The previous background frame Bt−1(x, y) and the new

incoming video frame It(x, y) are then integrated with the

current background image. The adaptive background model

is attained using the simple adaptive filter as follows:

Bt(x, y) = (1 − β)Bt−1(x, y) + βIt(x, y) (2)

where β is an empirically adjustable parameter. While a

large coefficient β leads to a faster background updating

speed, it also causes the creation of artificial trails behind

moving objects in the background model. In other words, if

objects remain stationary long enough, they become part of

the background model.

The binary motion detection mask D(x, y) is based on the

SBS method and is defined as follows:

D(x, y) =

1, if |It(x, y) − Bt(x, y)| > τ

0, if |It(x, y) − Bt(x, y)| ≤ τ

(3)

where It(x, y) is the current incoming video frame, Bt(x, y)

is the current background model, and τ is an experimentally

predefined threshold to generate the binary motion detection

mask.

HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 3

C. − Estimation

In accordance with the pixel-based decision framework, the

temporal statistics of the pixels of the original video sequence

is calculated by a new background subtraction method called

SDE method [16]. In the first background estimate, the calculation

makes use of the sgn function in order to estimate the

background intensity. The sgn function is expressed as follows:

sgn(a) =

⎧⎨

⎩

1, if a > 0

0, if a = 0

−1, if a < 0

(4)

where a is the input real value. Then the background estimation

formula is expressed as follows:

Bt(x, y) = Bt−1(x, y) + sgn(It(x, y) − Bt−1(x, y)) (5)

where Bt(x, y) is the current background model, Bt−1(x, y)

is the previous background model, and It(x, y) is the current

incoming video frame. The intensity of the background model

increases or decreases by a value of one through the evaluation

of the sgn function at every frame. The image of absolute difference

t(x, y) is then calculated as the estimative difference

between It(x, y) and Bt(x, y) as follows:

t(x, y) = |It(x, y) − Bt(x, y)|. (6)

In a similar fashion, the time-variance Vt(x, y) is calculated

by utilizing the sgn function which measures motion activity

in order to determine whether each pixel should be designated

as “background” or “moving object.”

Vt(x, y) = Vt−1(x, y) + sgn(N × t(x, y)

−Vt−1(x, y)) (7)

where Vt(x, y) is the current time-variance, Vt−1(x, y) is the

previous time-variance, and N is the predefined parameter

which ranges from 1 to 4.

Based on the generated current time-variance Vt(x, y), the

binary motion detection mask D(x, y) is detected as follows:

Dt(x, y) =

1, if t(x, y) > Vt(x, y)

0, if t(x, y) ≤ Vt(x, y).

(8)

D. Multiple − Estimation

The SDE method is characterized by its updating period

which features a constant time in which the background model

is generated. This in turn causes a constraint when used for

certain complex scenes, as in scenes with many moving objects

or those with moving objects exhibiting variable motion [19].

Thus, in this situation, the MSDE method is proposed in

order to build the adaptive background model. The background

model formula is expressed as follows:

bi

t(x, y) = bi

t−1(x, y) + sgn(bi−1

t (x, y) − bi

t−1(x, y)) (9)

where bi

t(x, y) is the current ith reference background,

bi

t−1(x, y) is the previous ith reference background, and

bi−1

t (x, y) is the current (i−1)th reference background. Additionally,

the reference difference i

t(x, y) and reference timevariance

vi

t(x, y) are also computed as follows:

vi

t(x, y) = vi

t−1(x, y) + sgn(N × i

t(x, y)

−vi

t−1(x, y)) (10)

where i

t(x, y) = |It(x, y) − bi

t(x, y)|.

The confidence adaptive background model Bt(x, y) can be

calculated after bi

t(x, y) and vi

t(x, y) are determined, yielding

the formula as follows:

Bt(x, y) =

i∈[1,R]

αi(bi

t (x,y))

vit

(x,y)

i∈[1,R]

αi

vit

(x,y)

(11)

where each αi is the predefined confidence value, i is the

reference number, R is the total number of i, and Bt(x, y)

is the confidence adaptive background model. According to

[19], R is experimentally set to 3 and confidence values α1,

α2, and α3 are set to 1, 8, and 16, respectively. Notice that the

binary moving objects mask D(x, y) is generated by the same

approach SDE based on the confidence adaptive background

model Bt(x, y).

For the certain complex scenes, the MSDE method can

detect multiple moving objects with higher degrees of accuracy

than the SDE method. This is because the MSDE method

generates the binary moving objects mask D(x, y) based on

the multimodal background model Bt(x, y), a procedure which

requires greater computational complexity.