`
guoyiqi
  • 浏览: 964317 次
社区版块
存档分类
最新评论

数据图像处理程序代写

 
阅读更多

Background Subtraction Using Low Rank
and Group Sparsity Constraints
Xinyi Cui1, Junzhou Huang2, Shaoting Zhang1, and Dimitris N. Metaxas1
1 CS Dept., Rutgers University
Piscataway, NJ 08854, USA
2 CSE Dept., Univ. of Texas at Arlington
Arlington, TX, 76019, USA
{xycui,shaoting,dnm}@cs.rutgers.edu,
jzhuang@uta.edu
Abstract. Background subtraction has been widely investigated in recent
years. Most previous work has focused on stationary cameras. Recently,
moving cameras have also been studied since videos from mobile
devices have increased significantly. In this paper, we propose a unified
and robust framework to effectively handle diverse types of videos,
e.g., videos from stationary or moving cameras. Our model is inspired
by two observations: 1) background motion caused by orthographic cameras
lies in a low rank subspace, and 2) pixels belonging to one trajectory
tend to group together. Based on these two observations, we introduce
a new model using both low rank and group sparsity constraints. It is
able to robustly decompose a motion trajectory matrix into foreground
and background ones. After obtaining foreground and background trajectories,
the information gathered on them is used to build a statistical
model to further label frames at the pixel level. Extensive experiments
demonstrate very competitive performance on both synthetic data and
real videos.
1 Introduction
Background subtraction is an important preprocessing step in video surveillance
systems. It aims to find independent moving objects in a scene. Many algorithms
have been proposed for background subtraction under stationary. In recent years,
videos from moving cameras have also been studied [1] because the number of
moving cameras (such as smart phones and digital videos) has increased significantly.
However, handling diverse types of videos robustly is still a challenging
problem (see the related work in Sec. 2). In this paper, we propose a unified
framework for background subtraction, which can robustly deal with videos from
stationary or moving cameras with various number of righd/non-rigid objects.
The proposed method for background subtraction is based on two sparsity
constraints applied on foreground and background levels, i.e., low rank [2] and
group sparsity constraints [3]. It is inspired by recently proposed sparsity theories
[4, 5]. There are two “sparsity” observations behind our method. First, when
A. Fitzgibbon et al. (Eds.): ECCV 2012, Part I, LNCS 7572, pp. 612–625, 2012.
c
Springer-Verlag Berlin Heidelberg 2012
Background Subtraction Using Low Rank and Group Sparsity Constraints 613
the scene in a video does not have any foreground moving objects, video motion
has a low rank constraint for orthographic cameras [6]. Thus the motion of
background points forms a low rank matrix. Second, foreground moving objects
usually occupy a small portion of the scene. In addition, when a foreground
object is projected to pixels on multiple frames, these pixels are not randomly
distributed. They tend to group together as a continuous trajectory. Thus these
foreground trajectories usually satisfy the group sparsity constraint.
These two observations provide important information to differentiate independent
objects from the scene. Based on them, the video background subtraction
problem is formulated as a matrix decomposition problem. First, the video
motion is represented as a matrix on trajectory level (i.e. each row in the motion
matrix is a trajectory of a point). Then it is decomposed into a background
matrix and a foreground matrix, where the background matrix is low rank,
and the foreground matrix is group sparse. This low rank constraint is able to
automatically model background from both stationary and moving cameras, and
the group sparsity constraint improves the robustness to noise.
The trajectories recognized by the above model can be further used to label
a frame into foreground and background at the pixel level. Motion segments on
a video sequence are generated using fairly standard techniques. Then the color
and motion information gathered from the trajectories is employed to classify
the motion segments as foreground or background. Our approach is validated on
various types of data, i.e., synthetic data, real-world video sequences recorded
by stationary cameras or moving cameras and/or nonrigid foreground objects.
Extensive experiments also show that our method compares favorably to the
recent state-of-the-art methods.
The main contribution of the proposed approach is a new model using low
rank and group sparsity constraints to differentiate foreground and background
motions. This approach has three merits: 1) The low rank constraint is able
to handle both static and moving cameras; 2)The group sparsity constraint
leverages the information of neighboring pixels, which makes the algorithm
robust to random noise; 3) It is relatively insensitive to parameter settings.
2 Related Work
Background Subtraction. A considerable amount of work has studied the
problem of background subtraction in videos. Here we review a few related
work, and please refer to [7] for comprehensive surveys. The mainstream in
the research area of background subtraction focuses on stationary cameras.
The earliest background subtraction methods use frame difference to detect
foreground [8]. Many subsequent approaches have been proposed to model the
uncertainty in background appearance, such as, but not limited to,W4 [9], single
gaussian model [10], mixture of gaussian [11], non-parametric kernel density [12]
and joint spatial-color model [13]. One important variation in stationary camera
based research is the background dynamism. When the camera is stationary, the
background scene may change over time due to many factors (e.g. illumination
614 X. Cui et al.
a) Input video
b) Tracked
trajectories
c) Trajectory
matrix
d) Decomposed
trajectories
e) Decomposed trajectories
on original video
f) Optical flow g) Motion segments
h) Output
Trajectory level separation
Pixel level labeling
Fig. 1. The framework. Our method takes a raw video sequence as input, and produces
a binary labeling as the output. Two major steps are trajectory level separation and
pixel level labeling.
changes, waves in water bodies, shadows, etc). Several algorithms have been
proposed to handle dynamic background [14–16].
The research for moving cameras has recently attracted people’s attention.
Motion segmentation approaches [17, 18] segment point trajectories based on
subspace analysis. These algorithms provide interesting analysis on sparse trajectories,
though do not output a binary mask as many background subtraction
methods do. Another popular way to handle camera motion is to have strong
priors of the scene, e.g., approximating background by a 2D plane or assuming
that camera center does not translate [19, 20], assuming a dominant plane [21],
etc. [22] propose a method to use belief propagation and Bayesian filtering
to handle moving cameras. Different from their work, long-term trajectories
we use encode more information for background subtraction. Recently, [1] has
been proposed to build a background model using RANSAC to estimate the
background trajectory basis. This approach assumes that the background motion
spans a three dimensional subspace. Then sets of three trajectories are randomly
selected to construct the background motion space until a consensus set is
discovered, by measuring the projection error on the subspace spanned by the
trajectory set. However, RANSAC based methods are generally sensitive to
parameter selection, which makes it less robust when handling different videos.
Group sparsity [23, 24] and low rank constraint [2] have also been applied to
background subtraction problem. However, these methods only focus on stationary
camera and their constraints are at spatial pixel level. Different from their
work, our method is based on constraints in temporal domain and analyzing the
trajectory properties.
3 Methodology
Our background subtraction algorithm takes a raw video sequence as input,
and generates a binary labeling at the pixel level. Fig. 1 shows our framework.
It has two major steps: trajectory level separation and pixel level labeling. In
Background Subtraction Using Low Rank and Group Sparsity Constraints 615
the first step, a dense set of points is tracked over all frames. We use an offthe-
shelf dense point tracker [25] to produce the trajectories. With the dense
point trajectories, a low rank and group sparsity based model is proposed to
decompose trajectories into foreground and background. In the second step,
motion segments are generated using optical flow [26] and graph cuts [27].
Then the color and motion information gathered from the recognized trajectories
builds statistics to label motion segments as foreground or background.
3.1 Low Rank and Group Sparsity Based Model
Notations: Given a video sequence, k points are tracked over l frames. Each
trajectory is represented as pi = [x1i, y1i, x2i, y2i, ...xli, yli] ∈ R1×2l, where x and
y denote the 2D coordinates in each frame. The collection of k trajectories is
represented as a k × 2l matrix, φ = [pT1
, pT2
, ..., pTl
]T, φ∈ Rk×2l.
In a video with moving foreground objects, a subset of k trajectories comes
from the foreground, and the rest belongs to the background. Our goal is to
decompose tracked k trajectories into two parts: m background trajectories and
n foreground trajectories. If we already know exactly which trajectories belong
to the background, then foreground objects can be easily obtained by subtracting
them from k trajectories, and vice versa. In other words, φ can be decomposed
as:
φ = B + F, (1)
where B ∈ Rk×2l and F ∈ Rk×2l denote matrices of background and foreground
trajectories, respectively. In the ideal case, the decomposed foreground matrix
F consists of n rows of foreground trajectories and m rows of flat zeros, while
B has m rows of background trajectories and n rows of zeros.
Eq. 1 is a severely under-constrained problem. It is difficult to find B and F
without any prior information. In our method, we incorporate two effective priors
to robustly solve this problem, i.e., the low rank constraint for the background
trajectories and the group sparsity constraint for the foreground trajectories.
Low Rank Constraint for the Background. In a 3D structured scene
without any moving foreground object, video motion solely depends on the scene
and the motion of the camera. Our background modeling is inspired from the
fact that B can be factored as a k×3 structure matrix of 3D points and a 3×2l
orthogonal matrix [6]. Thus the background matrix is a low rank matrix with
rank value at most 3. This leads us to build a low rank constraint model for the
background matrix B:
rank(B) ≤ 3, (2)
Another constraint has been used in the previous research work using RANSAC
based method [1]. This work assumes that the background matrix is of rank
three: rank(B) = 3. This is a very strict constraint for the problem. We refer
the above two types of constraints as the General Rank model (GR) and the
Fixed Rank model (FR). Our GR model is more general and handles more
situations. A rank-3 matrix models 3D scenes under moving cameras; a rank-2
616 X. Cui et al.
matrix models a 2D scene or 3D scene under stationary cameras; a rank-1 matrix
is a degenerated case when scene only has one point. The usage of GR model
allows us to develop a unified framework to handle both stationary cameras and
moving cameras. The experiment section (Sec. 4.2) provides more analysis on
the effectiveness of the GR model when handling diverse types videos.
Group Sparsity Constraint for the Foreground. Foreground moving objects,
in general, occupy a small portion of the scene. This observation motivates
us to use another important prior, i.e., the number of foreground trajectories
should be smaller than a certain ratio of all trajectories: m ≤ αk, where α
controls the sparsity of foreground trajectories.
Another important observation is that each row in φ represents one trajectory.
Thus the entries in φ are not randomly distributed. They are spatially clustered
within each row. If one entry of the ith row φi belongs to the foreground,
the whole φi is also in the foreground. This observation makes the foreground
trajectory matrix F satisfy the group sparsity constraint:
F2,0 ≤ αk, (3)
where ·2,0 is the mixture of both L2 and L0 norm. The L2 norm constraint is
applied to each group separately (i.e., each row of F). It ensures that all elements
in the same row are either zero or nonzero at the same time. The L0 norm
constraint is applied to count the nonzero groups/rows of F. It guarantees that
only a sparse number of rows are nonzero. Thus this group sparsity constraint not
only ensures that the foreground objects are spatially sparse, but also guarantees
that each trajectory is treated as one unit. Group sparsity is a powerful tool in
computer vision problems [28, 23]. The standard sparsity constraint, L0 norm,
(we refer this as Std. sparse method) has been intensively studied in recent
years. However, it does not work well for this problem compared to the group
sparsity one. Std. sparse method treats each element of F independently. It does
not consider any neighborhood information. Thus it is possible that points from
the same trajectory are classified into two classes. In the experiment section
(Sec. 4.1), we discuss the advantage of group sparsity constraint over sparsity
constraint through synthetic data analysis, and also show that this constraint
improves the robustness of our model.
Based on the low rank and group sparsity constraints, we formulate our
objective function as:

ˆ B, ˆ F

= argmin
B,F

φ − B − F 2
F

,
s.t. rank(B) ≤ 3, F
2,0 < αk, (4)
where · F is the Frobenius norm. This model leads to a good separation of
foreground and background trajectories. Fig. 2 illustrates our model.
Eq. 4 only has one parameter α, which controls the sparsity of the foreground
trajectories. In general, user-tuning parameter is a key issue for a good model. It
is preferable that the parameters are easy to tune and not sensitive to different
Background Subtraction Using Low Rank and Group Sparsity Constraints 617
= +
Low rank matrix
B=UΣVT
Group sparse
matrix F


Trajectory
matrix

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics