Abstract
Crowd behaviour analysis and management have become a significant research problem for the last few years because of the substantial growth in the world population and their security requirements. There are numerous unsolved problems like crowd flow modelling and crowd behaviour detection, which are still open in this area, seeking great attention from the research community. Crowd flow modelling is one of such problems, and it is also an integral part of an intelligent surveillance system. Modelling of crowd flow has now become a vital concern in the development of intelligent surveillance systems. Real-time analysis of crowd behavior needs accurate models that represent crowded scenarios. An intelligent surveillance system supporting a good crowd flow model will help identify the risks in a wide range of emergencies and facilitate human safety. Mathematical models of crowd flow developed from real-time video sequences enable further analysis and decision making. A novel method identifying eight possible crowd flow behaviours commonly seen in the crowd video sequences is explained in this paper. The proposed method uses crowd flow localisation using the Gunnar-Farneback optical flow method. The Jacobian and Hessian matrix analysis along with corresponding eigenvalues helps to find stability points identifying the flow patterns. This work is carried out on 80 videos taken from UCF crowd and CUHK video datasets. Comparison with existing works from the literature proves our method yields better results.
Introduction
The increased population density in cities and the socio-economic structure make security an essential concern of any government. Large gatherings occur in many human-related activities, and effective management of its mobility leads to the quality of human experience. Security systems implemented in the public spaces facilitate monitoring crowds and further analysis of security concerns. However, most video surveillance systems use human-in-loop in decision-making and hence have many limitations leading to disasters. A system capable of automatically estimating crowd size, behaviour, and deviation from expected behaviour would assist better crowd management. The area of intelligent surveillance systems is now attracting more research attention, and it will become capable of overriding traditional surveillance systems soon. An intelligent surveillance system uses strategies and algorithms from many potential research areas like computer vision, machine learning, pattern recognition, and artificial intelligence [35]. There has been significant interest in employing computer vision methods to monitor crowds. It is possible to model the crowd from video and apply efficient algorithms for crowd flow detection, localization, and movement.
An intelligent surveillance system contains several steps, including crowd flow modelling, motion pattern identification, action recognition, object identification and object tracking. Each of these steps is an individual research problem, and crowd flow modelling is vital in creating an intelligent surveillance system. The crowd flow modelling is a mathematical representation of the crowd flow patterns observed in the crowd videos. Crowd flow patterns precisely give us a detailed description of the entire crowd’s flow in the video scene, which accurately models the crowd’s possible usual behaviours and helps the system to identify abnormal flows quickly. Group dynamics is also an essential factor that effects crowd flow motion. The study of group dynamics includes different fields varies from psychology to computer science. The effective representation of these group dynamics is a challenging issue in crowd flow modelling. People show some interesting properties in their movement while moving as part of the crowd. Those movements are not a random movement, and they can be characterized. The characterization of these dynamic properties can be identified using stability analysis, and the proposed work is based on this concept.
In this work, a mathematical model to represent the crowd flow behaviour is proposed. We were able to classify the given input crowd video into one of the eight categories called bottleneck, two variations of lane motion, dispersion, diversion, circular and two types of spiral movements. All these movements are calculated based on the Hessian matrix’s determinant and the Jacobian matrix’s determinant extracted from the Region of Interest (ROI). We have used both the Hessian matrix and the Jacobian matrix to cross-validate the obtained results. A new method to identify the ROI using the optical flow method is also proposed in this work.
The rest of the paper is organised as follows. The literature review is included in Section 2, which deals with the related works and Section 3 describe the steps involved in the proposed method. Section 4 discusses experiments and results and, conclusion and future works are presented in Section 5.
Literature survey
Many works are reported in the literature in this area for the last few years. Crowd flow modelling can be classified into different categories based on various parameters [59] that affects the crowd flow motion. Most of them deal with low and medium-density crowds, claiming that they will also work for the high-density crowd. The parameters are different according to the scenario such as view level, the structure of the flow, nature of the background and many more. Crowd flow modelling can be categorised into two classes based on how the modelling is performed. They are micro-level flow modelling and macro-level flow modelling [25]. The micro-level analysis is also known as object-based method and in which each pedestrian or object can be considered as a separate entity. The macro-level analysis is also known as the holistic method. The entire flow can be viewed as a single global entity, and we can analyse and track the features of that global motion present in the crowd. Object-based methods consider local features and holistic approaches to recognise global behaviours to identify the crowd flow present in the video. Microscopic flows are further categorised into four based on the strategy that they are following. They are Force based Models, Cellular Automata Models, Agent-Based Models and Hybrid Models.
Force-based methods calculate individual motion using the force coefficients that can be calculated using different force equations. Force-based models use the continues position representation of individuals to represent the crowd flow. However, it is failed to incorporate the continuum flow laws and theorems available in physics and mathematics to the model. The Boids Model proposed by Reynolds [38] in 1987, which uses the bird flocks representation to model the crowd flow behaviour. This method was able to shape three types of actions like separation, alignment and cohesion. This model paves the path for later research in force-based models. Social Force model [16] proposed by Dirk Helbing and Peter Molnar in 1995, which uses the equation of motion to model the pedestrian behaviours. Social force model considered different features which influence pedestrian behaviour. Some of the crucial factors are the desired path of a pedestrian to the destination, the real way to the target, desired velocity, actual velocity, obstacles, the attractive force between pedestrians etc. This model is considered as the milestone invention in the research of crowd modelling. A modified version of the social force model which can handle panic situations was proposed by Helbing in 2000 [17]. This method considered nine socio-psychological features in addition to all social force aspects. This model was useful up to some extent in low-density crowds. Nevertheless, it is failed to address panic situations in high-density scenarios. Another modification of the social force model was proposed by Zheng [55] in 2002, which considered the individual personalities like patience and impatience to get a more accurate result. Heigeas et al. proposed a physically-based particle model of the crowd behaviour [14], which considers the interaction between two pedestrians defined by linear functions. Modified particle swarm optimisation was proposed by Cheng [5], in which crowd is viewed as a particle system constitutes from individual particles. These models may be failed to calculate the homogeneous behaviour of the flow system [34]. Many variations of Social force models are still being used to model and simulate the crowd behaviour. Qu et al. [37] proposed a modified version of the social force model to analyse the pedestrian dynamics. The proposed model is applied to analyse pedestrian dynamics in a corridor scenario with either unidirectional or bidirectional flow. In the model, the video is divided into a series of Voronoi cells, and the behaviours are calculated using the social force model in each cell.
Cellular automata model is a grid-based approach consisting of some cells. Usually, each compartment accommodates a single pedestrian, and at each step, the pedestrian moves to the one of the neighbouring grid. Grids are typically equal in size, and there is one assumption that all the pedestrian have the same speed. However, in the real-world scenario, the walker’s size and speed may change from person to person. Cellular automata models primarily lie on some predefined rules that guide the entire system. We can predict the individual behaviours using these rules. The cell states may be either empty or filled. If a person travels from one cell to another cell, the first cell’s state becomes vacant, and the second cell’s state becomes filled. Many methods using this concept have been proposed so far. The main difference between those methods is the difference in defining the rules for updating the states. Kirchner and Schadschneider introduced a bionic inspired cellular automata model [26] based on transition probabilities to determine the state changes. This method considers only four adjacent cells instead of eight neighbours. Cellular automata model for the bidirectional flow was proposed by Yeu [54] in 2010, which considers eight neighbours to models the pedestrian flow in two directions. Fang et al. [28] proposed an extended version of the cellular automata model, which uses a potential cost field to represent the pedestrian flow patterns. They also explored the influence of different factors which effects crowd flow and evacuation. Ji et al. [23] proposed a cellular automata model for the high-density crowd’s evacuation in panic situations. They have used the concept of triangle grids, which help the system adapt to the spatial information better. The fuzzy version of cellular automata model was proposed by Gerakakis et al. [10] in 2019.
In the agent-based approach, each pedestrian is considered as a single separate entity called agents. Each agent is not only autonomous but also interactive, intelligent and individual. The agent-based method considers each pedestrian as a single agent, and the system can calculate collective behaviour from the combination of these agent behaviours and each agent must possess the ability to learn and adapt the environment [2]. Most of the methods based on the agent-based system lie on a concept called Belief-Desire-Intention. In this concept, each agent must have a belief, a desire and an intention. Belief means the information processed by the agent, and desire means the aim of the agent and intention is the future task going to be performed by the agent [2, 41]. The main advantage of the agent-based model is the ability to consider individual behaviour heterogeneity. However, this system requires high computational power to maintain these heterogeneities. Because in the agent-based model, we need to calculate each agent or pedestrian’s behaviour in the crowd. The agent-based model can be applied where high-level accuracy is required. Zhou et al. [56] use the dynamic pedestrian agent to teach the pedestrians collective behaviour patterns from the crowded scene. Luo et al. [31] proposed a novel addition to the agent-based model to avoid the pedestrians’ collision. They have investigated two proactive steering behaviour of the crowd, namely gap seeking and following. Zia et al. [58] have incorporated the idea of a grid system with the original agent-based model to improve the model’s efficiency. These models concentrate more on individual behaviours rather than group dynamics [27].
Hybrid models use the combinations of the models mentioned above, and they can perform better than individual models. Some of them are agent-based cellular automata model, force-based agent model and force-based cellular automata model. Agent-based cellular automata model combines the benefits of the agent-based model as well as cellular automata model. Each pedestrian is considered as an agent moving over the grids called cells [18, 30]. Grids are used to locate the pedestrian’s position as agents. It will help the system to access the area of the crowd more precisely. Force-based agent models combine the advantages of both force-based models and agent-based models. These methods make the decision based on agents’ intelligence produced by agent-based models, and corresponding actions are calculated using the force-based models. This method can address both the homogeneous and heterogeneous behaviour of the crowd [19, 36]. Very few studies are conducted in the area of force-based cellular automata model. It is challenging to combine the force-based and cellular automata models because force-based methods and cellular automata methods follow different crowd representations. One pioneer work based on this concept was proposed by Song et al. [51], which can handle emergencies.
The Macroscopic models consider the crowd as a single global entity instead of considering the crowd as pedestrians with individual behaviour. Macroscopic flows can be classified into four categories. They are Fluid dynamics Models, Regression-based Models, Route Choice Models and Queuing Models. The crowd flow is almost similar to the fluid flow if we watch the flow in a macroscopic way, and this concept is used to model fluid dynamics crowd flow models. The idea of thinking fluid was proposed by Hughes et al. [22], which is considered as the first crowd behaviour analysis based on fluid dynamics. Helbing et al. presented a fluid dynamics based crowd evacuation model, which considers the crowd motion as a fluid motion [15]. Habib et al. [47] used the crowd dynamics model for the coherency detection of the crowd under surveillance. Farooq et al. [9] estimated the global motions present in high-density crowd scenarios using fluid dynamics. All these models are using the same approach, similar to the gas kinetic models.
The regression-based model calculates the mathematical relation between the flow variables using regression methods. Milazzo et al. [33] propose the first crowd flow method based on regression analysis. One of the latest work using this approach was presented by Duieves et al. [7] in 2017. This method calculates the regression between two parameters, like the pedestrians’ interaction and their walking behaviour. In the route choice model, the crowd can select a path, which can maximise their needs. The crowd sets their path based on different influential parameters like comfort, travel time, travel distance etc. The crowd always chooses the way which can maximise the utility [21].
Any system with limited space is usually known as a queueing system. Each environment is defined as the nodes, and the path between the environments are defined using edges. Markov chain model can be used in queuing models to describe the pedestrian’s movement from one node to another. The queuing theory can be used primarily in crowd scenarios to solve the congestion scenes like evacuation scenarios. Lovas et al. [29] propose a pedestrian model based on queuing theory, which models the crowded environment as a network of walkway sections. Razieh et al. [11] use the queuing theorem for the modelling of crowd flow during the pilgrimage at Masjiudul Haram.
Crowd flow is also divided into the structured flow, and unstructured flow based on the crowd movements [12, 40]. If all the pedestrian in the crowd uniquely flows in the same direction, we can call it a structured motion, and if it is not in the same direction, we can classify it as unstructured motion. It is effortless to identify the crowd behaviours from the structured action because it may contain only one movement type. At the same time, unstructured nature may include different types of behaviours in the same crowd.
The crowd flow is again categorised into two categories based on the background present in the video. They are the flow of the static environment and the stream with a dynamic context. Usually, the surveillance videos may contain static backgrounds in both indoor videos and outdoor videos. However, in some outdoor videos like the videos taken from the seashore, both the foreground and background may be dynamic [46]. An excellent crowd flow method must have the capacity to handle both the static and dynamic scenarios.
Some models in the literature cannot be restricted into any of the categories as mentioned above. Solmaz et al. [44] proposed the crowd behaviour analysis behaviour using Jacobian matrix analysis.They have contributed five different flow patterns namely Bottleneck, Fountainhead, Lanes, Arches and blocking. Chen et al. [4] proposed a method to detect unusual event detection with the help of optical flow and clustering algorithms. In this paper, The optical flow from the crowd video is estimated using the Lukas Kande Optical flow algorithm, and the crowd clusters are detected using adjacency matrix-based clustering. Then these crowd clusters are modelled using the force field method. Hang su et al. [45] uses the concept of Spatio-temporal viscous fluid fields to analyse human behaviour. hey have proposed a Spatio-temporal matrix to analyse the video using the local difference between relative pixels. They have used eigenvalue analysis to classify the crowd behaviour patterns. Ali et al. [1] proposed a system that analyses the flow instability of the system using Lagrangian Particle. A flow map is constructed to set up a deformation tensor. The maximum eigenvalue of the tensor is used to construct a finite-Time Lyapunov Exponent (FTLE) field, which reveals the Lagrangian Coherent Structures (LCS) present in the underlying flow. Cong et al. [6] proposed a crowd flow model using sparse reconstruction cost. Features are extracted using a Multi-scale Histogram of Optical Flow, and a dictionary of crowd behaviours are created using sparse reconstruction cost. Jing Shao [42] considers different group descriptor parameters such as collectiveness, stability, uniformity and conflict to model the crowd flow. Wu et al. [53] proposed a collective density clustering approach, which can detect both the macroscopic and microscopic motions. Shang Wu et al. [52] first uses curl and divergence for crowd behaviour analysis. Wang et al. [48] propose a texture-based approach for real-time crowd abnormal behaviour detection.
The concept of deep neural networks drives recent research in this area [49]. Deep learning frameworks are used to detect abnormalities rather than crowd flow modelling [13, 24]. Hasan et al. [13] proposed a fully-connected encoder with hand-crafted features as input to the system. Zhou et al. [57] used spatio-temporal information for classifying normal and abnormal events. Wang et al. [50] proposed modified generative neural networks for crowd anomaly detection.
The proposed system follows a macroscopic approach, in which the system considers the entire crowd as a single entity.
Proposed system
The architecture of the proposed crowd flow modelling is given in Fig. 1. The different steps involved in this proposed crowd modelling system are crowd flow detection, Crowd flow localization, Crowd movement analysis using Hessian matrix and Eigen matrix, and crow flow pattern identification. The proposed intelligent surveillance system takes surveillance videos as the input and identifies the flow region from the footage using The Gunnar-Farneback optical flow Method [8]. The optical flow method gives us the video’s flow fields, and these flow fields can represent the crowd motions present in the surveillance video. The optical flow also results in the extraction of motion coordinates. Some filters, like median filters and morphological filters, are applied to remove the video’s noise. SURF [3] feature extraction method is used to identify the interest points in the video, and these interest points are used to localize the crowd flow. Localization gives us the region where the motion presents and Hessian matrix and Jacobian Matrix are calculated from the extracted area. Corresponding eigenvectors and eigenvalues are also calculated. Based on these calculated eigenvalues and eigenvectors, stability analysis of the video is performed, and the given video is classified into one of the predefined behaviours. Each step involved in the proposed system is explained in detail in the below sections.

Architecture - intelligence surveillance system.
Let (x, y, t) be the input video, where x and y are the video’s width and height and t be the length of the video.
Crowd flow detection is the first step in in the proposed crowd behaviour pattern analysis. First, we need to detect the crowd flow present in the video to analyse the crowd flow. The well known optical flow method is used here to catch the crowd flow. The dense optical flow was calculated here instead of usual sparse optical flow to get more accurate patterns. Dense optical flow calculates the optical flow vectors for all the points in the frame. The dense optical flow was calculated using the Gunners - Farneback optical flow algorithm proposed by Gunner Farneback in 2003 [8].
The input video (x, y, t) is given to the optical flow module. The difference between two consecutive frames is Δx, Δy andΔt. The first step in Farneback optical flow algorithm is the approximation of the neighbourhood using quadratic polynomials. The pixel values in a region of the image are represented by
Where A is the symmetric matrix, b is a vector and c is a scalar value. So polynomial expansion of at current frame (x, y) can be expressed as
Suppose d is the displacement for each frame. Then new value f2 is the polynomial expansion at time t2, and it is calculated using the following equation.
Equating coefficients in the two polynomials of consecutive frames.
Assuming A is non-singular.
The global displacement vectors are calculated by equating the coefficients from f1 (x) and f2 (x) at each point in the frame. These differences are marked as the flow vectors in a coloured manner, and the input frame and corresponding motion detected frames are shown in the Fig. 2 Optical flow calculation step produces the motion vectors and flows fields, and this is used for the remaining process. This step will have coloured pixels locations where some flows are present.

Crowd flow detection.
The second step in the proposed system is crowd flow localization. When the crowd flow is detected in the previous step, our interest moves from the entire frame to the area where the motion is present. We have used SURF feature extraction mechanism for Crowd flow Localization. This step’s input is the flow detected frames and frames are represented as (x, y, t), and these key points are obtained only where the flow is present. These key points help us to calculate the ROI.
Speeded up robust features (SURF) [3] is used in this pattern identification system to extract key points from the flow image. SURF is more efficient than other feature detectors like SIFT, blob detector and MSER in our study. SURF algorithm has two parts, The first part of the algorithm is interest point detection, and the second part is the interest point description generation. SURF detects interest points using the Hessian matrix approximation of Integral images. Integral image at a particular location X = (x, y) T can be calculated using
SURF detects interest points in all scales using scale-space representations. Scale-space representation is an essential step in SURF feature detection which can extract interest points in all scales. Scale-space representation method uses octaves and box filters to identify these key points. Then the SURF algorithm needs to localize all these identified key points. SURF localize all these key points using a unique mechanism called non-maximum suppression in a 3 × 3 neighbourhood manner. SURF uses a 64-bit descriptor to describe and represent the SURF key points. The descriptor contains the orientation assignment details and the responses based on the Haar wavelet transform. This information in the SURF descriptors helps the SURF algorithm to match the key points in two different images. This step produces the interest points or key points only at the flow coordinates. These key points will help us to localize the crowd flow efficiently.
Crowd localisation is performed here using the ROI concept. It is defined as the area of an image or sub-image extracted from the original image for a particular purpose. In this problem, the ROI is the sub-image where crowd flow occurs, because our attention is now restricted to analyse crowd flow patterns.
The ROI can be calculated using the SURF keypoints extracted in the previous step. SURF extracts more key points in active areas. So ROI is always nearer to this dynamic area in the video. The centre of ROI is calculated using the global mean of SURF key points. All the SURF key points have x and y values to represent its centre values. The average of all the x and y values gives us the global average of all key points and these global value can be considered as the centre of ROI. A rectangle enclosing most of the key points with the centre as the global centre of the key points will give us the ROI. This ROI help us to localize the crowd flow, and it is shown in Fig. 3.

Crowd flow localization.
After localising the crowd behaviour, we need to analyse the crowd videos to get the crowd behaviour patterns. Hessian matrix and Jacobian matrix are used here for crowd behaviour analysis. Hessian Matrix and Jacobian Matrix are calculated only for the ROI, not for the entire frame.
Hessian matrix is a square matrix of second-order partial derivatives of an image. Let I be the ROI extracted from the video, and then the Hessian matrix is calculated using
Where
Once the Hessian is calculated, Our system calculates Jacobian of ROI [43]. Jacobian is a square matrix of all first-order derivative of the image. Jacobian Matrix of ROI is calculated using
Where u and v are the optical vectors in horizontal and vertical directions. The crowd flow patterns will be identified based on the Hessian matrix and Jacobian matrix eigenvalues.
Eigenvalues [20] are the set of values derived from linear systems of equations. Eigenvalues are also known as characteristic roots because these values can represent the system’s actual nature or character. The eigenvalues can be mainly used in image processing and computer vision problems to identify the image’s behaviour. The eigenvalues of the Hessian matrix are usually used to describe the behaviour of the matrix. Here the Hessian matrix represents crowd flow, and its eigenvalues can define the structure of the crowd flow.
Let H roi be the Hessian matrix calculated from the ROI matrix in the previous step. then the eigenvalues are calculated using
Where I is the identity matrix, and λ is the eigenvalue matrix. Two eigenvalues are obtained for a 2x2 matrix, and we can denote those values as λ1 and λ2. The eigenvalues of the hessian matrix are used to describe the structure of the matrix. Here the structure of the matrix means the structure of the crowd flow. If both the eigenvalues are positive and it represents the concave up behaviour of the matrix. If both the eigenvalues are negative, then it means the concave-down nature of the matrix. If the values are 0, we cannot predict anything from the matrix. If the values are mixed, that means one value is positive, and another value is negative, then it is a saddle point.
Eigenvalues of Jacobian matrix need to be calculated for identifying the motion patterns from the surveillance video. J roi is the Jacobian matrix calculated in the previous step, and then its eigenvalues are calculated using the equation
Where I is the identity matrix, and λ is the eigenvalue matrix. Similar to the Hessian matrix, two eigenvalues are calculated from the Jacobian matrix. The Jacobian matrix’s eigenvalues are also used to describe the structure of the crowd flow obtained from the surveillance video. Based on the values of eigenvalues, we can classify the input video into different behavioural patterns.
We can apply linear stability analysis to continuous-time non-linear dynamical systems. Consider the dynamics of a non-linear differential equation.
around its equilibrium point x eq . By definition, x eq satisfies
To analyse the system’s stability around this equilibrium point, we do the same coordinate switch for discrete-time models. Specifically, we apply the following replacement.
then we will get
Now that we know the non-linear function F on the right-hand side is approximated using the Jacobian matrix, the equation above is approximated as
where J is the Jacobian matrix of F at x = x eq . By combining the results, We obtain
Suppose the function z = f (x, y) has continuous second partial derivative. Let (x0, y0) be a critical point of f, and let λ1 and λ2 be the eigenvalues of H
f
(x0, y0). If λ1 < 0 and λ2 < 0, then f has a local maximum value at (x0, y0). If λ1 < 0 and λ2 > 0 or λ1 > 0 and λ2 < 0, then f has a saddle point at (x0, y0) If λ1 > 0 and λ2 > 0, then f has a local minimum value at (x0, y0). If λ1 = 0 and λ2 = 0, then no conclusions can be drawn.
Crowd behaviour pattern identification
Crowd behaviour pattern Identification is the final step in the proposed system. Based on the Hessian Matrix and Jacobian matrix’s eigenvalues, we can classify the crowd behaviour into one of the behaviour patterns shown in the Fig. 4. The eight different categories we have used in our problem are eight different categories named Bottleneck, Dispersion, Lane to a common point, Lane to the opposite directions, Diversion, Circular, Converging spiral and Diverging Spiral. Five of them are proposed by [44], and we newly identify the remaining new three.

Different types of motion patterns.
We need to interpret the eigenvalues of both the Hessian and the Jacobian to predict the motion patterns. λH1 and λH2 are the eigenvalues calculated from the Hessian matrix in the previous step. Hessian matrix makes an initial assumption about the flow model, and the Jacobian matrix makes the final classification.
Hessian matrix is the square matrix representation of second-order derivation of the system. We can consider the entire crowd flow as a system, and the Hessian matrix represents the flow system. Eigenvalues are calculated from the flow system to make the first level classification of the given input video. We have eight predefined class in our pattern identification system as given in the 4, and we will make a decision based on the nature of Hessian eigenvalues. The eigenvalues may be either positive, negative or zero. We will get such two eigenvalues for a 2 × 2 Hessian matrix. The real part of the eigenvalues may be negative, positive or mixed. If both of the eigenvalues are positive, then the system is elliptic paraboloid with a local minimum, and the system is stable. So we have two firm motion patterns in our system, and we can say that the given video may follow either a bottleneck, Lane to the same side or converging spiral pattern. Suppose both the eigenvalues of the Hessian matrix are negative. In that case, the system is unstable, and corresponding flow may be the dispersion, lane to the opposite side or diverging sphere. If the eigenvalues are mixed, that means one is positive and other is negative, then the system is in a saddle condition, and the input video pattern may be in any of the two patterns.
The flow patterns are also calculated from the eigenvalues of the Jacobian matrix. λJ1 and λJ2 are the eigenvalues calculated from the Jacobian matrix in the previous step. Based on the value and sign of the eigenvalues, we can classify the input surveillance video into eight different crowd motion patterns as given in the Fig. 4.
When both the eigenvalues λJ1 and λJ2 are less than zero, that means both of them are a real negative number; then the Jacobian matrix is in a stable state. In a stable system, the flow happens to a centre point from different locations in the scene. It is similar to the crow flow from different directions to a common centre of attraction or a junction. If the flow is too high to a common point without any control, it is complicated to handle that scenario without congestion. That situation can be called as the bottleneck. So when the values λJ1 < 0 and λJ2 < 0, we can classify the given video in to bottleneck class.
When both the eigenvalues of the calculated Jacobian matrix are positive and real numbers, the system is unstable. In an unstable system, the force moves from one central point to all outer directions. For example, when a public event or meeting is being happened, the crowd usually gathers near the venue, and the crowd begins to disperse to all directions once the meeting is over. We can call this type of motion as dispersion. So when the values λJ1 > 0 and λJ2 > 0, we can classify the given video in to dispersion class.
If one of the eigenvalues is zero, then the direction of force occurs in a line, because only one value is contributing to deciding the flow direction. The direction force is almost like a line motion, but not exactly a straight line. We can call it as a Lane motion. There are two types of lane motions based on the second eigenvalue. If the second value is negative, then the force is to the same direction from the two sides. Otherwise, the force is to the opposite direction in a lane manner.
If the two eigenvalues are there with the different sign, that means one is with the positive sign and another with negative signs; then the system is unstable. However, this unstable system’s structure is entirely different from the structure of unstable system we have discussed in the bottleneck scenario. In this method, the flow comes from different sides and reaches the centre point and divert to the centre’s desired direction. This type of behaviour is known as diversion. The diversion will not create a bottleneck in the junctions. Instead, there is continues flow in the junctions.
Eigenvalues usually in the form α ± iβ, where α is the real part, and β is the complex part of the eigenvalues. when α = 0, then the eigenvalue contains only complex part. If the eigenvalue contains the only real part, then the system follows the circular flow of forces. It is equivalent to the circular motion of crowd around an object an the motion may be in a clockwise or anti-clockwise direction depending on the sign of the complex part of the eigenvalue.
If the eigenvalues contain both the real and complex parts, then the flow is similar to a spiral. The spiral flow can be classified into two categories based on the real part of the eigenvalue value. If the real part of the eigenvalues is negative, then the spiral is converging, and all the flow reached into the centre position after entering into the scene. If the real part of the eigenvalue is positive, then the obtained shape is a diverging spiral and in which the flow starting from the centre and slowly diverged into the outer layers. All the conditions discussed above are summarized and given in the Table 1.
Classification rules
Based on the characteristics of eigenvalues of Hessian matrix and Jacobian matrix, we can classify the given input video into eight different categories named Bottleneck, Dispersion, Lane to a common point, Lane to the opposite directions, Diversion, Circular, Converging spiral and Diverging Spiral as we have discussed above.
Based on the characteristics of eigenvalues of Hessian matrix and Jacobian matrix, we can classify the given input video into eight different categories named Bottleneck, Dispersion, Lane to a common point, Lane to the opposite directions, Diversion, Circular, Converging spiral and Diverging Spiral as we have discussed above.
Experiments are conducted with the videos from UCF crowd video dataset [1], CUHK Crowd Dataset [42]. Ten videos of each class are identified and tested with the proposed algorithm to evaluate the system. The given input video is classified into one of the eight predefined categories based on the eigenvalue conditions shown in Table 1. Optical flow parameters and surf features are extracted from the input video and ROI are calculated from those extracted features. Then, the Hessian matrix and Jacobian matrix’s eigenvalues are calculated, and the decision is made based on those eigenvalues. The decision rules are given in Table 1, and the input video is classified into one of the predefined motion patterns. Some results are shown in Fig. 6.

Behaviour pattern.

Classification result.
To evaluate the performance of the proposed work, we compared our results against manually generated ground truth. A confusion matrix is created from the results, and it is given in Table 2. The true positive, true negative, false positive, and false negative values are calculated for each class to evaluate the system. Usually, these are the parameters to assess binary classifier, and we are extending those methods to calculate true positive, true negative, false positive, false negative values of the multi-classifier.
Confusion matrix
Then the Precision, recall, F Score and accuracy values are calculated for each class and given in Table 3. Average precision, Average recall, Average F Score and Average accuracy need to be calculated to evaluate the entire system’s performance. The average values are given in Table 4. We got the Average precision as 0.56, Average recall as 0.56, Average F Score as 0.56 and Average accuracy as 0.89. The performance of each class is diagrammatically represented in the graph given in Fig. 7. The entire system is implemented in Python using OpenCV 3.2. The system runs in Intel Core i5 processor with Quadro GPU.
Performance measures
Average performance

Performance analysis.
We have used a dense optical flow method in our problem to detect the crowd flow motion. If we use sparse optical flow method instead of dense optical flow, we could not detect the entire crowd flow present in the video, and we got only some points that cannot represent the entire crowd. We have also used the SURF feature detection mechanism to extract the crowd features from the video. We have tested our problem with other feature extraction mechanisms like SIFT and MSER. SIFT and MSER are computationally complex than SURF, and those methods take more time to execute. So SURF is preferred here to extract the features.
Our results are compared with existing crowd flow models in the literature. Berkan Solmaz’s [44] crowd flow model with the help of stability analysis is one of the pioneering work in this field. They were able to model only five behaviours. Our work, we have proposed and implemented eight different crowd behaviour patterns. We have compared the average accuracy of all those behaviours with our model’s average accuracy. The average accuracy of their model is 65%, and our method achieves 89% accuracy. Similarly, we have compared our approach with similar works in the literature, and the comparison is displayed in the Table 5. From the table, we can conclude that our method outperforms all the similar works in the literature. Our work is better than all the results we compared because our work uses both Hessian and Jacobian analysis strategies. This two-step analysis helps the system to get a more accurate result and our ROI method helps us to get the output faster than current research.
Comparison of accuracy
We have presented an architecture for detecting crowd behaviour patterns from videos. Based on the matrix analysis of Jacobian and Hessian, the input videos are assigned into one of the eight categories. Addressing the eight categories from the wide variety of videos is the main advantage of our work. The system could achieve considerably better performance than the works reported in the literature. We have also proposed a new method for ROI calculation. Instead of calculating the crowd behaviour for the entire frame, we have done the analysis only for the calculated ROI. This technique helps us to reduce the number of pixels to be processed. Our architecture can detect only eight behaviours far from the complexities present in the crowd dynamics. The inclusion of more real-world scenarios and crowd flow models will improve the system’s effectiveness. In the future, the proposed model can be combined with deep learning approaches to predict abnormal behaviours present in the surveillance videos.
