Abstract
The growth of the computerization of processes and services has changed human relations and, as a consequence, have created new forms of attacks and frauds for users of digital equipment. Because many people use computers, smartphones, and e-mail to perform day-to-day tasks, various data traffic is susceptible to attack. This can undermine the competitiveness of a company that may have breached strategic information. Therefore, security and information management are fundamental factors for companies to keep due control and management of their business knowledge. Cyber attacks are represented by a growing worldwide scale of secrecy breach of relevant information and are characterized as one of the significant challenges of the contemporary world. This article aims to propose a computational system based on intelligent hybrid models, which through fuzzy rules allows the construction of expert systems in attacks on cybernetic data of diverse natures. The tests were carried out with real bases of attacks on the database of governmental computerized devices.
The model proposed in this paper uses fuzzy evolving data grouping concepts. The extreme learning machine performs the training and the logical neurons of the unineuron type are responsible for creating fuzzy rules capable of transforming the knowledge acquired by the model into a database for employee training in companies, construction of other computer systems and awareness of elements which may harm the integrity of the data of individuals and companies. The novelty of the intelligent technique presented in the paper is that the nature of cyber attacks defines the structure of the model because the techniques of fuzzification and regularization are based entirely on the complexity of the cybernetic invasions. The binary pattern classification tests confronted with traditional models of the literature prove that the proposal of this paper can maintain the accuracy of detection of cyber attacks and still manages to construct a set of rules that serve as knowledge for the companies that wish to protect their information from attacks devices.
Introduction
The evolution of the times creates situations that must be controlled and monitored by employees and companies. As new computing resources become available to society, new forms of fraudulent data collection become recurrent in society. The spread of computing resources make threats more susceptible to attacks on personal computers, mobile devices [59], tablets, among others. This can undermine the competitiveness of a company that may have breached strategic information. Therefore, security and information management are fundamental factors for companies to keep due control and management of their business knowledge. It is a recent lawsuit, and many people do not know or care about protecting their information. Such oversights can create significant problems for the lives of individuals or corporations that may have personal or strategic data stolen and used for malicious purposes [106].
Knowledge management is an area of transversal activity between the different disciplines related in the life of companies and people, above all, strategic management, organization theory, information system, technology management, and more traditional areas such as economics, sociology, psychology, marketing, among others. Therefore it is found that it can interfere in diverse contexts in modern society. Knowledge, in addition to media, requires management, storage process, zeal in the custody of its information, management and channels for its dissemination, so knowledge management is necessary because of the existence of knowledge in the company, in the minds of people and processes performed. This type of management is the modeling of corporate processes through generated knowledge, a way to structure the organizational activities in the internal and external environment, it is a corporate management [1].
As cyber attacks are a new field of science and knowledge is not always disseminated for the prevention of this type of attack, many companies and individuals are susceptible to this kind of malicious approach, with good practices being disseminated so that everyone has knowledge, resources, and devices to prevent data from being shared inappropriately [107].
Resources widely used in artificial intelligence are expert systems, which emerge through intelligent techniques based on a set of data capable of explaining real characteristics of a problem and transforming them into relational knowledge, through IF / THEN rules [77] that allow interpretability of a complex problem through a more straightforward and more feasible language to be understood by people who are not experts in the field.
This type of approach requires that the data be verified and validated by people who have knowledge about the analyzed problem and the use of systems that add artificial intelligence techniques are used to extract from the representative dataset about the problem its similarities and patterns, allowing that the generated rules are the basis for the construction of systems in programming languages (such as java) or enable the construction of documents to help users to protect themselves from this type of problem [106] thus allowing the knowledge to be used to disseminate corporate strategies that assist in the management, control, and storage of information that is preponderant for the company’s business.
Cyber attacks grow as data becomes critical to business decision-making. However, many employees who are involved in the process can not keep pace with the evolution of these threats.Therefore a technique that can identify and extract knowledge about the nature of cyber attacks becomes essential to spreading technical knowledge in a way that is understandable to most people. The terms of computing are complex and a technique that is capable of transforming knowledge into an interpretable language can facilitate the training of people involved in the company’s processes.
Techniques that use the concepts of artificial neural networks and fuzzy systems can abstract information from the database and build fuzzy rules that represent the domain of the analyzed problem [77]. The fuzzy neural network models combine the interpretability capability of fuzzy systems with the efficiency of training that artificial neural networks can provide [79]. These models use fuzzy neurons instead of artificial neurons, and their connections between layers make possible the generalization of the results and the better understanding of a problem. Several expert systems have been produced in many areas of science through fuzzy neural networks, such as health [66, 92], engineering [70, 82], economics and financial problems [67, 84], time series forecasting [73, 28], nature elements [18, 71], general contexts [68, 26], regression problems [23, 65], software effort estimation [96] and even cyber attacks [54, 12].
This paper proposes the creation of an evolving model, capable of updating its parameters to each interaction with new information to build systems specialists in the prevention of cyber attacks. The proposed model has three layers like [100], where the first two form a fuzzy system of inference, capable of generating fuzzy rules based on the problem data, and the third layer is a neural network of aggregation, capable of transforming and grouping the fuzzy rules produced in responses about a malicious attack on the information set of people or companies. In the first layer, different from [100] and [94] will be used evolving data grouping concepts, where clusters can change with each new interaction of data coming from devices prepared to detect non-standard behavior. In the second layer, fuzzy logical neurons use the concepts of uninorm to perform the aggregation of the neurons produced in the first layer. These fuzzy neurons are called unineuron and can allow generated rules to travel more simply between AND and OR rules. Since the generation of rules can happen uncontrollably, generating redundant and unnecessary information, a regularization technique based on Bayesian theory capable of automatically determining the relevance of the element to the problem is applied to the hidden layer of the model to regularize it, thus avoiding overfitting. Finally, these rules are aggregated by an artificial neuron that has its synaptic weights calculated by the Extreme Learning Machine technique, generating the weights that unite the hidden layer to neural network analytically and in a single step. The novelty of the hybrid technique is to use the complexity of the cybernetic invasions to define the neurons through an evolving technique and the selection of more fuzzy rules according to the Bayesian techniques.
To perform the classification test of the model, databases of cybernetic invasions commonly used in problems of this nature were used. They have been provided in data mining and KDD (Knowledge Discovery in Databases) competitions and bring knowledge about simulations of various kinds of attacks to army computers. The proposed model will be compared with classical pattern recognition techniques, and it is intended to assess if it can maintain the ability to identify invading patterns and at the same time generate fuzzy rules to help people and companies disseminate patterns of attacks.
The paper is organized as follows: In the next section (2), the general references that guide this research will be presented. In section 3, the concepts that make up the proposed model are presented. Section 4 presents the new model proposed in this paper. In section 5 the tests and the database are presented to the reader, besides the generation of the fuzzy rules proposed by the approach and finally, in section 6 are presented the conclusions about the model and approach of dissemination of knowledge for the prevention of cyber attacks.
Literature review
Cyberspace
The term cyberspace was created in 1984 by an American writer named William Gibson; the term is presented in the author’s book entitled Neuromancer. The term cyberspace has different interpretations portrayed by different authors. By this statement we can understand that cyberspace is an environment that has as According to [39] there are still two variants of cyberspace in Barlovian cyberspace and virtual reality (VR). Barlovian cyberspace consists of an international computer network where the user interacts but does not immerse entirely in the environment, and the term virtual reality refers to the user’s deep interaction using most of his or her senses to create a simulated or real environment [39].
Computer theorists use the term cyberspace to refer to the notional social arena we enter when using computers to communicate. We can consider it as a metaphor that describes non-physical territory created by computational means, especially the internet, where individuals and corporations, alone or in groups, members of companies, public agencies or governments, can communicate, conduct research and traffic data in general, using Information and Communication Technologies (ICT) as support for its operation [45]. Actions in cyberspace are classified as offensive, exploitative or protective, and offensives can even impact national security. Cyberspace may be used more generally to refer to the potential "lifeway" or general type of culture being created via Advanced Information Technology (AIT) [45].
Cyberspace can relate to individuals by creating networks that are increasingly connected to a large number of points especially in the current day that computers and electronic devices are more accessible to the population, which makes sources of information increasingly accessible. It includes not only subjects, but also institutions that interconnect and interconnect with people, machines and documents.
Cyberspace is not just about who connects via the internet. It is an environment where humans interact with technologies, namely: cell phones, pagers, walk talk, among many others [74]. Figure 1 presents concepts related to cyberspace.

Cyberspace concepts. Avaliable in http://www.brasilcn.com/article/article_3840.html
Cyber attack is a malicious action commonly known as hacking, which involves the transmission of viruses (malicious files) that infect, damage and steal information from computers and other online databases of companies and individuals [55].
In an incipient way, nations, large and medium-sized companies have been preparing to avoid or minimize cyber attacks on the networks and information systems they use, as well as all other segments of society [11]. The attacks can happen in a physical way, where the devices containing the information are easily accessible, as well as modems, cables, and physical storage media [52]. Through the human environment, the attack is used by social engineering to obtain information from unprepared or unknowing people about virtual attacks, and by the logical medium where techniques like the Invasion to Overthrow Services (DDoS) are employed where the attacker works to overload the system in focus.
Techniques that exploit vulnerabilities for access ports, the fleeing of viruses and malware, or even password decoders, are examples of ways to attack the integrity of cyberspace. Some approaches work with scripts that try to decipher important access passwords [52]. Another vital aspect to be considered relates to cybercrime, mainly due to the damaging effects that can result from misuse of information and communication systems by malicious people. Despite the efforts of some sectors of the Public Administration and private companies’ computing departments, there are still gaps in physical and logical structures, in addition to countries such as Brazil that have weak legislation to typify attacks on the computer network [107]. Figure 2 below shows the origin and destination of cyber attacks.

Cyber attack in real time. Avaliable in 2
Malware is a term originating in English malicious software. Malware is a software intended to invade a computer system illegally, with information theft as its main objective [62].
In addition to computer viruses, which are developed to do harmful actions on a computer, legal applications for programming failures can also be considered as malware. According to [62], the human factor can contribute as much to the success as to the failure of the Malware attacks. According to the degree of awareness that a particular individual has about this type of attack, it can directly influence the expected result concerning the defense using anti-virus. Figure 3 shows examples of malware.

Malware. Avaliable in http://www.net-security.org/malwarenews.php?id=2636
A computer virus package is a program or piece of code designed to damage the computer device by corrupting system files, using resources, destroying data, or otherwise being a hassle. They may contain lines of code capable of damaging the processing of the computer [98].
Viruses are unique among other forms of malware because they can self-replicate, that is, they can copy to other files and computers without the user’s consent [98].
SQL injection
Structured Query Language, or just SQL, is the default language for interacting with relational databases. In it, the main tasks related to data manipulation in database structures [48] are performed.
SQL Injection is a type of cyber attack that takes advantage of flaws in systems that generally have a miscommunicated communication, programming or low-security criteria in web pages with the database through SQL commands, and for that reason is considered a type of attack it is effortless. In this invasion process, the attacker can insert a custom and undue SQL statement inside a query called SQL Query through the data entries of a program, such as forms or URL of an application. In the fields destined for user information, these commands are performed, that is, SQL commands are displayed, however because of this failure in the applications they end up causing changes in the database, loss of information, sharing of knowledge of the company in an inadequate way [46].
A cracker manages to obtain any sensitive data maintained in the database of a server computer through SQL injection attacks, including depending on the database version, it is also possible to insert malicious commands and obtain full permission to the host machine and executes the structure of the database [46]. Figure 4 reports the processes and steps involved in SQL Injection.

SQL Injection. Avaliable in https ://www . veracode . com/security/sql - injection
Data mining is a technology that combines traditional methods of data analysis with sophisticated algorithms to process large volumes of data [13]. These algorithms can receive as input a set of facts and return a behavior pattern that can be expressed as an association rule, a mapping function, or the modeling of a profile [13]. Data mining is a step in the process of knowledge discovery in databases (KDD). All steps are important in the process of knowledge discovery. The following is a description of each step [13]: -Data collection: a collection of data related to the business object to be analyzed; -Pre-processing: data treatment to reduce repetitions or discrepant values; procedures for selection of attributes and normalizations; -Data mining: application of data mining tasks; -Post-processing: validation and formatting of analyzes. The data mining tasks, in turn, are divided according to [13]: -Predictive tasks: The goal is to predict the value of a given attribute based on the values of other attributes. This attribute to be discovered is called the target attribute. They are specialized in the tasks of classification and regression, where the first use of discrete target attributes (binary values) and the second continuous target attributes (e.g., price, length, weight). - Descriptive tasks: The goal is to derive patterns, correlations, and groups of data. They specialize in the tasks of association analysis, grouping, and detection of deviations and/or anomalies. They are exploratory and, therefore, require post-processing techniques to validate their results.
Figure 5 summarizes the process of extracting knowledge from a database [40].

Process of applying data mining techniques to data.
The paper on network anomaly detection based on neural network evolution written by Konstantinos Demertzis and Lazaros Iliadis [32] describes an intelligent system of machine learning, where part of the system works looking for known threats, and another part tries to detect probable threats according to abnormal activities that take place in the system. The detection system is simple, it generates a state being treated as usual, and all signals outside the edge of that state are treated as an anomaly, so the detection algorithm learns continuously while the system is active in the network, is more and more precise. The methodology used in the article was Artificial Neural Networks (SANN) [32], which uses an approach of classification Evolving Connectionist System (eCOS) and Multi-Layer Feed Forward (ANN) to classify the exact type of the invasion or abnormality in the network with minimum computational potential. SANN is a set of modular systems based on node connections. The system continuously organizes itself, in line mode, adapting itself from the input data, being able to function or not in a supervised way. The SANN is also being applied to several other complex real-world problems, proving to be quite capable. The name of the developed model is called the biohybrid BIOPSSQLI (A Bio-Inspired Hybrid Artificial Intelligence Framework for Cyber Security), which works on the peaks that occur in the system, while the neurons are used to monitor the algorithm using OnePass learning. Traffic-oriented data is used by importing the classes, which use the variable Population Encoding (control variable from data conversion of the sample to the actual value in the time peaks). Data were classified into two types, Class 0, is the typical class results. Class 1, corresponding to abnormal results. When there is verification, and the result is 0, the eSNN classification process is repeated, but with appropriate data vectors. If the result continues 0, the process is terminated. When the result is Class 1, a neural network of two layers is used to recognize the pattern of the type of attack, using all the resources of the KDD database, if it happens in the hidden layer, 33 neurons are used. The results of the process are presented to the network administrator in the form of an alert, and the BIOPSSQLI graphical model can be analyzed in Fig. 6 [32].

BIOPSSQLI [32].
A work inspired by Greek mythology lies in the security of information systems. Ladon digital is a security mechanism of advanced information systems, which uses Artificial Intelligence to protect, control and offers an early warning in cases of deviations or mistakes of digital security measures. It is an effective system of network supervision, which enriches the lower layers of the system (transport, network, and data). Intelligently amplifies the top layers (Session, Presentation and Application) with automated control capabilities. This is done to increase the energy security and the mechanisms of reaction of the general system, without special requirements in computational resources. The Ladon Algorithm has advanced techniques for detecting cyber attacks, generating incredibly fast actions and low computational costs [33].
The use and generation of content, entertainment, and services through mobile services generate an extraordinary demand for the protection of these services. More and more people are using cell phones to share essential data, including their top payments. This attracts powerful gangs of cybercriminals, who use sophisticated, highly intelligent types of malware to amplify their attacks. Malicious software is designed to run silently and remain unsolvable for a long time. The work of [34] proposes the development of the anti-malware structure of computational intelligence (CIantiMF), which is innovative, ultra-fast and with low requirements. To run under the Android operating system (OS). His rationale is based on advanced approaches to computational intelligence such as the extreme learning machine. CIantiMF uses two advanced technology extensions for the ART Java virtual machine: the first is the intelligent anti-malware extension, which can recognize whether the java classes of an Android application are benign or malicious using an optimized multilayer perceptron. The second is the extension of online traffic identification Tor, which is capable of locating malware, identifying traffic Tor and prohibiting botnets, using the sequential algorithm of extreme online learning [34].
Most recently, Intelligent systems were also designed to protect data traffic from power distribution. An intelligent network is an improved power transmission and distribution network through digital control, monitoring, and telecommunications capability. It provides a two-way real-time flow of energy and information to all stakeholders in the chain from the generation plant to the commercial, industrial and residential end-user. Information and communication infrastructures will play an essential role in linking and optimizing the available grid layers. Grid operation relies on control systems called Supervision Control and Data Acquisition (SCADA) that monitors and controls the physical infrastructure. Because it is a sophisticated computer system of great relevance to the power distribution system, its devices can be targeted by cyber attacks. At the heart of these SCADA systems are specialized computers known as Programmable Logic Controllers (PLCs). In such devices, destructive cyber attacks against SCADA systems are carried out, destroying many devices and damaging the average speed of operation in order to deceive the operators of the equipment. To solve the problem, the work of [35] proposes a computer intelligence system to identify cyber attacks Intelligent Energy Networks (SICASEG). It is a big-time forensics tool that can capture, log and analyze the events of the intelligent power network to find the source of an attack to prevent future attacks and perhaps for lawsuits.
Also, noteworthy work to aid cybersecurity proposed in [37, 36]. These works use artificial intolerance approaches recognizing attack patterns and connecting to other systems that can act to avoid further damage to computational systems. Figure 7 presents a flowchart of actions performed by intelligent models incorporated into data traffic systems in the identification of cybernetic intrusions.

Computational intelligence system for malware detection [32].
Fuzzy neural networks (FNN) are neural networks of fuzzy neurons. These networks have as main characteristic the synergic collaboration between the fuzzy and neural networks theory generating models that integrate the treatment of the uncertainty and interpretability provided by fuzzy systems and the learning ability provided by neural networks [77].
Thus a Neuro-Fuzzy network (NFN) can be defined as a Fuzzy system that is trained by an algorithm provided by a Neural network. Given this analogy, the union of the neural network with the Fuzzy logic comes with the intention of softening the deficiency of each of these systems, making us have a more efficient, robust and easy to understand system [50]. Figure 8 presents examples of various combinations present between artificial neural networks and fuzzy systems. These intelligent models have an architecture based on multi-layered networks, where each of them has different functions in the model. In the works of [15, 85 and 72]. FNNs have three layers. In the models in [89] and [104, 16], its structure is composed of four layers. The function of each of these layers includes the concepts of fuzzy systems and artificial neural networks. In most models, the first layer is the one that partitions the input data, transforming them into fuzzy logical neurons. Versions of fuzzy c-means [14], ANFIS [53], and clustering by the cloud [4] are commonly applied.

Fuzzy Neural Networks examples [32].
Recent models of fuzzy neural networks based on the extreme learning machine were proposed in [49, 20]. Another models use back propagation like [17], genetic optmization [78], Hebb approach [99], stable learning [64] and ulti-objective algorithm [63]. Another model proposed by [26] uses the data density concepts for the fuzzification process and a fast extreme learning machine procedure.
It should also be noted that recent models of fuzzy neural networks have worked to solve problems of cybernetic attacks [12, 11]. However different from the approach proposed in this paper, resampling techniques and a non-evolutionary approach was used to solve the problems, which generated excellent results, but with long processing time.
The process by which fuzzy models treat data can determine how hybrid models can have the interpretability of their results closer to their real world. Models that are fully data-driven are the targets of recent research and have achieved satisfactory results in cloud data cluster. This clustering concept focused on data is called Empirical Data Analytics (EDA) [7]. This concept brings together the data without statistical or traditional probability approaches, based entirely on the empirical observation of the input data of the model, without the need for any previous assumptions and parameters [42]. SODA is a data partitioning algorithm capable of identifying peaks/modes of data distribution and uses them as focal points to associate other points to data clouds that resemble Voronoi tessellation. Data clouds can be understood as a particular type of clusters, but with a much different variety. They are non-parametric, but their shape is not predefined and predetermined by the type of distance metric used. Data clouds directly represent the properties of the local set of observed data samples [42]. The approach employs a magnitude component based on a traditional distance metric and a directional/angular component based on the cosine similarity. The main EDA operators are described in [7], which are also suitable for streaming data processing. The EDA operators include the Cumulative Proximity, Local Density, and Global Density. See more in [42].
The SODA approach uses two central concepts where the first stand out magnitude component based on a traditional distance metric and the second element involves the directional/angular component based on the cosine similarity.
The most widely used Euclidean distance metric was used in SODA as the magnitude component, and thus the magnitude component can be expressed by [42]:
The angular component that used the concepts of cosine similarity and expressed in SODA as [42]:
where cos(Θx
i
,x
j
) is
When one uses the magnitude and angular component values together, significant problems can be projected onto a 2D plane. This plan is called direction aware plane (DA) [42].
The Empirical Data Analytics (EDA) operators [7] include the Cumulative Proximity, Local Density, and Global Density. As the SODA approach, understanding the concept of density is relevant. This theme has been extensively presented in [7] and [5]. The local density D n is defined as the inverse of the normalized cumulative proximity and directly indicates the main pattern of observed data [5], where D for the training samples x i = (1,2...N; N u ) >1 is defined as follow [42]:
Global density is defined for unique data samples together with their corresponding numbers of repeats in the dataset/stream, and of a particular unique data sample,
As the main EDA operators: cumulative proximity, local density (D) and global density (D G ) can be updated recursively, allowing the SODA algorithm to be suitable for online processing of streaming data, causing the updating of density groups of the data is evolving. Figure 6 shows an example of the SODA definition and the center (black points) of density grouping defined by the algorithm.
The algorithm is performed in the following steps [42]:
Figure 9 shows an example of the SODA definition and the center of cloud grouping defined by the algorithm.

SODA [42].
This section will present the main concepts involved in the architecture of the proposed algorithm for the identification of cyber attacks.
Model architecture
The fuzzy neural network described in this section is composed of three layers. In the first layer, fuzzification is used through the concept of data density. The centers of the clusters are used to create the fuzzy Gaussian neurons in the first layer. The weights and bias of these neurons are randomly defined in the range of zero to one. Already in the second layer the logical neurons of the unineuron [60] type. These neurons have weights and activation functions determined at random and through t-norms and s-norms to aggregate the neurons of the first layer. To define the weights that connect the second layer with the output layer, the concept of a fast-learning machine [51] is used to act on the neuron with a linear activation function.
Unineuron is used to construct fuzzy neural networks in the second layer to solve pattern recognition problems and bring interpretability to the model. Figure 10 illustrates the feedforward topology of the fuzzy neural networks considered in this paper.

FNN architecture.
The second layer is composed by L fuzzy andneurons. Each neuron performs a weighted aggregation of some of the first layer outputs. This aggregation is performed using the weights
Finally, the output layer is composed of one neuron whose activation functions are linear.
The first layer is composed of neurons whose activation functions are membership functions of fuzzy sets defined for the input variables. For each input variable
x
ij
, L clouds are defined A
lj
, l = 1,... L whose membership functions are the activation functions of the corresponding neurons. Thus, the outputs of the first layer are the membership degrees associated with the input values, i.e.,
For the evolving SODA algorithm, 75% of the training samples are used in the evolving form at the first moment, and the remaining 25% is used for the recursive updating of the parameters that define the data density groups.
Second layer- Logical neurons fuzzy and fuzzy rules
The logical neurons used in the second layer of the model are of the unineuron [60]. They uses the concepts of uninorm [103] to perform more simplified operations according to the functions of activation of the fuzzy neurons. Its formatting allows the unineuron to use either concepts of a neuron and, or a neuron or. [61] explain important concepts about a unineuron. The processing of neurons occurs at two levels. At the first level of L1 locations the input signals are combined individually with the weights. In the second, at global level L2, a global aggregation operation is performed on the results of all first-level combinations. Traditional logical neurons use t-norms and s-norms to perform the described operations. 1- each pair (a
i
, w
i
) is transformed into a single value b
i
=
The function p is responsible for transforming the inputs and corresponding weights into individual transformed values. A formulation for the p function can be described as [60]:

Evolving Systems Concepts [90].
These rules allow the creation of a building base for expert systems [15].
The artificial neural network of aggregation uses the simple concepts of a network with its bias and its weight, which in this case is defined analytically by the extreme learning machine proposed by Huang. The output of the model is:
where z 0 = 1, v 0 is the bias, and z j and v j , j = 1,..., l are the output of each fuzzy neuron of the second layer and their corresponding weight and sign is the signum function, respectively.
Intelligent evolving systems are based on online machine learning methods for intelligent hybrid models. These systems are characterized by their ability to extract knowledge from data and adapt their structure and parameters to better adapt to changes in the environment [57, 25]. In general, they are formed by an evolving set of locally valid subsystems that represent different situations or points of operation [3]. [58] further developed evolving connectionist systems (ECOS) and not only to learn in an adaptive, incremental way from data that measure evolving processes, but to extract rules and knowledge from the trained systems. Evolving Fuzzy Systems (eFS) [6] are adaptive systems that modify both their structure and their parameters as data flow is processed. That is, the structure of the evolving fuzzy system can be reduced or expanded to fit each new input data. Evolving fuzzy systems can be seen as a combination of fuzzy models, an evolving mechanism for representing and compacting input data and methods recursive machine learning.
In Fig. 11 are exemplified the processes that involve the concept of evolving systems.
The membership functions in the first layer of the FNN are adopted as Gaussian, constructed through the centers obtained by the method of granularization of the evolving SODA input space and by the randomly defined sigma (in the interval between zero and one). The number of neurons L in the first layer is defined according to the input data, and by the number of partitions (ρ), defined parametrically. This approach partitions the input space, following the definition logic of creating data nodes. The centers of these formed clouds make up the Gaussian activation functions of the fuzzy neurons. These changes will allow the adaptation of the data according to the basis submitted to the model, allowing a more independent and data-centered approach. The second layer performs the aggregation of the L neurons from the first layer through the unineurons.
After the construction of the L unineurons the bolasso algorithm [8] is executed to select LARS using the most significant neurons (called L s ). The final network architecture is defined through a feature extraction technique based on regularization and resampling. The learning algorithm assumes that the output hidden layer composed of the candidate neurons can be written as [94, 24]:
When the amount of neurons is high to solve the problem of an intelligent model, intelligent techniques can be used to improve the architecture of the networks. These techniques use mathematical concepts to determine the neurons most relevant to the model for solving problems. Some of the feature selection techniques use probability features based on Bayes’ theory. A helpful indistinct penalty develops from a dual space representation of sparse Bayesian learning, which is based on the assumption of automatic relevance determination (ARD) that explains this problem by regularizing the solution space utilizing a parameterized prior distribution data-dependent prior distribution that effectively eliminates unnecessary or superfluous features [75]. [102] gives the canonical form of this problem
where Σ
y
≜ λI + ΦΓΦ
T
and Γ ≜ diag [γ]. Once some
See that if some γ∗,i = 0, as regularly happens through the learning process, then xSBL,i = 0 and the similar reference column is effectively pruned from the model. The resulting x SBL is therefore sparse, with nonzero elements corresponding including the suitable basis vectors [102]. Therefore, this probability-based approach to pruning second-layer neurons is efficient because it is linked to a data-centric non-parametric technique, allowing model definitions of relevance to be based on the problem data.
Subsequently, following the determination of the network topology, the predictions of the evaluation of the vector of weights’ output layer are performed. The Moore-Penrose pseudo-Inverse [41] estimates this vector:
The proposed model synthesized as demonstrated in Algorithm 4. It has one parameters: 1- the number of grid size, ρ ;
EFNN training
EFNN training
The model proposed in this paper has as main advantages over the other models that the nature of the data of the problem defines the construction of fuzzy rules and also selects those that are more representative to the problem, based on Bayesian techniques.
Databases used in the tests
The data originated from MIT’s Lincoln Lab represent the most popular free dataset used in the IDS assessment [97] was used to test the approach proposed in this paper. It contains recordings of the total network flow of a network that was installed in Lincoln Labs and simulates the military network of the US Air Force. The academic community commonly uses this database for having been challenged in the 1999 KDD Cup.
The network event analysis method includes the connection between a source IP address and a destination IP, during which a sequence of TCP packets are exchanged, using a specific protocol and a strictly defined operating time. They feature 41 features that are organized into the following four basic categories: Content Features, Traffic Features, Time-Based Traffic Features, Host-Based Traffic Features. Besides, the attacks are divided into four categories, namely: DoS, r2l, u2r, and probe.
The following files and configurations were used to perform the tests for the network traffic: TrafRedFull.data-(TRFD): In the first case of classification, all (41) features were used. The data were labeled as normal or abnormal. The TrafRedFull.data dataset has 145.738 records and 70 % (102.016 rec.) Were allocated to training data and the 25 % (43.722 rec.) Were applied in the validation test of the model.
In the second case of classification was used the normalFull.data (NFD) That has the standard characteristics relevant to the problem (11 resources). The data were also classified as normal or abnormal. The normalFull.data dataset has the same number of records as the TrafRedFull.data, so the division intended for training and model testing was the same.
FullVirusDataset.data (FVD) has a total of 5498 records consisting of 2.598 compressed viruses from the Malfease Project3 dataset, plus 2.231 noncompacted benign executables collected from a Windows XP Home plus, several common user applications and 669 benign executables packed. The dataset was randomly divided into two parts: A training data set (70 %) containing 3.849 patterns and a set of test data (30 %) containing 1.649 random patterns from the database. These Datasets [80] are available at 4.
VirusDataset.data (VD) containing 2.598 malware and 669 benign executables is divided into two parts: A training data set (70 %) containing 1.834 malware-related and 453 Patterns related to benign executables A set of test data (30 %) containing 762 malware-related patterns and 218 benign executables. To translate each executable into a standard vector Perdisci et al. [80] used binary static analysis to extract information such as the name of the code and data sections, the number of writable, executable sections, the code and the data entropy.
The SQLDataset.data (SQLD) used to evaluate attack patterns SQLInjection includes a list of 13.884 SQL statements that were selected by various sources through computational means. It should be noted that 12.881 of them are malicious (SQL Injections) and 1.003 are legitimate commands. With the help of the sqlparse module in Python5, which is a type of validation of malicious SQL codes, the syntax path and the use of certain SQL symbols in the construction of SQL injection commands. The SQL Statement correlation patterns were also obtained with the SQL injection type attacks. Therefore these patterns represent the main characteristics collected [32].
Configurations and models used in the intrusion detection and attack test
6In this section, the assumptions of the classification tests for the model proposed in this paper are presented. To perform the tests, real and synthetic bases were chosen, seeking to verify if the accuracy of the proposed model surpasses the traditional techniques of pattern classification. All the tests with the involved algorithms were done randomly, avoiding tendencies that could interfere in the evaluations of the results. The model proposed in this paper, called EFNN, was compared to fuzzy neural network classifiers using fuzzy c-means [14] (FFNN) [61] and genfis1 [53] (GFNN) [100]. In the Evolving SODA algorithm, 75% of the training samples are used for the offline training and the remaining 25% for the evolving training. In all models, the weights and bias were used in the first and second layers randomly. The number of primary neurons of each model is defined according to the number of centers (FFNN), membership functions (GFNN) and grid size (EFNN). For uniformity of the tests, the values involved in the first layers of the models, which end up defining the number of L neurons, were arbitrated in the range of [3–10], where the best results were defined using cross-validation. In the two models tested (GFNN and FFNN), we adopted the unineuron [60] for the composition of the logical neurons, and as neuron of third layer, we used a single neuron with a function of linear activation. For the model proposed in this paper, both the type of L neurons and the activation function of the exit neuron that compose the artificial neural network is defined according to the database and the combination established in stochastic form.
To verify the ability to classify binary patterns of the model we compared the results obtained with traditional models of machine learning that are available in the tool WEKA [47]. These are Naive Bayes [56], Multilayer Perceptron [86] and C4.5 [83]. In the artificial neural network models that use ELM as a training base, the same number of neurons used in the fuzzy neural network model test was used. In the models proposed in the WEKA, the concept of 10 k-fold was used to obtain the results. The initial configurations of the algorithm proposed in the weka tool were maintained. A total of 30 experiments were performed with the all models submitted to all test databases. In all tests and all models, the samples were shuffled in each test to demonstrate the actual capacity of the models. Percentage values for the classification tests are presented in the results tables, accompanied by the standard deviation found in the 30 replicates. The expected pattern obtained all accuracy responses with the response obtained. Finally, the AUC is also highlighted for classification tests and a test time. The outputs expected in the test were set to -1 and 1. Therefore, all bases used had their outputs converted to zero and one. Accuracy is the primary test result. In the neural network models, the activation functions are of the hyperbolic tangent type. The evaluation of the performance of these algorithms occurs through the following equations:
In all cases, TP = true positive, TN = true negative, FN = false negative and FP = false positive.
Table 2 present the accuracy results of the models in the 30 replicates in each of the bases evaluated.
Accuracies of the Fuzzy Neural Network in the tests performed
Accuracies of the Fuzzy Neural Network in the tests performed
AUC of the Fuzzy Neural Network in the tests performed
Table 3 show the AUC results of the models.
Time execution of the Fuzzy Neural Network in the tests performed
In Tables 4 the execution time (in seconds) of the algorithms for the proposed tests is present.
The fuzzy rules generated by the system attend to a logical and interpretive relation about the possible contexts of invasion. The technical terms involved in the use of artificial intelligence concepts can hinder the learning and understanding of the people involved in the processes. With the set of generated fuzzy rules, a language is allowed that is closer to the contexts lived in the daily life. A formed pool may be the relationship between malicious level and the trust level of a package or requisition. When we identify that new data belongs to this grouping, more coherent decisions can be made to prevent and support that computerized systems undergo cyber attacks. See the following rule example and how it can assist in the training and dissemination of technical knowledge:
"If the length is group 1 AND / OR the entropy is low AND / OR the level of malice is group 3, And the confidence level is group 2, And the level difference is group 1 then there is a SQL Injection Invasion." where according to the expert’s knowledge we can identify group 1 as a moderate characteristic, group 2 as high characteristic and group 3 as a simple characteristic.
Discussion - conclusions
We can verify that in Tables I and II the proposed algorithm was able to identify more appropriately the possible attacks from several sources of threats. This proves that the evolving approach becomes critical to day-to-day operations where systems need to learn about new situations and new attacks.
In this context, it can be verified that the proposed model acts in a similar way or even with superior performance to the algorithms of classification of binary patterns commonly used in the literature.
When comparing the hybrid approaches, it stands out a superior performance with a much shorter execution time than the others. This is due to updates of equally spaced clusters and membership functions that can make models unfeasible for approaches with large flows of information. The model maintained its ability to execute by performing the tasks in acceptable times, but still inferior to the traditional classifier algorithms. It has the advantage of the possibility of interpretability of the results, a factor that is not present in the neural network models that can be seen as a black box.
The generated fuzzy rules can serve as business rules for the construction of other computer systems or even for employee training and dissemination of knowledge so that according to the discovered patterns, preventive and corrective actions can be taken in companies or the familiar routine of people.
The technique proposed in this paper has advantages and disadvantages concerning the intelligent models commonly used to solve problems of cyber attacks as follows:
Advantages: Compared to traditional neural network models that use backpropagation to update network parameters, the model can act with a random definition of the internal parameters and only the third layer weights calculation, making the model simpler. Because it is a hybrid model, it has advantages over artificial neural networks by exploring aspects of interpretability of the problem, transforming the data into linguistic variables.
On other hybrid models that are state of the art in identifying problems on data attacks in cyberspace, we can highlight that the techniques used in the fuzzification of the model are based entirely on the nature of the data. It allows the formation of neurons more representative of the nature of the problems, thus allowing a more compact network.
Disadvantages: Neural network models can have the processing time of cyber attack activities much faster in the training phase of the models. The model proposed in the paper acts with more robust training since it extracts the characteristics of the data submitted to the model. Another disadvantage is that depending on the parameters used in the creation of the data clouds; the models may have their accuracy impaired when compared to FNN models that have grid-based fuzzification techniques. This is because, in the Grid technique, all possible combinations are defined by the membership functions formed.
Therefore we can conclude that the evolving fuzzy neural network proposed in this paper meets the requirements of a binary cyber attack classifier and still provides a set of knowledge that may be of value to individuals or companies. The density of the data allows us to find patterns of a grouping of characteristics allowing such situations to be taken care of by IT managers, ordinary people or even entire corporations.
The technique proposed in this paper may also work on classification problems related to other binary problems with numerical features, such as in health and industry.
As future work, other testing procedures, comparisons with other models, and other ways of evaluating results may be addressed.
