Abstract
Traditional methods have some problems such as large memory occupation and slow mining speed, so an intelligent mining acceleration algorithm based on particle swarm optimization is proposed. Based on the analysis of the communication data, the features of the communication data are selected by the acceleration strategy, and multiple feature subsets of the communication data are obtained repeatedly by using the remaining attributes. Particle swarm optimization (pso) algorithm is used to select the optimal feature subset, and average classification error is used as fitness function to complete intelligent mining of communication data. The experimental results show that the memory usage of this algorithm is between 62 and 71 GB in the experimental process, which is small and the average running time is better than the traditional algorithm. The results show that the algorithm has lower memory consumption and faster mining speed.
Introduction
Intelligent mining of communication data refers to the use of search methods to find hidden communication data information in a large number of communication data in cloud computing environment. It is based on artificial intelligence and other technologies, to find rules in more data, and to use visual form to reflect [12]. With the continuous development of Internet technology, communication data in cloud computing environment is facing the following three challenges: first, with the increasing speed of data query, how to meet the requirements of data analysis with the increasing amount of data; second, because of the different sources of communication data and the different formats of data, leading to the inconsistency of data formats, the communication data query becomes more cumbersome; third, the communication data makes the data storage and query very difficult [1].
Many experts and scholars have studied the accelerated algorithm for intelligent mining of communication data, and achieved some good results. Reference [3] proposed an accelerated algorithm for intelligent mining of communication network based on feature selection. Firstly, it improved the data source classification method of batch processing. For this reason, feature selection mechanism was introduced in the process of data mining modeling, which reduced the load of communication network, and an intelligent acceleration algorithm was designed to improve the efficiency of data mining. However, this method has a large memory footprint problem. Reference [8] presented an accelerated algorithm for intelligent mining of communication network based on FPGA. In order to meet the requirements of data multi-source and high throughput in stream data mining, a scalable, configurable and energy-efficient DTW stream data processing architecture based on FPGA was proposed, which could simultaneously process multi-source data with high energy efficiency and high performance. But this method has the problem of slow mining speed. Reference [10] proposed an accelerated algorithm for intelligent mining of communication network. In view of the shortcomings of K-means in traditional mining algorithm, an improved method based on data sampling and distribution density was proposed to obtain the central point of the algorithm. The clustering effect was improved by constructing functions in clustering, and the data mining model in cloud computing was enhanced to improve the acceleration effect of data mining. However, this method has the problem of slow mining speed, and the practical application effect is not ideal. Reference [4] proposed an accelerated method for intelligent mining of communication network based on improved Big FIM algorithm. In order to solve the problem of poor data mining effect in association rule mining and high frequency data item mining, a data mining acceleration method based on MapReduce framework was proposed. Compared with Big FIM, BFIM increased the support for larger scale data and improved the speed of data mining. However, this method has a large memory footprint problem.Literature [2] proposed the conceptual design of IAICA algorithm based on incremental data classification modeling of big data. By design, the process of overloading existing data content to build a classifier was eliminated after each new instance data arrived. Comparative experiments show that IAICA algorithm improves query accuracy and query speed. However, this method also has the problem of using too much memory.
Faced with various challenges of massive communication data in cloud computing environment, the former communication data mining algorithm is difficult to meet. In the whole mining process, there are problems such as large memory consumption and long average execution time. For this kind of situation, design a new kind of intelligent communication data mining based on particle swarm optimization to accelerate the algorithm, first to use acceleration strategy to analyze communication data, and then use the remaining properties to obtain multiple communication data feature subsets, finally to the average classification error as fitness function, adopting the particle swarm algorithm to optimize feature subset, complete intelligent communication data mining. Finally, in order to verify the operation effect of the method, a comparison experiment with the traditional method is designed, and the advantages of the modified method are proved.
The overall design of intelligent mining acceleration algorithm for communication data in cloud computing environment
Feature selection of communication data in cloud computing environment
In feature selection of communication data, whether it is classified by support vector machine or regression, in fact, the classifier is obtained by training the attributes of communication data, and the classifier coefficients are generally close to 0. There are more than one. In the process of feature selection of communication data in cloud computing environment, these attributes will eventually be removed. In this paper, the characteristic contribution values of these attributes are accumulated in order from low to high, and the accumulated value is set to a certain proportion. In the process of feature selection of communication data, the number of features that are not needed is removed at each time. Only the contribution value is considered, and the smaller contribution value of communication data attributes is removed according to the proportion of the total contribution value of communication data attributes.
In order to extract useless communication data features effectively, after getting a subset of communication data features, it is removed, and the remaining communication data features are used to continue extracting feature subsets until the feature subset cannot meet the constraints set before [14].
In the practical application of feature selection of support vector machine communication data, firstly, the problem of communication failure in cloud computing environment should be solved by accelerated linear algorithm, and the results of feature selection of communication data should be analyzed. In order to reduce the training time of feature selection of communication data in high-dimensional state with a large number of data attributes, acceleration algorithm is used to reduce the training time of feature selection of communication data in cloud computing environment.
In order to avoid the loss of important communication data in the acceleration process, three methods are used to avoid it. The first one is to train the subset of communication data features in high-dimensional state with a large number of data attributes. When the cross-validation accuracy generated by the acceleration algorithm is not greatly reduced, it is necessary to speed up the training of the subset of communication data features; the second is to train the communication data using the least time acceleration algorithm and to refer to the results; the third is to repeat the training of the remaining attributes of communication data features by using the acceleration algorithm to mine out the remaining attributes. Valid information of communication data that may be lost. When the results of the acceleration method have met the actual situation, the remaining communication data attributes are trained. By doing so, many useful information in useless attributes can be mined to a great extent. The acceleration algorithm effectively combines these three methods, and the final total time is far less than that of the algorithm which only deletes one attribute at a time.
(1) Acquiring the contribution value of characteristic kernel function of communication data
According to the above analysis, when calculating the characteristic contribution value of communication data, the calculation time takes up a large part of the total training time. In order to reduce the calculation amount of the characteristic contribution value of communication data, the following formula is used to give the contribution value of communication data in the case of another non-linear kernel function:
In the formula,
The difference between the characteristic non-linear kernel function and the linear function of communication data is that in a well-trained classifier, the total contribution value corresponding to each individual attribute in the linear kernel function of communication data is equal to that corresponding to all communication data attributes [6], while the sum of contribution value corresponding to each individual attribute in the characteristic non-linear kernel function of communication data is equal. The contribution values corresponding to all communication data attributes are not equal. The following two attributes are used for analysis:
Supposing that there are only two attributes
In the formula,
According to formula (1) and formula (2), the concrete expressions are given:
Whether formula (3) is valid or not can be judged by calculating the conversion of formula (3) to kernel function.
When the characteristic kernels of communication data are linear kernels, then formula (6) is valid:
Formula (6) does not hold when the characteristic kernel function of communication data in cloud computing environment is a linear kernel function.
(2) Acceleration algorithm (process)
(a) Firstly, all data attributes in the communication data set should be standardized and trained to get a classifier.
(b) According to the classifier, the contribution value of each feature attribute of communication data is acquired, and the negative contribution value is reduced to 0, which is arranged in order from low to high, and the cumulative value is calculated.
(c) If the minimum contribution value of data attributes in the current subset of communication data features is greater than the deletion criterion, it needs to be removed and the corresponding cross-validation accuracy value is obtained, except that the value is less than or equal to Percentage (50%, 40%, 30%, 20%, 10%, 5%, 4%, 3%, 2%, 1%) [11].
(d) Repeat (c) until the cross-validation accuracy is obtained from the last data attribute feature of communication data.
The method of deleting communication data features selected in this paper accelerates faster according to the proportion of contribution value. When the contribution value of each feature attribute of communication data is negative by classifier, the cumulative value is more than 100%. The proportion of contribution value of sorted communication data feature attributes is larger [15]. In order to make operation convenient, when the contribution value of each feature attribute of communication data is negative, it is treated as 0 and deleted.
If in the process of acceleration, multiple feature subsets of communication data are obtained, and the number and classification effect of each feature subset of communication data can be applied concretely. In the process of acceleration, multiple feature attributes of communication data are removed each time. However, in order to improve the classification effect, the number of feature attributes of communication data is still too large. At this time, the selected communication is needed. Data feature subset is used to reduce attributes and find out the effective feature subset with less attributes of communication data [13].
Accelerated algorithm for intelligent mining of communication data based on particle swarm optimization
This paper proposes an acceleration method for intelligent mining of communication data, which reduces the number of training samples in the subsequent mining stage by selecting reasonable candidate support vectors in advance, so as to improve the training speed of training samples [7].
Samples whose mutual center distance is less than the center distance of different communication data classes are selected as candidate support vectors.
(1) Linear separability
If the set of vectors (
(2) Nonlinear separability
Two communication data sample vectors x and y are given, they are mapped into low-dimensional feature space by a non-linear mapping function, and the Euclidean distance of two sample vectors of communication data in feature space is calculated [9]:
Where
According to formula (9) or formula (10), the distance between positive class and negative class centers can be calculated by calculating positive class communication data sample center
According to the following formulas, the distances between positive and negative communication data samples to these two kinds of centers m are calculated respectively. The communication data samples whose distances between them are less than D are regarded as candidate support vectors:
Traversal trains the set of samples through pre-selected communication data, removing the same elements in the set of samples, that is, removing redundant data [5].
After these operations, the number of training samples of communication data is effectively reduced and the training speed of communication data is improved.
Fuzzy membership degree is used to describe the degree to which a communication data sample belongs to a certain category. The change of membership function is usually related to the change of the center distance of the sample class. The larger the distance between the function and the center of the class is, the smaller the value is. It effectively reduces the noise and outliers in the communication data sample. However, the optimal classification surface of the fuzzy support vector machine is determined by the support vector machine. Generally, the distance between support vector and communication data class center is far, in order to obtain smaller membership degree, a new membership function is designed. The larger the distance between the support vector and the class center of communication data sample is, the larger the value is. That is to say, the support vector with a longer distance from the class center of communication sample will get a larger membership degree, which enhances the effect of the support vector on the classification hyperplane. Analysis of formula (11) or formula (12) can obtain positive class communication data sample center
Where, δ denotes the smaller positive number selected, which effectively avoids the occurrence of
By training the fuzzy support vector machine (FSVM) through the above process, the decision function obtained by formula (13) is obtained. In the classification stage of fuzzy support vector machine, fuzzy support vector machine classifies an unknown communication data sample with the complexity of
Particle swarm optimization (PSO) is a particle swarm optimization (PSO) algorithm. Particle swarm optimization (PSO) is a particle swarm optimization (PSO) algorithm. PSO algorithm also has memory, less parameters to adjust, easy to implement. The real number coding is directly determined by the solution of the problem, and the number of variables of the solution of the problem is directly taken as the dimension of the particle. Therefore, particle swarm optimization is adopted to optimize the feature subset.Collaborative PSO algorithm is used to slow down the convergence speed at the beginning of iteration. This algorithm adopts local learning method, so it is easier to pick out local minimum points than the basic PSO algorithm, achieving higher convergence accuracy.
Based on this, particle swarm optimization (PSO) is used to optimize support vectors to improve the mining speed without reducing the classification accuracy of communication data samples. Its essence is to regard the corresponding fuzzy membership vectors of support vector sets obtained after training fuzzy support vector machines as individuals in particle swarm optimization, and the average classification errors of test sets of communication data samples are as fitness functions. In order to reduce the number of support vectors and improve the speed of data mining, the best support vectors are selected to create a set.
The principle of particle swarm optimization (PSO) is to search the global optimal solution by constantly changing the position of individuals. In the D-dimensional search space, the group created by n individuals is
Where,
Particle swarm optimization is used to optimize the classifier of support vector machine. Each individual in the group represents a search solution. The number l of support vectors trained by fuzzy support vector machine is regarded as the spatial dimension of particles. Each particle is regarded as a subset of support vector machine set. The corresponding membership degree of these support vectors is calculated by formula (13). If the interval of membership degree of these communication data samples is within [
Where, M denotes the number of samples in the test set of communication data samples,
Experimental results and analysis
In order to verify the comprehensive effectiveness of the proposed intelligent mining acceleration algorithm for communication data based on particle swarm optimization, experiments are needed. The hardware platform used in the experiment was IBM PC, with the main frequency of 2.8 ghz, the operating system of Windows 8 and the memory of 24 GB. In order to ensure the quality of the experiment and the test results are more convincing, 1000 sets of data in the communication database under the cloud computing environment were randomly selected in the experiment. There are two sets of comparison algorithms in the experiment. One is the intelligent acceleration algorithm of communication network mining based on feature selection, and the other is the intelligent acceleration algorithm of communication network mining based on FPGA. The memory ratio and the running time of the algorithms are compared.
(1) Comparing the memory usage of the proposed algorithm with those of the intelligent acceleration algorithm based on feature selection and the intelligent acceleration algorithm based on FPGA, the experimental results are shown in Fig. 1.

Comparison of memory occupancy by different methods.
Figure 1 shows the memory footprint at different volumes of communication data. The smaller the memory footprint, the better the performance of data mining algorithm, and the more suitable for large-scale communication data mining. As can be seen from Fig. 1, the intelligent acceleration algorithm based on feature selection used between 86 and 100 GB of memory during the experiment, occupying a large memory capacity. The intelligent acceleration algorithm of communication network mining based on FPGA keeps the memory usage stable at about 80 GB during the experiment, which occupies a large memory capacity. The memory usage of the algorithm proposed in this paper during the experiment is between 62 and 71 GB, which is the smallest compared with the previous two methods. By comparison, this algorithm has a great advantage in communication data mining.
(2) The proposed algorithm is compared with the intelligent acceleration algorithm based on feature selection and the intelligent acceleration algorithm based on FPGA. The experimental results are shown in Fig. 2.

Comparison experiment of average running time by different algorithms.
Figure 2 represents the comparison of the average running time by the three algorithms under different data volumes. The average running time of the three algorithms increases with the increasing number of data. When the number of data is 1000, the average running time of the proposed algorithm is 21.5 s, the average running time of the intelligent acceleration algorithm based on feature selection is 26 s; and the average running time of the intelligent acceleration algorithm based on FPGA is 34 s. The less the average running time of communication data mining is, the better it reflects that the algorithm occupies a certain advantage in computing ability. By comparison, the proposed method is superior to the intelligent acceleration algorithm of communication network mining based on feature selection and the intelligent acceleration algorithm of communication network mining based on FPGA, and has higher mining efficiency.
The proposed algorithm improves the disadvantages of intelligent communication data mining under the traditional cloud computing environment, such as long average running time and large memory consumption. this paper proposes an accelerated algorithm for intelligent mining of communication data based on particle swarm optimization. The feature subset of communication data is obtained by using the feature selection method of support vector machine. Particle swarm optimization algorithm is used to optimize the feature subset of communication data. The average classification error is used as the fitness function to complete the intelligent mining of communication data. The experimental results show that the proposed algorithm has less memory occupation and shorter average execution time, and it is effective in intelligent mining of communication data in cloud computing environment.However, there are still deficiencies in the research of this paper. In the future, we will take improving the quality of communication data mining under the cloud computing environment as the research direction and conduct in-depth research.
Footnotes
Acknowledgements
This work was supported by Jiangsu Provincial Department of Housing and Urban-rural construction Foundation under grant no.2018ZD269.
