Abstract
Distant supervision is a widely applied approach in field of relation extraction, which could automatically generate large amounts of labeled training corpus with minimal manual effort. However, the labeled training corpus may have many false positive instances, which would hurt the performance of relation extraction. Moreover, in traditional distant supervised approaches, extraction models adopt human-design features with complicated natural language processing (NLP) preprocessing. It may cause poor performance either. To address these two shortcomings, in this work, we propose a novel Long Short Term Memory (LSTM) network integrated with multi-instance learning. Our approach is supposed to learn and extract features automatically from the data itself and treats distant supervision as a multi-instance learning problem to settle the problem of false positive instances. Experimental results demonstrate that our proposed approach is effective and achieve better performance than traditional methods.
Introduction
Relation extraction, which is defined as a task of extracting semantic relations between a pair of entities expressed in text, has attracted increasing attention in big data era. It can play a significant role in various scenarios, e.g. information extraction [32, 10], question answering [20], knowledge base construction [9], ontology learning [41], etc. Most approaches to relation extraction use supervised learning of relation-specific instances, which would achieve high precision and recall. However, traditional fully supervised approaches are unlikely to scale to the large amount of relations found on Web and are limited by the availability of training data [34].
Due to the shortcoming of supervised approaches described above, recently, a more promising approach named distant supervision (DS) has gained much attention, which creating its own training data by heuristically aligning facts in existing knowledge bases to free text [21, 34, 36, 37]. The core of distant supervision for relation extraction is an assumption that if two entities have a relationship in a known knowledge base, then all sentences that contain this pair of entities will express the same relationship in some way [21]. Figure 1 shows a simple example for distant supervision in relation extraction domain. In the example, Barack Obama and United States are related entities in Freebase1
Automatic labeling by distant supervision. The first sentence is a correct labeling instance and the second sentence is an incorrect labeling instance.
The strategy of distant supervision is an effective approach of automatically labeling training data and a sound solution to solve the problem of availability of big data, which is suitable for text mining and web mining [8, 36, 37]. However, it has two major shortcomings constraining the effectiveness of relation extraction. Firstly, the original distant supervision assumption is too strong and may cause wrong label problem, i.e. false positive instances [36]. A sentence containing a pair of entities does not express the relation which is in a knowledge base. It is possible because the entities may just appear in the same sentence and share the same topic [6]. For instance, consider the relation EmployedBy between Barack Obama and United States in Fig. 1. The first sentence indeed expresses the relation EmployedBy between two entities. In distant supervision approach, EmployedBy is assigned to the sentence which becomes a useful training instances. On the other hand, the second sentence does not express the relation between two entities. However, distant supervision approach heuristically labels the second sentence as expressing the relation EmployedBy and selects the second sentence as a positive training example either. These false positive instances would hurt the performance of relation extraction model which is trained based on such noisy data [34].
Moreover, previous research has typically applied traditional machine learning models to exquisitely designed features with labeled training data obtained through distant supervision [21, 34, 36]. These features are derived from existing natural language processing (NLP) tools, such as openNLP2
In this paper, we propose a novel Long Short Term Memory (LSTM) network integrated with multi-instance learning to address two shortcomings described above. We treat distant supervision for relation extraction as a multi-instance learning problem to settle the first deficiency similar to previous studies [6, 21, 23, 34, 36]. In multi-instance learning problem, the training dataset are divided into many bags which consists of many instances and each bag has a label [39]. When a bag is labeled as negative, it means that all the instances in the bag are negative. But when a bag is labeled as positive, we only know that as least one instance in the bag is positive. In distant supervision for relation extraction, a bag is considered as a pair of entities, label of a bag stands for a relation which two entities holds and instances in the bag are all sentences containing both two entities. Inspired by Zeng et al. [6], we utilize an objective function at the entity pair level and integrating the function into our novel LSTM networks. In this way, during training process, the uncertainty of label of instances could be taken into account, which mitigates the wrong label problem.
To address the error propagation or accumulation problem, we adopt LSTM architecture inspired by Xu et al. [44] to automatically extract features without artificial design or complicated NLP processing. We utilize the shortest dependency path (SDP) of sentences [16] as input of LSTM-based recurrent neural networks. Although SDP has been shown to be effective for relation extraction, it abandon parts of sentence and cannot capture complete information of the sentence containing two entities. Sentence embedding is a real-valued vector and proved to be able to capture syntactic and semantic information of whole sentence [13, 27, 28]. Thus in our extraction model we combine sentence embedding with the output of LSTM units at hidden layer to furnish supplementary information for relation extraction. To effectively train and update parameters of our model, we also customize a new dropout strategy to alleviate overfitting. Therefore, our approach is expected to exhibit superior performance compared with traditional methods.
The contributions of our work can be summarized as follows.
We propose a novel LSTM network combined with sentence embedding to performing distant supervision for relation extraction to avoid artificial designed features. Our model is suppose to automatically learn and extract features without complicated NLP preprocessing. To the best of our knowledge, we are the first work to use LSTM-based recurrent neural networks for distant supervision in relation extraction domain. In our designed network, we introduce sentence embedding to supply complete syntactic and semantic information in whole sentence, which remedies the limitation of using SDP as our input of LSTM-based networks. To figure out the problem of false positive instances in training data, we incorporate multi-instance learning into our LSTM-based network for distant supervised relation extraction.
In the rest of this paper, we review related work in Section 2. The formalizing description of distant supervision for relation extraction is introduced in Section 3. In Section 4, we describe our LSTM-based model in detail. Section 5 presents quantitative experimental results. Finally, we have our conclusion in Section 6.
Relation extraction is a widely studied topic in NLP community. Various supervised-learning methods to relation extraction have been proposed [2, 4, 5, 12, 16, 17, 18]. In these methods, relation extraction is deemed to be a multi-class classification problem which can achieve high precision and recall [4, 12, 26]. However, supervision learning methods may suffer from a lack of labeled training data and are unlikely to scale to thousands of relations found in text on the Web. To address this deficiency of supervised-learning methods, Mintz et al. [21] introduced distant supervision for relation extraction, which adopted Freebase to heuristically label a textual corpus. Riedel et al. [36], Hoffmann et al. [34] and Surdeanu et al. [23] relaxed the original distant supervision assumption, i.e. if two entities participate in a relation, at least one sentence that mentions these two entities might express that relation, and presented a series of models casting distant supervision. The relaxed assumption converts distant supervision into a multi-instance learning problem, which is coined by Dietterich et al. [39].
Recent work has focused on filtering noisy data in heuristically labeled training data generated by distant supervision. Takamatsu et al. [37] proposed a generative model of the labeling process to improve the quality of labels before training relation extraction model as a preprocessing step. Xu et al. [42] analyzed random samples from New York Times corpus, indicating that a large amount of entity pairs expressing a relation Freebase defined correspond to false negative instances. They adopt pseudo-relevance feedback to add missing entries in the knowledge base before training MultiR [34].
These studies have been proved to be effective for distant supervision in relation extraction. Whereas the performance of their models relies on the quality of artificial designed features heavily. Most existing approaches have concentrated on extracting features to classify semantic relations between two entities, which mainly fall into three classes: feature-based [8, 26], kernel-based [19, 29, 30] and neural network-based [5, 18, 22, 44]. In feature-based methods, different feature sets are extracted and selected, which would be fed into a classifier. Generally, three types of features are utilized, i.e. lexical features (e.g. POS, etc.), syntactic features (e.g. parsing tree, etc.) and semantic features (e.g. entity type, entity mention, etc.) [26]. Feature-based methods may suffer from the difficulty of selecting an appropriate feature set and transforming structured representations into feature vectors. Kernel-based approaches specify a measure of similarity between two instances without explicit feature representation. Several kernels have been proposed, such as convolution tree kernel [19], subsequence kernel [29] and dependency tree kernel [30]. The potential deficiency of kernel-based methods is that all data information is completely measured by the designed kernel function, and devising an effective kernel becomes crucial.
Deep neural networks have attracted growing attention in the literature, which can learn underlying features automatically. Socher et al. [38] proposed a recurrent neural network (RNN) with parsing tree of sentences to classify relations between entities. Hashimoto et al. [17] improved the performance by weighting importance of phrases in RNN. Ebrihimi and Dou [14] adopted dependency path between two indicated entities to build a RNN model. Xu et al. [44] fed the shortest dependency path instead of whole path into their LSTM-based RNN model and achieve a better result. Other neural networks are utilized to extract relation either. Zeng et al. [5] explored convolution neural networks (CNN) using sequential information about sentences and Dos Santos et al. [2] proposed a ranking loss function to train their CNN model.
Inspired by Zeng et al. [6] and Xu et al. [44], we propose a use of LSTM-based networks with multi-instance learning to learn features automatically for distant supervised relation extraction. Zhang and Zhou [25] integrated multi-instance learning with traditional back propagation (BP) and radial basis function (RBF) networks successfully by minimizing a sum-of-squares error function. Like Zeng’s work [6], we define an objection function based on cross-entropy principle.
Distant supervision for relation extraction
In this section, we concentrate on distant supervision for extraction of relations between two entities. A relation instance is defined as an expression
An entity mention as a contiguous sequence of text tokens denoting an entity name. In this paper, we assume all the entity mentions in a corpus are extracted by a different process, such as named entity recognition.
A relation mention is a sequence of text including two or more entity mentions, which indicates that a certain relation instance
Distant supervision for relation extraction utilizes a knowledge base to create labeled data for relation extraction by heuristically matching entity pairs [37]. A knowledge base is a set of relation instances of predefined relation names. For each sentence in a corpus, we extract all of its entity pairs. Then for each entity pair, we try to retrieve all the relation instances about the entity pair from the knowledge base. If we found such a relation instance, i.e. a relation mention, a set of the entity pair and the sentence is stored as a positive example. If not, a set of the entity pair and the sentence is stored as a negative example. Through this procedure, training data are automatically constructed.
Methodology
In this section, we describe our LSTM-based network incorporating with multi-instance learning in detail to fulfill distant supervision for relation extraction, which is formulated as a multi-instance problem. Subsection 4.1 delineates the overall architecture of our model. Subsection 4.2 introduces the merit and demerit of using the shortest dependency path as input of our model. Subsection 4.3 discusses the rationale of adopting word embeddings and sentence embedding to provide supplementary semantic information. LSTM units and softmax layer are explained in Subsections 4.4 and 4.5. Subsection 4.6 customizes a dropout strategy for our model to alleviate overfitting. We finally integrate multi-instance learning with our LSTM-based model in Subsection 4.7. For convenient expression, we will abbreviate our proposed model to SE-LSTM (LSTM with Sentence Embedding).
Overall architecture
Figure 2 gives the overall architecture of our SE-LSTM network. A sentence containing two entities is given which is parsed to a dependency tree by Stanford parser and the shortest dependency path (SDP) is extracted as input of SE-LSTM. In the input layer, word tokens in each SDP are mapped to low-dimensional distributed representations, which are real-valued vectors called word embeddings [31, 40] and could capture the underlying syntactic and semantic information of the inputs. Then each SDP would be separated into left and right sub-path by the common ancestor node of two entities, and two recurrent neural networks are utilized to extract implicit feature and information of the two sub-path, respectively. In each recurrent neural networks, long short term memory (LSTM) units are adopted for effective information propagation. Next, a max pooling layer is applied to pick up the most deterministic feature of each sub-path and fixed the dimension of outputs of LSTM units. After that, the max pooling layers for two sub-path are thereafter concatenated and then are combined with the embedding of the whole sentence into a vector. Subsequently, the concatenated vector would be fed into a full-connected hidden layer which exports a confidence score vector of relation labels. In the end, a softmax output layer is used to predict relation names.
The overall architecture of SE-LSTM.
The dependence parse tree of the sentence ‘Barack Obama is the 44th and current President of the United States.’ The bold lines describe the shortest dependency path between entities Barack Obama and United States.
Dependency parsing tree is proved to be suitable for relation extraction because it concentrates on the action and agents in a sentence [16]. Moreover, the shortest dependency paths are informative to determine the two entities’ relation, which focus on the most relevant information while diminishing relevant noise [29, 44]. For example, Fig. 3 demonstrates the shortest dependency path of the aforementioned example sentence. As the entities’ relation distinguishes its directionality, i.e. for a certain relation
The shortest dependency path is split into two sub-path by the common ancestor node of two entities, i.e. ‘is’.
Raw word tokens in each sub-path are mapped into low-dimensional vectors as other literature [2, 5, 31, 40] when using a neural network model. Every input word token is transformed into a real-valued vector by looking up pre-trained word embeddings which are trained on large-scale corpuses. Moreover, we introduce sentence embedding to make up for the information loss caused by SDP, which are illustrated in Subsection 4.3.2.
Word embeddings
Word embeddings are distributed representations of words. Through a lookup operation, every word in word sequence is mapped into a dense, low-dimensional and real-valued vectors, which is proposed in recent years [15, 40]. Each dimension of word embeddings expresses a latent feature of words. Such distributed representations of words have been verified to be able to capture the syntactic and semantic information and similarity property significantly [40, 15]. Using pre-trained word embeddings has become common practice for enhancing many NLP tasks [1, 7].
Word embeddings are encoded by column vectors in an embedding matrix
where
Although SDP can extract the most significant feature of the sentence containing two entities, adopting SDP only is insufficient for relation extraction. As mention in Section 1, approximately half of the sentences are longer than 40 words in the benchmark DS dataset [6]. However, SDP reduces the length of the sentence too rapidly and is unable to capture the whole important semantic information of the sentence for relation extraction. What’s more, word embeddings are severely limited because they cannot capture long-distance dependencies and semantic compositionality. Therefore, we introduce sentence embedding [13, 27, 28] to provide supplementary semantic information of the given sentence containing two entities.
The architecture utilized for extracting sentence embedding.
Sentence embedding is obtained through a max-pooled convolution neural network proposed by Kim [45]. Figure 5 describes the model to offer sentence embeddings which can extract sentence level semantic and syntactic features automatically. Let
where
Recurrent neural network (RNN) models are promising deep learning models which can represent phrases of arbitrary length in a vector space of a fixed dimension [14, 17, 38]. It is indicated that RNNs are suitable for modeling sequential data such as text and speech, because they keep a hidden state vector
LSTM architecture was proposed and extended by [33]. The main mechanism of LSTM is to adopt memory blocks to decide the degree of information that LSTM unit should keep and memorize. In the literature, many LSTM variants have been proposed. Inspired by Xu’s work [44], we adopt a variant version of LSTM introduced by [43]. More specifically, a LSTM unit consists of one recurrently memory cell and three multiplicative blocks, i.e. an input gate
Inner structure of a long short term memory unit.
Figure 6 depicts one LSTM unit. Concretely, the LSTM unit at
where
Then a max-pooling operator [5, 31, 44] runs over all the LSTM units to extract the global prominent feature in each dimension for two sub-path.
The output layer determines the relation label of input relation mention, and consists of a full-connected layer which is called penultimate layer and a softmax layer. For each relation mention, the exports of the two max-pooling layers and sentence embedding
which are then fed into the penultimate layer.
To compute confidence of each relation label, the vector
where
It is proved that deep neural networks which contain multiple hidden layers could achieve satisfactory performance for many specific tasks. However, because training data is inadequate, deep neural networks with a great amount of parameters always suffer the problem of overfitting [44]. Therefore, a suitable regularization approach is needed to alleviate this inevitable problem. Dropout, proposed by Hinton [11], has been indicated to be successful in deep learning domain. Through dropping out of neural units randomly during training phase, dropout can generate more robust network units and achieve better performance. Three dropout strategies are attempted for our SE-LSTM network.
Dropout word embeddings and sentence embedding. Dropout the penultimate layer. Dropout inner cells in LSTM units.
On the basis of comparison results of experiments in Section 5.5, the first two dropout strategies remarkably boost the performance of our model. Whereas, omitting inner cells of LSTM units is demonstrated to be harmful to SE-LSTM network.
In the first strategy, i.e. dropout word embeddings and sentence embedding, following equations are given.
where
In penultimate layer, given the input
where
It is proved that there may be large amounts of false positive instances in distant supervision for relation extraction because the original assumption is too strong [21, 34, 36]. In order to alleviate the problem of wrong label, we relax the assumption and convert distant supervision for relation extraction to a multi-instance problem like many scholars. In this paper, we integrate multi-instance learning and our proposed SE-LSTM networks model inspired by Zhang and Zhou [25] and Zeng et al. [5]. In the distant supervision for relation extraction problem, supposing that
Multi-instance learning introduces the concept of bag which contains multiple instances. We assume there are
where
We adopt stochastic gradient descent over shuffled mini-batches with Adadelta [24] update rule to maximize
[h] Multi-instance Learning 1. Initialize the parameter set
As described above, in the training phase, our SE-LSTM model would update parameters based on bags instead of each relation mention, which reflects the essential of distant supervised relation extraction because false positive instances are inevitable.
In purpose to verify that our proposed SE-LSTM model is effective to automatically learning features and can reduce wrong label instances using multi-instance learning, extensive experiments have been carried out in this section. Section 5.1 introduces the dataset we adopted. Section 5.2 describes experimental hyper-parameter settings and evaluation metrics used. In Section 5.3, we compare SE-LSTM’s performance with several traditional methods. We also analyze the effect of sentence embedding and multi-instance learning in Section 5.4.
Experimental dataset
Following [21, 34, 36], we evaluate our method on the same dataset4
Parameter settings
This subsection describes hyper-parameter tuning for our approach. To ensure a fair comparison, we utilize a three-fold validation to tune all the hyper-parameters of our model just as Surdeanu did [23]. Table 1 demonstrates all hyper-parameters used in the experiments. We set word embeddings to be 50-dimensional and the penultimate hidden layer is 50-dimensional. While in training phase, the batch size is fixed to 50. Following Zeiler [24], two main parameters Adadelta relying on, i.e.
Hyper-parameter in the experiments
Hyper-parameter in the experiments
The word embeddings used in our experiments are 50-dimensional and initialized by unsupervised pre-training. We use the Skip-gram neural network architecture to pre-train word embeddings which is available in the word2vec tool6
Remove paragraphs that are not in English. Substitute non-western characters for a special character. Remove sentences that are less than 20 characters long including white spaces. Lowercase all words and substitute each numerical digit for a single zero.
It is notable that we utilize the pre-trained word embeddings to obtain sentence embedding either.
As same as [21], we adopt two evaluation metrics to assessment our SE-LSTM model in distant supervised relation extraction, which are the held-out evaluation and the manual evaluation specifically. In the held-out evaluation method, the extracted relation instances are only compared with Freebase facts and the precision/recall curves of the experiments are shown to indicate whether our SE-LSTM model is effective. In the manual evaluation, we check the extracted relation instances manually which are not appearing in Freebase faces [6].
Experimental results and discussions
Baselines
We compare our approach against three traditional methods to evaluate the performance of our proposed SE-LSTM model.
Mintz: This is an original model for distant supervised relation extraction proposed by [21]. MultiR: This is a distant supervision for relation extraction model based on multi-instance learning, which was developed by [34]. MultiR transforms relation extraction into a multi-instance problem, but adopts a Perceptron algorithm and uses a deterministic ‘at-least-one’ assumption instead of a relation classifier. MIML: This is a multi-instance multi-label model for distant supervised relation extraction, which is proposed by [23]. MIML models the latent assignment of labels to instances and dependencies between labels assigned to the same entity pair.
Following [21, 34, 36], we use the held-out evaluation to demonstrate whether our SE-LSTM model is effective. The held-out evaluation provides an approximate measure of precision without manual evaluating. In our experiments, half of the Freebase facts are used for testing. Relation instances discovered from the test articles are automatically compared with those in Freebase. As for training cost, our approach processes around 100 sentences per second on a single NVIDA Tesla K20 GPU.
In Fig. 7, we compare the precision and recall curve for the three baseline distant supervision model, i.e., Mintz, MultiR and MIML and our proposed SE-LSTM model with multi-instance learning (MIL). The curve is constructed by ranking the predicted relation instances using their log-linear score. From Fig. 7, we can see that our SE-LSTM with MIL is consistently outperforming the three baseline methods with achieving higher precision over the entire range of recall. It is noticed that three baseline models are utilizing artificial designed features. However, our approach learns features automatically from original word tokens in sentences. The results of experiments are indicating that SE-LSTM with MIL is effective for distant supervised relation extraction, and features obtained by SE-LSTM can degrade the influence of error propagation which occurs in traditional feature engineering. Moreover, integrating SE-LSTM and multi-instance learning is a sound and valid solution to reduce false positive instances in distant supervision for relation extraction.
Comparison of precision and recall curve for the Riedel dataset of SE-LSTM with multi-instance and three baseline methods.
Precision comparison of SE-LSTM with multi-instance and three baseline methods in manual evaluation.
For the manual evaluation we adopt all Freebase facts as training instances. As relation candidate instances we choose those entity pairs which at least one participating entity is not in Freebase. It is meaning that there is no overlap between the held-out and manual candidates. Then we apply our models to the test set, and evaluate the top-
The results of experiments show that our SE-LSTM with MIL obtains competitive or even better precision than other three baseline models. More specifically, as
In addition, it is noted that the precision in manual evaluation case is much higher than in held-out evaluation case. This is because in the held-out evaluation, false negative instances in Freebase are in the majority of incorrect instances [37]. However, many of the false negative instances are in fact true relation instances which are misclassified due to the absence in Freebase. This is the same reason that there is a sharp decline in the held-out precision-recall curves [6].
Effect of sentence embedding and multi-instance learning
In this paper, we introduce sentence embedding to provide supplementary semantic information of the given sentence containing two entities and we incorporate multi-instance learning into our SE-LSTM model. This subsection analyzes how sentence embedding and multi-instance learning affect our model. To demonstrate the effect of the above two techniques, we first adopt LSTM alone through held-out evaluation as a baseline. Then we integrate sentence embedding and multi-instance learning with LSTM, respectively. The results of comparison experiments are presented in Fig. 9.
We see from Fig. 9 that when sentence embedding is incorporating into LSTM model, SE-LSTM yield a remarkable performance compared with LSTM alone. This phenomenon is indicated that proposed SE-LSTM is beneficial and can effectively capture structural and semantic information for sentence-level relation extraction. Moreover, we notice that after integrating multi-instance learning (MIL) technique, both LSTM with MIL and SE-LSTM with MIL achieve greater performance than their counterparts, LSTM alone and SE-LSTM alone, respectively. This suggests that adding multi-instance learning into our SE-LSTM can indeed degrade the problem of false positive instances in distant supervision for relation extraction effectively. Overall, SE-LSTM with MIL achieves the best performance due to the combination of both sentence embedding and multi-instance learning.
Top 500 precision in manual evaluation of three dropout strategies at different dropout probability
Top 500 precision in manual evaluation of three dropout strategies at different dropout probability
Effect of sentence embedding and multi-instance learning.
It has been proved that deep neural networks such as LSTMs have complex structure and large amounts of parameters, which probably lead to a better functional fitting capability, whereas they are prone to be over-fitting [44]. Therefore techniques which can alleviate such problem would be necessary. The dropout strategy proposed by Hinton et al. [11] is adopted in our approach, which can degrade the over-fitting problem effectively. In order to observe which dropout strategy is proper for our SE-LSTM and find out which dropout rate can bring the best performance, we devise a series of experiments of distant supervised relation extraction in the manual evaluation (
As described in Section 4.6, three dropout strategies are attempted for our SE-LSTM network, i.e. dropout word embeddings and sentence embedding, the penultimate layer and inner cells in LSTM units, respectively. From Table 2, it can be observed that the dropout of omitting inner cells of LSTM units is harmful to the performance of our SE-LSTM network. This is because LSTM has the characteristic of sharing weight, which leads to alleviating the over-fitting problem itself. Thus, the application of dropout is backfired. However, it is indicated that the other two dropout strategies can improve the performance of SE-LSTM. From Table 2, we can see that when dropout rate of word embeddings and sentence embedding is 0.5, SE-LSTM can achieve the best precision. And in terms of the penultimate layer the best precision is obtained when dropout rate is 0.8.
Conclusion
In this paper, we propose a novel neural network model for distant supervised relation extraction, SE-LSTM, which could be able to learn and extract features automatically almost from data itself without complicated NLP preprocessing. Meanwhile, we incorporate multi-instance learning into our SE-LSTM to tackle the problem of false positive instances in distant supervision for relation extraction. Experimental results indicate that our proposed approach outperforms traditional methods. Moreover, effect of sentence embedding and different dropout strategies are analyzed and discussed in a series of additional experiments.
The proposed approach just converts distant supervised relation extraction into a multiple instances learning problem. However, distant supervised relation extraction can be posed as a multi-instance multi-label problem [23]. Our future work is to modify our SE-LSTM model to solve the multi-instance multi-label problem.
Footnotes
Acknowledgments
We would like to thank members of Military Data & Knowledge Engineering Lab and the anonymous reviewers for their comprehensive feedback on all the research described here. This work has been supported by The National Natural Science Foundation for Young Scientists of China (Grant No. 71501186).
