Mapping of the vector space model with cognitive skills of the user using fuzzy approach

Abstract

Information retrieval process is an inference for the real-world user communication, which is based on the concept of storing, representing and searching information. Moreover, it utilizes the repository in order to retrieve the stored information in an effective way. The processing of such system is not an easy task; also, its complexity depends on the quality of searched information. The information that would be retrieved depends on the query formed by the user. The fuzziness of human brain is extremely high as every person has different cognitive skills, opinions, thinking, perception, situation, intention, intuition and domain; these varied attributes results in a fuzzy query by the user for any information need, thus it can be said that different types of user have different query apprehension. In order to provide efficient and relevant results to the user according to the information need, the foremost requirement is to understand the users’ query. In this paper, hybrid of Vector Space Model with fuzzy logic inference has been implemented. The purpose of the system is mapping of the Vector Space Model with cognitive skills of the user using the fuzzy approach. The similarity between the documents has been computed using the fuzzy logic in order to evaluate the query results based on the graininess of the user.

Keywords

Information Retrieval cognitive vector space fuzzy logic terrier TREC hybrid

1. Introduction

Information Retrieval (IR) is a process of extracting relevant documents as per the information need of the user. A lot of data is created everyday by the user intentionally or unintentionally. Data is stored in unstructured format and extracting information from this big pool of data isa tedius and complex task. Various information retrieval algorithms are proposed in literature for retrieving information which assists in retrieving relevant information from the repository [17]. The most commonly known are:

Boolean Retrieval [25] – It deals with the matching of exact queries. The matching is done based on the Boolean operators such as AND and OR. The term is either significant or insignificant when the IR system is boolean. Significant here refers to the terms that occur atleast once whereas, insignificant terms are those that donot occur even once in the document. It is one of the oldest model for IR systems, known for its simplicity. But the model fails when it comes to the exression formulation according to the grading scale (extended boolean model). This lead to the need of a model that would deal with the user query which is fuzzy in nature.

Vector Space Mode [25] – VSM model calculates the TF-IDF for each document and converts them into a vector to search the relevant documents. This model is advantageous as it improves the retrieval performance by arranging the documents according to the similarity to the user query and also allows partial matching of the query with the documents. With these added functionalities the model does not posses simplicity.

Probabilistic Model [25] – This model was proposed by Robertson and Johnes who calculates the probability of finding a document $d$ relevant to query $q$ . It is also known as the best match model.

A document contain numerous terms, where each term has different level of significance. But there is no way by which one term can be entitled more significant than the other. The information retrieval of the documents is majorly based on the way the document is represented and the query that the user gives. The representation of the documents is centered around the most basic constituent of the document, that are the terms. Further, the query that the user gives is based on these terms. The user query can be expressed through various methods in which the use of logical operators (AND, OR) is one of them [4]. These operators allows the user to use combination of terms in order to give more efficient query. It is assumed that the terms in the users’ query are completely relevant to the needs of the user. But this assumption is not always true, there is some imprecision in the users’ query. A user has some thought process running at the back of his brain about what exactly he wants to search. According to Ingwerson a user is usually categorized in three groups namely – that are verificative, consicious topical and ill defined. Each type of user formulates the query in different way [5].

Formulating a query based on the thought plays a major role in retrieving the document. The search will be more refined if the query formed is such that it clearly expresses the need. The imprecision can be caused due to the vague knowledge of the user about the subject to be searched and assumption that the document representation is partial. This imprecision (partial significance of the terms) cannot be handled by the boolean IR system. In order to retrieve relevant documents from the set of whole documents, it is necessary that the system is capable of handling uncertainity. The VSM model provides the score that reflects the relevence of the document w.r.t. the given query. This ranking can be used for developing the membership function in the range of 0 to 1, leading in gradual transition from membership to non membership which would be abrupt in case of boolean retrieval system. Thus, fuzzy set theory to devlop the fuzzy inference system which could better coorelate the thinking pattern of the user as per the catergories given by ingerwerson. The keywords that represent the document is the focus in the IR research. Non topical attributes characterization can also be used for the document eventhough it may sometimes cause imprecision. There has to be a system that would handle the uncertainity caused by the vague queries by the user and the imprecise values. A fuzzy IR system works on the fuzzy set theory that considers both the vagueness in query predicates as well as the uncertainity in database record. In this paper, the Logic based model has been combined with Vector Space Model using fuzzy logic. This hybrid model improves the performnace and flexibility of the system and also simplifies it.

The paper is divided as follows: User graininess description, background of fuzzy IR and vector space model has been described in Section 2. It also covers the hybrid model used in the research. Section 3 provides the experimental design based on the hybrid of fuzzy IR with the vector space model including the dataset, tools, model implementation and the result evaluator. Section 4 presents the results obtained from the implemenation of the hybrid model and also covers the major discussions of the study; and Section 5 presents the conclusion of this work.

2. Background

IR is basically a system that provides a platform to the user to import their query in order to retieve the information from the unstructured repository to fulfil their information needs. The user tries to formulate a query as per knowledge, which might not be treated structured by the repository for retrieving relevant documents.

2.1 Graininess of the user

An IR system is designed for retrieving information, it discusses about the uncertainty of the data and query formulation but never discusses the diversity of the user. Every user has a different set of varying cognitive skills (thinking, opinion, thought process, and perception decision making etc.) which affects the query given by the him. Thus, it is important to understand the state of mind of the user. Cognitive Informatics provides better understanding of the complexity of the users [22]. These informatics simplifies researchers to have a better understanding of users’ searching pattern and query formulation [6].

Figure 1.

Fuzzy information retrieval system.

Figure 2.

Vector space model stages.

Graininess is a concept to break the modules into smaller chunks, grains or terms such that each particular grain plays an important role [7]. User graininess categorizes the user in three, namely: verificative user, conscious topical and ill defined user. Table 1 gives the information about the types of user. The graininess of the cognitive information retrieval can be used to understand the users’ need in the context of executive functions (flexibility, theory of mind, anticipation, decision making, problem solving, working memory etc.), motor skills and perception.

Also, different retrieval methods can be used for different users.

Table 1

Types of user with their information need/intentions and cognitive skills

Type of user	Information need with intention	Cognitive skills
Verificative user	• Deterministic search • Exact query formulation • Concept clarity	Executive function
Conscious topical	• Undeterministic search • Query formulation-know • Partial clarity of concept	Motor skills
Ill defined user	• Random search • Query formulation is poor • Concept is not clear	Perception

Type of user

Information need with intention

Cognitive skills

Verificative user

•

Deterministic search

•

Exact query formulation

•

Concept clarity

Executive function

Conscious topical

•

Undeterministic search

•

Query formulation-know

•

Partial clarity of concept

Motor skills

Ill defined user

•

Random search

•

Query formulation is poor

•

Concept is not clear

Perception

2.2 Fuzzy information retrieval

The fuzzy IR set model basically has two elements, queries and documents [1]. These elements are represented by set of indexed terms. The documents together are stored in repositories. The queries are put on these repositories which brings out the matched documents. This matching is approximate that results in the vagueness. Further, each document has degree of membership which the system can effectively display [14].

Whenever a query is given, a term co-relation matrix is created which is then normalised using various algorithms [Ref]. The documents are categoried on a relevance scale based on the similarity in the terms of the query and the document. As the similarity between the elements increases the document weight increases too. Fuzzy IR informs the user about the validity of the document relevance to a query entered by the user [13, 23]. The following Fig. 1 shows a fuzzy IR system.

2.3 Vector space model

Vector Space Model (VSM) is an algebraic model of information retrieval that is used to represent text documents in the form of vectors. It was given by Salton et al. in 1975. It ranks the document depending upon the score of each query in that particular document. The major goal of VSM model is to rank the documents that contains the query most frequently. In this model, every document $d$ is considered as a vector of documents. While dealing with the vector space model algorithm, the query that is entered in the search engine goes through variousstages that are, indexing the documents, then term weighing and then finally finding out the similarity coefficients Using Cosine. These stages have been shown in Fig. 2.

The following steps are involved in VSM model that plays a crucial role in indexing the repository and for query processing:

Indexing the documents (ID) – The foremost step while indexing the documents is the removal of non significant words from the document vector. Indexing basically finds out the term frequency which can either high or low in range.

Term weighing (TW) – In this stage the resultant frequency from the prior stage is used in order to find out the weight of terms. There are 3 main factors that affect the term weight: term frequency factor, collection frequency factor and length normalisation factor. All the factors are multiplied to get TW as outcome. Whenever a query has to be processed, the recall and precision can be evaluated based on the factors mentioned above.

The TW for document vector $d$ can be expressed as shown in Eq. (1):

$\displaystyle\overrightarrow{V(d)}=(\textit{TW}_{d,1}\textit{TW}_{d,2})$ (1)

Each dimension of the vector corresponds to a term that is assigned by a weight called TF-IDF weight [1]. TF is term frequency of the document that contains the most frequently occurring terms and IDF is inverse document frequency of the document that are less frequent in the documents [15, 16]. Thus TF can be mathematically expressed as in Eq. (2):

$\displaystyle\text{TF}=\begin{cases}\log\left(f_{t,d}\right)+1&\text{if∼{}}f_{% t,d}>0\\ 0&\text{otherwise}\\ \end{cases}$ (2)

And IDF is given in Eqs (3) and (4):

$\displaystyle\text{IDF}=\log(N/N_{t})$ (3) $\displaystyle T_{\textit{WF}}=\textit{TFXIDF}=\textit{TFX}\left(\log_{2}\left(% \frac{N}{n}\right)+1\right)$ (4)

where, $T_{\textit{WF}}=$ Term Weight Frequency; IDF $=$ Inverse Document Frequency; $N=$ total number of documents; $n=$ number of documents containing term $T$ .

Similarity coefficients (SC) – Similarity coefficients are the resultant of document vector and query vector. In order to measure the SC the angle between the two vectors has to be calculated. The similarity of each document $d$ with the query $q$ is calculated using cosine similarity.

The result file obtained from the above steps which is the outcome of TF-IDF of the topic file. Further, the result file is used in the final evaualtion of the documents retrieved. The evaluation can be measured through various formulas, amongst which the major two are the recall and precision. Recall is the fraction or percentage of relevant documents that were retrieved and precision is the fraction or percentage of retrieved documents that are relevant. Figure 3 shows the venn representation of recall and precision.

Figure 3.

Venn diagram of recall and precision.

Mathematically, recall and precision can be expressed as shown in Eqs (5) and (6):

$\displaystyle\text{Recall}=\text{Relevant/Relevant Retrieved}$ (5) $\displaystyle\text{Precision}=\text{Retrieved/Relevant Retrived}$ (6)

3. Proposed model: Hybrid model converging VSM with users need

VSM gives us the TF-IDF values depending upon the specific term that has occured in a document frequency. Also, fuzzy logics are used to find the ranking of the particular document that are ranked previously by the user and stored as meta data. These soft computing techniques are being used majorly in decision-making applications with inexact and undefined knowledge. These applications of fuzzy approach is emerging consistently in area of reasearch [2]. Combining these techniques together will formularize a logic based model that will improve flexibility and performance of the vector model [3]. The fuzzy logic expresses relevance as a degree of membership that ranges between 0 to 1. The documents with relevance greater than 0.5 are considered as highly relevant, the documents with relevance 0.5 is considered as somewhat relevant and documents with relevance less than 0.5 are considered as less relevant for any search term. This will reduce the dimensionality of the vector space and the results that are more relevant would be collected in accordance to the page contents using vector space model. In Eq. (4), if $T_{\textit{WF}}$ is high, then the document is highly relevant to the query and if $T_{\textit{WF}}$ is low, then the documents’ relevance to that particular query or term will be low. This relevance will help the user to determine the level of uncertainty coupled with the document and relevance with the query fired by the user to the system [3]. The Fuzzy Vector Space Model with graininess is providing the fastest and accurate solution for retrieving the information as per the user requirement. The objective of this paper is to give the similarity of the graininess and incorporate the multi-type user with their need and intentions. Hence, graininess cognitive IR may provide a method for developing user knowledge based IR System [6, 24].

In the proposed Hybrid Model, the system access the database according to the query the user has phrased. The model is the combination of VSM and Fuzzy Logic. VSM preprocesses the dataset using stemming algorithm, removing stop words and indexing the terms and then it applies TF-IDF algorithm on topic files. After the above steps, result file is evaluated by Qrel file that gives the recall and precision. Fuzzy Logic categorizes the user query as well as their results based on the cognitive skills [22]. The output generated by fuzzy system indicates the type of user that feeds the quires. It is an efficient way to categorize the users depending upon the score of TF_IDF model. Different rules are generated based upon the membership functions of the fuzzy system. These rules decide the ultimate output of the system [13]. The model proposed in this study (Convergence of Fuzzy Inference with VSM) is given in the Fig. 4.

Figure 4.

Flow chart depicting the hybrid model.

To implement and validate the model experiment was conducted which has been given in detail in the next section.

4. Experimental evaluation

The hybrid model makes use of the Text Retrieval Extraction Conference (TREC) dataset that has been evaluated and analysed using Terrier and Matlab.

4.1 Dataset

The dataset that is used in the experiment was collected from Disk-1 Text Retrieval Extraction Conference (TREC) sponsored by Defense Advanced Research Projects Agency (DARPA) and National Institute of Science and Technology (NIST). Disk 1 consists of data from five newswires namely, Associated Press (AP), Department of Energy abstracts (DOE), Federal Register (FR), Wall Street Journal(WSJ) and Ziff-Davis (ZIFF). They contain document and their respective DTD (Document Type Definition).

The topic files for evaluating the result was collected from [9, 10] and evaluation file was collected from [11]. Topic files are basically the query files and Qrel files are the relevance judgment files according to which the evaluation is done.

4.2 Tools used

Terrier. Terrier is a highly flexible, well organized, and efficient open source engine that can deploy documents at a large scale. It is a hasty platform for research and experimentation in text retrieval. It is completely written in java and hence easy to use. It is an effective, efficient, flexible, multi-lingual, extensible and interactive tool for the researcher of the field information retrieval [11].

Indexing. Terrier processes the transformed data through indexing. The indexing is a four step process that includes the following:

•
Collection.
•
Documents.
•
Term pipelines.
•
Indexer.

Table 2
Trec Disk 1 Corpuscomplied details

Source # Doc Size # Unique terms # Tokens # Pointers

Trec Disk 1 506816 498 MB 414467 105115911 56719496

Firstly, the collection of documents is injected in the system. After having the collection of documents, an automatic indexing runs on these documents to create indexes upon which the further search is performed. During indexing, some preprocessing steps are applied on the documents such as removing of stop words, stemming of words etc. Thenuser identifies its information need and in accordance to this, generates a query statement. The system processes the query against the created index and returns a number of potential feedback with the score calculated for each document. The user selects the feedback generated by the system to access the documents. Figure 5 shows a block diagram for creating Index and searching relevant items.

Topic files. In Terrier supporting dataset format which is used as an information need that is “topic” to differentiate it from a “query” which is a data structure represented in the retrieval system. There are various statements of the topic in Terrier along with number of fields.

Qrel files. In order to test the relevance of topic files, relevance judgement files are made. They are often mistaken as the query files. But the qrel files are the batch files that are created with the opinion of experts.

Matlab. Matlab provides a toolbox for fuzzy logic implementation. It computes the inference engine rules, verifies the changes and analyses the parameters. It also provides a flexible environment in order to improve integrity of the system even though there are changes in the parameters of inference rules and membership.

Figure 5.
Creating index and searching relevant items.

4.3 Implementation of TF_IDF Model with Fuzzy Inference Engine

Source	# Doc	Size	# Unique terms	# Tokens	# Pointers
Trec Disk 1	506816	498 MB	414467	105115911	56719496

Information Retrieval is about retrieving the results against the queries generated by the users. Disparities in the ranking system of TF-IDF are mostly used by the search engines as a major means to score a document and rank a document for any given user query. The idea behind the fundamentals of Fuzzy System is to deal with the vague values rather than crisp values. MATLAB was used to create an inference engine that deals with words. The authors evaluated the TF and IDF for each query then further generated the rules using the fuzzy system. The generated rules indicates the relevance of the query and based upon this, the user is classified as verificative, conscious topical and ill defined. A fuzzy system was build that takes input P1, P2, Pn where, Pn is the evaluation score of TF_IDF model for query. The fuzzy rule based classifier generates flexible and useful structures [7]. Figure 6 shows a graphical model of fuzzy TF-IDF [20, 21].

Table 3
TF_IDF score count with average

S.no	TF-IDF score	Sum		Rank count	Rank score
					average
1	36	36.	915	0	36.915
2	35	35.	390	1	35.39
3	33	33.	500	1	33.50
4	32	32.	900	1	32.90
5	31	62.	680	2	31.34
6	30	61.	370	3	30.46
7	29	148.	006	5	29.60
8	28	370.	653	13	28.51
9	27	962.	550	35	27.50
10	26	1190.	610	45	26.45
11	25	1603.	007	63	25.44
12	24	1812.	023	74	24.48
13	23	2252.	450	96	23.46
14	22	3504.	550	156	22.46
15	21	5972.	190	278	21.48
16	20	7668.	625	375	20.44
17	19	11787.	095	606	19.45
18	18	6468.	950	892	18.46
19	17	23716.	136	1358	17.45
20	16	35069.	619	2129	16.47
21	15	48951.	853	3166	15.46
22	14	69577.	680	4813	14.45
23	13	106464.	570	7915	13.45
24	12	174098.	150	13985	12.44
25	11	253603.	870	22104	11.47
26	10	323828.	300	30941	10.46
27	9	431935.	640	45651	9.46
28	8	520807.	294	61404	8.48
29	7	511456.	150	68268	7.49
30	6	543870.	300	82833	6.56
31	5	124979.	320	22642	5.56
32	4	45467.	430	10043	4.52
33	3	17317.	520	4856	3.55
34	2	6996.	290	2808	2.49
35	1	402.	092	264	1.52
36	0	93.	107	184	0.50

Table 4

Rule set of the Fuzzy_TF_IDF system for different types of user

Rule number	Rule definition	Output
R1	IF TF_IDF $>$ 36	HIGH with verificative user
R2	IF TF_IDF $<$ 35 & $>$ 30	MEDIUMHIGH with verificative user
R3	IF TF_IDF $<$ 29 & $>$ 25	MEDIUM with conscious topical user
R4	IF TF_IDF $<$ 24 & $>$ 15	MEDIUMLOW with ILL defined user
R5	IF TF_IDF $<$ 14 & $>$ 7	LOW with ILL defined user
R6	IF TF_IDF $<$ 7	HIGHLOW with ILL defined user

Figure 6.

Fuzzy_TF_IDF.

5. Results and discussions

To evaluate the effectiveness of TF_IDF with fuzzy model, data from the NIST TREC Disk 1 was used that is mentioned in Table 2.

Baseline system. Terrier 3.5 [11] Open Source Tool with query expansion module was used as a baseline search engine. Terrier is an open source, high performance, full featured text search engine written entirely in java. The queries were fed into the system. TF_IDF was used to generate the scores. The scores that were generated was divided into five different levels to differentiate the types of users. Maximum score obtained was 31.84 and Minimum score obtained was 1.24. The fuzzy TF_IDF regulates the relationship among the user and TF_IDF score which maps these two factors.

Table 3 shows the TF_IDF score count with average of the scores obtained in that particular query. Further, Fig. 7 depicts the graph for sum, rank count and rank score average.

Figure 7.

Sum, rank count, and rank score average.

Fuzzy TF_IDF Rules:

Different rules were generated by classifying the type of the users. TF_IDF rules were generated. Scores of TF_IDF was given as input through the fuzzy system and corresponding results were generated. Here having knowledge or shortage of knowledge to determine weights of the effectiveness of users can affect the overall ranking.

Refer Table 4 for the rule set of fuzzy TF_IDF system.

Figure 8.

Membership function.

Figure 8 illustrates the membership functions used in Fuzzy TF_IDF.

As discussed, there are three types of user in cognitive information retrieval who has their focus and reason for searching. Some users are clear about what they want while others are confused about their own thoughts. Thus, fuzzy modeling is an area that deals with this vagueness when user’s perspective is not clear.

The resultafter evaluating the data is:

Table 5

Evaluation of qrel file

Number of queries	200
Retrieved	200000
Relevant	56377
Relevant retrieved	13551
Average Precision	0.0830
R Precision	0.1548

Table 6 gives the precision @N along with their scores. A comparison is done with the precision score evaluated for standanrd VSM model with the proposed hybrid model. The result shows a significant improvement, this improvement can be well understood by the gain.

Table 6

Comparision of precision @N of TF_IDF and hybrid model

S.no	Precision @N	Normal TF_IDF	Hybrid model	Gain
1	1	0.2867	0.3750	0.0883
2	2	0.2733	0.3425	0.0692
3	3	0.2756	0.3217	0.0515
4	4	0.275	0.3120	0.037
5	5	0.276	0.2935	0.0175
6	10	0.2547	0.2553	0.0006
7	15	0.2356	0.2787	0.0431
8	20	0.2217	0.2708	0.0491
9	30	0.2044	0.2608	0.0564
10	50	0.1837	0.2423	0.0586
11	100	0.1434	0.2021	0.0587
12	200	0.0994	0.1593	0.0599
13	500	0.0538	0.1042	0.0504
14	1000	0.0317	0.0678	0.0361

The table can be well understood with the graph as shown in Fig. 9.

Figure 9.

Comparison chart of retrieved relevant terms with score.

6. Conclusion

In this paper a hybrid information retrieval model has been propsed by converging vector space model in user’s cognitive skills into a fuzzy inference system. The VSM model produces automated rank list as an output that is based on term weighted score. The human perception and thinking creates significant variation in the retrieved information. Cognitive information retrieval models were used for understanding the psychology of human brain, how it thinks and perceives information and categorizes the users accordingly. In this paper the users were categorised into three groups, namely: verificative, conscious topical and ill defined and their needs were taken as a base to create a rule set that can give more precise the retrieval for a given query. As the categorery of the user do not have the fixed boundaries, a fuzzy logic based inference system was desinged. The six rules were framed for each user and TF_IDF score. The output of the inference system delivered new ranking which out-performed the normal TF_IDF score based on the performance evaluation using precision and recall. In the future, the work can be extended further using query expansion technique where the selection of a suitable query expansion apporach is used based on type of user and would also be combining more information retrieval models with other machine learning algorithms.

References

Chawla

. Application of hybrid of fuzzy set, trust and genetic algorithm in query log mining for effective information retrieval. International Journal of Intelligent Systems and Applications in Engineering. 2018; 6(1): 47-52.

Khezri

Hosseini

Mazinani

. A fuzzy rule-based expert system for the prognosis of the risk of development of the breast cancer. 2014.

Chiew

Lau

. Monotonicity preserving SIRMs-connected fuzzy inference system for predicting HPC compressive strength. Intelligent Decision Technologies. 2018; 1-10.

Kanagavalli

Raja

. A fuzzy logic based method for efficient retrieval of vague and uncertain spatial expressions in text exploiting the granulation of the spatial event queries. in: International Journal of Computer Applications (0975-8887), National Conference on Future Computing CoRR. 2013.

Wilson

Devillers

Hoeber

. Fuzzy logic ranking for personalized geographic information retrieval. in: Proceedings of the Third International Conference on Intelligent Human Computer Interaction (IHCI 2011). Prague, Czech Republic, August, 2011. Springer, Berlin, Heidelberg. 2013; 111-123.

Jakhar

Rajnish

. MEasurement of complexity and comprehension of a program through a cognitive approach. International Journal of Engineering-Transactions B: Applications. 2015; 28(11): 1579.

Abualigah

Khader

Hanandeh

. A hybrid strategy for krill herd algorithm with harmony search algorithm to improve the data clustering. Intelligent Decision Technologies. 2018; 1-12.

Mahdizadeh

Eftekhari

. Proposing a novel cost sensitive imbalanced classification method based on hybrid of new fuzzy cost assigning approaches, fuzzy clustering and evolutionary algorithms. International Journal of Engineering-Transactions B: Applications. 2015; 28(8): 1160.

http://trec.nist.gov/data/topics_eng/index.html.

10.

http://trec.nist.gov/data/qrels_eng/index.html.

11.

www.terrier.org.

12.

Jayashree

Devi

Papandrianos

Papageorgiou

. Application of fuzzy cognitive map for geospatial dengue outbreak risk prediction of tropical regions of Southern India. Intelligent Decision Technologies. 2018; 1-20.

13.

Chawla

. Effective personalization of web search based on fuzzy information retrieval. International Journal of Computer Science and Information Technologies. 2015; 6(3): 2831-2837.

14.

Beigbeder

Mercier

. An information retrieval model using the fuzzy proximity degree of term occurences. in: Proceedings of the 2005 ACM Symposium on Applied Computing. ACM. 2005 March; 1018-1022.

15.

Rubens

. The application of fuzzy logic to the construction of the ranking function of information retrieval systems. arXiv preprint cs/0610039. 2006.

16.

Gupta

Saini

Saxena

. A new fuzzy logic based ranking function for efficient information retrieval system. Expert Systems with Applications. 2015; 42(3): 1223-1234.

17.

Beigbeder

Mercier

. An information retrieval model using the fuzzy proximity degree of term occurences. 2005.

18.

Schwarzer

Schubotz

Meuschke

Breitinger

Markl

Gipp

. Evaluating link-based recommendations for Wikipedia. in: Digital Libraries (JCDL), 2016 IEEE/ACM Joint Conference on. IEEE. 2016 June; 191-200.

19.

Kasemsap

. Mastering web mining and information retrieval in the digital age. Web usage mining techniques and applications across industries. 2017; 1-28.

20.

Hannah

Geetha

Mukherjee

. Automatic extractive text summarization based on fuzzy logic: A sentence oriented approach. in: International Conference on Swarm, Evolutionary, and Memetic Computing. Springer, Berlin, Heidelberg. 2011 December; 530-538.

21.

Doran

Yetilmezsoy

Murtazaoglu

. Application of fuzzy logic approach in predicting the lateral confinement coefficient for RC columns wrapped with CFRP. Engineering Structures. 2015; 88: 74-91.

22.

Sanchiz

Chevalier

Amadieu

. How do older and young adults start searching for information? Impact of age, domain knowledge and problem complexity on the different steps of information searching. Computers in Human Behavior. 2017; 72: 67-78.

23.

Dewan

Biswas

. A unified approach for fuzzy multiobjective stochastic programming with Cauchy and extreme value distributed fuzzy random variables. Intelligent Decision Technologies. 2018; 1-11.

24.

Roy

Kumar

Sharma

. A novel fuzzy document based information retrieval model for forecasting. Fuzzy Information and Engineering. 2017; 9(2): 137-159.

25.

Roelleke

. Information retrieval models: Foundations and relationships. Synthesis Lectures on Information Concepts, Retrieval, and Services. 2013; 5(3): 1-163.