Abstract
Ontology is used for ranking the webpages using the semantic recommendation system. Ontology matching with the semantics of words in the said query makes the use of the web and identifies the semantic ontologies by differentiating relevant and irrelevant ontologies. The filtered results from the merging of ontologies are processed to semantic recommendation modules to calculate similarity, pattern analysis and probabilistic ranking of webpages. Semantics discovers ontologies from the domain. The use and merging of ontologies with relations provide high interpretability of recommendations. After merging ontologies, the results are forwarded to the recommendation phase. The semantic recommended results are computed on the categories such as books, online web pagination, projects, repositories, experts of the subject, citations and online journals. Semantic recommendation system allows a number of users to create their login area and construct a logfile to store previous searched queries by the individual then display the recommended results on the basis of previous queries. Thus, Ontopark Ranking Framework for Semantic Web Mining is the optimizing technique to detect online user behavior.
Introduction
Being able to use the internet and disseminate information to people all over the world requires the use of the World Wide Web. This past decade has seen a dramatic increase in the size and scope of the Internet, making it dynamic and ever-changing. Initially comprising 26 million pages in 1998, Google’s index now has one billion. According to Google’s announcement on July 25, 2008, it has discovered one trillion distinct URLs. Users are presently drowning in data as a result of the Internet’s fast expansion (Gautam and Tiwari, 2016).
Finding what you’re looking for on the Internet might be a challenge because of the overwhelming amount of data available. The growth of data on the Internet has taken a heavy toll on the human brain’s ability to process large volumes of information. An ever-increasing amount of information overload is a result of this It is becoming increasingly difficult to uncover patterns in the Web that can be exploited as rich data sources for data mining. This problem may be resolved via information retrieval and information filtering. As stated by (T. A. Al-asadi, 2017), While these tools can be used to gather some critical data from the Internet, they fall short in retrieving all of it. Only relevant data can be extracted from the Web’s data with the Semantic Web, minimizing the amount of results returned (Vidya and Banumathy, 2015). Most knowledge engineers spend a significant amount of time shadowing and learning from specialists in their fields. As a result, the rate at which new information may be absorbed slows. Web-based intelligent information systems find it especially challenging to learn new information due to the absence of time-saving and centralised domain knowledge engineering. For the Semantic Web’s knowledge-based applications, ontologies are critical. On top of these, a semantically-aware World Wide Web is developed (Mughal, 2018). Ontologies have a role in a variety of fields, including data mining, AI, and knowledge management. They provide effective retrieval by allowing inferences based on domain information obtained during the knowledge base’s creation. Ontologies will be required for knowledge management systems in corporations and other organizations to share information (C. Li, 2021; Anurag Kumar and Ravi Kumar Singh, 2016; Simranjeet Kaur, 2015).
Web mining background
Extracting data from the web using the term “data mining” is a common practice nowadays. To retrieve data at the user’s request, WM is used. No matter how hard we try, there will always be an issue with the sources from which we get our knowledge. There is a need for such umbrella operations for the evolution of web mining standards examined by Bhamare and Pawar and Ahmad Tasnim Siddiqui (2013) (MohdShoaib, 2014).
Web mining is a term used to describe the process of extracting fresh data from the internet. In terms of dealing with issue overloading, WM was recognized as the best. This exploit is utilized in order to locate and achieve desired effects. The term “data mining” refers to the process of searching a database for the most relevant and useful information. Data is entered into an online form using a query, which is processed thereafter, and the results are shown as illustrated in Figure 1 below. Query processing for web mining.
It is possible to separate web mining into three categories: content mining, which is fantastic for data mining, structure mining, which is wonderful for detecting linkages between sites, and use mining. Data warehouse experimentation by Malhotra (2014) shows that storing data in a warehouse can be used in the future to obtain information that is required.
Data warehouses are used to store a wide range of information. This includes data that can be used in industry. When it comes to collecting data, some industries collect various kinds of information while other gather just a single kind. A wealth of useful information can be found on this page. The data warehouse can be utilized to determine the support system. Using historical data, certain systems can predict the future.
Warehouses for storing data of all kinds are commonplace today. The data in this example is industry-specific. All forms of data are collected in some industries, while only a single category is collected in others. This knowledge is important and practical. Making choices concerning the support system is aided by data acquired from the data warehouse. Based on previous data, certain systems are able to predict what will happen next.
Web mining is a subset of categorization, as its name suggests. This sort of mining is permitted and legal. You must follow a precise path to get to this information’s web pages on the internet. These logs show that this information is gathered promptly. Additional relevant review logs can be found in CGI scripts as well. Data mining, internet intranet-based applications, and access to a wide range of information are all dependent on this portion. According to Jain and Purohit (2011), the following components should be included: A. Web content analysis B. Web Structure Analysis C. Web Usage Analysis
Web content analysis
It can be used to identify the most critical material in a document in the same way that a web page is. WCM can be either on-site or off-site. The first two types of data are those that can be found in hard copy or electronic format (R. H. Salman, 2020; Anurag kumar, 2017). There is a section titled “Knowledge Discovery in the Text” (KDT) by Aggarwal, Gates and Yu, which includes HTML and an amorphous paragraph (1999). It’ll be the multi-media and structured information’s job to first provide a conceptual explanation. “ (E. T. John, 2016)” A database is a tool for organizing and storing information gathered from various sources, including the internet. Systems can now mix several methodologies, such as agent-based and database-based methods, to maximize the distribution of information, as development tools have progressed. (2008) and Faustina Johnson (2012).
Agent based approach
The autonomous software entity is referred to as a “agent” in this method. In this case, the agent is able to do the tasks on behalf of others. In addition to being capable of doing their mission, they also represent important characteristics such as mobility, adaptability, learning capacity, user preferences, and search methodologies. Other uses include the identification of a document’s subject representation and identifying online sites that are similar to those noticed by Rana (2012). For three of the system’s duties, agent-based content mining is preferred (Herrouz, 2013; Malarvizhi and Saraswathi, 2013).
Smart search engine: Two examples of intelligent search agents are Harvest and Shopbot, which were created with the goal of acquiring knowledge-based information and understanding it by using the data gathered. A harvest agent is a programme that collects data on a given domain from a variety of documents. The shopbot system, on the other hand, may be used to gather product information from a variety of websites. Like hyper suits, many systems operate filters that categorise data. Clusters of docs may be created using this technique.
For example, a user may use web watcher and Firefly in conjunction with other sorts of data information to get the results they want from the website based on their own preferences and the actions they take while searching.
Database approach
This approach organises semiconductor data in a way that allows for the best possible data management and querying. Multilevel data mining uses web mining to extract the necessary data from hypertext documents in a hierarchical database structure. Two-level data hierarchies are utilised in the execution of this method. The lowest and highest levels are represented by these two tiers. At the lowest level of the hierarchy, semi-structured hypertext doc data is used to create metadata, which is then saved at the highest level. This data is obtained, and it is organised using the object-oriented database structure (Kosala and Blockeel, 2000).
Web structure analysis
A method for finding data from internet documents that are linked to one another. It is made up of nodes and edges, which are web pages and links (Pahl, 2002). All pages may be computed and indexed using this. Hyperlinks and the structure of the content are the two primary functions of this method (Verma, 2011).
Hyperlink
There are two types of hyperlinks: intra-document and inter-document. The intra-document hyperlink connects pages within a single document, while the inter-document hyperlink connects pages across many documents. Using this strategy, the hyperlink tool may be examined and improved as well as the quality of the search method itself. It also uses and analyses links in the same way that an algorithm would (Vijayarani, 2015).
Page rank
A metric called Page Rank is used to estimate where a content will appear in search results. It determines the numerical weight of the whole connected pages based on the degree of the hyperlinks and the quantity of citations. The link structure of a website has an impact on how well a page ranks in search engine results. Because of the importance of the connection to the web page, the link to other web pages from this page is also deemed crucial. A backlink that appears on a high-importance page will be linked with a longer wait time than one that appears on a low-importance page, due to this more advanced method of determining the similarity index of the web page (Pradhan, 2020).
Weighted page rank
It’s a newer variation of Google’s PageRank algorithm. Instead of dispersing the higher rank value evenly, this strategy will allow it to be applied to important pages. When a user clicks on one of the out-links, they immediately gain from the value of that link. It is feasible to show wout (a, b) as the in-link page and wout (a, b) as the out-link page for win (a, b). Here, the letters (a, b) stand for the weight of the link and the quantity of in-links, respectively (Miguel Gomes da Costa Jnr., 2005).
The inbound links from pages b and k are represented by ok in the equations above, whereas the reference page an is represented by P(a). The most effective approach for calculating page rank is depicted in the following mathematical equation (Monica Sehgal, 2014).
Document structure
Web content is arranged in a tree-like structure, and each page contains a large number of tags. The core idea is an automated, bidirectional obtaining format.
In order to get the specific document structure data after the clustering, it is implemented using structure-based clustering.
Using a pattern framework, the linking web pages are analysed. It’s possible to access these sites in many ways, depending on the user’s level of interest.
Web usage mining
It’s an essential strategy for figuring out how things interact and how to make predictions about future behaviour based on that pattern. It can also assist in identifying the most popular types of websites among its visitors. This method is being used by a wide range of businesses to make the most of their valuable time by giving the best possible weblinks (Chakurkar, 2014).
Server logs, browser logs, and company databases may all be mined for data using online use mining techniques (Pranit Bari, 2013). The information requested from an online page is frequently saved thanks to the web server log. Pattern determination and assessment are the first two steps of this procedure
Pre-processing of data
As well as working out the pattern, it executes information that changes the shape of. Cleansing knowledge, detecting users and sessions, and determining paths are common approaches (Kamika Chaudhary, 2013). • Data cleaning: It’s possible to use data cleansing to get rid of the unnecessary logs. • Detection: the following method is used to identify the user and session
UI: Because of the user’s identity, UI works out which online pages are accessed by the user and which are assessed by them. This method may be used in a variety of ways, such by using an IP address, cookies, or directly verifying it. The time technique can also be used to break up periods into many user sessions if an analogous operator is using the browser.
Session identification: It is a method for determining which internet pages are viewed by the user at a given moment and for which browser functionalities are being used to represent distinct operators. A single or numerous sessions is common. In this case, the user has been identified, and therefore the session is broken up into smaller chunks. Sessionization is the term used to describe this process. As a result of these three factors, it is frequently destroyed:
Time spent on each page: This tells you how long it took you to get to the third page after the first.
Site navigation pages: This is a site-to-site connection.
Page stay time: It’s used to keep track of how far apart two timestamps are.
Pattern discovery
As a result, it’s difficult to get a handle on how people browse the web. Deducing cache hits is a common solution to this problem. Detection of patterns There are a variety of data processing methods that may be used once a user’s transaction has been verified.
Statistics: Utilizing statistics, you may learn more about the users that visit your website. It does statistical analysis based on the user’s journey across the website, including average page views, frequency, and average time spent on each page. The association criteria discussed above are used to identify the most popular sites visited during a certain server session.
Clustering: This is the process of classifying items according to the characteristics they share. When it comes to this, there are usually two options: In order to locate a group of people who browse in the same way, a method called “usage clustering” is employed. Because of the way the pages are organized, this is possible. As a result, all e-commerce sites in the same industry have the same material.
Classification is a highly effective method of dividing a volume of information into established groups. In this way, the developer is able to create precise content that will serve as the foundation for various classes and groups on upcoming websites. In an effort to make sense of the mess, a variety of classifiers are commonly applied. Web mining relies heavily on pattern analysis to aid users in discovering the data they want by putting the discovery process into action. Information is collected using a variety of means, including OLAP, Visualization Methods, and Knowledge Query Procedures.
In order for an OLAP agent to be considered intelligent, it must be able to digest data quickly and do new things.
In this method of information search, command language components resembling query language are often employed. Using visualization is an important way to explain graphic tools and apply colour to different values. Technology known as online analytical processing is used to speed up the processing of many transactions at the same time.
Semantic web mining
One of the major goals of Semantic Web Mining is to combine the two rapidly emerging fields of Web Mining and the Semantic Web. A step toward Web intelligence is the Semantic Web (WI). Languages that include semantic material from webpages and make it machine-readable enable this technology. It will be simpler to develop and analyse semantic markup using Semantic Online technologies if ontologies are utilised to mark up web resources. Users are now required to define data in ontologies using more meaningful XML documents and novel semantic markup languages as a result of this technological transition. Manually creating an ontology is still a labor-intensive and challenging procedure (Raji, 2016).
Information on the Semantic Web, an intelligent retrieval system, is easily absorbed by machines. Currently, the information published online can only be decoded by humans. The future of the internet will likely be significantly impacted by the Semantic Web. The Semantic Web is seen as an expansion of the current Web as meta-data is added to sites. Data that can be handled by a computer is known as metadata. The explicit encoding of meta-information and other domain theories, such as ontologies, will improve the Web’s level of service. On the other hand, creating the Semantic Web is a challenging task at the moment.
Web mining is a relatively new method that has swiftly acquired popularity in the field of web intelligence. Web mining may be seen as a data mining technique for gathering, collecting, generalising, and evaluating information. Data mining methods may, of course, be used to web mining. Web mining, as opposed to data mining, relies on Web-related data sources such as semi-structured HTML or XML documents, logs, services, and user profiles. Data mining, on the other hand, depends on more conventional databases and is hence more challenging to execute. In order to automatically find and extract information from web-related data sources, web mining needs the development and deployment of many goal-oriented methodologies and techniques.
A relatively new area of research called Semantic Web Mining seeks to create a Semantic Web by fusing the Semantic Web with Web Mining. This topic is multidisciplinary, with researchers from the domains of business, information retrieval, and computer science all contributing. The goal of semantic web mining is to use Semantic Web technologies to find and extract relevant, useful, and exciting patterns from a substantial quantity of Web data. By exploiting the semantic architecture of the Web, it helps to create a more useable web. The benefits obtained in e-activities, health care, privacy and security, knowledge management, and information retrieval have given it a lot of industrial impetus.
Technology for the semantic web
The Semantic Web becomes increasingly evolved as a result of technological advancement. Metadata, ontology, logic, and agents are the four new Semantic Web technologies in development (Ristoski and Paulheim, 2016). The following are some examples:
Metadata
Metadata refers to information about data. Part of the meaning of data is captured through metadata. Rather than being structured for computer readers, web material is currently geared for humans. Website management is mostly accomplished using HTML. Although Web sites do a good job of presenting information, computers will have difficulty discriminating between keywords with more than one meaning. In order to solve this problem, the Semantic Web replaces HTML with better languages that provide information about their content. This HTML code is a good example:
The problematic HTML code is replaced with the following meta information about the content. Instead of pre-existing tags, users-defined tags are utilised.
User-defined tags make this representation easier to understand.
Ontology
An explicit and formal definition of a term may be found in an ontology. An ontology serves as a base for combining data from many sources, boosting collaboration in online communities, improving search capabilities, and using reasoning based on the use of existing knowledge. Content-based access, interoperability and communication across the Web are made possible by ontologies. As a result, ontologies are critical to the creation of the Semantic Web.
Ontologies provide a mechanism to deal with the diversity of Web resource representations. As a unifying framework for providing information a common representation and semantics, an ontology’s domain model might be considered. Ontologies serve as a common language for describing a subject matter on the Internet. To overcome terminology discrepancies, it is vital to have a common understanding. Using ontologies to organise and navigate websites is beneficial. As a bonus, they boost search results on the Internet. Concepts, relations, instances, and axioms make up the bulk of an ontology.
Concepts- concept is a group or class of items inside the domain that is represented by the idea For instance, the term “student” refers to something that is under the purview of education.
Relations- ideas or attributes of a concept that interact with one other are called relations. Exams are one common method of gauging a student’s progress.
Instances- The ‘objects' that a notion represents are known as instances. Malala, for example, is an example of a student.
Axioms- Constraints for classes or instances are enforced via axioms. A relation’s qualities are axioms, in this case. Re-examination is required for students who score less than 40%.
These four components are used to store keywords’ semantic information in an ontology for future reference. Ontology merging may be required if they need to be derived from databases of knowledge.
Logic
Logic is a branch of philosophy that focuses on the foundations of human thinking. For expressing knowledge, logic provides formal languages and well-understood formal semantics. It is possible for automated reasoning systems to draw conclusions from the provided information, making the implicit knowledge apparent. Predicate logic expresses information as follows:
On the other hand, logic is a broader concept. For intelligent agents, it may be used as a tool for making decisions and deciding what to do. If a shopkeeper decides to offer a discount to a client because of this guideline,
Customer loyalty is based on data gathered from business databases.
A benefit of logic is that it can explain the conclusions that it draws. Its drawbacks include the fact that logic cannot be machine-processed and cannot be employed on its own. Web-based data must be utilized in conjunction with other information.
Agents
Agents are software programmes that are self-sufficient and proactive. The Semantic Web’s personal agents will take instructions from users, search the Web for information, converse with other agents, compare user requirements and preferences, pick options, and respond to the user. Metadata, ontology, and logic will all be utilized by Semantic Web agents.
Identifying and extracting information from the Web will be done via metadata.
Web searches, information interpretation, and interoperability with other agents will be made easier with the help of ontology.
To analyse the data and generate conclusions, logic will be employed
Agent communication languages are necessary for sophisticated applications, which is a downside of utilising them. There must also be a formal depiction of the thoughts and aspirations that people have.
Literature review
An explicit and formal definition of a term may be found in an ontology. An ontology serves as a base for combining data from many sources, boosting collaboration in online communities, improving search capabilities, and using reasoning based on the use of existing knowledge. The Semantic Web is the main focus right now. The Semantic Web is being constructed one layer at a time as it progresses.
According to Fouad et al. (2012), when developing a Semantic Web, the following stages should be taken into consideration: • Machine-readable statements should have a consistent syntax. • Creating a shared vocabulary. • A logical language must be agreed upon by all parties. • Proofs can be exchanged in the language
Yasodha and Dhenekaran (2014) made the suggestion to provide the research community a viewpoint on the methodologies to use in order to extract meaningful patterns from the web.
Workflows and a Semantic Map. It was utilized by Acuña and Zoe (2016) for the interpretation of legacy Python operations and the mapping between a website ontology and its structure. WISE Instrumentation for Structure Extraction. You can go through the scientific procedures while asking yourself conceptual questions if you use these methods.
Asikri et al. (2016) set out to examine and supply the issue of ontology development using the Semantic Web Mining technique during this research project. The Semantic Web and Web Mining are usually regarded as the two study fields that make up this system. The application of the knowledge mining technique to Web content, trends, and use is sometimes mentioned in the context of online mining. After this, Semantic Web is often referred to as the second-generation World Wide Web. It helps the user while the procedure is carried out in addition to giving the machine the data it needs. Both local and universal patterns can be discovered by using this tool. Ontologies are labor-intensive and time-consuming to create manually, so academics are motivated to develop an automated method. The author had written the review in order to identify the points where the two regions intersect, as well as to discuss the many approaches for integration that will yield the best outcomes.
Now that there is so much information online, web users can clearly see the boundaries of this world. It isn’t easy to gather information and expertise from so many different sources. Semantic Web and data processing are two young, fast expanding fields that must be combined for a human’s new context to add up and for data contained in websites to be acquired. In order to transform the internet into a shape of semantically information area, these two regions may overcome the challenge and merge these two regions.
Ontology engineering has been demonstrated to implement ontology learning from web data to resolve interoperability issues between systems, so the technique is frequently created to gather knowledge from the web context and illustrate these data in a way that is understandable by arbitrary parties such as machines or humans. The author employed comparisons between two locations in his study to acquire relevant data.
According to Yasodha and Dhenekaran (2014), it takes a lot of work to sift through the vast quantity of knowledge that is accessible online to get the specific information. You lose a lot of time by clicking on unnecessary links that search engines provide. Manually combining data from many websites is likewise a challenging task. The Semantic Web approach is so commonly used to simplify data retrieval and make specific data easily accessible to users. You may utilise the metadata information in a Semantic Web ontology to locate the data you’re searching for.
The Ontology-Based framework for Semantic web page mining was proposed by the author. A semi-automatic task that may be used in a variety of applications, including education, health care and tourism is what we’re talking about. Ontology engineering is frequently carried out by implementing the ontology language, RDF, using the author’s beginner technique, which is implemented in JAVA (Resource Description Framework). NetBeans IDE was used to make the screenshots. The success of this system is evaluated using a variety of metrics, including accuracy, average accuracy, and relevance score.
Extracted information can be used to support the newly developed schema and ontology, as Ramesh et al. (2017) demonstrated in their paper (see Figure 1). An ontology and online usage mining study was offered to investigate knowledge machines that might be adjusted for extraction of valuable information, leading to an increasing need for blog analyzers across a wide range of internet sites. Making the proper framework for preserving knowledge can be a major undertaking. Personalization is frequently used to overcome this problem by tailoring the information available on the Internet to the specific needs of the user. Online use mining methods have been discovered through previous research, and these techniques are critical to the development of online page suggestion systems. Additionally, the pattern and recommendation approach for the context-dependent suggestion system were developed utilising an online use mining methodology without the need of semantic knowledge. The author spent the most of his time focusing on mining techniques. Sequential Pattern mining and the provision of frequent Sequential Patterns were both made possible thanks to the CloSpan approach, which was implemented over the semantic region. Semantic knowledge was then added to the patterns, allowing them to be included into the website Recommendation model offline.
The optimal SOA realization approach can be deployed over the internet, according to Cheng et al. (2018). Concerns have been raised about data island issues in the Web, service sector, after conducting a comparison with traditional sites. It’s not just online services that have their own library, but also ordinary websites that have their own library. Because it is difficult to develop internet services quickly and accurately, study was required to develop the online services.
Semantic mining and indexing-based rapid online service discovery was introduced by the author First and foremost, it was necessary to specialize in the various formalizing models for Web services that made use of the most important and necessary data from Web services. Online services matching engine was introduced with accuracy after this and there was no dependence on semantic ontology in the matching engine.
A strategy for detecting Web services was also introduced by the author and supported by the index library, which will speed up the processing of search requests. In order to tackle the problem of searching precision in the typical index framework, semantics mining was used. In the end, an evaluation of the Web service discovery framework’s efficiency was conducted, and the findings showed a large accuracy rate, which is able to resolve the challenge of persuading low search intervals and high search precision.
Nadim et al. (2018) discovered that the process of identifying knowledge is made more difficult by the proliferation of Web of Things (WoT) services. This problem can be solved by using several ways such semantic Web-based clustering, which reduce the number of determined services. In addition, old methods were ideal for a set amount of data and did not take into account the many types of services available.
In addition, a variety of services were centrally managed. The distributed types of WoT services were outlined in the author’s architecture proposal. Clustering, indexing, and ranking were the first three filtering services included in the design, and each of them was supported by semantic annotation. WoT gateways are used to provide mobility for IoT gateways as well as dynamically enhancing the clustering method to create this architecture.
The researchers Effendi and co (2018) Currently, the Petri model is used to model a variety of systems due to its efficiency as a modelling tool. Uncertainty, concurrency, and synchronization are all available with this tool. Because Petri nets were so frequently used, a number of researchers devoted attention to them. Hence, reusability, sharing, and standardization should all be included in their Petri net design. Semantic Web technologies are frequently employed in the implementation of this notion. Ontology is an example of a Semantic Web technology. When it comes to Petri nets and OWL DL, the author developed a formal way for illustrating business tasks.
Ontopark framework
This method uses RDF ontologies to rank pages in the semantic web. The approach proposed is an extension of the classic information retrieval vector space paradigm. Using an ontology and Semantic Web technologies, meaningful information may be retrieved from the Internet. RDF knowledge bases can help you find solutions to difficult questions in a matter of minutes. During the ranking procedure, the keywords from the refined question are mapped to the RDF entities in the knowledge base that was created specifically for that query. The suggested design eliminates the need to rank pages with a PageRank of zero. Semantic annotations added to RDF files enable semantics-based retrieval of information.
Methodology
The three stages of the framework’s operation are preprocessing, ontology construction, and ranking.
Figure 2 depicts the ONTOPARK framework. ONTOPARK framework design.
Preprocessing
At this point, the framework has taken the user’s query and has retrieved the first 30 Web pages that Google has determined are most relevant to it. You may see a site’s PageRank by installing Google’s toolbar or by using a PageRank checking tool like https://www.prchecker.info/. If a website’s PageRank value is zero, it is likely that the page has no useful information and should be ignored. Consequently, they are useless even if they contain pertinent information. Then, preprocessing methods like stopword removal, stemming, and POS-tagging are applied to the content of all Web sites with a PageRank higher than 0. (Parts-of-Speech-tagging).
Some of the rules are: If the word endsin’ed.',remove’ed.' If the word ends in’ing’, remove’ing' If the word ends in’ly’, remove’ly'
The Porter-stemmer method is used for stemming. The morphological and inflected ends of English words are removed using the Porter –Stemming method. Porter stemmer’s official website may be found on the Web 10. The algorithm is also available in ANSIC on the Web11. Stemming is used for search phrases and web page content.
Word Tag cook verb (noun) meat noun in preposition (noun, adverb) a determiner (noun) big adjective (noun) vessel noun
As a result of this knowledge, a word’s range of alternative meanings might be limited. There are three noun senses, seven adjective senses, and one adverb sense for the word “bitter” in WordNet 2.0. There is no room for misinterpretation when using bitter as averb (Roy et al., 2019).
For POS-Tagging, the ONTOPARK system makes use of Stanford POS-tagger.
The Web has a Java-based Stanford POS tagger 12. For the next stage, the improved query is utilised as the input.
Ontology construction
Ontology The ONTOPARK architecture is very dependent on building. An RDF knowledge base is created for each query once the inquiry and the content of Web sites with non-zero PageRank have been preprocessed. In RDF files, RDF ontologies are developed and utilised with preprocessed websites.
The RDF knowledge base that is pertinent to that query is made up of all of these annotated RDF files. The ranking mechanism will then employ this RDF knowledge base.
Resource description framework (RDF)
The Resource Description Framework (RDF), developed by the World Wide Web Consortium, was first created as a metadata information design, claim Manuja and Garg (2011). As a reason for handling metadata, RDF permits interoperability between programming that trades data that might be perceived by machines. Information about documents and other machine-open assets can be put away as metadata. There are three sorts of substances in RDF archives: • A resource is anything that isn’t a part of the WWW itself, such as a Web page, a collection of Web sites, or even a physical object. All resources are identified by their URIs in RDF. • Specific features, characteristics, or interactions that describe a resource are called properties (Resource, Property, Value) triples make up each sentence.
As illustrated in Figures 3 and 4, an RDF graph can be employed to visually represent RDF triples, highlighting the relationships between subjects, predicates, and objects. RDF graph example. ONTOPARK’s algorithm.

Ranking
We use a model based on the information retrieval approach of vector space to award a rating. The frequency with which each word occurs in Web database pages is tallied to calculate the query phrase term weights in the vector space technique. The proposal proposes for the establishment of term weights for each time a query word appears in an RDF data file, as well as any semantically related term in the RDF knowledge base. We tweak the TF-IDF technique—where TF stands for Term Frequency and IDF for Inverse Document Frequency—to compute Term weight (IDF). Based on how closely each RDF file fits the query, the RDF knowledge base assigns each RDF file in the database a relevance score using this term weight. The relevance score establishes the ranking of the results.
RDF files r1 and r2m are included in the knowledge base K. Framework accepts query Q = ‘x1n’ with the terms in it. This query returns a list of the most popular ndocuments. AnRDF filer’s weight W (x,r) of a term x is computed as
(where tf (x,r)denotes the term frequency and id f (x,K)denotes the inverse document frequency)
Terms that are semantically connected to one another are included in RDF filer term frequency (tf (x, r)). It’s a compute das.
(where freq (x, r)denotes the number of occurrences of the term x in rand
freq (s(x),r)denotes the frequency of the semantically related terms of x that appear in r).
Document frequency df (x, K) is the number of RDF files containing x in K. Most RDF files will include the same term, so it’s unlikely to be significant or discriminatory. In this case, the inverse document frequency, indicated by the symbol idf (x, K), is equal to
|K|+1
df (x,K)
(where |K| = m is the size of the knowledge base K and d f (x, K) is the number of RDF files in knowledge base K annotated with x).
The documents are ranked according to a relevances core which is the relevance of an RDF filer to the query Q. The relevance score Score (Q, r)is computed as
(where ΣW(x,r)denotes the sum of the weights of all the query terms).
Illustration of Page Rank Calculation
Think on the following question: “What is Data Mining??” Google’s preprocessed query Data Mining January 2014 returned a list of web pages and their PageRank, as shown in Table 1. To determine the query’s Inverse Document Frequency, we can use the formula provided in the following section: IDF = log (|K|+1) /df (x, K) = log (28/ 27) = log (1.0370) = 0.0158 (Here K = 27 because only 27 Web pages have non-zero Page Rankand df (x,K) = 27 because all the 27 RDF files contain the query term x). List of web pages retrieved for the query ‘data mining’.
Add the frequency of each term and any semantically related terms that are annotated to each RDF file to determine Term Frequency (TF) using the formula (3.3). The following equations are used to determine the relevance of URLs 1 and 2:
The same process is used for all subsequent iterations. 0 PageRank URLs have no TF, no weight, and no relevance score, thus the TF of each word and its accompanying relevance score are also zero. Finally, the URLs are ordered according to their relevancy score in decreasing order. Search results are sorted by the importance of the most important query term among pages with the same relevance score. One for which the sum of word frequency is at a low value is an essential inquiry phrase.
Data mining for page rank calculation for the query.
ONTOPARK algorithm
This section focuses on the ONTOPARK algorithm. The ONTOPARK framework’s pseudocode can be seen in this algorithm. The most relevant and highly rated results for the user’s query are returned by this algorithm. Algorithm: ONTOPARK Input: Query Output: Ranked Web pages relevant to the query
The recommended method improves accuracy by removing unnecessary Web sites with PageRank values of 0 before computing the rank.
Building an RDF knowledge base assists in the retrieval of semantic data.
It eliminates the retrieval of ambiguous results for keywords with multiple semantics.
The framework generates a PageRank value that is unique to it.
The framework is query-dependent and is concerned with the semantics of keywords.
Therefore, ONTOPARK’s precision is satisfactory.
There is no possibility of link-spamming as the proposed framework is not link-based.
Evaluation measures
It is evaluated using three metrics: Precision, Relative Recall, and F-measure. Accuracy is assessed using precision. The number of Web pages that were retrieved is used to gauge how relevant a page is. A measure called “Relative Recall” contrasts how many web pages ONTOPARK and Google were able to find with how many total web pages were found. The AP/AR scores are calculated by averaging the precision and recall levels of all single- and multi-word searches. The MAP/MAR (Mean Average Precision or Relative Recall) is computed for single-word and multi-word searches.
Evaluation measures of ONTOPARK.
Using Google to find information on the Internet has become a common occurrence. There are a few issues with Google’s Page rank algorithm that cause it to return a lot of useless results. Google’s search results are re ranked using anontology-based framework called ONTOPARK for semantic web structure mining in order to provide users more precise results. Measures such as F-measure are also used in evaluating performance. Search engine results are compared to those from the study. The proposed paradigm provides a significant amount of precision. For multi-word inquiries, ONTOPARK’s F-measure is better than that of Google.
Conclusion
A formal statement of an idea that is articulated explicitly is referred to as an ontology. In other words, the ideas, attributes, relations, functions, axioms, and constraints that make up the system are all explicitly stated. Using these words, you may describe what you know about the subject. Formal in the sense that it can be read and understood by computers. In the sense that it is a simplified representation of the subject matter, it is a conceptualization. As a result, translating ontology concepts from English to French has no effect on the ontology’s conceptual structure. A panel of specialists has previously agreed on the material, and it is now being disseminated. An ontology establishes a standard terminology for scientists working in the same field. Basic ideas in the domain are defined, as well as the relationships between them, in a way that computers can understand. As a result of its advantages, it’s commonly employed in fields like information retrieval and artificial intelligence. Ontology has the following advantages:
Allows for the exchange of information. People or software agents can have a shared notion of information structure. Reuse of domain knowledge should be made possible. Make assumptions about the domain clear. Distinguish between “domain” and “operational” knowledge Analyze domain knowledge Be a symbol of one’s values and aspirations.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
