FUZZY DOCUMENT REPRESENTATION FOR SEARCH DIVERSIFICATION
Main Article Content
Abstract
Fuzzy document representationinvolves transforming the unstructured data into numerical vectors. Such a representation is more useful for text classification and document clustering. The proposed Fuzzy Conceptualization Model (FCM) performs conceptualization and provides a better data representation model on the basis of semantic relatedness and similarity between terms in a word corpus. Word embedding is used to hold the semantically related words in a concept cluster. The concept clusters are inferred and vectored forthe given corpus to hold the data in a multidimensional space. FCM determines the fuzzy membership value of a base term by calculating the affinity score between its corresponding word embedding and other word embeddings. A weighing scheme isused to distinguish between exact and approximate matches. The greatest bound for the distribution of base set over the documents gives the best matched documents for a search query. The exact and approximate matches are differentiated by considering the normalized term frequency of a term in the specified concept cluster along with its actual presence. The resultant matrix gives a lower dimensional and discriminated representation of data
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0 DEED).
You are free to:
- Share — copy and redistribute the material in any medium or format
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- NonCommercial — You may not use the material for commercial purposes .
- NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
Rights of Authors
Authors retain the following rights:
1. Copyright and other proprietary rights relating to the article, such as patent rights,
2. the right to use the substance of the article in future works, including lectures and books,
3. the right to reproduce the article for own purposes, provided the copies are not offered for sale,
4. the right to self-archive the article.