S D Samantaray, Geetika Jodhani
Abstract: In this era of digitization, most of the people get news from internet and often it can be difficult to tell whether stories are credible or not. Information overload and a general lack of understanding about how the internet works by people has also contributed to an increase in fake news or hoax stories. Traditionally we got our news from trusted sources, journalists and media outlets that are required to follow strict codes of practice. However, the internet has enabled a completely new way to publish, share and consume information and news with very little regulation or editorial standards. Our aim is to develop an automatic fake news detection system for analyzing the credibility of online news. So that the reader become aware about the news that is factually incorrect and optimized for sharing. News articles are nothing but a piece of text. Hence, the proposed work divided into two subtasks; Text Analysis and Performance Evaluation. Text analysis is for the transformation of text into numerical features. These numerical features used for matching the similarity between queried article and other articles. For articles similarity we have used hybrid of three text similarity approaches namely N gram (Character Based Similarity), TF*IDF (Term Based Similarity) and Cosine Similarity (Corpus Based Similarity). System tested for 100 news articles and analyzed that if more than three articles found to be similar with 0.70 matching value will result to truthiness of the input article.
Keywords: Fake News N-Grams, TF*IDF, cosine similarity, character Based Similarity, corpus based Similarity, term Based Similarity, matching value