Törkçe Haber Metinleri için Oylama Temelli Çoklu Siniflandirma Yaklaşimi

Translated title of the contribution: Voting-Based Multiple Classification Approach for Turkish News Texts

Basak Buluz, Yavuz Komecoglu, Merve Ayyuce Kizrak

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Nowadays, there are numerous sources on the internet that produce news on a daily basis. Through this growing knowledge base, it makes it difficult for users to access the information and news they are looking for. It is important to classify the information for fast and efficient search and access. In this study, a dataset consisting of Turkish news content Kemik prepared by Yildiz Technical University, Natural Language Processing Group, used. A hierarchical approach based on a voting structure is adopted by using machine learning based approaches. In order to solve the problem, firstly Tf-Idf method is applied for word 1-3-ngrams and character 2-6-ngrams. Thus, the 2000 dimensional feature vector is pre-Trained. By using FastText, 300-dimensional feature vectors and 2 feature vectors are combined to produce 2300-dimensional feature vectors. In order to determine the one that will increase the classification accuracy among these vectors, Support Vector Machines method is applied and Tf-Idf method which has the robust accuracy is determined as the main feature extraction method. Next, Support Vector Machines, K-Nearest Neighborhood Method, Random Forest, Logistic Regression, XGBoost methods are used for the classification of news texts. Estimated label values from all classifiers are voted for each sample and the label with the highest voting rate is considered as the final estimate. In this study, it is aimed to open the way to reach the right information quickly by classifying news topics. Finally, the feature vector size has been reduced using Principal Component Analysis and it is possible to gain processing speed without reducing performance. In both approaches, it is seen that the performance achieved by voting is higher than the individual performance rates of the classifiers.

Translated title of the contributionVoting-Based Multiple Classification Approach for Turkish News Texts
Original languageTurkish
Title of host publicationProceedings - 2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728128689
DOIs
Publication statusPublished - Oct 2019
Externally publishedYes
Event2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019 - Izmir, Turkey
Duration: 31 Oct 20192 Nov 2019

Publication series

NameProceedings - 2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019

Conference

Conference2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019
Country/TerritoryTurkey
CityIzmir
Period31/10/192/11/19

Fingerprint

Dive into the research topics of 'Voting-Based Multiple Classification Approach for Turkish News Texts'. Together they form a unique fingerprint.

Cite this