TY - GEN
T1 - Törkçe Haber Metinleri için Oylama Temelli Çoklu Siniflandirma Yaklaşimi
AU - Buluz, Basak
AU - Komecoglu, Yavuz
AU - Kizrak, Merve Ayyuce
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Nowadays, there are numerous sources on the internet that produce news on a daily basis. Through this growing knowledge base, it makes it difficult for users to access the information and news they are looking for. It is important to classify the information for fast and efficient search and access. In this study, a dataset consisting of Turkish news content Kemik prepared by Yildiz Technical University, Natural Language Processing Group, used. A hierarchical approach based on a voting structure is adopted by using machine learning based approaches. In order to solve the problem, firstly Tf-Idf method is applied for word 1-3-ngrams and character 2-6-ngrams. Thus, the 2000 dimensional feature vector is pre-Trained. By using FastText, 300-dimensional feature vectors and 2 feature vectors are combined to produce 2300-dimensional feature vectors. In order to determine the one that will increase the classification accuracy among these vectors, Support Vector Machines method is applied and Tf-Idf method which has the robust accuracy is determined as the main feature extraction method. Next, Support Vector Machines, K-Nearest Neighborhood Method, Random Forest, Logistic Regression, XGBoost methods are used for the classification of news texts. Estimated label values from all classifiers are voted for each sample and the label with the highest voting rate is considered as the final estimate. In this study, it is aimed to open the way to reach the right information quickly by classifying news topics. Finally, the feature vector size has been reduced using Principal Component Analysis and it is possible to gain processing speed without reducing performance. In both approaches, it is seen that the performance achieved by voting is higher than the individual performance rates of the classifiers.
AB - Nowadays, there are numerous sources on the internet that produce news on a daily basis. Through this growing knowledge base, it makes it difficult for users to access the information and news they are looking for. It is important to classify the information for fast and efficient search and access. In this study, a dataset consisting of Turkish news content Kemik prepared by Yildiz Technical University, Natural Language Processing Group, used. A hierarchical approach based on a voting structure is adopted by using machine learning based approaches. In order to solve the problem, firstly Tf-Idf method is applied for word 1-3-ngrams and character 2-6-ngrams. Thus, the 2000 dimensional feature vector is pre-Trained. By using FastText, 300-dimensional feature vectors and 2 feature vectors are combined to produce 2300-dimensional feature vectors. In order to determine the one that will increase the classification accuracy among these vectors, Support Vector Machines method is applied and Tf-Idf method which has the robust accuracy is determined as the main feature extraction method. Next, Support Vector Machines, K-Nearest Neighborhood Method, Random Forest, Logistic Regression, XGBoost methods are used for the classification of news texts. Estimated label values from all classifiers are voted for each sample and the label with the highest voting rate is considered as the final estimate. In this study, it is aimed to open the way to reach the right information quickly by classifying news topics. Finally, the feature vector size has been reduced using Principal Component Analysis and it is possible to gain processing speed without reducing performance. In both approaches, it is seen that the performance achieved by voting is higher than the individual performance rates of the classifiers.
KW - dimension reduction
KW - majority voting
KW - natural language processing
KW - support vektör machines
KW - text classification
UR - http://www.scopus.com/inward/record.url?scp=85078343716&partnerID=8YFLogxK
U2 - 10.1109/ASYU48272.2019.8946333
DO - 10.1109/ASYU48272.2019.8946333
M3 - Konferans katkısı
AN - SCOPUS:85078343716
T3 - Proceedings - 2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019
BT - Proceedings - 2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 Innovations in Intelligent Systems and Applications Conference, ASYU 2019
Y2 - 31 October 2019 through 2 November 2019
ER -