TY - GEN
T1 - Veri Madenciliǧi Teknikleriyle Türkçe Web Sayfalarinin Kategorize Edilmesi
AU - Hüsem, Seçil Şekerci
AU - Gülcü, Ayla
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/10/31
Y1 - 2017/10/31
N2 - Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important for this reason. In this study, a database consisting of 22 thousand samples was created in order to respond to the need for Turkish data and various methods used for text classification in the literature were tested on this data. Multinomial Naive Bayes (M-NB) and Support Vector Machines (SVM) algorithms which are frequently used for text classification, were compared by applying the n-gram word vector selection and information gain ratio. Beside these, it has been focused on the number of categories, the content of data used to train the model and the completeness of this data, and also the effects of these on classification success are examined.
AB - Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important for this reason. In this study, a database consisting of 22 thousand samples was created in order to respond to the need for Turkish data and various methods used for text classification in the literature were tested on this data. Multinomial Naive Bayes (M-NB) and Support Vector Machines (SVM) algorithms which are frequently used for text classification, were compared by applying the n-gram word vector selection and information gain ratio. Beside these, it has been focused on the number of categories, the content of data used to train the model and the completeness of this data, and also the effects of these on classification success are examined.
KW - Data mining
KW - Naive Bayes
KW - Support Vector Machines
KW - Text classification
UR - https://www.scopus.com/pages/publications/85040626842
U2 - 10.1109/UBMK.2017.8093385
DO - 10.1109/UBMK.2017.8093385
M3 - Konferans katkısı
AN - SCOPUS:85040626842
T3 - 2nd International Conference on Computer Science and Engineering, UBMK 2017
SP - 255
EP - 260
BT - 2nd International Conference on Computer Science and Engineering, UBMK 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2nd International Conference on Computer Science and Engineering, UBMK 2017
Y2 - 5 October 2017 through 8 October 2017
ER -