Dataset and source code

Book classification: (895 books x 10 workers, 101 classes)
Business classification: (388 companies x 10 workers, 273 classes)
Steps GLAD codes

File format

labels.csv
- Labels annotated by crowd workers
- Each row includes task ID (integer), worker’s name (string) and class ID (integer) given by the worker to the task
class_hierarchy.csv
- Hierarcical relationships among target classes
- Each row includes class id (integer and string) and its parent’s class ID (string) if it exists
true_label.csv
- Correct labels derived from existing databased (NDLC for book classification and Teikoku DataBank for buisiness classification)
- Some tasks may have multiple correct labels

Reference

If you use our dataset in your papers, please cite the following paper:
N. Otani, Y. Baba, H. Kashima, “Quality Control for Crowdsourced Hierarchical Classification”, in Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM), 2015, pp. 937-942.

Contact

Naoki Otani, Kyoto University.

Machine Learning and Data Mining Research Laboratory

Graduate School of Informatics, Kyoto University

Dataset for “Quality Control for Crowdsourced Hierarchical Classification”

Dataset and source code

File format

Reference

Contact