Dataset and source code
- Book classification: (895 books x 10 workers, 101 classes)
- Business classification: (388 companies x 10 workers, 273 classes)
- Steps GLAD codes
File format
- labels.csv
- Labels annotated by crowd workers
- Each row includes task ID (integer), worker’s name (string) and class ID (integer) given by the worker to the task
- class_hierarchy.csv
- Hierarcical relationships among target classes
- Each row includes class id (integer and string) and its parent’s class ID (string) if it exists
- true_label.csv
- Correct labels derived from existing databased (NDLC for book classification and Teikoku DataBank for buisiness classification)
- Some tasks may have multiple correct labels
Reference
If you use our dataset in your papers, please cite the following paper:
N. Otani, Y. Baba, H. Kashima, “Quality Control for Crowdsourced Hierarchical Classification”, in Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM), 2015, pp. 937-942.
Contact
Naoki Otani, Kyoto University.