Dataset for “Quality Control for Crowdsourced Hierarchical Classification”

Dataset and source code

File format

  • labels.csv
    • Labels annotated by crowd workers
    • Each row includes task ID (integer), worker’s name (string) and class ID (integer) given by the worker to the task
  • class_hierarchy.csv
    • Hierarcical relationships among target classes
    • Each row includes class id (integer and string) and its parent’s class ID (string) if it exists
  • true_label.csv
    • Correct labels derived from existing databased (NDLC for book classification and Teikoku DataBank for buisiness classification)
    • Some tasks may have multiple correct labels


If you use our dataset in your papers, please cite the following paper:
N. Otani, Y. Baba, H. Kashima, “Quality Control for Crowdsourced Hierarchical Classification”, in Proceedings of the 2015 IEEE International Conference on Data Mining (ICDM), 2015, pp. 937-942.


Naoki Otani, Kyoto University.