PAKDD 2015 Tutorial | Machine Learning and Data Mining Research Laboratory

Crowdsourcing for Big Data Analytics

Presenters: Hisashi Kashima (Kyoto University), Satoshi Oyama (Hokkaido University), Yukino Baba (Kyoto University)

Abstract. Automated data analysis technologies developed in data mining are certainly a core of big data analytics; however, on the other hand, it is also well known that it is not realistic to automatically analyze all of heterogeneous, complex, and unstructured data in the real world, and therefore a significant amount of manual data processing by humans is unavoidable.

Crowdsourcing is a relatively new idea to outsource human intelligence tasks to a large number of unspecified people via the internet, and it is attracting considerable attention as a promising solution to dissolve the human bottleneck in the big data analysis.

In this tutorial, we will start with introducing the basic concept of crowdsourcing and how it is used for executing data mining processes in the big data analysis. Then we focus on two major usages of crowdsourcing, data collection/annotation and modeling, and technical issues accompanied by them, including quality control problems in crowdsourcing results. Various data-driven approaches to these problems will be introduced. Finally, we will address safety and ethical issues such as security, privacy, and fairness, which are unavoidable in crowdsourced data analytics, and introduce technical efforts for alleviating them.

Slides:

Part I: Crowdsourcing for data analytics (Hisashi Kashima)

Part II: Crowdsourcing for datafication (Satoshi Oyama)

Part III&IV: Crowdsourcing for analysis & Future directions (Yukino Baba)