Liste

Large-Scale Human-Machine Systems for Data Science

Lay summary

Ein Anwendungsgebiet, in dem dieser Brückenschlag besonders drängt, ist die die Datenwissenschaft. Währendem datenwissenschaftliche Analysen bedeutende Erkenntnisse versprechen, stellen sich für deren Forscher neue Herausforderungen: sie müssen herausfinden, welche Fragestellungen im Detail beantwortet werden sollen, wie sich allfällige Datenquellen finden und kombinieren lassen, sowie wie man die Resultate zusammenfassen und vermitteln kann. Zusätzlich sind Datenwissenschaftler im Moment noch rar und deshalb schwer zu finden.

Ziel dieses Forschungsprojekts ist es, zu untersuchen, wie Menschen und die KI gemeinsam datenwissenschaftliche Aufgaben lösen können. Insbesondere möchten wir neue Methoden der Mensch-Maschinen-Kooperation entwickeln, welche Datenwissenschaftliche Probleme erkennen und lösen kann, um den Erkenntnisgewinn durch die Datenwissenschaft auch Laien zur ermöglichen. Dabei kombinieren wir Erkenntnisse der Statistik, Datenwissenschaft, Schwarmintelligenzforschung und der computergestützten Gruppenarbeit. Die Erkenntnisse dieser Studie helfen uns die Zusammenarbeit von Mensch und Maschine besser zu verstehen – ein Ziel, das für unser Leben und unsere Arbeit im Zeitalter der KI von immer grösserer Bedeutung ist.

Abstract

Artificial Intelligence (AI) is increasingly taking over work domains that used to be the realm of people (Brynjolfsson and McAfee, 2014). In recent history, the industrial revolutions fueled similar fears of unemployment, but eventually led to a new distribution of labor. Indeed, humans and machines present different, and, therefore, complementary characteristics that suggest that they are not perfect substitutes of each other (Chui et al., 2016). While some systems have occasionally proven to be capable of beating humans in specific well-defined tasks (High, 2012; Wang et al., 2016), humans’ innate ability/creativity/versatility to deal with incomplete or ill-specified tasks gives them an advantage in many work settings. Especially in groups, they are able to accomplish astonishing tasks that seem difficult to automate (Malone et al., 2010).

Machines, in contrast to humans, are able to work systematically, homogeneously, and repeatedly without being bored or tired. Humans exhibit varying work quality, cognitive diversity (i.e., they have vastly varying skill sets), and need to be motivated (Bernstein et al., 2012). Assuming that well-structured tasks will eventually be automated and that humans have some innate capabilities difficult to completely replicate by machines the main question to be answered is: How can we combine groups of human and machine intelligences to effectively perform ill-structured tasks that are not achievable by either party alone?

One field that seems to crystalize itself as in dire need of combined, cooperative human-machine intelligence is data-driven knowledge discovery (Gil, 2017) or Data Science (Cao, 2017b,a). The explosion of available data on the Web and from sensors as well as the advances in machine learning have facilitated knowledge discovery in society, science, and the economy faster than ever before: Data-driven findings are shaping the way organizations take decisions (Business, 2012), new empirical scientific disciplines have emerged, citizens are constantly exposed to data journalism (Gray et al., 2012), and participants of civic coding events analyze/visualize data to better understand and improve their own environment. However, at the same time this rise in data and methods introduces several challenges for data scientists, who must specify what questions are worthy of pursuing and how to find, integrate, interpret, summarize, and analyze a vast number of sources of diverse nature at different stages of their multidisciplinary work. But Data Science professionals are very short in supply. To address this need, scientists have proposed statistical expert or intelligent discovery systems (Bernstein et al., 2005; Serban et al., 2013). In practice, though, the success of these systems is limited as well-trained data scientists are still an indispensable part of the process.

Other AI-oriented efforts have developed methods to automatically search and digest scientific publications (Gil et al., 2014), and dynamically update analyses with the appearance of new related data (Gil et al., 2017). However, there is still a massive amount of manual work to identify gaps in the literature and design new Data Science problems to be investigated. Furthermore, mosts data analyses are still mostly pursued by individuals or small groups (ignoring AI-based support), as much of the tooling for collaborative Data Science is either inspired by software engineering tools and is not always suited for the explorative data analysis process or from collaborative writing lacking the capability to structure analyses and capture rationale.

To address this shortcoming, the goal of this proposal is to investigate how human-machine cooperatives (from now on HuMaCs) (Malone and Bernstein, 2015) can review, define and solve Data Science problems and results, in order to facilitate systematic progress in data-driven discoveries. Specifically, we will design, build, deploy, and evaluate HuMaCs that allow both novice and expert users as well as machines to collaborate on complex Data Science tasks. In pursuing this goal, the project will combine insight from statistics, Data Science, collective intelligence, computer supported collaborative work, and crowdcomputing to pioneer the exploration of human-machine collaboration for the data analysis task-a topic of utmost importance with the our life in the age of the smart machine.

Last updated:04.03.2022

SNSF
Project funding (Div. I-III)
Original data source 184994 i

Information Technology
Mathematics, Natural- and Engineering Sciences;Engineering Sciences

1 People

Prof.Abraham Bernstein

We help you find the perfect fit.

Lay summary

Abstract