Liste

The Perils and Prospects of Big Data in Finance

Lay summary

L’objectif de notre projet est de mettre en évidence les dangers de l’extraction excessive de données. Nous examinons le récit édifiant de « Si vous torturez suffisamment les données, elles vont avouer ». Nous utilisons des techniques statistiques avancées liées aux tests d’hypothèses multiples. Notre objectif est d’estimer les faux positifs susceptibles de se produire du fait de l’utilisation de tests classiques au lieu de tests multiples.

Notre étude concerne la gestion de fortune, en particulier les fonds de pension. Le coût d’investissement dans des stratégies non rentables est élevé pour les investisseurs qui épargnent en vue de leur retraite. Notre étude fournira des lignes directrices pour minimiser les risques de ces « faux » investissements.

Abstract

The goal of this research project is to highlight the dangers of excessive data mining in financial research. The scale of this problem has increased since the advent and widespread use of machine learning techniques in investment research. Equally importantly, incentives for research seem to be skewed in favor of finding false positives.Harvey (2017) reports that the problem of p-hacking, by which the statistical significance can be enhanced, is widespread. Most of the studies in this area look at only published research. Our goal in the first project is to evaluate all information contained in the most commonly used finance datasets and construct a large set of over two million trading strategies, some of which have been studied and published, as well as some that have been studied but not published, and those that have yet to be studied. We will use advances in statistics, specifically techniques from multiple hypothesis testing, to attempt to put a bound on p-hacking and to gauge whether truly abnormal trading strategies exist.The second project is motivated by the advent of machine learning techniques in finance. Nowadays, researchers routinely use web crawlers to extract news and their impact on stock returns. There have also been a spate of papers using elastic net regression techniques to learn more about the cross-sectional patterns in stock returns. Chinco, Clark-Joseph, and Ye (2017) find an impressive predictability in high-frequency returns using LASSO regression. Despite safeguards such as cross-validation, very few controlled experiments in finance document the efficacy of these techniques. Our goal is to construct such a simulation and be able to put an estimate on the statistical significance of results usually reported using real data.The third project justifies the word “prospects” in the title of this research proposal. Here we turn from the challenges of big data to opportunities of big data. We will use big data to analyze the strength of connections in money management industry. In particular, we will study whether connections (more broadly defined than just social media connections) between money managers and pension plan sponsors influence the probability of hiring. We will also study whether these connections have a positive or negative impact on the outcome of greater interest, viz., returns generated by money managers. Research methods in this proposal come from a variety of fields including (i) multiple hypothesis testing from statistics and econometrics, (ii) machine learning techniques, and (iii) network analysis. One of sub-projects does not require much data, another sub-project uses data regularly used by finance researchers, while the third sub-project requires purchase of new data. All projects are heavy on computational requirements.

Last updated:02.03.2022

SNSF
Project funding (Div. I-III)
Original data source 182198 i

1 People

Amit Goyal

We help you find the perfect fit.

Lay summary

Abstract