Web scraping gives fresh insight and impetus to investment strategies

Uncovering data on social media that could reveal the next big stock is becoming increasingly important. How do investors track that data down?

When amateur traders on social network website Reddit sent shares in US video game retailer Gamestop rocketing in January 2021, both professional and retail traders started scouring the r/wallstreetbets forum to try to get a whiff of what the next big meme stock. No longer were financial statements alone sufficient to gauge the likely direction of a company’s share price: investors had to start factoring in online sentiment as a potential indicator as well.

The need for such alternative data has fuelled increased interest in web-scraping tools that monitor stock conviction on social media platforms and give investors clues on share price performance that would not be apparent with just traditional methods of equity analysis.

“One of our main data sets is the data we have tracking r/wallstreetbets discussions, which received a ton of interest at the start of this year while Gamestop was making headlines,” says James Kardatzke, CEO of alternative data platform Quiver Quantitative. “Alternative data has become really popular with hedge funds across the board. Basically, every hedge fund and professional investor out there is looking for ways that they can get as much information as possible about the companies they are investing in, and a lot of them have taken to getting data from non-traditional sources.”

“We’re still in a period where our courts of law are trying to figure out how to legislate some of these alternative data sets. So over the next decade there’s probably going to be a lot of new developments on the legal front and more clarity on what’s allowed and what’s not.”

James Kardatzke, CEO, Quiver Quantitative

Market watchers expect the use of web-scraping techniques for investment insights will continue to grow. Capital markets consultancy Opimas in 2019 predicted that asset managers were on track to spend almost $2bn on web-scraping activities by the end of 2021. They also predicted that, by 2022, the number of web pages accessed daily for investment-related scraping would rise to 25bn, up from 10bn in 2018, with hedge funds leading the way.

Many small and mid-size funds are relying on third-party vendors to provide that data, while some larger funds have been building out their own in-house teams. “The pioneers in this space are typically the hedge funds,” says Vinesh Jha, CEO of alternative data provider ExtractAlpha. “A few traditional asset managers have some more advanced data science teams that are doing this sort of thing, such as BlackRock, but mostly it’s the hedge funds, and it’s a combination of building their own capabilities in-house for some of the larger firms that have the resources and also working with vendors like ourselves. A lot of funds have a belief they should build things themselves but in practice they find it’s perhaps too much, so they end up contracting out certain things such as the data collection to vendors.”

Unlocking data insights

Collecting the data is relatively easy. Making sense of alternative data to unlock insights that can help investors make smarter decisions is the tricky part. “Some of the data sets are by their nature very complex, so they might require a full team of data analysts to really make use of the data and gain a competitive edge,” says Kardatzke. “For some of the data sets, there is quite a bit of intensive work that needs to go into turning it into something that you can generate insights from. Of course, it’s an entirely different question about whether those insights can actually generate investment returns that consistently outperform the market.”

Web scraping is not without its challenges. Not only can website structures change, potentially disrupting the scraping process, but also the legal backdrop remains ill-defined. “We’re still in a period where our courts of law are trying to figure out how to legislate some of these alternative data sets”, says Kardatzke. “So over the next decade there’s probably going to be a lot of new developments on the legal front and more clarity on what’s allowed and what’s not.”

That means that funds engaged in web-scraping activities need to remain vigilant about potential shifts in the legal environment. “Financial institutions employing web-scraping techniques should consult with knowledgeable legal advisors on a regular basis and use other techniques to monitor developments in this area,” says Doug Rappaport, a partner at law firm Akin Gump. “These institutions should also conduct thorough due diligence on any vendors they may consider contracting with that may employ web-scraping techniques. This is an evolving area of the law and the legal and regulatory landscape may change as thinking and perspectives evolve in this area.”

Tracking visitors with partner companies

Given that potential legal uncertainty around web scraping, some alternative data providers have been using different methods to sniff out online sentiment trends. ExtractAlpha, for instance, has a data set that tracks the number of visitors to stock ticker pages on various retail-focused financial websites. It does that not by scraping data but through a partnership with a company that serves ads on those websites, which through cookie tracking can figure out the number of page impressions for a particular stock.

“That is a very correlated measure of these types of activities – you see those spikes such as Gamestop and other r/wallstreetbets-type stocks show up in the data – so you can come at this from different angles,” says Jha.

One of the products ExtraAlpha offers investors is the Digital Revenue Signal, which aggregates various online activity such as web search data, web traffic, and social media engagement via non-scraped sources to give investors an early heads-up if a company is likely to perform better than expected.

“More engagement tends to mean people are interested in that company and potentially interested in buying its products or subscribing to its services, so it’s a consumer sentiment-type of metric and, as such, it is predictive of a company’s revenues,” says Jha. “What’s interesting about that is that it captures revenues in a way that the Street doesn’t seem to recognize; sell-side analysts do not seem to use this information very much so it can actually predict not just revenues but also whether a company beats revenue expectations, which is ultimately what matters for investors.”