Preventing Fraud at Robinhood using Graph Intelligence
Robinhood was founded on a simple idea: that our financial markets should be accessible to all. With customers at the heart of our decisions, Robinhood is lowering barriers and providing greater access to financial information and investing. Together, we are building products and services that help create a financial system everyone can participate in.
…
Authored by: Katie Liu, Senior Data Scientist, and Claire Liu, Staff Software Engineer
In today’s digital finance landscape, the challenge of preventing fraud is a critical and complex task. As we continue our mission to democratize finance for all, our steadfast dedication to fraud prevention is essential in safeguarding the financial interests of our customers. With more than 23M net funded accounts, we prioritize a user-friendly experience with features like retirement accounts, 24 Hour Markets, 5% APY with Gold and more.
We have previously shared practical guidance and recommendations on account security, covering topics such as “How to identify and report scams” and “Security best practices“. These resources offer advice to enhance your online safety.
As a safety-first company, building safe and secure systems via detecting high risk fraudulent behaviors is a top priority. As an industry, we face the same three major fraud categories:
- Identity fraud: accounts opened using stolen or synthetic identities.
- Account takeovers: due to hacking, scams, or phishing attacks that create unauthorized fund transfers.
- First-party fraud: deliberately reckless trades on volatile assets with no intent or capability to pay for, if unsuccessful.
The teams at Robinhood implemented a meticulously designed fraud detection system to identify fraudsters while minimizing impact on legitimate users. One foundational contributing factor is graph intelligence, which supports models and tools that play key roles in our fraud defense systems. As continuous reduction of losses is a key objective for Robinhood’s Customer Trust & Safety team, let’s explore how we accomplish this.
Graph data helps prevent fraud.
In the digital fintech space, fraud is typically executed at scale and involves multiple actors, as these are coordinated efforts that often involve hundreds (or even thousands) of accounts at various firms. When fraudsters identify an opportunity, their goal is maximizing profits fast. On the flip side, in fraud detection, coordinated attacks typically have common elements: this makes identifying common assets easier with a graph data structure.
To visualize connections between accounts, we use a graph-suitable data model. Our data is rich in relationships, resembling edges when we represent users and their attributes as nodes: it’s a natural fit for graph modeling.
How we build a fraud detection graph.
Part 1: Data modeling
We start with a Robinhood user and their attributes and relationships, connecting them with other accounts. A node may represent a customer and their assets while relationships represent connections. A heterogeneous graph surveys different types of entities and relationships to figure out commonalities.
Part 2: Types of graph intelligence for combating fraud
To gain intelligence for combating fraud via graph, there are two graph algorithms.
-> Type 1: Vertex-centric intelligence
Vertex-centric graph intelligence helps us quantify the likelihood that the user is a bad actor. These features involve going from a node (or a set of nodes) on a defined path to collect statistics about connections. Computation typically starts from a user-specified point and expands to a subsection of the graph.
With Seed analysis, we’re able to quickly identify first attackers, to understand fraud vectors and discover similar bad actors. Starting with known fraudulent actors as seed nodes, we identify risky neighbors in the graph. Combined with other fraud signals, this has been an effective discovery and mitigation tool for emerging attacks.
-> Type 2: Graph-centric intelligence
Graph-centric algorithms are useful in assessing the riskiness of a node based on how they belong to a group/cluster who exhibit risky behaviors. An entire graph structure can be used to derive intelligence too, with computation usually involving looking at the full graph.
Connected components are a classic community-based graph algorithm, allowing us to group and categorize nodes within the graph. This helps us create communities, where each node can connect to another through a path of edges that could indicate widespread fraudulent activity.
A temporal motif network is a subgraph defined by a series of time-stamped edges. Thanks to years of studies1 that show the usefulness of identifying such sub-graphs, we can catch fraudulent actors in the financial industry.
Graph embedding helps capture as much information as possible, flattening information to lower dimensions while still preserving graph-like intelligence. And, if the nodes are well-connected in the new embedding space graph, they will still remain close to each other in the chosen distance measure. Taking each end-of-day graph to be a date-labeled snapshot, we can create graph embeddings to feed to downstream ML algorithms like an XGBoost User Riskiness Classification model that scores daily. GNN applications in fraud are also useful, as illustrated in recent studies.
Part 3: Data processing and serving
We process these workloads in two ways. First, with graph-centric features, we analyze the entire graph and identify clusters of entities. While this is a resource intensive process, advanced graph algorithms like the aforementioned are used to highlight potentially fraudulent behaviors within the graph. Second is vertex-centric features, which traverse a specific node along a defined path to collect statistics about connections.
We’ve adopted a hybrid approach to address such diverse requirements. For the graph-centric, we pre-compute features offline using commonly used graph algorithms like Connected Components and Page Rank and includes commonly used graph algorithms. These computations are done periodically and computed data is ingested into our online feature stores, where it’s ready to be served. For the vertex-centric, which is similar to graph-centric features, we can also pre-compute offline and ingest. But, due to the periodic nature of batch ingestion, these features might not reflect the most recent changes. To address this, we employ near real-time streaming ingestion for vertex centric features, reducing lag from hours to seconds. This approach involves computing features on read-operations rather than write-operations.
For the vertex-centric workload, a storage solution was needed. While we conducted benchmarks using popular graph databases (Neo4j, Neptune, DynamoDB), we opted for DynamoDB due to several factors –
- Horizontal Scalability, with seamless scaling based on demand and a pay-per-request model to keep costs low.
- Flexible Schema, eliminating the need to learn complex graph query languages.
- Performance, which works exceptionally well with relatively sparse graphs and small number of hops.
- Stack consolidation, which uses the same solution as graph-centric workloads thus reducing engineer maintenance.
Have we been successful?
Absolutely! We experimented with the aforementioned algorithms and gained confidence that graph based features will empower the team to catch more fraud, thanks to the following solutions:
- Hundreds of user-level graph features, with both real-time and batch offline computing.
- Multiple fraud models, serving as top features for many models throughout stages of user journey to prevent high-risk activities, from Account Application to Trading and to Money Movement.
- Multiple production rules for using graph features directly to assess sharing riskiness at multiple checkpoints for assessing fraud risk.
To protect our community and their assets, serious measures are needed to ensure safe financial activities. Thanks to first principles thinking that taps into unique technology, we’ve found success. The results? Reduced fraud – and an even more secure space for digital finance.
We’d like to acknowledge our fantastic team, especially Sara Rush and Lei He, and the supportive cross-organizational leadership team of the Customer Trust and Safety organization, whose efforts and expertise have been vital in the success of our fraud prevention initiatives. Your dedication and creativity are the fundamental elements that consistently drive forward the mission of Robinhood.
…
We are always looking for more individuals who share our commitment to building a diverse team and creating an inclusive environment as we continue in our journey in democratizing finance for all. Stay connected with us — join our talent community and check out our open positions!
…
© 2024 Robinhood Markets, Inc.
…
3410425