← Back to Home
PyLumeAI logo

PyLumeAI

Illuminate Data with Python & AI

Fraud Analytics ETL Pipeline – PaySim

End-to-end PySpark + Databricks pipeline with dashboards to uncover fraud patterns in financial transactions.

📖 Overview

The PaySim dataset simulates mobile money transactions based on real financial data. This project demonstrates how to turn raw transactions into actionable fraud insights using PySpark and Databricks, with curated dashboards for decision-makers.

Bronze → Silver → Gold ETL Fraud Features & CLV Dashboards & Trends

🎯 Problem

  • Detect fraudulent transactions
  • Identify high-risk customers & accounts
  • Visualize fraud trends for proactive monitoring

🛠️ Solution

  1. Bronze: Ingest PaySim CSV to Delta
  2. Silver: Clean/standardize + fraud flags
  3. Gold: Fraud rate, CLV, hotspots
  4. Dashboards: Fraud by Type, Trends, Risk Profiling, Hotspots, CLV vs Exposure

🔍 Key Insights

Fraud by Transaction Type

Fraud by Transaction Type — risk concentrated in Transfers.

Fraud Trend Over Time

Fraud Trend — daily counts with 7-day moving averages.

Customer Risk Profiling

Customer Risk Profiling — CLV segments and risk levels.

Fraud Hotspots

Fraud Hotspots — origination/destination accounts most impacted.

CLV vs Fraud Exposure

CLV vs Fraud Exposure — bubble chart of value vs risk.

🔗 Code

GitHub Repository   

🚀 Work With PyLumeAI

This project shows how we deliver end-to-end fraud analytics — from raw data to dashboards.

📧 Contact Me