A full end-to-end analysis of a 100k+ row e-commerce dataset.
This project analyzes the publicly available Brazilian Olist e-commerce dataset to surface actionable insights about revenue generation, customer value distribution, and retention behavior. Using SQL in Google BigQuery to query and aggregate data across multiple relational tables, and Tableau to communicate findings visually.
The dataset simulates real-world marketplace operations across multiple relational tables. The three tables central to this analysis are: orders (purchase status and timestamps), order_items (product-level pricing and freight), and customers (unique buyer identifiers). Tables are joined via shared keys — order_id and customer_id.
Interactive dashboard — filter, hover, and explore. Open in Tableau Public ↗
SQL-first. BigQuery. Then Tableau.
All data processing and metric calculation was performed using SQL in Google BigQuery. Analysis was restricted to orders with a status of 'delivered' to ensure only completed transactions were counted — removing cancelled and pending orders so revenue reflects cash actually collected.
- Multi-table joins across orders, order_items, products, and category_translation tables
- CTEs (Common Table Expressions) to stage intermediate calculations cleanly
- NTILE(100) window function for percentile-based customer revenue segmentation
- Conditional aggregation for one-time vs. repeat buyer classification
- customer_unique_id used instead of customer_id to correctly identify repeat buyers across orders
- Findings exported and visualized in a Tableau dashboard
Four findings. Each with a direct business implication.
Finding 1 — Revenue & Average Order Value
The business generated approximately $15.4M in total revenue from delivered orders, with an average order value (AOV) of $159.83. The AOV positions the platform at a mid-range price point — high enough to suggest considered purchases, with meaningful room to grow through upselling and bundling. A 10% AOV lift across existing order volume would add over $1.5M in revenue.
Finding 2 — Revenue Concentration
Using NTILE(100) to rank customers into percentile buckets, the top 10% of customers contribute 38% of total revenue. Breaking this into non-overlapping segments: the top 1% alone drove ~$1.60M (10.4% of total), the top 2–5% added another $2.51M (16.3%), and the top 6–10% contributed $1.74M (11.3%). The bottom 90% account for the remaining 61.9%.
Finding 3 — Top Product Categories
Revenue is driven by lifestyle and consumer-oriented categories. The top five each surpass $1M, indicating a healthy diversified product mix: Health & Beauty ($1.41M), Watches & Gifts ($1.26M), Bed, Bath & Table ($1.23M), Sports & Leisure ($1.12M), and Computers & Accessories ($1.03M). Category names were translated from Portuguese via a joined mapping table.
Finding 4 — Customer Retention Risk
The most significant finding: of approximately 93,000 unique customers, 97% made only a single purchase, with just 3% returning for a second order. The business is almost entirely dependent on new customer acquisition to sustain revenue — a costly and fragile growth model. Improving retention by even a few percentage points would compound meaningfully over time.
The actual code behind each finding.
All queries were written and executed in Google BigQuery. Below are the four key queries powering this analysis.
The full written analysis with all SQL code is available as a PDF below.
What the data says to do next.
- Improve retention first. With 97% one-time buyers, even a modest lift in repeat purchase rate has outsized revenue impact. Loyalty programs, post-purchase email sequences, and personalized recommendations are the highest-leverage levers.
- Increase average order value. At $159.83, bundling, 'frequently bought together' modules, and a free-shipping threshold around $200 are proven AOV drivers. A 10% lift adds $1.5M+ across existing order volume.
- Double down on top categories. Health & Beauty, Watches & Gifts, and Bed, Bath & Table collectively drive nearly $4M. These warrant priority inventory investment and targeted marketing spend.
- Build a VIP program for the top 10%. These customers drive 38% of revenue. Early access, exclusive discounts, and dedicated support would increase both retention and lifetime value for the segment that matters most.