Home About Work Contact
Work / E-Commerce Revenue & Customer Analysis
EDA · Part 1 of 2

E-Commerce Revenue &
Customer Analysis

DatasetBrazilian Olist E-Commerce (Kaggle)
ToolsSQL · Google BigQuery · Tableau
Project06 / 07
SQL Google BigQuery Tableau 100k+ Rows CTEs & Window Functions E-Commerce Analytics SQL Google BigQuery Tableau 100k+ Rows CTEs & Window Functions E-Commerce Analytics
Overview

A full end-to-end analysis of a 100k+ row e-commerce dataset.

This project analyzes the publicly available Brazilian Olist e-commerce dataset to surface actionable insights about revenue generation, customer value distribution, and retention behavior. Using SQL in Google BigQuery to query and aggregate data across multiple relational tables, and Tableau to communicate findings visually.

The dataset simulates real-world marketplace operations across multiple relational tables. The three tables central to this analysis are: orders (purchase status and timestamps), order_items (product-level pricing and freight), and customers (unique buyer identifiers). Tables are joined via shared keys — order_id and customer_id.

$15.4MTotal Revenue
$159.83Avg. Order Value
3%Repeat Customer Rate
Interactive Dashboard

Interactive dashboard — filter, hover, and explore. Open in Tableau Public ↗

Methodology

SQL-first. BigQuery. Then Tableau.

All data processing and metric calculation was performed using SQL in Google BigQuery. Analysis was restricted to orders with a status of 'delivered' to ensure only completed transactions were counted — removing cancelled and pending orders so revenue reflects cash actually collected.

  • Multi-table joins across orders, order_items, products, and category_translation tables
  • CTEs (Common Table Expressions) to stage intermediate calculations cleanly
  • NTILE(100) window function for percentile-based customer revenue segmentation
  • Conditional aggregation for one-time vs. repeat buyer classification
  • customer_unique_id used instead of customer_id to correctly identify repeat buyers across orders
  • Findings exported and visualized in a Tableau dashboard
Findings

Four findings. Each with a direct business implication.

Finding 1 — Revenue & Average Order Value

The business generated approximately $15.4M in total revenue from delivered orders, with an average order value (AOV) of $159.83. The AOV positions the platform at a mid-range price point — high enough to suggest considered purchases, with meaningful room to grow through upselling and bundling. A 10% AOV lift across existing order volume would add over $1.5M in revenue.

Finding 2 — Revenue Concentration

Using NTILE(100) to rank customers into percentile buckets, the top 10% of customers contribute 38% of total revenue. Breaking this into non-overlapping segments: the top 1% alone drove ~$1.60M (10.4% of total), the top 2–5% added another $2.51M (16.3%), and the top 6–10% contributed $1.74M (11.3%). The bottom 90% account for the remaining 61.9%.

Finding 3 — Top Product Categories

Revenue is driven by lifestyle and consumer-oriented categories. The top five each surpass $1M, indicating a healthy diversified product mix: Health & Beauty ($1.41M), Watches & Gifts ($1.26M), Bed, Bath & Table ($1.23M), Sports & Leisure ($1.12M), and Computers & Accessories ($1.03M). Category names were translated from Portuguese via a joined mapping table.

Finding 4 — Customer Retention Risk

The most significant finding: of approximately 93,000 unique customers, 97% made only a single purchase, with just 3% returning for a second order. The business is almost entirely dependent on new customer acquisition to sustain revenue — a costly and fragile growth model. Improving retention by even a few percentage points would compound meaningfully over time.

38%Top 10% Revenue Share
97%One-Time Buyers
5Categories Over $1M
SQL Queries

The actual code behind each finding.

All queries were written and executed in Google BigQuery. Below are the four key queries powering this analysis.

Total Revenue & AOV SUM of price + freight_value joined across orders and order_items, filtered to delivered status only. Freight included to capture true transaction value.
Customer Segmentation — NTILE(100) CTE stages customer-level revenue, then NTILE(100) window function ranks customers into percentile buckets. Two separate queries: cumulative top 10% and non-overlapping segment breakdown.
Category Revenue — 4-Table Join Joins orders → order_items → products → category_translation to map Portuguese category names to English and aggregate revenue by category. LIMIT 10 for top performers.
Repeat vs. One-Time Buyers CTE counts orders per customer_unique_id (not customer_id — critical distinction in the Olist schema). CASE statement classifies buyers. Percentage calculated against total CTE count.

The full written analysis with all SQL code is available as a PDF below.

Giovanni_Valdes_Ecommerce_Analysis.pdf PDF · Full analysis with SQL queries, findings, and Tableau dashboard
Download
Recommendations

What the data says to do next.

  • Improve retention first. With 97% one-time buyers, even a modest lift in repeat purchase rate has outsized revenue impact. Loyalty programs, post-purchase email sequences, and personalized recommendations are the highest-leverage levers.
  • Increase average order value. At $159.83, bundling, 'frequently bought together' modules, and a free-shipping threshold around $200 are proven AOV drivers. A 10% lift adds $1.5M+ across existing order volume.
  • Double down on top categories. Health & Beauty, Watches & Gifts, and Bed, Bath & Table collectively drive nearly $4M. These warrant priority inventory investment and targeted marketing spend.
  • Build a VIP program for the top 10%. These customers drive 38% of revenue. Early access, exclusive discounts, and dedicated support would increase both retention and lifetime value for the segment that matters most.

Open to new
opportunities.

I'm actively looking for data analyst roles — remote or Honolulu-based.

Contact Me
View All Projects