DetailPage: Amazon Marketplace Data Platform

Architecture

Sources

Keepa — price & sales rank
AsinDataApi — search results
Amazon keyword files
Amazon SP-API

Ingest & process

24/7 data processing and dozens of ETLs running
SQS + Lambda fan-out
Rate-throttled vendor APIs

Orchestrate

Apache Airflow on MWAA
ECS Fargate ETL jobs

Warehouse

Amazon Redshift, double-digit TB
Aurora PostgreSQL + pgvector

Serve

3× FastAPI services
Redis cache

Every component is provisioned as code with the AWS CDK: Redshift, MWAA, ECS Fargate, Lambda, SQS, EFS, Aurora, ElastiCache, Cognito, and the supporting network.

The product

As Principal Data Engineer for DetailPage, I built and ran the company’s entire data backend: an internal platform that powers DetailPage’s Amazon marketplace analytics product, including APIs that power their SEO and AEO optimizations, significant ML pipelines for sales predictions based on scraped product metrics, customer actuals, and Amazon self-reported “soft ranges”, as well as the abilities revolving around custom category universe creation for large clients (i.e. no Amazon category fits their universe, so we build their universe), which results in unparalleled reporting and competitor analysis.

DetailPage helps brands optimize how their products rank in the Amazon marketplace and increasingly via AI shopping engines, and that depends on knowing the marketplace in depth. I built the platform that gathers and serves that knowledge. Every month it ingests data on hundreds of millions of Amazon products from four external vendors: Keepa for pricing, sales rank, and history; AsinDataApi for search-results data; Amazon’s own keyword files; and the Amazon Selling Partner API. An always-on processor and a distributed SQS + Lambda extraction pipeline (with token-aware rate throttling and priority tiers) feed a double-digit terabyte Amazon Redshift warehouse, orchestrated by Apache Airflow on MWAA. Three FastAPI services serve the data to the product, with PostgreSQL and pgvector powering keyword-embedding search and an OpenAI-backed analytics feature.

Results

I designed and built the whole platform solo: every layer, from the AWS infrastructure to the ingestion pipelines, the Redshift schema and its stored procedures, the APIs, and the data-science pieces. Careful right-sizing, serverless components, and auto-scaling kept it all running on under $3,000 a month of AWS while covering tens of thousands of product categories, millions of keywords, and dozens of client brands. In the later stages I onboarded and trained a small team of mid-level engineers to take over the platform.

DetailPage: Amazon Marketplace Data Platform

Stack

Skills

Architecture

The product

Results

Highlights