WinnerData PipelineSystems

Data infrastructure · HackDuke Best Use of Solana

DataCrawl

Won HackDuke Code for Good 2026 Best Use of Solana with a prompt-to-dataset agent pipeline for validated financial data acquisition.

DataCrawl orchestration interface

Overview

DataCrawl automates financial dataset acquisition from plain-English requests. Instead of starting with brittle one-off scripts, it treats acquisition as an orchestrated pipeline from prompt to validated file output.

My contribution

Gemini orchestrator design, LangGraph/FastAPI pipeline wiring, and validation subagent coordination.

Problem

Useful financial data often lives behind inconsistent source structures, so manual collection does not scale and shallow crawlers fail quickly once validation matters.

Approach

  • Built a Gemini orchestrator coordinating 5+ subagents for crawling, normalization, and validation.
  • Connected LangGraph and FastAPI so the pipeline could move from prompt to schema-accurate output files.
  • Designed the flow around repeatable execution rather than one-off scraping sessions.

Result

Won HackDuke Code for Good 2026 Best Use of Solana with full pipeline execution from a plain-English request to validated output files.

Stack

PythonTypeScriptReactFastAPILangGraphFirebaseGemini
Team repo (hackathon)