Automated Data Collection System

Designing a configurable, analyst-owned data pipeline for PortfolioIQ — so investment teams always have fresh company signals without the manual research overhead.

A story of building trust into automation for people whose job depends on getting the numbers right.

Client
PortfolioIQ
Year
2024
Role
Sr. Product Designer
Type
Feature Design

Introduction

PortfolioIQ is an investment intelligence platform that helps analysts and partners at investment firms track, manage, and act on portfolio company data. The platform covers deal flow, company profiles, reporting, and team collaboration.

This case study covers one specific feature within that platform — an Automated Data Collection system. Analysts were spending hours every week manually pulling signals from public sources and copying them into spreadsheets. There was no standardised process, no shared record, and no way to catch what you didn't already know to look for.

My challenge was to design a transparent, configurable collection pipeline that non-technical analysts could own themselves — and trust enough to use in board presentations.

Dashboard Overview

Why We Built This

PortfolioIQ already had strong company profile pages. The problem was keeping them current. Updating a profile meant an analyst manually searching for funding announcements, leadership changes, product news, and market signals — then copying it in. For a portfolio of 40+ companies, that was a full-time job hiding inside a part-time task.

The platform team had two options: integrate with a third-party data provider, or build a configurable collection layer that analysts could own themselves. The third-party route was fast but expensive, coverage was patchy for early-stage companies, and it locked PortfolioIQ into someone else's data model.

Building in-house meant more work upfront — but the team could tune it precisely for how their users operated, and make it a differentiator rather than a commodity integration. We chose to build.

Research

Before designing anything, I needed to understand where the real friction was — not just what the team assumed it was. I spent time with three analysts, shadowing their weekly research routines and mapping every tool they touched. The goal was to find the pattern underneath the complaints.

01

3–4 hours per company, per week on manual gathering

Searching, verifying, and copying data into PortfolioIQ — before any actual analysis began. Across a 40+ company portfolio, this was the team's single largest hidden cost.

02

No two analysts followed the same process

Different sources, different update cadences, no shared standard. There was no institutional method — just individual habits that couldn't be audited or improved.

03

Most data never made it into PortfolioIQ at all

It lived in personal Notion pages and spreadsheets, invisible to the rest of the team. The platform's company profiles were consistently out of date.

04

Missed signals were discovered after the fact

Funding rounds, key hires, pivots — sometimes surfacing after a board meeting where they would have mattered. The reactive process created real professional risk.

05

Confidence in the data was universally low

When asked "how confident are you in the company data in PortfolioIQ right now?" — all three analysts said low to medium. This wasn't a data access problem. It was a process problem.

Research Notes & User Flow Mapping

"The signals existed publicly. There was just no reliable, repeatable path to get them into the system."

Competitor Insights

Before committing to a direction, I mapped how competing and adjacent tools handled data collection — looking for what they did well and where the gaps were for an investment-focused context.

Data Providers
Crunchbase, PitchBook, Harmonic
Broad, well-structured datasets — but coverage drops sharply for early-stage companies. Data is static from the platform's perspective; you query it, you don't configure it.
CRM-Adjacent
Affinity, Attio
Strong relationship intelligence for deal flow and contacts. Built for sourcing, not ongoing portfolio monitoring. Enrichment is fixed — users can't define what signals they care about.
Portfolio Monitoring
Visible, Carta
Focused on financial reporting and founder-submitted updates. Good for structured data, but entirely dependent on the company proactively sharing — no passive collection.
The Gap
None of them gave analysts direct control over what was being collected and from where. A self-serve, source-configurable layer built around how investment analysts actually research — not how data providers package their APIs — didn't exist.

Strategy

With research and the competitive landscape clear, I mapped the collection pipeline end-to-end — from source to analyst action — and identified where trust broke down. Three design principles shaped every decision that followed.

01

Transparency over automation

Analysts using this data in board decks and investor updates need to trust it. Black-box automation would have been rejected. Every collection job had to be auditable — source, timestamp, confidence, and status all visible at a glance.

02

Self-serve from day one

If source configuration required an engineering ticket, adoption would die. The UI had to let a non-technical analyst add a news feed, adjust update frequency, or pause a source — without any support.

03

Review as a feature, not friction

Rather than auto-committing everything, a lightweight review queue for flagged and low-confidence items became the trust layer — keeping the record clean while giving analysts final say over what reaches their company profiles.

Strategy Sketch / Pipeline Architecture

Customised for PortfolioIQ Users

This wasn't a generic data pipeline UI. Every decision was shaped around how investment analysts specifically work — the way they think about companies, how they scan information, and what they need to feel confident presenting data to partners and LPs.

Language
Named, not numbered
Sources are labelled by company and type — "TechCorp · Funding News" — not technical endpoint names. Analysts think in companies, not configurations.
Classification
Signal types, not raw data
Ingested items are classified before they reach the analyst — Funding, Leadership Change, Product Update, Market News. This mirrors how analysts already categorise what they care about.
Review UX
Built for scanning, not deliberating
The review queue is designed for speed. Approve or dismiss in one click per row. The target was a full queue reviewed in under 5 minutes — and we hit it in testing.
Scope
Portfolio-wide view
Alongside the company-level feed, analysts can switch to a portfolio-wide view to see what's changed across all companies that week — a view unique to PortfolioIQ's context.

How It Works

The system is a five-step pipeline — from analyst-defined source to verified company record. Each step was designed with a specific failure mode in mind, based on what we'd seen break in the manual process.

01

Configure Sources

Analysts define which sources to monitor per company — news feeds, funding databases, hiring signals, social profiles, custom URLs. Rules filter by keyword, source type, and update frequency. No engineering required.

02

Automated Ingestion

Collection jobs run on a defined schedule. Every run is logged with its status, timestamp, number of items collected, and any errors. Nothing runs invisibly — every job has a record.

03

Processing & Classification

Raw data is parsed, deduplicated, and classified by signal type. Each item receives a confidence score based on source reliability and content clarity before it reaches any analyst.

04

Review Queue

Flagged items and anything below the confidence threshold surface in the review queue. Analysts approve, dismiss, or edit. Everything above the threshold is committed automatically. In testing, a typical queue took 3–4 minutes to clear.

05

Centralised Record

Verified data lives on the company profile — shared across the team, filterable by signal type and date, and exportable for board reports and LP updates.

Key Design Decisions

A few decisions shaped the system in ways that weren't obvious upfront — each one came out of a conflict between what seemed logical in the abstract and what actually worked for the analysts using it.

01

Rule-based config over AI-first suggestions

We prototyped an AI-powered source suggester. Analysts found it useful for discovery but didn't trust it as the primary config mechanism — they wanted to know exactly what the system was watching. We kept AI for signal classification, where mistakes are recoverable, and made source config explicit and manual where it counted.

02

Review queue as the trust layer

Auto-committing all ingested data felt too risky for a team presenting this in board meetings. The review queue gave analysts a moment of accountability without adding real overhead. The key was making it fast — a one-click action per row, not a modal review for each item.

03

Inline source health, not a separate dashboard

Early designs put source status in a separate "System Health" page. Analysts ignored it entirely. Moving status indicators inline — visible next to each configured source on the company page — meant problems were caught immediately, not discovered later when data had already gone stale.

Company Profile · Live Data Feed

Outcome

With sources configured once, analysts stopped re-researching companies from scratch each cycle. Reporting prep time dropped significantly, and for the first time the whole team had a shared view of portfolio activity — rather than siloed spreadsheets that were out of date before they were shared.

The system also surfaced signals the team had previously missed — funding announcements, key hires, product launches — because it was watching continuously rather than on-demand. The shift was less about efficiency and more about confidence: analysts trusted the data they were bringing into conversations.

Retrospective

Source configuration started too technical — the first version looked like a developer tool and didn't resonate with analysts at all. We went through two rounds of simplification before the mental model clicked. Earlier, more frequent testing with actual analysts — not just product and engineering team members — would have caught this in week two instead of week six.

The broader lesson: investment analysts are sophisticated users, but not technical ones. The gap between "this makes sense to an engineer" and "this makes sense to someone who tracks companies for a living" is wider than it looks from the inside.

NEXT PROJECT
Chat-AI →