7 MINUTE READ

Credit Agreement Analysis with CreditSeer

Enabling 50% faster, safer decisions by designing source-grounded extraction, AI guardrails, and failure-aware components

CONTEXT

CreditSeer is an AI-powered credit analysis tool built to help loan officers at regional banks and credit unions review syndicated loan agreements. Developed as part of Georgia Tech's Financial Services Innovation Lab, I led the product from early concept through to a working MVP that was tested with credit analysts across 15+ real credit agreements

TYPE

FinTech
B2B2C
Responsible AI

ROLE

Product Designer (AI Systems)

DURATION

6 months

TEAM

Product Manager
2 ML Engineers
1 Frontend Developer

SKILLS

Building MVP using AI

LLM System Architecture

Design Systems

Design–Engineering Collab

PROBLEM

Credit analysts spend 2+ hours on credit agreement review....

Critical details are scattered

Reviewing credit agreements manually is slow because analysts must cross-reference key details scattered across hundreds of pages to understand a single credit facility.

....yet most LLMs aren't trustworthy enough for high-stakes decisions

Hallucinate values, creating regulatory risk

AI promises to solve the speed issue but a single extraction error can invalidate a financial recommendation and lead to regulatory failure.

Insight generation without transparency

Most LLMs prioritize "insight generation" (summarizing and interpreting), which lacks the transparency required for regulated finance. What loan officers actually need is defensible, auditable decision support, not an AI that tells them what to think.

SOLUTION

CreditSeer is an AI-assisted system that parses complex credit agreements into a structured, explainable dashboard, enabling faster review without sacrificing trust.

Reorganizing Complex Agreements Around Analyst Workflows

CreditSeer focuses on extracting predefined, high-value financial and legal terms that analysts actually review (e.g., commitments, pricing, covenants, default triggers) and rearranges them into analyst-aligned views.
‍
Why this matters: This reassembly reduces the cognitive load + time caused by constant cross-referencing

Workflow Aligned

Faster review

Low cognitive load

Making Every Extracted Value Explainable

Every extracted value is directly linked to its source clause in the agreement, allowing analysts to verify accuracy instantly without manual cross-referencing.
‍
Why this matters: This transforms the tool from a black box into a transparent auditable decision making assistant

Instant verification

Audit-ready

Trust through traceability

Degrading Gracefully When the Model Is Uncertain

The dashboard clearly surfaces ambiguous states and allows analysts to correct values, with each intervention feeding back into pipeline refinement, schema updates, and future annotation strategies.
Why this matters: This builds more trust than false confidence creating a feedback loop for model improvement

[data-wf-bgvideo-fallback-img] { display: none; } @media (prefers-reduced-motion: reduce) { [data-wf-bgvideo-fallback-img] { position: absolute; z-index: -100; display: inline-block; height: 100%; width: 100%; object-fit: cover; } }

Safe failure

Human-in-control

Pipeline Refinement

IMPLEMENTATION HIGHLIGHTS

Leading Cross-Functional Collaboration

What I did: Helped ML engineers define annotation schemas and informs features through analysis of 15+ agreements so that model behavior aligned with how analysts actually review and verify terms.
‍
Impact: This prevented the team from wasting time on unreliably extractable fields and helped set realistic expectations with stakeholders about what the MVP could deliver.

Designing an End-to-End Extraction Pipeline

What I did: Architected and implemented the document chunking logic and two-stage extraction pipeline (block discovery + value parsing), working directly in code to test what actually worked vs. what looked good on paper.

‍Impact: By building the pipeline myself, I could rapidly test extraction patterns, identify failure modes, and iterate on prompt structures within hours instead of waiting days for engineering cycles. It revealed critical insights about context window, chunk boundary and tokens issues that would have been invisible from mockups alone.

Building UI System for Uncertain AI Outputs

What I did: Built a functional React-based MVP using Cursor.ai, Figma, and Vercel rather than creating high-fidelity static mockups and created a data-driven design system with components built to handle all AI output states, not just the happy path.

‍Impact: This approach cut design-to-feedback cycles from weeks to days. Instead of designing for hypothetical "perfect" AI outputs, I could see actual failure modes (blank fields, overlong values, low-confidence extractions) and design appropriate handling patterns immediately.

DEEP DIVE INTO THE EXTRACTION PIPELINE

When early model extraction outputs broke review workflows, I developed a pipeline to understand what GenAI could reliably extract and shape a UX grounded in what analysts could actually trust.

In earlier attempts we extracted structured values directly from large, article-level chunks (often 3,000-5,000 tokens).

The prompt essentially asked: "Extract the base rate, applicable margin, commitment fee, and utilization fee."

Issues:

Some fields were blank

Few returned clause-length text instead of a value

Some values were confidently wrong with no clean trace to source

The dashboard became visually inconsistent and hard to scan

This wasn’t just a model issue — it became a product reliability issue.
• Design impact: cards looked empty or visually messy
• Workflow impact: analysts couldn’t quickly verify or compare
• System impact: errors were hard to debug (chunk? block? field?)

I split extraction into block discovery and value parsing to constrain context, reduce ambiguity, and produce schema-aligned outputs.

I split extraction into two sequential, focused steps with different objectives and constraints

Stage 1 - Block Discovery
‍
The prompt asks: "Find the exact block that defines the 'Applicable Margin' and return it word-for-word"

‍Key design decisions:
‍Used defined patterns for each field to guide the model
Required verbatim extraction, no interpretation
Mandated location metadata (page, section, line numbers)

‍Why this works:
Model can't hallucinate when it's required to quote directly
Every block has a documented source location
If extraction fails, we can examine the retrieved block to see if it's even the right content

Stage 2 - Value Extraction
‍
The prompt asks: "From this block about 'Applicable Margin', extract: (1) base value, (2) adjustment trigger"

‍Key design decisions:
Extremely constrained context only from the relevant block
Strict schema with expected formats, units, and field types
Required "ambigous" markers when uncertain

Why this works:
Smaller context + strict schema = consistent output
Known output structure means components can be designed with confidence
Values are formatted for UI display, not as paragraphs

This Architectural Change fundamentally changed what the product could do:

For the design:

UI components could now be designed with predictable data shapes, eliminating the "messy card" problem and enabling the consistent, scannable dashboard analysts needed.

For trust:

The two-stage approach made the model's reasoning visible, analysts could see both the extracted value AND the source block it came from, building confidence through transparency.

For debugging:

When something went wrong, the team could now isolate the issue was block discovery (Stage 1) or value parsing (Stage 2), dramatically speeding up iteration cycles.

For ML improvement:

The intermediate block artifacts created a natural annotation dataset. When analysts corrected values, we could trace back to see if the wrong block was retrieved or if parsing was the issue, directly informing model retraining priorities.

IMPACT

~50% reduction in review time

Reduced end-to-end agreement review from ~2 hours to ~1 hour by reorganizing fragmented terms into workflow-aligned views (e.g., pricing, covenants, triggers).

Helped Boost model accuracy

Achieved through product-led iteration: redesigning extraction logic, tightening schemas, and identifying edge cases across 15+ real credit agreements.

Cut Design–engineering–Feedback time from weeks to days

Built and iterated on a functional React-based MVP using Cursor.ai and rapid prototyping tools, allowing real model outputs to directly inform design decisions.

Built Shared Design Infrastructure for the Lab

Created a scalable, data-driven design system and supporting documentation (including financial concepts) that serve as a reference for future projects and improve onboarding across all teams

REFLECTION

AI Tools Pushed Me Beyond UX Into System Design

Using AI tools like Cursor helped me move beyond just designing screens. I ended up building parts of the extraction logic as well, which changed how I think about the role of a product designer—as someone who shapes system behavior, not just interfaces.

Understanding the Domain Was Necessary to Simplify It

To design for credit analysts, I had to deeply understand how syndicated credit agreements work. Learning the domain was key to turning dense, fragmented information into something usable and coherent.

Early Access to Users and Data Matters More Than Process

Working in fintech made it clear that limited access to real users and agreements can slow down the right decisions. In trust-critical domains, getting early exposure to real data and analyst workflows is essential to designing something reliable.

Designing for Uncertainty Is Part of Designing for AI

This project reinforced that AI won’t always be right, especially in high-stakes workflows. Making uncertainty visible and designing clear fallback paths helped keep the product usable and trustworthy.