case study

Law Firm Document Search: Private AI for Contract Review and Compliance

A Bangkok law firm searches 140,000 contracts, tracks deadlines, and prepares client briefs using a private AI system on their own server.

contracts indexed

140,000+

search time

seconds, not hours

data sources

4 (documents, email, calendar, billing)

the problem

Twenty years of contracts with no way to search them

The firm handles corporate transactions, real estate, and regulatory compliance across Bangkok and the surrounding provinces. Forty lawyers, eight paralegals, and a support staff that has been filing contracts since the early 2000s.

The archive is what you would expect from twenty years of practice. Scanned PDFs in nested folders on a shared drive. Word documents named by a convention that changed three times. Paper originals in storage that were partially digitized in 2016 and then again in 2019 with different naming.

Email threads containing executed versions never made it to the shared drive. The billing system's client and matter numbers did not map cleanly to the document folders.

When a lawyer needed to find a specific clause, a prior version of an agreement, or every contract with a particular counterparty, the process was manual. Search the shared drive by filename. Ask the paralegal who handled the matter. Dig through email. For due diligence on an acquisition, this could take days.

The firm evaluated cloud document management platforms, but every one required uploading the full archive to an external server. For a law firm handling sensitive client agreements, that was not acceptable.

the build

What was built

The system runs on a server in the firm's own office. No cloud services, no external APIs, no data leaving the network. The private AI processes documents on hardware the firm controls.

The first step was ingestion. The system crawled the shared drive, extracted text from every PDF and Word document, and built a searchable index. Scanned documents went through OCR. Each document was chunked, embedded, and stored in a vector database alongside its full text, metadata, and file path. The email archive was ingested separately over IMAP.

The second layer was structured extraction. OpenClaw's agent parsed each contract to identify parties, effective dates, termination clauses, renewal deadlines, governing law provisions, and key obligations. These fields were stored as structured data in SQLite, making it possible to query across the full archive: "show me every lease with a renewal date in the next 90 days" or "list all contracts with counterparty X across all matters."

The billing system integration mapped client and matter numbers to the document index. A lawyer could search by matter number and see every related document, email, and calendar entry in one result set.

OpenClaw orchestrates the full pipeline: document ingestion, structured extraction, search, and the conversational interface that lawyers use daily. Each component is a plugin with a typed schema.

daily use

What it looks like in practice

A corporate associate preparing for due diligence types a question in the firm's internal chat: "find all joint venture agreements with companies registered in Chonburi province, signed between 2018 and 2023." The system returns a ranked list of documents with excerpts showing the relevant clauses and links to the original files.

A partner asks "what are our standard force majeure provisions and how have they changed since 2020?" The system pulls examples across practice areas, compares the language, and surfaces the evolution in a summary.

Deadline tracking runs automatically. The system monitors extracted renewal and termination dates and creates calendar events when action windows open. A weekly digest goes to each practice group listing upcoming deadlines across their matters.

Client briefing preparation is conversational. "Summarize our relationship with this client: active contracts, total value, upcoming renewals, and any pending obligations" produces a briefing sourced from contracts, billing data, and email correspondence. New documents are indexed on arrival, so the archive stays current without manual effort.

the result

What changed

The firm's institutional knowledge used to live in the heads of senior partners and long-tenured paralegals. When someone left, their knowledge of the archive went with them. The system replaced that dependency with a searchable, structured record of everything the firm has produced.

Due diligence timelines shortened. Reviews that required days of manual document gathering start with a comprehensive search result in minutes. The remaining time goes to analysis, not assembly.

Compliance monitoring became proactive. The system surfaces upcoming deadlines and expiring agreements before they become problems. The weekly digest replaced a manual tracking spreadsheet that was always slightly out of date.

Junior lawyers became productive faster. Instead of learning filing conventions over months and building relationships with paralegals who knew where things lived, they could search the full archive from their first week. The firm's managing partner chose the private AI architecture specifically because client confidentiality is the baseline, not a feature to toggle on.

the stack

Technical details

The core is a document processing pipeline backed by SQLite for structured data and a vector database for semantic search. Every contract, email, and attachment is stored as both raw text and embedded vectors. The structured extraction layer writes parsed fields (parties, dates, clauses, obligations) to relational tables that support exact queries alongside the fuzzy semantic search.

Document ingestion handles four paths: filesystem monitoring for the shared drive, IMAP polling for email attachments, a web interface for manual uploads, and a bulk import tool for the initial archive migration. OCR runs locally for scanned PDFs.

The conversational interface connects through the firm's existing internal messaging system. OpenClaw's plugin architecture wraps the search index, the structured database, and the calendar integration as typed tools. A single query can hit the vector index for semantic relevance, filter by structured fields, and return results with source attribution.

Deadline monitoring runs as a cron job through OpenClaw's scheduler. The system runs on commodity server hardware in the firm's Bangkok office. The language model handles natural language queries and summarization. Document parsing, OCR, embedding, and structured extraction are deterministic code.

services

Private AI systems for law firms, clinics, and businesses running on your own infrastructure.

how it works

Deployment model, privacy architecture, and the engagement process.

your documents already contain the answers.

The same architecture that powers document search for a law firm handles patient records for clinics, tracks health data for individuals, and coordinates operations for construction companies. The conversation starts with your archive and your constraints.

book a consultation