case study
Law Firm Document Search: Private AI for Contract Review and Compliance
A mid-size Bangkok law firm with 20 years of contracts locked in filing cabinets and shared drives. A private AI system that searches every document, tracks every deadline, and prepares client briefs on the firm's own server. Nothing leaves the building.
contracts indexed
140,000+
search time
seconds, not hours
data sources
4 (documents, email, calendar, billing)
the problem
The Problem: Two Decades of Contracts with No Way to Search Them
the firm handles corporate transactions, real estate, and regulatory compliance across Bangkok and the surrounding provinces. forty lawyers, eight paralegals, and a support staff that has been filing contracts since the early 2000s. the archive is roughly what you would expect from twenty years of practice. scanned PDFs in nested folders on a shared drive. Word documents named by a convention that changed three times. paper originals in storage that were partially digitized in 2016 and then again in 2019 with different naming. email threads containing executed versions that never made it to the shared drive at all. a billing system with client and matter numbers that do not map cleanly to the document folders. when a lawyer needed to find a specific clause, a prior version of an agreement, or every contract with a particular counterparty, the process was manual. search the shared drive by filename. ask the paralegal who handled the matter. dig through email. for due diligence on an acquisition, this could take days. for a compliance review, it meant hoping someone remembered where the relevant agreements lived. the firm evaluated cloud document management platforms. every one of them required uploading the full archive to an external server. for a law firm handling sensitive client agreements, that was not acceptable. the data needed to stay inside the building.
the build
What Was Built: A Private Document Intelligence System
the system runs on a server in the firm's own office. no cloud services, no external APIs, no data leaving the network. the private AI processes documents on hardware the firm controls, and the index lives in a local database alongside the original files. the first step was ingestion. the system crawled the shared drive, extracted text from every PDF and Word document, and built a searchable index. scanned documents went through OCR. each document was chunked, embedded, and stored in a vector database alongside its full text, metadata, and file path. the email archive was ingested separately over IMAP, pulling attachments and threading conversations to their associated matters. the second layer was structured extraction. OpenClaw's agent parsed each contract to identify parties, effective dates, termination clauses, renewal deadlines, governing law provisions, and key obligations. these fields were stored as structured data in SQLite, making it possible to query across the full archive: "show me every lease with a renewal date in the next 90 days" or "list all contracts with counterparty X across all matters." the billing system integration mapped client and matter numbers to the document index. a lawyer could search by matter number and see every related document, email, and calendar entry in one result set. OpenClaw orchestrates the full pipeline: document ingestion, structured extraction, search, and the conversational interface that lawyers use daily. each component is a plugin with a typed schema. adding a new data source or extraction rule does not require rebuilding the system.
daily use
What It Looks Like in Practice
a corporate associate preparing for a due diligence review types a question in the firm's internal chat: "find all joint venture agreements with companies registered in Chonburi province, signed between 2018 and 2023." the system returns a ranked list of documents with excerpts showing the relevant clauses. each result links to the original file on the shared drive. a partner asks "what are our standard force majeure provisions and how have they changed since 2020?" the system pulls examples across practice areas, compares the language, and surfaces the evolution in a summary. work that used to require a junior associate spending an afternoon now takes a few seconds. deadline tracking runs automatically. the system monitors extracted renewal and termination dates and creates calendar events when action windows open. a weekly digest goes to each practice group listing upcoming deadlines across their matters. missed renewals are no longer a function of someone forgetting to check a spreadsheet. client briefing preparation is conversational. "summarize our relationship with [client name]: active contracts, total value, upcoming renewals, and any pending obligations" produces a briefing document sourced from contracts, billing data, and email correspondence. the lawyer reviews and edits rather than assembling from scratch. new documents are indexed on arrival. when a signed contract is saved to the shared drive or an executed version arrives by email, the system ingests it, extracts structured fields, links it to the relevant matter, and updates the deadline tracker. the archive stays current without manual effort.
the result
What Changed: Institutional Memory That Actually Works
the firm's institutional knowledge used to live in the heads of senior partners and long-tenured paralegals. when someone left, their knowledge of the archive left with them. the system replaced that dependency with a searchable, structured record of everything the firm has ever produced. due diligence timelines shortened. reviews that required days of manual document gathering now start with a comprehensive search result in minutes. the remaining time is spent on analysis, not assembly. compliance monitoring became proactive instead of reactive. the system surfaces upcoming deadlines and expiring agreements before they become problems. the weekly digest replaced a manual tracking spreadsheet that was always slightly out of date. junior lawyers became more productive immediately. instead of learning the filing conventions over months and building relationships with paralegals who knew where things lived, they could search the full archive from their first week. the institutional knowledge gap between a first-year associate and a ten-year partner, at least for document retrieval, largely closed. the private AI architecture was the deciding factor for the firm's managing partner. client confidentiality is not a feature to toggle on. it is the baseline. every query, every document, every extraction runs on hardware in the firm's office. the system answers to the firm's data governance policies, not a cloud provider's terms of service.
the stack
Technical Details: Local Infrastructure, Standard Protocols, OpenClaw Orchestration
the core is a document processing pipeline backed by SQLite for structured data and a vector database for semantic search. every contract, email, and attachment is stored as both raw text and embedded vectors. the structured extraction layer writes parsed fields (parties, dates, clauses, obligations) to relational tables that support exact queries alongside the fuzzy semantic search. document ingestion handles four paths: filesystem monitoring for the shared drive (new and modified files trigger re-ingestion), IMAP polling for email attachments, a web interface for manual uploads, and a bulk import tool for the initial archive migration. OCR runs locally for scanned PDFs. each path is independent and can be extended without touching the others. the conversational interface connects through the firm's existing internal messaging system. OpenClaw's plugin architecture wraps the search index, the structured database, and the calendar integration as typed tools. a single query can hit the vector index for semantic relevance, filter by structured fields (date range, counterparty, practice area), and return results with source attribution and direct links to original documents. deadline monitoring runs as a cron job through OpenClaw's scheduler. it queries the structured extraction table for upcoming dates, compares against the calendar, and creates or updates events as needed. the weekly digest is assembled and delivered through the same messaging channel lawyers use for ad hoc queries. the system runs on commodity server hardware in the firm's Bangkok office. no GPUs required for inference at this scale. the language model handles natural language queries and summarization. document parsing, OCR, embedding, and structured extraction are deterministic code. the boundary between model inference and computation is explicit, which matters for a firm that needs to explain how its tools work to clients and regulators.
private AI systems for law firms, clinics, and businesses running on your own infrastructure.
deployment model, privacy architecture, and the engagement process.
your documents already contain the answers.
the same architecture that powers document search for a law firm handles patient records for clinics, tracks health data for individuals, and coordinates operations for construction companies. the conversation starts with your archive and your constraints.