guide

How to Build: Bloodwork Tracking with Private AI

Ingest lab reports from any source, track biomarker trends, and automate retest scheduling. Private AI on your own hardware.

overview

What it is: a private biomarker database with longitudinal tracking

A private AI system that keeps your complete bloodwork history in a local database, structured and queryable. It ingests reports from email, photos, PDFs, and direct messages, normalizes marker names across labs, and tracks every value over time. When results arrive, the system compares them against both lab reference ranges and whatever targets you've set based on your own health goals. Trend detection runs across every draw. A retest schedule advances automatically each time a marker comes in.

The part that makes this different from uploading reports to a health portal: you own the data, the stack, and the query surface. No vendor decides what charts you're allowed to see. No subscription tier gates access to your own history. When you want to know how your ferritin has moved over the past year, whether your homocysteine started improving after a dietary change, or how your inflammation markers looked during a heavy training block, you just ask. The answer comes from your database, not a dashboard someone else designed.

The agent monitors things on your behalf and sends you results through whatever channel you use. You don't check a portal. You don't open an app and scroll through tabs. The system comes to you.

usage

How you use it: forwarding results, asking questions, and automatic alerts

You get bloodwork done. The results get to your system through whichever door is easiest. Forward the lab's email to your AI. Set up a rule so anything from your lab's domain lands in a monitored inbox automatically. Photograph the paper printout and send it to your bot on Telegram. Screenshot the patient portal and paste it in a chat. Save the PDF to a synced folder and let the watcher pick it up. Type the values yourself if you want. It doesn't matter. The data gets in, and the system handles the rest.

For most people, email is the path of least resistance. Labs already send results by email. One forwarding rule and every future draw flows in automatically. If your lab allows it, you can give them a dedicated address and skip the forwarding step entirely. Either way, it's one-time setup.

Once data is in, the system runs on a schedule in the background. A daily check surfaces anything due for retesting and sends a reminder if something is coming up. A weekly summary flags markers outside your personal targets. You only hear from the system when there's something to tell you.

The other half is conversational: ask whatever you want. How has my ApoB moved over the last four draws? Which markers are currently outside my targets? What changed between my March and September panels? Can you show me everything in the hormone category from this year? The agent queries your local database and gives you an answer in plain language. No predefined views, no export step, no portal login. It's your data, sitting in a database on your machine, and you can ask about it any way you want.

To adjust a target range, tell the agent. To update a retest cadence, tell the agent. To add a new marker you want to track, tell the agent. The agent calls the underlying CLI tool. You never touch the database directly.

architecture

Architecture: SQLite, Node.js CLI, and OpenClaw plugin

The stack described here is Node.js, SQLite, and a CLI that the AI agent calls as a tool. Python works equally well, with from stdlib and or for the CLI. Postgres or DuckDB works if you prefer. The specific tools matter less than the structure: deterministic code handles all the data work, the agent handles conversation, and the CLI is the boundary between them.

Ingestion. Reports enter the system as structured JSON through the CLI's command. A typical payload includes a report_type ("bloodwork"), a report_date, a source_name for the lab, and an array of results where each result has a marker name, numeric value, unit, and the lab's reference range. Keeping extraction separate from ingestion proper is a good design decision: the ingestion layer validates and inserts, while extraction logic lives elsewhere and can be swapped or improved without touching the database.

Email ingestion is the most hands-off path. OpenClaw's email tool monitors a dedicated inbox via IMAP ( or ). When a message arrives from a known lab sender, the agent downloads the PDF attachment and extracts the marker table. For labs you use regularly, a deterministic PDF parser handles this reliably: a coding agent writes a one-time parser for that lab's specific layout using (Python) or (Node), and every future report from that lab processes without AI involvement. For labs you use once or occasionally, the agent reads the PDF directly and extracts values from the image. Either way, output is validated against the schema before ingestion. When you configure email ingestion, filter by sender domain: only process emails from addresses you trust, and process the PDF attachment rather than the freeform email body. This keeps the prompt injection surface minimal.

Image and messaging ingestion handles paper reports, patient portal screenshots, and anything photographed and sent through Telegram, Signal, or Discord. The agent reads the image, identifies the marker table, extracts values, and validates before ingesting. Clean tabular lab printouts work well. For anything ambiguous, the agent flags uncertain extractions for confirmation rather than ingesting silently.

File watch covers PDFs dropped into a synced folder. (Node) or (Python) detects new files and triggers extraction. Whatever path delivers the data, the agent handles extraction and the CLI handles ingestion. The user doesn't touch structured data.

Marker normalization. Lab reports are inconsistent about naming. "CRP (High Sensitivity)", "hs-CRP", and "C-Reactive Protein, High Sensitivity" all mean the same marker. "Free T3", "FT3", and "Triiodothyronine, Free" are the same thing from different printers. The canonicalization layer normalizes these at ingestion time using a static alias lookup table mapping known variants to canonical snake_case identifiers: crp_hs, free_t3, ferritin, apob, homocysteine. This should be deterministic code, not LLM inference. A lookup table is correct every time, auditable, and doesn't change behavior when you update your model. Category assignment follows the same pattern: a hardcoded map from canonical marker name to category (ferritin to hematology, homocysteine to cardiovascular, dhea_s to hormone, alt to liver).

Storage. SQLite with (Node) or (Python) is the right starting point. No server to run, no configuration to maintain, single file to back up. The point is a real relational database with queries, indexes, unique constraints, foreign keys, and transactions, not a folder of JSON files. When you want to ask "show me all my metabolic markers over five years," you want a SQL query. Three tables carries you a long way. health_reports stores one row per lab visit with report_date, report_type, source_name, and a unique constraint on (report_date, report_type) to prevent duplicates. health_results stores one row per marker per report, foreign-keyed to health_reports, with the canonical marker name, numeric value, unit, the lab's flag and reference range, and category. health_targets stores your personal optimal ranges: target_low, target_high, default_cadence_days, next_due, and notes. The database file lives on your local disk with no network interface. Disk encryption (FileVault on macOS, LUKS on Linux) handles at-rest protection automatically.

CLI surface. Every operation is a subcommand that returns structured JSON. The CLI never outputs conversational text. The agent handles presentation. This separation makes the CLI independently testable (pipe to jq, assert on fields), usable from scripts, and trivially wrappable as an OpenClaw plugin. Key subcommands include: report-add for ingesting a new report, which returns inserted counts and a next_step field telling the agent what to do next; latest for current values across all markers with flags against both lab ranges and personal targets; marker for a longitudinal view of a single marker across all draws; category for all markers in a group; flags for everything out of personal target range ordered by severity; trend for direction computed from the last several draws using numeric comparison rather than LLM interpretation; report for a full report view or side-by-side comparison; review for a specific report with each marker's value, target, trend, and active medications; targets for listing all personal targets; and due-soon for markers approaching their retest date.

Scheduling. OpenClaw's cron system runs tools on a schedule and delivers output through whichever messaging channels you've connected. due-soon returns JSON. It doesn't send messages. That's OpenClaw's job. The CLI doesn't need Telegram credentials, and changing your notification channel doesn't require touching the CLI code. Two scheduled jobs cover most needs: a daily job for due-soon and a weekly job for flags. The email ingestion chain is worth understanding end-to-end: the cron job fires, checks the inbox, finds a new result from Greenfield Health Labs, downloads the PDF, runs the parser, validates the output, and calls report-add. If ingestion succeeds, it runs report-review, formats the output, and sends a summary to Telegram. The whole chain is OpenClaw orchestrating tools that are already registered in the same instance.

Integration points. Because everything runs as tools inside the same OpenClaw instance, the agent can combine them in a single conversation turn without any glue code. Medication correlation is the most useful: you started a new supplement and want to know whether your inflammatory markers moved. The medication tracker knows when you started it, the bloodwork tracker has your CRP history, and the agent presents the timeline without custom joining code. Calendar integration works the same way: after ingesting a new draw and advancing retest dates, the agent creates calendar events via CalDAV and sends you a summary. The most common cross-domain use is a question during the day: "How did my thyroid markers look during the period when my sleep scores were dropping?" Sleep data lives in one table, bloodwork in another, both local, both queryable, and the agent pulls from both in the same response.

Platform considerations. Bloodwork tracking doesn't depend on a phone data bridge. Reports come in through email, photos, or files regardless of platform. If you want bloodwork alongside continuous health metrics, you need a data bridge from the phone. On iOS, Health Auto Export pushes HealthKit data as JSON over HTTP to a local server behind a tunnel (Cloudflare Tunnel, Tailscale Funnel, or ngrok). On Android, Health Connect does the same. OpenClaw runs on macOS, Linux, or any Node.js environment. The server runs on whatever hardware you have: a Mac Mini in your Bangkok apartment, a Raspberry Pi, a VPS. The phone pushes to the server, and OpenClaw orchestrates everything else.

development

Development: building the CLI and wiring OpenClaw integration

How a coding agent builds this

A coding agent (Claude Code, Codex, or whatever you prefer) builds this iteratively. Resist the urge to design every subcommand before writing any code. You don't know which queries matter until you have real data to run against. In practice, gaps show up after the first real draw, not before.

Start with the schema and a single subcommand. Scaffold the three tables, indexes, constraints, and report-add. Nothing else yet. Get a single real lab report into the database and run raw SQL queries to confirm the data looks right. Fix the schema now if anything is wrong. Changing a schema after six months of data is painful in a way that changing it after one row is not.

Write tests at the same time as the code. This is the step most people skip and later regret. A coding agent works fast, introduces regressions fast, and generates changes that are hard to audit without a test suite. Write a test for report-add before building anything else: known JSON input, expected row counts, expected marker values in the DB, run against a fresh temp database created and destroyed per test. When the coding agent refactors canonicalization three iterations later and something breaks, the test fails immediately rather than surfacing two weeks later.

Read subcommands come one at a time. Add latest, then marker, then category, then flags. Get each one working and tested before starting the next. trend should wait until you have at least two real draws to validate against. review depends on personal targets being set, so build targets first. Build what you'll use immediately.

The OpenClaw plugin wrapper converts the CLI into a conversational tool. The plugin defines each subcommand as a named action with a typed parameter schema, calls the binary, and returns JSON output. The typing matters: the agent can't invoke a nonexistent operation because the schema won't validate it. Keep the plugin as a thin wrapper. Business logic belongs in the CLI, not the plugin.

Email integration is worth prioritizing early if your lab already sends results electronically. It removes the only manual step in the system. Start with sender-based filtering rules, then write the PDF extraction path for your recurring lab. A deterministic parser for your specific lab's format is a one-time investment that handles every future draw without AI involvement.

OpenClaw cron jobs come after the plugin. Set up the daily due-soon job first. Make sure the full delivery path works: job fires, CLI runs, output reaches Telegram (or wherever). Then add the weekly flags job. Two jobs is enough to start.

Git from the first scaffold, not once things are "ready." Commit after every subcommand. If the coding agent refactors marker canonicalization at iteration six and breaks something from iteration two, git diff tells you exactly what changed.

Best practices

Deterministic logic over LLM inference matters more here than almost anywhere else in a health data system. Canonicalization, trend detection, threshold comparison, flag generation: all of these should be deterministic code with deterministic test coverage. The alias lookup table for marker names doesn't drift based on which model you're running or how a prompt was phrased. If you catch yourself writing a prompt that says "determine whether this marker name is equivalent to the canonical name," stop and write a lookup table instead.

Structured data in a real database from the start. Bloodwork accumulates over years, and you will want to ask questions you haven't thought of yet. "Show me everything in my cardiovascular panel over five years" is a SQL query. It's not something you can answer from a folder of JSON files without writing custom reconstruction code every time.

Embed next_step guidance in CLI output. After report-add, the output includes a next_step field pointing to report-review. After report-review, each marker includes a set_due_command with the exact invocation needed to advance its retest date. The agent follows the workflow chain without those instructions living in a system prompt.

Back up what's irreplaceable. The code is in git. Your bloodwork history is not. A scheduled backup job dumps the SQLite file, compresses it, and pushes to a private repository or encrypted offsite storage.

Tests for the canonicalization table specifically. Alias resolution is the most likely source of silent data quality problems. If a lab renames a marker and your alias table doesn't include the new name, the marker ingests as an unrecognized entry. A fixture-based test catches this immediately.

See also: Best Practices for Private AI Systems for the full list.

models

Models: vision for ingestion, deterministic code for everything else

Almost all of this system is deterministic code. Canonicalization, trend detection, threshold comparison, flag generation, and schema validation are all if-statements and arithmetic in the CLI. The model does three things: it interprets natural language questions, renders query results in readable form, and extracts structured data from PDFs and images during ingestion.

For extraction specifically, vision capability and structured output quality matter directly for data quality. A model that misreads a marker table produces wrong values in your database, which is worse than no ingestion at all. For conversation and result presentation, the requirements are loose and almost any recent model handles it well. For the coding agent that builds the system, you need strong code generation and enough context window to iterate on a full codebase without losing earlier decisions.

These three roles don't require the same model and in practice shouldn't all be optimized for the same deployment. If you're running locally, a capable vision model for ingestion and a smaller conversational model for queries is a reasonable split. If you're on API models, the cost profile across those three use cases is worth calculating before settling on an approach.

See Choosing Models for Private AI Systems for the full breakdown on local vs API, hardware requirements, and cost modeling.

security

Security: keeping bloodwork data private on local hardware

Bloodwork data is sensitive. The architecture keeps it on your hardware with no cloud copy, which handles the biggest risk. But the inbound surfaces are worth understanding.

Email is the primary attack surface. If your system monitors an inbox, anything delivered to that inbox enters the agent's context. Someone who knows the bot's email address can send it text, and that text gets read. The mitigation is sender filtering: only process emails from your lab's known domain, your own forwarding address, or a small allowlist. Process the PDF attachment, not the freeform email body. Unknown senders should be logged and skipped, not acted on. Prompt injection via email is a real attack class, and narrow sender filtering is the practical defense.

Messaging ingestion has the same shape on a smaller scale. If your bot is in a group chat or has a public address, anyone can send it content. In a private one-on-one chat, only you can. Configure messaging platforms to restrict who can initiate conversations with the bot.

HTTP webhook endpoints need a shared secret in the request header. The server rejects requests without it. Use a tunnel service (Cloudflare Tunnel, Tailscale Funnel, or ngrok) rather than exposing a local port directly. The tunnel handles HTTPS. The endpoint validates the payload schema before processing.

Local storage is protected by OS-level disk encryption. FileVault on macOS, LUKS on Linux. The SQLite file has no network interface. Filesystem permissions control who can read or write it. Backups pushed offsite should be encrypted or stored in a private repository.

The overall exposure is low. Your data doesn't touch third-party infrastructure except through paths you've explicitly configured. The most realistic risks are a misconfigured email forwarding rule that accepts messages from unexpected senders, or a messaging channel left open to a group. Both are straightforward to manage.

See also: Security Considerations for Private AI Systems for the full reference.

timeline

Timeline: from first lab report to automated monitoring

The minimum useful version is a CLI that ingests a JSON payload and returns the latest values: report-add and latest. Get your first real lab report into the database and query it. At that point you have structured, queryable bloodwork data instead of a PDF you'll never find again.

Extend from there based on what you actually need next. When your second draw comes in, marker becomes worth having so you can see the longitudinal view. After talking to a doctor about optimal ranges, health_targets and flags become the most-used surfaces. Once you're tracking enough markers that you'd lose track of retest timing, due-soon with a scheduled check pays back immediately.

The OpenClaw plugin wrapper comes in when running subcommands by hand stops feeling natural. That usually happens around the third or fourth draw, when the questions start getting interesting and you want to ask them conversationally. Email integration is worth setting up early if your lab already sends results electronically, since it removes the only manual step. PDF parsers come in once you've drawn from the same lab twice and confirmed the format is stable.

Medication correlation, calendar integration, and cross-domain queries with sleep or activity data are late-stage additions. They depend on having useful history in the bloodwork database first. Build the core, use it, and add integrations where the actual gaps show up rather than the theoretical ones.

personal

Personal use cases: catching trends and preparing for appointments

Most bloodwork history lives in a folder of PDFs. Each one is a snapshot. Together they're a record, but only if they're in one place and structured. The situations below are where having the full history in a private AI system changes what you know about your own health.

You've been managing borderline HbA1c for two years with diet changes. Your endocrinologist follows up every three months. With each report going into a PDF folder, your sense of progress is whatever gets summarized at the appointment. With this system, you ask "how has my HbA1c moved over the last two years?" before the visit and see the actual trajectory: four draws, specific values, the direction. The appointment becomes a conversation about your data rather than a briefing you're receiving for the first time.

A runner tracking ferritin has a specific problem: ferritin can drop quietly for months before symptoms show up. The OpenClaw agent runs a weekly flags check. One week it sends a message: ferritin is at 18 ng/mL, down from 31 six months ago, now below the personal target of 25. The lab's reference range flags 12 as the lower limit, so nothing triggered in the official report. The personal target caught it four months before it would have become a clinical problem.

Eighteen months ago, a functional medicine practitioner ordered a full cardiovascular panel including ApoB and Lp(a) alongside the usual lipids. That panel went to a different lab than the standard draws. Both sources now feed the same private AI system. When asking about cardiovascular risk markers, the answer includes everything from both labs sorted chronologically, with the specialty markers appearing alongside the standard ones.

Your system has a 90-day target cadence set on your inflammation panel. Three months after your last draw, a message arrives: CRP, fibrinogen, and homocysteine are all due within the next ten days. You book the appointment without opening a calendar or doing any mental tracking. Those cadences were set automatically when the last results came in.

Before your annual primary care appointment, you ask the OpenClaw agent to pull everything flagged against your personal targets across the last twelve months. Six markers come back. You ask for the trend on each. Three are improving, two are stable, one keeps creeping up. You walk in with specific questions rather than a vague concern about something you half-remember from a PDF months ago.

Your lab emailed results on a Friday evening. By Saturday morning, a message arrives on Telegram: everything in the metabolic panel is consistent with last quarter, but DHEA-S has dropped 30% from your previous draw and is now below your target range. The system noticed because it compared this draw to the prior one numerically. Your previous draw had been slightly low too, which is now visible in the trend view.

business

Business use cases: clinics, research teams, and multi-lab practices

Practices that order bloodwork regularly are already doing manual tracking of some kind. The question is what that costs in time and what gets missed when the volume gets high enough. The scenarios below are where a private AI system changes the workflow.

A longevity clinic runs comprehensive quarterly panels on twenty clients: full lipids including ApoB and Lp(a), inflammatory markers, complete hormones, metabolic, and micronutrients. Forty markers per client, quarterly cadence. Today that's a staff member reviewing PDFs, manually recording values, and emailing summaries. With this system, results ingest automatically from the lab's email, per-client history is in a structured database, and the physician gets a flags report before each consultation showing exactly which markers need discussion. The staff time shifts from data entry to patient contact.

A fertility clinic follows hormonal panels through treatment cycles. FSH, LH, estradiol, AMH, and progesterone all move fast and matter in specific windows. Results arrive from the same two labs on a predictable cadence. The system has a deterministic parser for each lab's format. Results ingest automatically, trends are tracked per patient, and the coordinator gets a daily flags summary showing anyone whose values are outside the target window for their current cycle phase.

An occupational health provider handles annual physicals for corporate clients: one hundred employees, one draw per year. Not continuous monitoring, but historical comparison matters. Is this employee's liver enzyme trend concerning relative to their prior draws? The system keeps cumulative history across years. When the annual panel comes in, the OpenClaw agent surfaces any meaningful changes since the previous draw without someone manually pulling up last year's values.

An independent practitioner sees patients across multiple specialties and uses four different labs depending on the panel needed: a hospital lab for standard draws, a specialty reference lab for hormones, a functional medicine lab for micronutrients, and a direct-to-consumer lab patients sometimes bring in themselves. Normalizing marker names across four sources into a single queryable timeline is the core problem. During a consultation at their Bangkok practice, the practitioner asks "show me everything we have on this patient's thyroid and inflammatory markers" and gets a unified answer from all four sources, sorted chronologically.

A clinical research group follows biomarker panels on study participants over twelve months. They need structured, consistently named data, not a folder of PDFs, and they need to query across the whole cohort: "show me all CRP values over time, grouped by cohort and sorted by draw date." The system's canonical marker names and SQLite storage make this a straightforward query. Exporting for analysis doesn't require a data cleaning step because the ingestion layer already normalized everything.

Our services

See how we build private health tracking systems for individuals and practices.

How it works

Learn about our process for building custom AI systems on your hardware.

Ready to build your bloodwork tracking system?

We help individuals and practices build private, AI-powered health data systems that run entirely on your own hardware.

book a consultation