> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.cloudraker.com/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.cloudraker.com/_mcp/server.

# Extract data from documents

A **Read documents** action — also called an **extract** action — has Florent read your files and pull the values you care about into fields your team reviews. Set it up once, test it on a few sample files, then run it over many documents at once or use it as a step inside a playbook.

This page covers building and activating an extract action. For what an action is and the other kinds, see [Actions overview](/actions/overview). To run a finished action over many files and review every result, see [Batch action runs](/actions/batch-runs).

Building, testing, and activating actions is for **Builders** and **Admins**. If you open an action editor without access, you'll see "You don't have access to actions." Ask an admin if you need it. See [Roles and permissions](/admin/roles-and-permissions).

## Create the action

In the sidebar, under **Building Blocks**, click **Actions**.

Click **New action**. In the **New action** dialog, under **What should it do?**, choose **Read documents** — "AI reads files and fills in fields your team review."

Give it a clear **Name** (for example, "Extract key details"). Add an optional **Description** — this is what teammates see in the actions list. Then click **Create action**.

You land straight in the action editor with a **Draft** badge. A draft can't be run yet — you'll set its output and instructions, test it, then activate it.

## The extract action editor

The editor is a single page with two setup panels — **Output** and **How it runs** — and a **Test bench** on the right. The header shows the action name, the **Draft** badge, a **Save** button (active only when you have unsaved changes), and an **Activate** button.

While the action is a draft, the header lists anything still blocking activation, such as:

* "Add extraction instructions before activating."
* "Pick a record type for the output."
* "Add at least one output field."

Work through both panels until those blockers clear and **Activate** lights up.

## Set the output — what each run extracts

The **Output** panel decides what comes out of every document. A toggle picks where the extracted data lands.

Choose **Record type** to write each result into rows of an existing [record type](/record-types/overview) — for example, an "Invoice" record type. Pick one from the **Pick a record type** dropdown. Each option shows the record type's name, its field count, and its description, so two similarly named types stay distinct.

The fields you extract are the record type's fields. When the action runs and a result is approved, it becomes a record in the project.

If the dropdown is empty, you have no record types yet. Create one in **Building Blocks → Record types** first — see [Build a record type](/record-types/build-a-record-type) — then come back.

Choose **Inline fields** to define fields that live only on this action, when you don't need a reusable record type.

* Click **Add field** to define each field (its label, key, and type).
* Each field shows a type badge, its label, its key, and a remove button.
* Use the **Label field** select to pick which field becomes each extracted record's title.

## Set how it runs — instructions, mode, and model

The **How it runs** panel tells Florent what to look for and how to process your files.

### Instructions

Write plain-language **Instructions** describing what to pull from each document — for example, "Extract the title, date, and key details from each document." This is the single biggest lever on quality. Be specific about what each field means and where it usually appears.

### Mode

The **Mode** dropdown sets how files map to results:

| Mode                                  | What it does                                                                                       |
| ------------------------------------- | -------------------------------------------------------------------------------------------------- |
| **Extract from each file**            | One result per input file. Use this when each document is its own record (one invoice per PDF).    |
| **Extract one combined result**       | One result drawn from all the files together. Use this when several files describe a single thing. |
| **Extract table rows from documents** | Pulls table rows out of documents — one result per row. Use this for line items or tabular data.   |

### Model

The **Model preset** picker chooses which AI model reads your documents. The default works for most cases; switch presets only if you have a reason to. For an exact model, set the **Provider** and **Model** below the preset.

## Test it on sample files

The **Test bench** on the right is a dry run. It proves your instructions and output fields work before you activate — and **nothing it produces is saved**. No records, no drafts, no files.

Click **Add sample files** and pick up to three representative documents. Only PDF and DOCX files are accepted. If you add more than three, only the first three are used.

Click **Run test**. Florent reads each file and fills a results table with one row per field, showing the **Field**, the extracted **Value**, and a **Score** (how confident Florent is in that value).

Check that each value is right. If a file yields nothing, you'll see "No fields extracted." If something looks off, refine your **Instructions** or **Output** fields and run the test again. Warnings appear in an amber banner above the table.

Test with your messiest, most awkward documents, not your cleanest ones. If the action handles the hard cases, the easy ones take care of themselves.

The score you see here is a preview of the same **confidence** signal you'll use during review. When the action runs for real, each value also carries a **citation** back to the exact page and text it came from, so reviewers can verify before approving. See [Reviewing AI work](/work/reviewing-ai-work).

## Activate the action

When the activation blockers are gone — you have instructions and either a record type or at least one inline field — click **Activate**. The badge changes to **Published**, and the action becomes runnable: as its own [batch action run](/actions/batch-runs) and as an [Action task](/playbooks/task-types) inside a playbook.

A **Draft** or **Archived** action can't be run. It won't appear in the batch launcher's action list until it's **Published**.

## After it's live

To edit a published action, open it, make your changes, and click **Save**. Changing the output or the run settings bumps the action's **version** — if a playbook pinned an older version, re-activate the action so that playbook adopts the change.

The **Draft** / **Published** / **Archived** lifecycle, versioning, and the archive-not-delete rule are the same for every action and are covered in [Actions overview](/actions/overview).

## Where to go next

Launch a batch action run and review every extracted result in a grid.

Approve, correct, or reject each value using its citation back to the source.

Shape the record type your extracted values are written into.

See the action kinds and how built-in and custom actions differ.