# API Reference

The HTTP API for submitting documents directly, without the Laravel SDK. Use it
to integrate from any language or runtime. The SDK is the recommended path for
Laravel applications and wraps everything described here.

All parsing through the API uses bring-your-own-storage (BYO): your files stay in
your own bucket and no document bytes pass through our servers. You hand us a
presigned URL to read the source and a presigned URL to write the result. The
managed storage mode used by the SDK in local development is not available on the
API.

The base URL is `https://parseforartisans.com/api/v1`. All requests and responses
are JSON unless noted.

## Authentication

Every request is authenticated with a bearer token. Create an API key in the
dashboard under API Keys, then send it in the `Authorization` header:

`Authorization: Bearer <your-api-key>`

A key is scoped to the team it was created in. Requests with a missing or invalid
key return `401` with the error type `invalid_api_key`.

`GET /ping` is a lightweight authenticated endpoint for verifying a key. It
returns `200` when the key is valid.

## Core concepts

**Asynchronous.** Submitting a document returns immediately with a job in the
`pending` state. The work happens in the background. You learn the outcome by
polling the status endpoint or by receiving a webhook.

**Client-generated id.** You generate the job `id` (a UUID) and send it with the
submission. This is the idempotency key: re-submitting the same `id` does not
create a second job, it returns the state of the existing one. Generate a fresh
UUID per document and persist it so you can correlate the result later.

**Bring-your-own-storage.** You provide two presigned URLs for each job:

- `file_url`: a presigned `GET` URL we use to download the source document.
- `upload_url`: a presigned `PUT` URL we use to upload the resulting Markdown.

The parsed Markdown is written to your `upload_url` location. It never lands on
our storage, so there is no result-download endpoint in BYO mode. Once the job is
`completed`, read the Markdown from your own bucket.

**Presigned URL lifetime.** Both URLs must stay valid long enough to cover queue
time plus processing. Sign them with a generous expiry; one hour is a safe
default for typical documents and well beyond the processing time for most files.
If a URL has expired by the time we use it, the job fails.

## Submit a document

`POST /parse`

Submit one document for parsing. Returns `202` with the job id and status.

| Field | Type | Required | Description |
| --- | --- | --- | --- |
| `id` | string (UUID) | yes | The job id you generate. Idempotency key. |
| `extension` | string | yes | The source file extension, lowercase, without a dot (for example `pdf`). Must be a supported type. |
| `filename` | string | no | An optional label for the source file. Does not affect routing; the type is determined by `extension`. |
| `source` | object | yes | The BYO storage descriptor. |
| `source.mode` | string | yes | Must be `byo`. |
| `source.file_url` | string (URL) | yes | Presigned `GET` URL for the source document. |
| `source.upload_url` | string (URL) | yes | Presigned `PUT` URL where the Markdown result is written. |
| `delivery` | object | no | How you want to be notified of completion. Defaults to polling. |
| `delivery.mode` | string | no | `poll` or `webhook`. Defaults to `poll`. |
| `delivery.callback_url` | string (URL) | no | Required when `delivery.mode` is `webhook`. Must be `https`. See Webhooks. |
| `options` | object | no | Parsing options. |
| `options.force_ocr` | boolean | no | Force OCR. OCR is auto-detected for scanned PDFs by default. |
| `options.ocr_language` | string | no | OCR language hint, for example `eng` or `eng+fra`. |
| `options.pages` | string | no | Restrict to a page range, for example `1-20`. Only valid for paginated formats. |
| `options.frontmatter` | boolean | no | Prepend YAML frontmatter (author, dates, page count) to the Markdown. |

### Response

`202 Accepted`

| Field | Type | Description |
| --- | --- | --- |
| `id` | string | The job id. |
| `status` | string | The job status, `pending` on a new submission. |

Re-submitting an id that already exists returns `202` with the existing job's
current status rather than creating a new job.

### Supported extensions

`pdf`, `docx`, `pptx`, `xlsx`, `csv`, `doc`, `ppt`, `xls`, `eml`, `msg`. An
unsupported value returns `422` with the type `unsupported_type`.

The `pages` option is only meaningful for paginated formats: `pdf`, `pptx`,
`xlsx`, `csv`, `ppt`, `xls`. Sending it for any other extension returns `422`
with the type `unsupported_option`.

## Check status

`GET /parse/{id}`

Returns the current state of a job. Scoped to the authenticated team; an unknown
id returns `404`.

| Field | Type | Description |
| --- | --- | --- |
| `id` | string | The job id. |
| `status` | string | One of `pending`, `processing`, `completed`, `failed`. |
| `page_count` | integer or null | Number of pages parsed, once known. |
| `credits_used` | integer or null | Credits consumed by the job, once known. |
| `started_at` | string or null | ISO 8601 timestamp when processing began. |
| `completed_at` | string or null | ISO 8601 timestamp when the job reached a terminal state. |
| `duration_ms` | integer or null | Processing duration in milliseconds, once complete. |
| `error` | object or null | Present only when `status` is `failed`. Contains `type` and `message`. |

The response is returned at the top level, with no `data` envelope. Additional
fields may be added over time, so parse defensively and ignore unknown fields.

### Status lifecycle

A job moves `pending` to `processing` to a terminal state of either `completed`
or `failed`. Poll until `status` is terminal, or use a webhook to avoid polling.

## Retrieve the result

In BYO mode the Markdown is written to the `upload_url` you supplied at
submission. Once `status` is `completed`, read the object from your own bucket.
There is no result-download endpoint, because the bytes never reach our storage.

## Webhooks

Set `delivery.mode` to `webhook` and provide an `https` `delivery.callback_url`
to be notified when a job reaches a terminal state instead of polling. The
callback host must be publicly resolvable; URLs that resolve to private,
loopback, link-local, or cloud-metadata addresses are rejected at submission.

When the job finishes we send a `POST` to your callback URL. The request body is
the same JSON object returned by the status endpoint. Delivery is retried with
backoff (up to five attempts) if your endpoint is unavailable.

### Verifying the signature

Each webhook carries an `X-Parse-Signature` header so you can confirm it came
from us and was not modified. The header has the form:

`X-Parse-Signature: t=<timestamp>,v1=<signature>`

`t` is a Unix timestamp and `v1` is a hex-encoded HMAC-SHA256 signature. To
verify:

1. Read `t` and `v1` from the header.
2. Build the signed payload by joining the timestamp and the raw request body
   with a single period: `<t>.<raw-body>`. Use the exact bytes of the body, not
   a re-serialized copy.
3. Compute `HMAC-SHA256` over that string using your team's webhook signing
   secret (the `whsec_` value from the dashboard) as the key, hex-encoded.
4. Compare the result to `v1` using a constant-time comparison. Reject the
   request if they do not match.

Optionally reject requests whose `t` is too far from the current time to limit
replay. Rotating the signing secret in the dashboard invalidates signatures
verified against the previous value.

## Errors

Errors return a non-2xx status and a JSON body with an `error` object:

| Field | Type | Description |
| --- | --- | --- |
| `error.type` | string | A stable, machine-readable error code. |
| `error.message` | string | A human-readable description. |

| Type | Status | Meaning |
| --- | --- | --- |
| `invalid_api_key` | 401 | Missing or invalid API key. |
| `invalid_request` | 400 | The request body failed validation. |
| `unsupported_type` | 422 | The `extension` is not a supported file type. |
| `unsupported_option` | 422 | An option is not valid for the given file type, such as `pages` on a non-paginated format. |
