Reference

API Reference

The HTTP API for submitting documents directly, without the Laravel SDK. Use it to integrate from any language or runtime. The SDK is the recommended path for Laravel applications and wraps everything described here.

All parsing through the API uses bring-your-own-storage (BYO): your files stay in your own bucket and no document bytes pass through our servers. You hand us a presigned URL to read the source and a presigned URL to write the result. The managed storage mode used by the SDK in local development is not available on the API.

The base URL is https://parseforartisans.com/api/v1. All requests and responses are JSON unless noted.

Authentication

Every request is authenticated with a bearer token. Create an API key in the dashboard under API Keys, then send it in the Authorization header:

Authorization: Bearer <your-api-key>

A key is scoped to the team it was created in. Requests with a missing or invalid key return 401 with the error type invalid_api_key.

GET /ping is a lightweight authenticated endpoint for verifying a key. It returns 200 when the key is valid.

Core concepts

Asynchronous. Submitting a document returns immediately with a job in the pending state. The work happens in the background. You learn the outcome by polling the status endpoint or by receiving a webhook.

Client-generated id. You generate the job id (a UUID) and send it with the submission. This is the idempotency key: re-submitting the same id does not create a second job, it returns the state of the existing one. Generate a fresh UUID per document and persist it so you can correlate the result later.

Bring-your-own-storage. You provide two presigned URLs for each job:

  • file_url: a presigned GET URL we use to download the source document.
  • upload_url: a presigned PUT URL we use to upload the resulting Markdown.

The parsed Markdown is written to your upload_url location. It never lands on our storage, so there is no result-download endpoint in BYO mode. Once the job is completed, read the Markdown from your own bucket.

Presigned URL lifetime. Both URLs must stay valid long enough to cover queue time plus processing. Sign them with a generous expiry; one hour is a safe default for typical documents and well beyond the processing time for most files. If a URL has expired by the time we use it, the job fails.

Submit a document

POST /parse

Submit one document for parsing. Returns 202 with the job id and status.

Field Type Required Description
id string (UUID) yes The job id you generate. Idempotency key.
extension string yes The source file extension, lowercase, without a dot (for example pdf). Must be a supported type.
filename string no An optional label for the source file. Does not affect routing; the type is determined by extension.
source object yes The BYO storage descriptor.
source.mode string yes Must be byo.
source.file_url string (URL) yes Presigned GET URL for the source document.
source.upload_url string (URL) yes Presigned PUT URL where the Markdown result is written.
delivery object no How you want to be notified of completion. Defaults to polling.
delivery.mode string no poll or webhook. Defaults to poll.
delivery.callback_url string (URL) no Required when delivery.mode is webhook. Must be https. See Webhooks.
options object no Parsing options.
options.force_ocr boolean no Force OCR. OCR is auto-detected for scanned PDFs by default.
options.ocr_language string no OCR language hint, for example eng or eng+fra.
options.pages string no Restrict to a page range, for example 1-20. Only valid for paginated formats.
options.frontmatter boolean no Prepend YAML frontmatter (author, dates, page count) to the Markdown.

Response

202 Accepted

Field Type Description
id string The job id.
status string The job status, pending on a new submission.

Re-submitting an id that already exists returns 202 with the existing job's current status rather than creating a new job.

Supported extensions

pdf, docx, pptx, xlsx, csv, doc, ppt, xls, eml, msg. An unsupported value returns 422 with the type unsupported_type.

The pages option is only meaningful for paginated formats: pdf, pptx, xlsx, csv, ppt, xls. Sending it for any other extension returns 422 with the type unsupported_option.

Check status

GET /parse/{id}

Returns the current state of a job. Scoped to the authenticated team; an unknown id returns 404.

Field Type Description
id string The job id.
status string One of pending, processing, completed, failed.
page_count integer or null Number of pages parsed, once known.
credits_used integer or null Credits consumed by the job, once known.
started_at string or null ISO 8601 timestamp when processing began.
completed_at string or null ISO 8601 timestamp when the job reached a terminal state.
duration_ms integer or null Processing duration in milliseconds, once complete.
error object or null Present only when status is failed. Contains type and message.

The response is returned at the top level, with no data envelope. Additional fields may be added over time, so parse defensively and ignore unknown fields.

Status lifecycle

A job moves pending to processing to a terminal state of either completed or failed. Poll until status is terminal, or use a webhook to avoid polling.

Retrieve the result

In BYO mode the Markdown is written to the upload_url you supplied at submission. Once status is completed, read the object from your own bucket. There is no result-download endpoint, because the bytes never reach our storage.

Webhooks

Set delivery.mode to webhook and provide an https delivery.callback_url to be notified when a job reaches a terminal state instead of polling. The callback host must be publicly resolvable; URLs that resolve to private, loopback, link-local, or cloud-metadata addresses are rejected at submission.

When the job finishes we send a POST to your callback URL. The request body is the same JSON object returned by the status endpoint. Delivery is retried with backoff (up to five attempts) if your endpoint is unavailable.

Verifying the signature

Each webhook carries an X-Parse-Signature header so you can confirm it came from us and was not modified. The header has the form:

X-Parse-Signature: t=<timestamp>,v1=<signature>

t is a Unix timestamp and v1 is a hex-encoded HMAC-SHA256 signature. To verify:

  1. Read t and v1 from the header.
  2. Build the signed payload by joining the timestamp and the raw request body with a single period: <t>.<raw-body>. Use the exact bytes of the body, not a re-serialized copy.
  3. Compute HMAC-SHA256 over that string using your team's webhook signing secret (the whsec_ value from the dashboard) as the key, hex-encoded.
  4. Compare the result to v1 using a constant-time comparison. Reject the request if they do not match.

Optionally reject requests whose t is too far from the current time to limit replay. Rotating the signing secret in the dashboard invalidates signatures verified against the previous value.

Errors

Errors return a non-2xx status and a JSON body with an error object:

Field Type Description
error.type string A stable, machine-readable error code.
error.message string A human-readable description.
Type Status Meaning
invalid_api_key 401 Missing or invalid API key.
invalid_request 400 The request body failed validation.
unsupported_type 422 The extension is not a supported file type.
unsupported_option 422 An option is not valid for the given file type, such as pages on a non-paginated format.