Usage
Parsing Documents
Use the Parse facade. It takes a path on a Laravel filesystem disk (resolved like
Storage::get(), so you never hand-build an OS path):
use ParseForArtisans\Facades\Parse;
$parse = Parse::file('contracts/foo.pdf')->parse(); // default disk
$parse = Parse::disk('s3')->file('contracts/foo.pdf')->parse();
Tie the job to one of your models with ->for($model). The event hands that record straight
back, so you never store or match an id yourself:
$parse = Parse::file('contracts/foo.pdf')
->for($document) // associate with your model
->to('contracts/foo.md') // optional: defaults to parsed/contracts/foo.pdf.md
->parse();
$parse->id; // uuid
$parse->status(); // 'pending'
By default the output mirrors the source path under parsed/, keeping the source filename and
appending .md (so contracts/foo.pdf becomes parsed/contracts/foo.pdf.md, and a .docx next
to it never collides). Override per-file with ->to(...).
Correlating results with your models
Parsing is async: the Markdown shows up later, in a ParseCompleted event. So you need a way
to know which of your records a finished document belongs to. ->for($model) handles this:
it records the association on the SDK's own parse_requests row (a polymorphic relation, so
nothing to add to your schema), and the event gives the model back:
Parse::file('contracts/foo.pdf')->for($document)->parse();
// later, in your listener:
$document = $event->request->parsable; // your Document model, typed
Want the reverse link on your model? Add one line, no migration:
class Document extends Model
{
public function parse()
{
return $this->morphOne(\ParseForArtisans\Models\ParseRequest::class, 'parsable');
}
}
$document->parse?->status(); // 'completed'
No model to point at? (a one-off script, a batch of loose files, or a record you only
create after parsing). Skip ->for() and attach your own context with ->withMeta():
$parse = Parse::file('contracts/foo.pdf')
->withMeta(['tenant_id' => $tenant->id, 'source' => 'bulk-import']) // any scalars you like
->parse();
// in your listener:
$tenantId = $event->request->meta['tenant_id'];
->for() and ->withMeta() compose: use the relation for the model, meta for extra context.
Parse a URL
Already have the document at a public URL? Skip the upload; we fetch it. Try it right now in
php artisan tinker with our sample file:
$parse = Parse::url('https://parseforartisans.com/samples/invoice.pdf')->parse();
From a file upload
There's no inline Markdown to return, and parsing happens in your bucket, so the idiomatic Laravel move is store the upload, then parse the stored path:
use Illuminate\Http\Request;
public function store(Request $request)
{
$request->validate([
'document' => ['required', 'file', 'mimes:pdf,docx,xlsx', 'max:51200'],
]);
$path = $request->file('document')->store('uploads', 's3'); // your disk
$document = Document::create(['source_path' => $path]);
$parse = Parse::disk('s3')->file($path)->for($document)->parse();
// Hand the id to your frontend so it can poll ->status(), or just wait for the event.
return response()->json(['parse_id' => $parse->id]);
}
You don't tell us the file type; we detect it (PDF, Word, PowerPoint, Excel, email, and more) and route it automatically.
Batch processing
Hand Parse::files() an array (or collection) of paths instead of a single file. They're
submitted as one batch and you get back a collection of ParseRequest models, one per file:
$paths = Storage::disk('s3')->files('contracts');
$batch = Parse::disk('s3')->files($paths)
->frontmatter(true) // options apply to every file in the batch
->parse();
$batch->count(); // how many were submitted
$batch->pluck('id'); // the parse references the SDK is tracking
Each file fires its own ParseCompleted event as it finishes, so your published listener
handles results the same way whether you submitted one file or ten thousand.
Submitting thousands at once? Wrap the
->parse()call in your own queued job so the submissions run in the background with retries. The SDK doesn't force a queue; it's your call.
Options
Chain options before ->parse():
$parse = Parse::file($path)
->for($document) // associate with one of your models, handed back in the event
->withMeta([...]) // attach arbitrary scalar context, handed back in the event
->ocr(true) // force OCR (auto-detected by default for scanned PDFs)
->ocrLanguage('spa') // OCR language(s) for scanned PDFs, e.g. 'spa' or 'eng+fra'
->pages('1-20') // only these pages
->frontmatter(true) // prepend YAML frontmatter (author, dates, page count)
->to('out/foo.md') // override output path (default: parsed/<source>.md, e.g. parsed/contracts/foo.pdf.md)
->parse();
Options apply where they fit the format:
ocr()andocrLanguage()affect PDFs, andpages()works on paginated formats (PDF, PowerPoint, spreadsheets).