Skip to main content

UnstructuredLoader

Compatibility

Only available on Node.js.

This notebook provides a quick overview for getting started with UnstructuredLoader document loaders. For detailed documentation of all UnstructuredLoader features and configurations head to the API reference.

Overview

Integration details

  • TODO: Fill in table features.
  • TODO: Remove JS support link if not relevant, otherwise ensure link is correct.
  • TODO: Make sure API reference links are correct.
ClassPackageCompatibilityLocalPY support
UnstructuredLoader@langchain/communityNode-only

Setup

To access UnstructuredLoader document loader you’ll need to install the @langchain/community integration package, and create an Unstructured account and get an API key.

Local

You can run Unstructured locally in your computer using Docker. To do so, you need to have Docker installed. You can find the instructions to install Docker here.

docker run -p 8000:8000 -d --rm --name unstructured-api downloads.unstructured.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0

Credentials

Head to unstructured.io to sign up to Unstructured and generate an API key. Once you’ve done this set the UNSTRUCTURED_API_KEY environment variable:

export UNSTRUCTURED_API_KEY="your-api-key"

Installation

The LangChain UnstructuredLoader integration lives in the @langchain/community package:

yarn add @langchain/community

Instantiation

Now we can instantiate our model object and load documents:

import { UnstructuredLoader } from "@langchain/community/document_loaders/fs/unstructured";

const loader = new UnstructuredLoader(
"../../../../../../examples/src/document_loaders/example_data/notion.md"
);

Load

const docs = await loader.load();
docs[0];
Document {
pageContent: '# Testing the notion markdownloader',
metadata: {
filename: 'notion.md',
languages: [ 'eng' ],
filetype: 'text/plain',
category: 'NarrativeText'
},
id: undefined
}
console.log(docs[0].metadata);
{
filename: 'notion.md',
languages: [ 'eng' ],
filetype: 'text/plain',
category: 'NarrativeText'
}

Directories

You can also load all of the files in the directory using UnstructuredDirectoryLoader, which inherits from DirectoryLoader:

import { UnstructuredDirectoryLoader } from "@langchain/community/document_loaders/fs/unstructured";

const directoryLoader = new UnstructuredDirectoryLoader(
"../../../../../../examples/src/document_loaders/example_data/",
{}
);
const directoryDocs = await directoryLoader.load();
console.log("directoryDocs.length: ", directoryDocs.length);
console.log(directoryDocs[0]);
Unknown file type: Star_Wars_The_Clone_Wars_S06E07_Crisis_at_the_Heart.srt
Unknown file type: test.mp3
directoryDocs.length:  247
Document {
pageContent: 'Bitcoin: A Peer-to-Peer Electronic Cash System',
metadata: {
filetype: 'application/pdf',
languages: [ 'eng' ],
page_number: 1,
filename: 'bitcoin.pdf',
category: 'Title'
},
id: undefined
}

API reference

For detailed documentation of all UnstructuredLoader features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html


Was this page helpful?


You can also leave detailed feedback on GitHub.