This notebook shows how to load email (.eml) or Microsoft Outlook (.msg) files.
Please see this guide for more instructions on setting up Unstructured locally, including setting up required system dependencies.
Using Unstructured
%pip install --upgrade --quiet unstructured
from langchain_community.document_loaders import UnstructuredEmailLoader
loader = UnstructuredEmailLoader("./example_data/fake-email.eml")
data = loader.load()
data
API Reference:UnstructuredEmailLoader
[Document(page_content='This is a test email to use for unit tests.\n\nImportant points:\n\nRoses are red\n\nViolets are blue', metadata={'source': './example_data/fake-email.eml'})]
Retain Elements
Under the hood, Unstructured creates different "elements" for different chunks of text. By default we combine those together, but you can easily keep that separation by specifying mode="elements".
loader = UnstructuredEmailLoader("example_data/fake-email.eml", mode="elements")
data = loader.load()
data[0]
Document(page_content='This is a test email to use for unit tests.', metadata={'source': 'example_data/fake-email.eml', 'file_directory': 'example_data', 'filename': 'fake-email.eml', 'last_modified': '2022-12-16T17:04:16-05:00', 'sent_from': ['Matthew Robinson <mrobinson@unstructured.io>'], 'sent_to': ['Matthew Robinson <mrobinson@unstructured.io>'], 'subject': 'Test Email', 'languages': ['eng'], 'filetype': 'message/rfc822', 'category': 'NarrativeText'})