Skip to main content
When you make calls to the Unstructured Workflow Endpoint, you might need to include a secret as part of the request. This secret is typically something such as the contents of a private key file that a third-party service requires for programmatic authentication. These secrets are typically required when creating source connectors or destination connectors that work with specific third-party services. There are inherent risks to sending plaintext secrets over a network. For stronger security, you may choose to use Unstructured’s process for encrypting secrets locally as follows:
  1. Call Unstructured to get the RSA public key associated with your Unstructured user account.
  2. Verify the public key’s authenticity.
  3. Use this key to encrypt your plaintext secret locally.
  4. Register the encrypted version of the secret with your Unstructured account. Unstructured returns a unique ID for the registered secret, along with the type of encryption that was used.
  5. Specify the registered secret’s ID and encryption type in the call to the Unstructured Workflow Endpoint as needed.
The source and destination connectors that require you to follow this process currently include the following: Unstructured plans to support this workflow with other source and destination connectors in the future. The following sections describe how to complete the preceding process.

Requirements

You can use Python, or a REST API client such as curl or Postman, to complete the following steps. You must have the following:
  • For Python, Python installed on your local development machine and the unstructured-client package installed into your local Python virtual environment.
  • For REST, a REST API client such as curl or Postman installed on your local development machine.
  • An Unstructured account, including a valid Unstructured API key for that account. To get your API key, do the following:
    1. Sign in to your Unstructured account:
      • If you do not already have an Unstructured account, go to https://unstructured.io/contact and fill out the online form to indicate your interest.
      • If you already have an Unstructured account, sign in by using the URL of the sign in page that Unstructured provided to you when your Unstructured account was created. After you sign in, the Unstructured user interface (UI) then appears, and you can start using it right away. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.
    2. Get your Unstructured API key: a. In the Unstructured UI, click API Keys on the sidebar.
      b. Click Generate API Key.
      c. Follow the on-screen instructions to finish generating the key.
      d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
  • Some of the following steps also require you to specify the Unstructured Workflow Endpoint API URL for your Unstructured user account. This URL was provided to you when your Unstructured account was created. If you do not have this URL, contact Unstructured Sales at sales@unstructured.io.
    The default URL for the Unstructured Worfklow Endpoint is https://platform.unstructuredapp.io/api/v1. However, you should always use the URL that was provided to you when your Unstructured account was created.
  • The following steps assume that you have the following two environment variables set locally:
    • UNSTRUCTURED_API_URL, set to the Workflow Endpoint API URL for your Unstructured user account.
    • UNSTRUCTURED_API_KEY, set to the API key for your Unstructured user account.

Step 1: Get the RSA public key

In this step, you call the Unstructured Workflow Endpoint to get the the public key for your Unstructured user account. This public key is contained within a certificate. The certificate’s chain is also provided so that you can verify the public key’s authenticity in the next step.
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import RetrieveRequest

# This code assumes you want to use the default API URL for the 
# Unstructured Workflow Endpoint: https://platform.unstructuredapp.io/api/v1
# To use a different URL, set the UnstructuredClient constructor's 
# server_url parameter to the target URL.
with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.users.retrieve(
        request=RetrieveRequest()
    )

    print(response.pem_auth_response.pem_key)
The output looks similar to the following:
-----BEGIN PUBLIC KEY-----
MII...YTv/
5VI...wrX
2Yy...YPG
TTt...Vwj
EU0...SXI
jAV...3Wu
ytz...kvi
yL+...ZDf
r+t...AE=
-----END PUBLIC KEY-----
curl --request 'POST' --location \
"$UNSTRUCTURED_API_URL/users/retrieve" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY"
The output looks similar to the following. Line breaks and whitespace have been added to the output for readability:
{
    "pem_key": "-----BEGIN PUBLIC KEY-----\nMII...AE=\n-----END PUBLIC KEY-----\n",
    "tenant_id": "324...183",
    "user_id": "eef...9d0"
}
Copy only the contents of the pem_key field from the output. Ignore the tenant_id and user_id fields.
  1. In the method drop-down list, select POST.
  2. In the address box, enter the following URL:
    {{UNSTRUCTURED_API_URL}}/users/retrieve
    
  3. On the Headers tab, enter the following headers:
    • Key: unstructured-api-key, Value: {{UNSTRUCTURED_API_KEY}}
    • Key: accept, Value: application/json
  4. Click Send. The response body looks similar to the following:
    {
        "pem_key": "-----BEGIN PUBLIC KEY-----\nMII...AE=\n-----END PUBLIC KEY-----\n",
        "tenant_id": "324...183",
        "user_id": "eef...9d0"
    }
    
  5. Copy only the contents of the pem_key field from the response body. Ignore the tenant_id and user_id fields.

Step 2: Verify the public key’s authenticity

Step 3: Encrypt the secret

In this step, you use the PEM version of the public key for your Unstructured user account that you got from the previous step to encrypt the target plain-text secret. The result is a JSON-formatted object that contains keys named encrypted_aes_key, aes_iv, encrypted_value, and type. All of the keys’ values except the one for type are Base64-encoded. This step can be completed only by using Python on your local development machine.
The following code requires you to install the cryptography package into your Python virtual environment.The following envelope_encrypt function encrypts the target plain-text string by using envelope encryption. You must supply the function with the PEM version of the public key for your Unstructured user account that you got from the previous step, and the plain-text version of the secret that you want to encrypt.
from cryptography.hazmat.primitives import serialization, hashes
from cryptography.hazmat.primitives.asymmetric import padding, rsa
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
import os
import base64

def envelope_encrypt(public_key_pem: str, plaintext: str) -> dict:
    """
    Encrypts a string by using envelope encryption.
    
    Args:
        public_key_pem (str): The public key in PEM format.
        plaintext (str): The string to encrypt.

    Returns:
        dict: A dictionary with the encrypted AES key, iv, and ciphertext (all Base64-encoded).
    """

    # Load the public RSA key.
    public_key = serialization.load_pem_public_key(
        public_key_pem.encode("utf-8"),
        backend=default_backend()
    )

    # Generate a random AES key.
    aes_key = os.urandom(32)  # 256-bit AES key.

    # Generate a random IV.
    iv = os.urandom(16)

    # Encrypt by using AES-CFB.
    cipher = Cipher(
        algorithms.AES(aes_key),
        modes.CFB(iv),
    )
    encryptor = cipher.encryptor()
    ciphertext = encryptor.update(plaintext.encode("utf-8")) + encryptor.finalize()
    
    # Encrypt the AES key by using the RSA public key.
    encrypted_key = public_key.encrypt(
        aes_key,
        padding.OAEP(
            mgf=padding.MGF1(algorithm=hashes.SHA256()),
            algorithm=hashes.SHA256(),
            label=None
        )
    )

    # Return all encrypted components, Base64-encoded.
    return {
        "encrypted_aes_key": base64.b64encode(encrypted_key).decode("utf-8"),
        "aes_iv": base64.b64encode(iv).decode("utf-8"),
        "encrypted_value": base64.b64encode(ciphertext).decode("utf-8"),
        "type": "rsa_aes",
    }
You could call the preceding envelope_encrypt function with code similar to the following. This code gets the plain-text contents of the specified service account key file for a Google Cloud service account. The code then encrypts the plain-text contents by using the PEM version of the public key file for the user in the Unstructured account.
import json 

# Get the plain-text contents of the specified service account key file for 
# a Google Cloud service account.
# Alternatively, you could get the plain-text contents of the service account key file 
# by some other means, and then pass those contents as a string 
# directly to the envelope_encrypt function. 
google_drive_creds_json_file = "/Users/<username>/Downloads/<file-name>.json"

with open(google_drive_creds_json_file, "r") as f:
    google_json = json.load(f)
    secret_account_key = json.dumps(google_json)

# Encrypt the plain text by using the PEM version of the public key file for 
# the user in the Unstructured account.
encrypted_secret = envelope_encrypt(
    public_key_pem="""-----BEGIN PUBLIC KEY-----
MII...YTv/
5VI...wrX
2Yy...YPG
TTt...Vwj
EU0...SXI
jAV...3Wu
ytz...kvi
yL+...ZDf
r+t...AE=
-----END PUBLIC KEY-----""",
    plaintext=secret_account_key
)

print(json.dumps(encrypted_secret, indent=4))
The output looks similar to the following:
{
    "encrypted_aes_key": "x3+...9zD",
    "aes_iv": "k2N...g==",
    "encrypted_value": "gM1...A2m",
    "type": "rsa_aes"
}

Step 4: Register the encrypted secret

In this step, you call the Unstructured Workflow Endpoint again, this time to register the encrypted secret that you got from the previous step. The result is a JSON-formatted object that contains keys named id and type.
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import StoreSecretRequest

# This code assumes you want to use the default API URL for the 
# Unstructured Workflow Endpoint: https://platform.unstructuredapp.io/api/v1
# To use a different URL, set the UnstructuredClient constructor's 
# server_url parameter to the target URL.
with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.users.store_secret(
        request=StoreSecretRequest(
            encrypted_secret={
                "encrypted_aes_key": "x3+...9zD",
                "aes_iv": "k2N...g==",
                "encrypted_value": "gM1...A2m",
                "type": "rsa_aes"
            }
        )
    )

print(response.secret_reference.model_dump_json(indent=4))
The output looks similar to the following:
{
    "id": "09e...260",
    "type": "rsa_aes"
}
curl --request 'POST' --location \
"$UNSTRUCTURED_API_URL/users/secrets" \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--data \
'{
    "encrypted_aes_key": "x3+...9zD",
    "aes_iv": "k2N...g==",
    "encrypted_value": "gM1...A2m",
    "type": "rsa_aes"
}'
The output looks similar to the following. Line breaks and whitespace have been added to the output for readability:
{
    "id": "09e...260",
    "type": "rsa_aes"
}
  1. In the method drop-down list, select POST.
  2. In the address box, enter the following URL:
    {{UNSTRUCTURED_API_URL}}/users/secrets
    
  3. On the Headers tab, enter the following headers:
    • Key: unstructured-api-key, Value: {{UNSTRUCTURED_API_KEY}}
    • Key: accept, Value: application/json
    • Key: Content-Type, Value: application/json
  4. On the Body tab, select raw and JSON, and specify the encrypted secret, for example:
    {
        "encrypted_aes_key": "x3+...9zD",
        "aes_iv": "k2N...g==",
        "encrypted_value": "gM1...A2m",
        "type": "rsa_aes"
    }
    
  5. Click Send. The response body looks similar to the following:
    {
        "id": "09e...260",
        "type": "rsa_aes"
    }
    

Step 5: Use the registered secret’s reference ID

In this step, you use the registered secret’s ID and encryption type to specify the secret when you call the Unstructured Workflow Endpoint. This step shows how to specify the registered secret’s ID and encryption type when you create a new Google Drive source connector.
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateSourceRequest
from unstructured_client.models.shared import (
    CreateSourceConnector,
    SourceConnectorType,
    GoogleDriveSourceConnectorConfigInput
)

# This code assumes you want to use the default API URL for the 
# Unstructured Workflow Endpoint: https://platform.unstructuredapp.io/api/v1
# To use a different URL, set the UnstructuredClient constructor's 
# server_url parameter to the target URL.
with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
    response = client.sources.create_source(
        request=CreateSourceRequest(
            create_source_connector=CreateSourceConnector(
                name="<name>",
                type=SourceConnectorType.GOOGLE_DRIVE,
                config=GoogleDriveSourceConnectorConfigInput(
                    drive_id="1oK...bmf",
                    service_account_key={
                        "id": "09e...260",
                        "type": "rsa_aes"
                    }
                )
            )
        )
    )

    print(response.source_connector_information.model_dump_json(indent=4))
The output looks similar to the following:
{
    "config": {
        "drive_id": "1oK...bmf",
        "recursive": true,
        "service_account_key": "**********"
    },
    "created_at": "<date-time>",
    "id": "3c2...17e",
    "name": "<name>",
    "type": "google_drive",
    "updated_at": "<date-time>"
}
curl --request 'POST' --location \
"$UNSTRUCTURED_API_URL/sources" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'content-type: application/json' \
--data \
'{
    "name": "<name>",
    "type": "google_drive",
    "config": {
        "drive_id": ""1oK...bmf"",
        "service_account_key": {
            "id": "09e...260",
            "type": "rsa_aes"
        }
    }
}'
The output looks similar to the following:
{
    "config": {
        "drive_id": "1oK...bmf",
        "recursive": true,
        "service_account_key": "**********"
    },
    "created_at": "<date-time>",
    "id": "3c2...17e",
    "name": "<name>",
    "type": "google_drive",
    "updated_at": "<date-time>"
}
  1. In the method drop-down list, select POST.
  2. In the address box, enter the following URL:
    {{UNSTRUCTURED_API_URL}}/sources
    
  3. On the Headers tab, enter the following headers:
    • Key: unstructured-api-key, Value: {{UNSTRUCTURED_API_KEY}}
    • Key: accept, Value: application/json
    • Key: content-Type, Value: application/json
  4. On the Body tab, select raw and JSON, and specify the connector settings, for example:
    {
        "name": "<name>",
        "type": "google_drive",
        "config": {
            "drive_id": "1oK...bmf",
            "service_account_key": {
                "id": "09e...260",
                "type": "rsa_aes"
            }
        }
    }
    
  5. Click Send. The response body looks similar to the following:
    {
        "config": {
            "drive_id": "1oK...bmf",
            "recursive": true,
            "service_account_key": "**********"
        },
        "created_at": "<date-time>",
        "id": "3c2...17e",
        "name": "<name>",
        "type": "google_drive",
        "updated_at": "<date-time>"
    }
    
I