Datalumo

Getting Started

Getting Started

Datalumo lets you upload your own content, search it by meaning, and embed AI-powered answers on your site. No machine learning setup required.

Create an account

Sign up at Datalumo to create your account. After registering you'll be taken to your dashboard where you can manage collections, entries, and integrations.

Collections

A collection is a container for related content. For example, you might create a "Support Articles" collection, a "Product FAQs" collection, or a "Knowledge Base" collection.

You can create collections from the dashboard or via the API.

Adding entries

Entries are the individual pieces of content inside a collection. Each entry has a title, text content, optional metadata, and source information.

Entries support two metadata fields: meta for filterable-only data (e.g. prices, categories) and searchable_meta for data that should also be included in the search index (e.g. SKUs, brand names).

File upload

From the dashboard, you can upload files to populate a collection. Supported formats:

  • CSV — each row becomes an entry
  • Excel (.xlsx) — each row becomes an entry
  • PDF — text is extracted and chunked automatically
  • JSON — array of objects, each becomes an entry
  • XML — elements are parsed into entries

When importing tabular data (CSV, Excel), you'll be asked to map columns to entry fields (title, text, meta, etc.).

Sitemap crawler

The sitemap crawler automatically imports pages from your website and keeps them in sync. You can point it at a sitemap URL or a homepage.

Sitemap URL - provide a sitemap.xml URL and the crawler discovers all pages listed in it, including nested sitemap indexes.

Homepage URL - provide your homepage and the crawler follows internal links up to 2 levels deep to discover pages.

Once pages are discovered, you can review them, select which ones to include, and start crawling. The crawler extracts the main content from each page (stripping navigation, footers, scripts, and other non-content elements), converts it to clean text, and creates an entry for each page.

URL filters

Use include and exclude patterns to control which pages are crawled. Patterns are matched as substrings against the full URL. For example, an exclude pattern of /blog/tag/ skips all tag archive pages.

Re-crawling

Crawlers can be set to re-crawl on a schedule:

  • Manual - re-crawl only when you trigger it
  • Daily - automatically re-crawl once per day
  • Weekly - automatically re-crawl once per week

When re-crawling, only pages with changed content are updated. Pages that return a 404 are automatically marked as removed and hidden from search.

Limits

  • The number of pages you can crawl depends on your plan (Free: 100, Pro: 1,000, Business: 5,000, Scale: unlimited)
  • The crawler respects robots.txt rules and skips disallowed pages
  • Pages are crawled at a rate of 2 requests per second per domain
  • Pages that fail to load (timeouts, server errors) are marked as failed and can be retried

Via the API

Push entries programmatically using the API. This is useful for syncing content from a CMS, database, or other system.

curl -X POST https://datalumo.app/api/v1/collections/{collection}/entries \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Refund policy",
    "raw_text": "Refunds are available for 30 days after purchase...",
    "meta": {"category": "policy"},
    "searchable_meta": {"tags": ["billing", "refunds"]},
    "source_type": "web",
    "source_id": "refund-policy-v2"
  }'

For bulk syncing, use the upsert or batch upsert endpoints to create or update entries by their source_type and source_id.

See the API Reference for full details on all available endpoints.

Searching

Once your entries are in a collection, Datalumo automatically chunks and embeds the text for semantic search. You can search by meaning — not just keywords.

curl "https://datalumo.app/api/v1/collections/{collection}/search?query=how+do+refunds+work" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Accept: application/json"

You can also generate AI summaries of matching results using the summarise endpoint, or have a full conversation using the chat endpoint.

Integrations

Integrations let you embed Datalumo functionality directly on your website. There are three types:

  • Chatbot — an AI chat widget that answers questions from your collection data
  • Search Box — a search interface with semantic results and optional AI summaries
  • Custom — a flexible integration for custom search implementations

Integrations can be created from the dashboard or via the API.

Embedding on your site

Once you've created an integration, you can embed it on your website using a script tag and the fluent builder API:

<script src="https://cdn.datalumo.app/embed.v1.js"></script>

<script>
Datalumo.chat('your-integration-id')
  .popup()
  .mount()
</script>

The embed script supports chatbots (popup, sidebar, inline) and search boxes (modal, inline) with customizable theming, colors, localization, and more.

See the Embedding documentation for full details on all available options.

For WordPress sites, see the WordPress Plugin documentation.