Quickstart

Important

The API and its documentation are still in progress and not stable yet.

Get your API token

Please follow the steps on the Authentication page to make authenticated requests.

Create your first collection

A collection is a group that will hold different resources, like a website or a single page. You can create a collection using the web UI or using the API.

Here's a quick cURL to create a test

curl --request POST \
    --url https://api.embedding.io/v0/collections \
    --header 'Authorization: Bearer YOUR_API_KEY' \
    --json '{
        "name": "Paul Graham"
    }'

Response

{
	"id": "col_lPMjKLBRLZ4qVe",
	"name": "Paul Graham"
}

Ingest content

Now that we have a collection, we can ingest some content.

For our example, I'll add the website https://paulgraham.com/.

curl --request POST \
    --url https://api.embedding.io/v0/collections/col_lPMjKLBRLZ4qVe/websites \
    --header 'Authorization: Bearer YOUR_API_KEY' \
    --json '{
    	"domains": [
     	   "https://paulgraham.com/"
    	]
    }'

Response

[
    {
       "id": "web_1234",
        "domain": "paulgraham.com"
    }
]

Websites need to be crawled, and before crawling it's best to add a few filters to avoid embedding all the pages.

Filters

Use the ID of the website in the response above. Here, we'll choose to ingest all the pages except this one:

/rss.html

curl --request POST \
    --url https://api.embedding.io/v0/websites/web_1234/filters \
    --header 'Authorization: Bearer YOUR_API_KEY' \
    --json '{
        "type": "path",
        "match": "not-equals",
        "value": "/rss.html"
    }'

Response

{
    "id": "fil_5678",
    "type": "path",
    "match": "not-equals",
    "value": "/rss.html"
}

You can find more info on the page how to use filters.

Crawl

All you need to do now is to crawl your website. You can specify an optional webhook_url that will be called once the crawl is complete.

curl --request POST \
    --url https://api.embedding.io/v0/websites/web_1234/crawls \
    --header 'Authorization: Bearer YOUR_API_KEY' \
    --json '{
        "webhook_url": "https://www.example.com/webhook"
    }'

Response

{
	"id": "cra_1234",
	"status": "in_progress",
	"webhook_url": "https://www.example.com/webhook"
}

This will crawl your website and embed all the pages that pass your filters. When your crawl is done, usually after a few minutes, you'll be able to query your collection.

Query your collection

curl --request POST \
    --url https://api.embedding.io/v0/query \
    --header 'Authorization: Bearer YOUR_API_KEY' \
    --json '{
		"collection": "col_lPMjKLBRLZ4qVe",
		"query": "What makes a good essay?"
	}'

Response

[
    {
        "page": {
            "id": "pag_8dVBOXzj5XJxG9",
            "url": "https://paulgraham.com/useful.html",
            "title": "How to Write Usefully",
            "description": null,
            "og_type": null,
            "og_image": null,
            "h1": null
        },
        "metadata": [],
        "content": "February 2020  \nWhat should an essay be? Many people would say persuasive...",
        "index": 1,
        "score": 0.872881591
    },
    {
        "page": {
            "id": "pag_qxQKYLED16l0Np",
            "url": "https://paulgraham.com/best.html",
            "title": "The Best Essay",
            "description": null,
            "og_type": null,
            "og_image": null,
            "h1": null
        },
        "metadata": [],
        "content": "I already aim for it. Breadth and novelty are the two things I'm always chasing...",
        "index": 10,
        "score": 0.868309259
    }
	// ...
]