General
Quickstart
Get your API token
Please follow the steps on the Authentication page to make authenticated requests.
Create your first collection
A collection is a group that will hold different resources, like a website or a single page. You can create a collection using the web UI or using the API.
Here's a quick cURL to create a test
curl --request POST \
--url https://api.embedding.io/v0/collections \
--header 'Authorization: Bearer YOUR_API_KEY' \
--json '{
"name": "Paul Graham"
}'
Response
{
"id": "col_lPMjKLBRLZ4qVe",
"name": "Paul Graham"
}
Ingest content
Now that we have a collection, we can ingest some content.
For our example, I'll add the website https://paulgraham.com/
.
curl --request POST \
--url https://api.embedding.io/v0/collections/col_lPMjKLBRLZ4qVe/websites \
--header 'Authorization: Bearer YOUR_API_KEY' \
--json '{
"domains": [
"https://paulgraham.com/"
]
}'
Response
[
{
"id": "web_1234",
"domain": "paulgraham.com"
}
]
Websites need to be crawled, and before crawling it's best to add a few filters to avoid embedding all the pages.
Filters
Use the ID of the website in the response above. Here, we'll choose to ingest all the pages except this one:
- /rss.html
curl --request POST \
--url https://api.embedding.io/v0/websites/web_1234/filters \
--header 'Authorization: Bearer YOUR_API_KEY' \
--json '{
"type": "path",
"match": "not-equals",
"value": "/rss.html"
}'
Response
{
"id": "fil_5678",
"type": "path",
"match": "not-equals",
"value": "/rss.html"
}
You can find more info on the page how to use filters.
Crawl
All you need to do now is to crawl your website. You can specify an optional webhook_url
that will be called once the crawl is complete.
curl --request POST \
--url https://api.embedding.io/v0/websites/web_1234/crawls \
--header 'Authorization: Bearer YOUR_API_KEY' \
--json '{
"webhook_url": "https://www.example.com/webhook"
}'
Response
{
"id": "cra_1234",
"status": "in_progress",
"webhook_url": "https://www.example.com/webhook"
}
This will crawl your website and embed all the pages that pass your filters. When your crawl is done, usually after a few minutes, you'll be able to query your collection.
Query your collection
curl --request POST \
--url https://api.embedding.io/v0/query \
--header 'Authorization: Bearer YOUR_API_KEY' \
--json '{
"collection": "col_lPMjKLBRLZ4qVe",
"query": "What makes a good essay?"
}'
Response
[
{
"page": {
"id": "pag_8dVBOXzj5XJxG9",
"url": "https://paulgraham.com/useful.html",
"title": "How to Write Usefully",
"description": null,
"og_type": null,
"og_image": null,
"h1": null
},
"metadata": [],
"content": "February 2020 \nWhat should an essay be? Many people would say persuasive...",
"index": 1,
"score": 0.872881591
},
{
"page": {
"id": "pag_qxQKYLED16l0Np",
"url": "https://paulgraham.com/best.html",
"title": "The Best Essay",
"description": null,
"og_type": null,
"og_image": null,
"h1": null
},
"metadata": [],
"content": "I already aim for it. Breadth and novelty are the two things I'm always chasing...",
"index": 10,
"score": 0.868309259
}
// ...
]