Azure AI Search Python SDK — Installation, Authentication, and Index Management

10 min read
In this first part of our Azure AI Search Python SDK series, we install the azure-search-documents library, configure authentication, create and manage indexes programmatically, upload documents, and run your first full-text search queries — all from Python.

Introduction

Azure AI Search (formerly Azure Cognitive Search) exposes a rich REST API for index management, document ingestion, and querying. The azure-search-documents Python SDK wraps that API with a clean, type-annotated interface so you can build search-powered applications without writing raw HTTP requests.

In this first part, we cover the foundational workflow: installing the SDK, authenticating against your Azure Search resource, defining an index schema, uploading documents, and running full-text search queries. By the end, you will have a working Python script that creates an index, loads data, and returns search results.


Prerequisites

Requirement Detail
Python 3.8 or higher
Azure subscription portal.azure.com
Azure AI Search resource Free (F0) or Basic tier works for this guide
Search service endpoint https://<service-name>.search.windows.net
Admin API key or Entra ID For index and document management

Retrieve your endpoint and admin key from Azure Portal → Azure AI Search resource → Keys.


Installation

Install the azure-search-documents package from PyPI:

1
pip install azure-search-documents

For Microsoft Entra ID (Azure AD) authentication, also install the identity library:

1
pip install azure-identity

Verify the install:

1
2
import azure.search.documents
print(azure.search.documents.__version__)

The Three Core Clients

The SDK ships three clients, each scoped to a specific concern:

Client Import Purpose
SearchIndexClient azure.search.documents.indexes Create, update, delete, and list indexes
SearchIndexerClient azure.search.documents.indexes Manage indexers, data sources, and skillsets
SearchClient azure.search.documents Upload documents and run search queries

You will use all three in a real application. Index management uses SearchIndexClient; day-to-day querying uses SearchClient.

We can authenticate with Azure AI Search in different ways. Let us first look at the API key method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient

load_dotenv()

ENDPOINT = os.environ["AZURE_SEARCH_ENDPOINT"]   # https://<name>.search.windows.net
ADMIN_KEY = os.environ["AZURE_SEARCH_ADMIN_KEY"]
INDEX_NAME = "hotels-sample"

credential = AzureKeyCredential(ADMIN_KEY)

# Index management client
index_client = SearchIndexClient(endpoint=ENDPOINT, credential=credential)

# Document search client for a specific index
search_client = SearchClient(
    endpoint=ENDPOINT,
    index_name=INDEX_NAME,
    credential=credential,
)

This is good for quick demo apps and experimentation. For production scenarios, using Microsoft Entra ID is the recommended method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient

credential = DefaultAzureCredential()

index_client = SearchIndexClient(endpoint=ENDPOINT, credential=credential)
search_client = SearchClient(
    endpoint=ENDPOINT,
    index_name=INDEX_NAME,
    credential=credential,
)

Role assignment: The identity needs Search Index Data Contributor (documents) and Search Service Contributor (index management) roles on the Search resource.

.env setup

1
2
3
AZURE_SEARCH_ENDPOINT=https://<service-name>.search.windows.net
AZURE_SEARCH_ADMIN_KEY=your-admin-key-here
AZURE_SEARCH_QUERY_KEY=your-query-key-here

Use the admin key for index management and document upload. Use the query key (read-only) in front-end applications and APIs that only need to run searches.

Creating an index

An index schema specifies the fields that every document in the index must include. Each field has a name, a type, and a set of attributes that control its behaviour. The following tables show the field names and attributes.

Python class JSON type Use for
SimpleField Various IDs, flags, numeric data — not searchable by default
SearchableField Edm.String Full-text searchable text fields
ComplexField Object / Collection Nested objects
SearchField Any Full control — specify all attributes manually
Attribute Meaning
key=True Document unique identifier (exactly one required)
filterable=True Can be used in $filter expressions
sortable=True Can be used in $orderby
facetable=True Can be used in faceted navigation
searchable=True Full-text indexed
retrievable=True Returned in search results (default True)

To create an index, we define a list of fields and use the SearchIndex method.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SimpleField,
    SearchableField,
    SearchFieldDataType,
)

load_dotenv()

ENDPOINT = os.environ["AZURE_SEARCH_ENDPOINT"]
ADMIN_KEY = os.environ["AZURE_SEARCH_ADMIN_KEY"]
INDEX_NAME = "hotels-sample"

credential = AzureKeyCredential(ADMIN_KEY)
index_client = SearchIndexClient(endpoint=ENDPOINT, credential=credential)

fields = [
    SimpleField(
        name="hotelId",
        type=SearchFieldDataType.String,
        key=True,
        filterable=True,
    ),
    SearchableField(name="hotelName", type=SearchFieldDataType.String, sortable=True),
    SearchableField(name="description", type=SearchFieldDataType.String),
    SearchableField(name="category", type=SearchFieldDataType.String, filterable=True, facetable=True),
    SimpleField(name="rating", type=SearchFieldDataType.Double, filterable=True, sortable=True, facetable=True),
    SimpleField(name="lastRenovationDate", type=SearchFieldDataType.DateTimeOffset, filterable=True, sortable=True),
    SimpleField(name="parkingIncluded", type=SearchFieldDataType.Boolean, filterable=True, facetable=True),
]

index = SearchIndex(name=INDEX_NAME, fields=fields)

result = index_client.create_or_update_index(index)
print(f"Index '{result.name}' created/updated successfully.")

create_or_update_index is idempotent. It creates the index on first run and updates the schema on subsequent runs (subject to field update restrictions). The list_indexes() and delete_index() methods of the index client are used to list and delete indices.

1
2
3
4
5
6
7
# List all indexes in the service
for idx in index_client.list_indexes():
    print(idx.name)

# Delete an index (irreversible — also deletes all documents)
index_client.delete_index(INDEX_NAME)
print(f"Index '{INDEX_NAME}' deleted.")

Uploading Documents

SearchClient handles document operations: upload, merge, merge-or-upload, and delete.

Upload (insert or replace)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

load_dotenv()

ENDPOINT  = os.environ["AZURE_SEARCH_ENDPOINT"]
ADMIN_KEY = os.environ["AZURE_SEARCH_ADMIN_KEY"]
INDEX_NAME = "hotels-sample"

search_client = SearchClient(
    endpoint=ENDPOINT,
    index_name=INDEX_NAME,
    credential=AzureKeyCredential(ADMIN_KEY),
)

documents = [
    {
        "hotelId": "1",
        "hotelName": "Secret Point Motel",
        "description": "The hotel is ideally located on the main commercial artery of the city.",
        "category": "Boutique",
        "rating": 4.0,
        "lastRenovationDate": "1970-01-18T00:00:00Z",
        "parkingIncluded": False,
    },
    {
        "hotelId": "2",
        "hotelName": "Twin Dome Motel",
        "description": "The hotel is situated in a 19th century plaza with rooms that are spacious.",
        "category": "Budget",
        "rating": 3.6,
        "lastRenovationDate": "1979-02-18T00:00:00Z",
        "parkingIncluded": False,
    },
    {
        "hotelId": "3",
        "hotelName": "Triple Landscape Hotel",
        "description": "The Hotel stands out for its excellent location and the care given to its clientele.",
        "category": "Resort and Spa",
        "rating": 4.8,
        "lastRenovationDate": "2015-09-20T00:00:00Z",
        "parkingIncluded": True,
    },
]

result = search_client.upload_documents(documents=documents)

for r in result:
    print(f"Document {r.key}: {'succeeded' if r.succeeded else 'FAILED — ' + r.error_message}")

merge_documents updates only the fields you provide. Fields not included are left unchanged:

1
2
3
4
5
6
7
updates = [
    {"hotelId": "1", "rating": 4.5},  # only rating changes
]

result = search_client.merge_documents(documents=updates)
for r in result:
    print(f"Merge {r.key}: {'ok' if r.succeeded else r.error_message}")

The most common pattern for ETL pipelines is insert if the document does not exist, update if it does:

1
result = search_client.merge_or_upload_documents(documents=documents)

And, to delete documents,

1
2
to_delete = [{"hotelId": "2"}]
result = search_client.delete_documents(documents=to_delete)

Batching: Each upload_documents call can include up to 1,000 documents in one batch. For large datasets, split your data into chunks and loop.


Running Search Queries

SearchClient.search() executes a full-text search query. The method returns an iterator of result documents.

1
2
3
4
results = search_client.search(search_text="motel")

for result in results:
    print(f"{result['hotelId']}{result['hotelName']} (rating: {result['rating']})")

By default all retrievable fields are returned. The @search.score metadata field is included automatically.

Selecting fields

Limit which fields are returned to reduce payload size:

1
2
3
4
5
6
7
results = search_client.search(
    search_text="motel",
    select=["hotelId", "hotelName", "rating"],
)

for result in results:
    print(result)

Accessing the search score

1
2
3
4
5
6
7
results = search_client.search(
    search_text="spacious rooms",
    select=["hotelId", "hotelName"],
)

for result in results:
    print(f"Score: {result['@search.score']:.4f}{result['hotelName']}")

Filtering results

filter accepts OData expressions:

1
2
3
4
5
6
7
8
9
results = search_client.search(
    search_text="*",           # wildcard — return everything
    filter="rating ge 4.0",
    select=["hotelId", "hotelName", "rating"],
    order_by=["rating desc"],
)

for result in results:
    print(f"{result['hotelName']}{result['rating']}")

Common OData operators:

Operator Meaning Example
eq equals category eq 'Boutique'
ne not equals parkingIncluded ne true
gt / ge greater than / or equal rating ge 4.0
lt / le less than / or equal rating lt 3.0
and / or / not logical rating ge 4.0 and parkingIncluded eq true

Sorting

1
2
3
4
5
results = search_client.search(
    search_text="hotel",
    order_by=["rating desc", "hotelName asc"],
    select=["hotelId", "hotelName", "rating"],
)

Pagination

top limits results per page; skip offsets the start:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Page 1 — first 5 results
results = search_client.search(
    search_text="hotel",
    top=5,
    skip=0,
    include_total_count=True,
)

print(f"Total matching documents: {results.get_count()}")
for result in results:
    print(result["hotelName"])

# Page 2 — next 5 results
results = search_client.search(search_text="hotel", top=5, skip=5)

Highlighting matched terms

Return snippets with matched terms wrapped in <em> tags:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
results = search_client.search(
    search_text="excellent location",
    highlight_fields="description",
    select=["hotelId", "hotelName"],
)

for result in results:
    highlights = result.get("@search.highlights", {}).get("description", [])
    print(f"{result['hotelName']}")
    for snippet in highlights:
        print(f"  → {snippet}")

Getting a single document by key

1
2
doc = search_client.get_document(key="1")
print(doc["hotelName"])

Document count

1
2
count = search_client.get_document_count()
print(f"Documents in index: {count}")

Autocomplete and Suggestions

Autocomplete and suggestions require a suggester defined on the index. Add one when creating your index:

1
2
3
4
5
6
7
8
9
from azure.search.documents.indexes.models import SearchSuggester

index = SearchIndex(
    name=INDEX_NAME,
    fields=fields,
    suggesters=[
        SearchSuggester(name="sg", source_fields=["hotelName", "description"])
    ],
)

Autocomplete

1
2
3
4
5
6
7
autocomplete_results = search_client.autocomplete(
    search_text="spa",
    suggester_name="sg",
)

for result in autocomplete_results:
    print(result["text"])

Suggestions (full document context)

1
2
3
4
5
6
7
8
suggestion_results = search_client.suggest(
    search_text="mot",
    suggester_name="sg",
    select=["hotelId", "hotelName"],
)

for result in suggestion_results:
    print(f"{result['@search.text']}{result['hotelName']}")

Let us now put all the knowledge gained in this article together to create a complete example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# azure_search_basics.py
import os
from dotenv import load_dotenv
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex,
    SimpleField,
    SearchableField,
    SearchFieldDataType,
    SearchSuggester,
)

load_dotenv()

ENDPOINT   = os.environ["AZURE_SEARCH_ENDPOINT"]
ADMIN_KEY  = os.environ["AZURE_SEARCH_ADMIN_KEY"]
INDEX_NAME = "hotels-quickstart"

credential   = AzureKeyCredential(ADMIN_KEY)
index_client = SearchIndexClient(endpoint=ENDPOINT, credential=credential)
search_client = SearchClient(endpoint=ENDPOINT, index_name=INDEX_NAME, credential=credential)


# 1. Create index
def create_index():
    fields = [
        SimpleField(name="hotelId", type=SearchFieldDataType.String, key=True, filterable=True),
        SearchableField(name="hotelName", type=SearchFieldDataType.String, sortable=True),
        SearchableField(name="description", type=SearchFieldDataType.String),
        SearchableField(name="category", type=SearchFieldDataType.String, filterable=True, facetable=True),
        SimpleField(name="rating", type=SearchFieldDataType.Double, filterable=True, sortable=True, facetable=True),
        SimpleField(name="parkingIncluded", type=SearchFieldDataType.Boolean, filterable=True),
    ]
    index = SearchIndex(
        name=INDEX_NAME,
        fields=fields,
        suggesters=[SearchSuggester(name="sg", source_fields=["hotelName"])],
    )
    result = index_client.create_or_update_index(index)
    print(f"Index '{result.name}' ready.")


# 2. Upload documents
def upload_documents():
    docs = [
        {"hotelId": "1", "hotelName": "Secret Point Motel", "description": "On the main commercial artery.", "category": "Boutique", "rating": 4.0, "parkingIncluded": False},
        {"hotelId": "2", "hotelName": "Twin Dome Motel", "description": "Spacious rooms in a 19th century plaza.", "category": "Budget", "rating": 3.6, "parkingIncluded": False},
        {"hotelId": "3", "hotelName": "Triple Landscape Hotel", "description": "Excellent location and care for clientele.", "category": "Resort and Spa", "rating": 4.8, "parkingIncluded": True},
        {"hotelId": "4", "hotelName": "Sublime Cliff Hotel", "description": "Sublime Cliff Hotel is located in the heart of the historic city centre.", "category": "Boutique", "rating": 4.6, "parkingIncluded": True},
    ]
    result = search_client.upload_documents(documents=docs)
    for r in result:
        status = "ok" if r.succeeded else r.error_message
        print(f"  Upload {r.key}: {status}")


# 3. Search
def run_searches():
    print("\n--- Full-text: 'motel' ---")
    for r in search_client.search("motel", select=["hotelId", "hotelName", "rating"]):
        print(f"  {r['hotelId']} {r['hotelName']} ({r['rating']})")

    print("\n--- Filter: rating >= 4.5 ---")
    for r in search_client.search("*", filter="rating ge 4.5", select=["hotelName", "rating"], order_by=["rating desc"]):
        print(f"  {r['hotelName']}{r['rating']}")

    print("\n--- Autocomplete: 'sub' ---")
    for r in search_client.autocomplete("sub", "sg"):
        print(f"  {r['text']}")


if __name__ == "__main__":
    create_index()
    upload_documents()

    import time
    time.sleep(2)   # brief pause for indexing to propagate

    run_searches()

Indexing lag: Documents uploaded via the push API are not instantly queryable. In practice, they are available within a second or two, but add a short time.sleep() in scripts that upload and immediately query.

In this part, we covered:

  • Installing azure-search-documents and understanding the three core clients.
  • Two authentication options: API key and Microsoft Entra ID with DefaultAzureCredential.
  • Defining an index schema with SimpleField, SearchableField, and field attributes.
  • Creating, updating, listing, and deleting indexes with SearchIndexClient.
  • Uploading documents with upload, merge, merge-or-upload, and delete operations.
  • Running full-text searches with SearchClient.search() — selecting fields, filtering, sorting, paginating, and highlighting.
  • Adding a suggester and using autocomplete and suggestions.

In the next part on using the Python SDK, we move into advanced query capabilities: faceted navigation, scoring profiles, semantic ranking, vector search, and hybrid search that combines keyword and vector retrieval.

Share this article

Comments