# AI Architecture Overview

Welcome to the architecture documentation for the Smart Search / AI Explorer. This section provides a high-level overview of how the system is structured and the technologies powering its intelligent search and retrieval capabilities.

***

### Key Components

<details>

<summary>🚀 Solution Database</summary>

Every solution in our database is vectorized, enabling semantic understanding and advanced search functionalities. This preprocessing ensures that solutions can be effectively retrieved based on context, relevance, and similarity.

* **Vectorization Model:** [`text-embedding-3-large`](https://openai.com/index/new-embedding-models-and-api-updates/) from [OpenAI](https://openai.com/)
* **Vector Database:** Powered by [Typesense](https://typesense.org/), optimized for hybrid search capabilities.

</details>

<details>

<summary>🔍 Search Engine</summary>

The Smart Search utilizes a hybrid approach, combining **keyword-based search** with **semantic search** to maximize retrieval effectiveness:

* **Keyword Search:** Traditional exact or partial matching for direct queries.
* **Semantic Search:** Contextual matching using vector embeddings.
* **Ranking Factors:**
  * Length of query
  * Weights assigned to keyword and semantic search results

This approach ensures accurate and relevant results for both simple and complex queries.

</details>

<details>

<summary>🤖 AI-Powered Agent</summary>

At the heart of the system is an adaptive **RAG (Retrieval-Augmented Generation)** agent that enhances interaction and search capabilities. The agent architecture includes:

**LLM Layer**

The agent leverages the [**`GPT-4o-mini`**](https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/) model for natural language understanding and response generation.

**Vector Store Integration**

The agent connects to the vector store to retrieve relevant context and solutions, ensuring precise and informative responses.

**Adaptive Toolset**

Unlike standard RAG implementations, the agent includes:

* A layer of intelligence to determine the best tools to use for each request.
* Integration with [**LangChain**](https://www.langchain.com/)**/**[**Langraph**](https://www.langchain.com/langgraph) for dynamic workflow execution.
* Deployment on the [**LangSmith Platform Cloud**](https://www.langchain.com/langsmith) for scalability and reliability.

</details>

<details>

<summary>📝 Summarization and Token Control</summary>

To minimize costs and reduce environmental impact, the agent incorporates:

* **Summarization:** Condenses retrieved information to optimize token usage before processing by the LLM.
* **Token Control:** Limits token usage intelligently by summarizing conversations and deleting irrelevant messages to ensure efficient operation.

This approach not only saves resources but also aligns with sustainability goals.

</details>

<details>

<summary>🔀 Agent Workflow</summary>

The agent's workflow follows the architecture below:

1. **Start:** The user input is analyzed.
2. **Agent Decision:** The agent determines the best action using tools or interacting with the vector store.
3. **Summarization:** If required, retrieved data is summarized to reduce token consumption.
4. **Token Filtering:** Irrelevant or redundant data is filtered and removed.
5. **Error Handling:** If issues arise (e.g., recursion limits), the agent invokes an error handler.
6. **Final Response:** A concise and informative response is generated.

</details>

***

### System Workflow

<figure><img src="/files/XewU4CYwQQdSKg4DQ1AR" alt=""><figcaption><p>Langraph architecture as displayed in Langraph Studio</p></figcaption></figure>

1. **Query Processing:**
   * User inputs are processed to identify query type, length, and context.
   * Query is sent to the hybrid search engine.
2. **Retrieval:**
   * The vector database provides semantic matches.
   * Keyword matches are ranked and combined with semantic results.
3. **Agent Interaction:**
   * The RAG agent processes retrieved data.
   * Summarization reduces token usage for efficient LLM processing.
   * Adaptive tools may be invoked as needed.
4. **Response Generation:**
   * `GPT-4o-mini` generates natural language responses enriched by retrieved solutions.

***

### Technical Stack

| Component           | Technology                      |
| ------------------- | ------------------------------- |
| Vectorization Model | OpenAI `text-embedding-3-large` |
| Vector Database     | Typesense                       |
| Search Engine       | Hybrid (Keyword + Semantic)     |
| AI Agent Framework  | LangChain / LangGraph           |
| Deployment Platform | LangSmith Platform Cloud        |
| LLM                 | `GPT-4o-mini`                   |
| Summarization       | LangGraph Summarization Tools   |
| Reranking           | Cohere Rerank-v3.5              |

For detailed implementation instructions, refer to the Technical key feature sections:

{% content-ref url="/pages/HJHIRJpZ4u740WJe8dYn" %}
[Smart Search](/resources/key-features/smart-search.md)
{% endcontent-ref %}

{% content-ref url="/pages/25yFlYDATVIgJ3uKKKAZ" %}
[AI Explorer](/resources/key-features/ai-explorer.md)
{% endcontent-ref %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://solutions-explorer.gitbook.io/resources/additional-resources/architecture.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
