From Local to Global Sensemaking: First Impressions of Microsoft GraphRAG (MS GraphRAG)

7 minute read

Published:

TL;DR GraphRAG replaces vector search with a lightweight knowledge-graph index and a map-reduce summarization step. The result: LLMs can tackle global questions such as “What themes span this entire corpus?” while remaining fast and token-efficient. In head-to-head tests against GPT-4-powered vector RAG, GraphRAG won 72-83 % of comparisons on answer comprehensiveness and 62-82 % on diversity, while using up to 97 % fewer context tokens for some query modes.

Knowledge Graph
Figure 1: An LLM-generated knowledge graph built using GPT-4 Turbo and Microsoft GraphRAG

What is Naive (Vector) RAG?

Retrieval Augmented Generation (RAG) refers to a system where a user query is used to retireve most relevant information from external data sources. These relevant information will be provided to an LLM to use and generate the answer based on the user query. Essentially, Naive RAG follows a two-stage approach: Retrieval and Generation.

Naive RAG Workflow
Figure 2: Traditional RAG workflow showing retrieval and generation stages

Retrieval Stage consists of two phases:

  • Indexing Phase: Documents are split into chunks, converted into vector embeddings using an embedding model, and stored in a vector index. This creates a searchable knowledge base where semantically similar content clusters together in the vector space.
  • Search Phase: When a user asks a question, the query is embedded using the same model, and the system performs approximate nearest neighbor search to find the most semantically similar document chunks. These are then ranked by similarity score to produce the top-k results.

Generation Stage: The retrieved top-k results are combined with the user query in a prompt template and fed to the language model to generate the final answer.

This approach works well for local questions where the answer can be found in a few relevant chunks. However, it struggles with global reasoning tasks that require understanding connections and themes across the entire corpus.

Why Does Classic Naive RAG Break Down?

Naive RAG excels when answers exist in a handful of chunks that fit the model’s context window. Global “sensemaking” questions, however, require reasoning across the entire corpus rather than just the top-k nearest neighbours. Sensemaking tasks require reasoning over connections which can be among entities (persons, places) and relations between them. A query like “What overarching themes emerge across the entire dataset?” makes this clear.

To address this limitation, Microsoft proposed and implemented GraphRAG which has become one of the most popular approaches. In this article, we will dive into how GraphRAG works, explore its key innovations, and see why it is proving so effective for global reasoning tasks.

How Does MS GraphRAG work?

GraphRAG is a graph-based extension of Naive RAG designed for global sensemaking over large text corpora. It uses an LLM to first build a knowledge graph of entities and their relationships, then clusters related entities into hierarchical communities. Each community is summarized by the LLM, creating a structured overview of the corpus. At query time, GraphRAG applies a map-reduce strategy: community summaries are used to generate partial answers in parallel, which are then merged into a final, coherent response.

The very first part of this workflow focuses on building a knowledge graph in three different stages:

1. Chunking the Corpes

In this step, the raw documents are split into smaller text chunks. Chunk size is a key design choice. Larger chunks reduce the number of LLM calls but may miss fine-grained information, while smaller chunks improve recall but come at a higher processing cost.

2. Extracting Entities, Relationships, and Claims

The LLM processes each chunk to extract important entities, their relationships, and relevant claims. For instance, from a sentence about a tech acquisition, the model might extract entities like NeoChip and Quantum Systems, the relationship acquired, and the claim that the acquisition happened in 2016.

3. Constructing the Knowledge Graph

The extracted entities and relationships are aggregated across the entire corpus to build a knowledge graph. Entities become nodes, relationships become edges, and repeated mentions strengthen edge weights. Descriptions are summarized and duplicates are reconciled, resulting in a structured and interconnected representation of the dataset (documents).

4. Community Detection: Structuring the Graph

Once the knowledge graph is constructed, GraphRAG applies a hierarchical community detection algorithm, specifically Leiden clustering, to group related entities into tightly connected subgraphs. This step reveals the internal structure of the corpus by identifying clusters of entities that are semantically or contextually related.

The process is recursive: it first finds broad, high-level clusters (root communities), then drills down to identify finer-grained sub-communities within them. This hierarchical grouping is key for enabling divide-and-conquer summarization, where each community is summarized independently before being merged into global insights.

In the visualization below, you can see this structure in action:

GraphRAG Communities

Left (a): Root-level communities (Level 0) show the most general groupings across the corpus.

Right (b): Sub-communities (Level 1) reveal a more detailed breakdown within each root cluster.

5. Community Summarization: Turning Structure into Insight

With the graph now organized into a hierarchy of communities, GraphRAG moves on to generating summaries for each of these clusters. These act like mini-reports describing the entities, relationships, and key claims in each community. Importantly, these summaries serve the final query and offer an independent, human-readable overview of the corpus structure.

Summarization happens in two main layers:

  • Leaf-level communities (the most granular clusters) are summarized by prioritizing their most important elements: prominent nodes and edges. The LLM includes as many of these as possible within its context window, starting with the most connected nodes and relationships (i.e., the “hubs” of that subgraph).

  • Higher-level communities (larger clusters comprising sub-communities) are summarized bottom-up. The LLM processes them directly if all the detailed element summaries still fit into the context window. Otherwise, the system switches to using already-generated summaries of sub-communities, which are shorter and more abstract, to stay within the token limit.

6. Answering Queries: From Partial Insights to a Global Response

Once the community summaries are in place, GraphRAG can answer user queries through a map-reduce-style process that balances efficiency and depth. Because summaries exist at multiple levels of the hierarchy, the system can choose the right level of granularity depending on the type of question—broad, thematic queries may benefit from high-level summaries. In contrast, detailed queries might use more specific ones.

Here is how the process works:

Prepare the input: Community summaries are shuffled and grouped into token-sized chunks. This process prevents key information from clumping into a single block that the LLM might truncate or ignore.

  • Map stage: The LLM generates a partial answer to the user’s query for each chunk, along with a helpfulness score (0–100) indicating how relevant that chunk’s information is. Chunks with a score of 0 are discarded.

  • Reduce stage: The most helpful partial answers are then sorted by score and merged into a single context. The LLM uses this refined input to generate the final, global answer to the query.

Conclusion

GraphRAG represents a significant advancement over traditional vector-based RAG by combining knowledge graph construction with hierarchical summarization to support true global sensemaking. Unlike vector-only retrieval, GraphRAG can navigate the graph structure to uncover directly relevant content and indirectly related information by following meaningful relationships between entities. This process allows it to generate richer, more context-aware responses, especially in complex or exploratory tasks. By filtering and ranking retrieved elements based on user context and task, GraphRAG enhances both the relevance and precision of answers. Its structured design also improves explainability, making tracing the origins and logic behind each response easier. Altogether, GraphRAG delivers more comprehensive, accurate, and efficient answers at a fraction of the token cost, making it a compelling solution for working with large, interconnected datasets.