Pradyuman's Papershelf

📚 Curated Collection of Fascinating Research Papers and Articles

View on GitHub
4 July 2024

A Comprehensive Survey on Vector Database: Storage and Retrieval Technique, Challenge

by Pradyuman

1. Introduction

This paper attempts to consolidate all the information regarding vector databases including algorithms for solving ANN and NNN problem using hash-based, tree-based & graph-based approaches. It also mentioned what challenges are faced while developing vector databases. It also touched upon use cases of combining Large Language Models with Vector Database and vice versa, how they can help each other in generating value. At last it introduced Retrieval-Based LLM which is enhancement of LLMs and hot topic of research.

2. Paper Summary

3. Key Concepts

4. Learnings

Nearest Neighbour Search (NNN)

From this paper, I was introduced to this problem. I was familiar with basic search algorithms like linear search and binary search, but I realized how inefficient they would be for searching datasets with a very large number of data points, in the billions. There were several ways in which this problem could be solved, each with its own trade-offs. This paper discussed different algorithms in detail.

Approximate Nearest Neighbour Search (ANNN)

Although Exact Nearest Neighbor Search (NNN) can solve the problem, it comes at the cost of a higher memory footprint and even higher search times. To improve upon this, Approximate Nearest Neighbor Search (ANNS) algorithms were introduced. This paper discusses various approaches to solving the ANNS problem, such as tree-based, graph-based, hashing-based, and quantization-based methods, providing deeper insights into these algorithms.

Combining LLMs with Vector Database

Although vector databases (vector DBs) and large language models (LLMs) are individually powerful tools, combining them opens the door to a whole new level of possibilities. LLMs can leverage the storage capabilities provided by vector databases to deliver more efficient responses. Additionally, using LLMs on top of vector databases can enhance search efficiency by adding context to queries, thereby optimizing the performance of vector DBs. Numerous possibilities arise from this combination that can be further explored.

5. Conclusion

Research paper gave really great insight on vector databases, underlying algorithms for NNN problem. I would highly suggest this paper to someone who want to start in this field. This paper might open new doors for you. Additionally this paper also leaves you with a interesting topic to research about i.e. Retrieval-Based LLM. I would definetly continue reading about this further. Hoping you will too.

Download PDF

tags: data-engineering - AI