Large Language Models and Vector Databases for News Recommendations

 Large Language Models (LLMs) generated a global buzz in the machine learning community with recent releases of generative AI tools such as Chat-GPT, Bard, and others alike. One of the core ideas behind these solutions is to compute a numerical representation of unstructured data (such as texts and images) and find similarities between these representations.

However, taking all of these concepts into a production environment has its own set of machine learning engineering challenges:

  • How to generate these representations quickly?
  • How to store them in a proper database?
  • How to quickly compute similarities for production environments?

In this article, I introduce two open-source solutions that aim to solve these questions:

These tools are applied to NPR [2], a News Portal Recommendation dataset (openly available at Kaggle) which was built to support the academic community to develop recommendation algorithms. By the end of the articles, you’ll see how to:

All code for this article is made available on Github.









Since generating embeddings might be an expensive process, we can use a vector database to store these embeddings and execute queries based on diverse strategies.

There are several vector database software to achieve this task, but I’ll use Qdrant for this article, which is an open-source solution with APIs available for popular programming languages like PythonGo, and Typescript. For a better comparison between these vector databases, check this article [4].

Since generating embeddings might be an expensive process, we can use a vector database to store these embeddings and execute queries based on diverse strategies.

There are several vector database software to achieve this task, but I’ll use Qdrant for this article, which is an open-source solution with APIs available for popular programming languages like PythonGo, and Typescript. For a better comparison between these vector databases, check this article [4].

_______________________________________________________________________________

More info:

Website link: https://databasescientist.org/

Contact us:  contact@databasescientist.org

Nomination Link: https://databasescientist.org/award-nomination/?ecategory=Awards&rcategory=Awardee

_______________________________________________________________________________

social media:

Twitter: https://x.com/databasesc10061

Pinterest: https://in.pinterest.com/databasescientist/

Linked in: https://www.linkedin.com/in/databasescientist-database-440a12365/

You tube: https://www.youtube.com/@databasescientist


Comments

Popular posts from this blog

MIT Researchers Develop Generative AI Tool to Boost Database Searches

NIH autism database announcement raises concerns among researchers