{"id":2406789,"date":"2023-12-01T19:58:00","date_gmt":"2023-12-02T00:58:00","guid":{"rendered":"https:\/\/platoaistream.net\/plato-data\/building-a-rag-pipeline-for-semi-structured-data-with-langchain\/"},"modified":"2023-12-01T19:58:00","modified_gmt":"2023-12-02T00:58:00","slug":"building-a-rag-pipeline-for-semi-structured-data-with-langchain","status":"publish","type":"station","link":"https:\/\/platoaistream.net\/plato-data\/building-a-rag-pipeline-for-semi-structured-data-with-langchain\/","title":{"rendered":"Building A RAG Pipeline for Semi-structured Data with Langchain"},"content":{"rendered":"
Retrieval Augmented Generation has been here for a while. Many tools and applications are being built around this concept, like vector stores, retrieval frameworks, and LLMs, making it convenient to work with custom documents, especially Semi-structured Data with Langchain. Working with long, dense texts has never been so easy and fun. The conventional RAG<\/a> works well with unstructured text-heavy files like DOC, PDFs, etc. However, this approach does not sit well with semi-structured data, such as embedded tables in PDFs.<\/p>\n While working with semi-structured data, there are usually two concerns.<\/p>\n So, in this article, we will build a Retrieval generation pipeline for semi-structured data with Langchain to address these two concerns with semistructured data.<\/p>\n\n
Learning Objectives<\/h4>\n
\n