[SIP-140] Proposal for data summarization using LLM #29495

ved-kashyap-samsung · 2024-07-05T00:38:41Z

[SIP-140] Proposal for data summarization using LLM

Motivation

Summarizing SQL returned data using Language Models (LLMs) adds value by:

Insight Extraction: LLMs can extract key insights from large datasets, providing users with concise summaries of the most relevant information.
Contextual Understanding: LLMs can contextualize data summaries based on the user's query, offering personalized insights tailored to their needs.
Automation: Automating the summarization process reduces the manual effort required to sift through vast amounts of data, increasing efficiency and productivity.
Consistency: LLMs ensure consistency in summarization by following predefined rules, reducing the risk of human error and bias.
Scalability: As datasets grow, LLMs can scale to handle larger volumes of data while still providing accurate and relevant summaries, ensuring the usability of the system over time.

Proposed Change

A sample screenshot for how the feature will be implemented using LLM.

New or Changed Public Interfaces

There should be option of choosing LLM ex. self-hosted (fine tuned for the data) or LLM as service (from openai, google bard).

We can create an abstraction layer for using these LLMs where in user will have to provide only configurable details for LLM through UI. Example : screenshot attached.

New dependencies

To be discussed

Migration Plan and Compatibility

To be discussed

Rejected Alternatives

NA

ved-kashyap-samsung added the sip Superset Improvement Proposal label Jul 5, 2024

dosubot bot added the design:proposal Design proposals label Jul 5, 2024

rusackas changed the title ~~[SIP] Proposal for data summarization using LLM~~ [SIP-140] Proposal for data summarization using LLM Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SIP-140] Proposal for data summarization using LLM #29495

[SIP-140] Proposal for data summarization using LLM #29495

ved-kashyap-samsung commented Jul 5, 2024 •

edited by rusackas

Loading

[SIP-140] Proposal for data summarization using LLM #29495

[SIP-140] Proposal for data summarization using LLM #29495

Comments

ved-kashyap-samsung commented Jul 5, 2024 • edited by rusackas Loading