Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-140] Proposal for data summarization using LLM #29495

Open
ved-kashyap-samsung opened this issue Jul 5, 2024 · 0 comments
Open

[SIP-140] Proposal for data summarization using LLM #29495

ved-kashyap-samsung opened this issue Jul 5, 2024 · 0 comments
Labels
design:proposal Design proposals sip Superset Improvement Proposal

Comments

@ved-kashyap-samsung
Copy link
Contributor

ved-kashyap-samsung commented Jul 5, 2024

[SIP-140] Proposal for data summarization using LLM

Motivation

Summarizing SQL returned data using Language Models (LLMs) adds value by:

  • Insight Extraction: LLMs can extract key insights from large datasets, providing users with concise summaries of the most relevant information.
  • Contextual Understanding: LLMs can contextualize data summaries based on the user's query, offering personalized insights tailored to their needs.
  • Automation: Automating the summarization process reduces the manual effort required to sift through vast amounts of data, increasing efficiency and productivity.
  • Consistency: LLMs ensure consistency in summarization by following predefined rules, reducing the risk of human error and bias.
  • Scalability: As datasets grow, LLMs can scale to handle larger volumes of data while still providing accurate and relevant summaries, ensuring the usability of the system over time.

Proposed Change

A sample screenshot for how the feature will be implemented using LLM.

summarization

New or Changed Public Interfaces

There should be option of choosing LLM ex. self-hosted (fine tuned for the data) or LLM as service (from openai, google bard).

We can create an abstraction layer for using these LLMs where in user will have to provide only configurable details for LLM through UI. Example : screenshot attached.

configure llm parameters modal

New dependencies

To be discussed

Migration Plan and Compatibility

To be discussed

Rejected Alternatives

NA

@ved-kashyap-samsung ved-kashyap-samsung added the sip Superset Improvement Proposal label Jul 5, 2024
@dosubot dosubot bot added the design:proposal Design proposals label Jul 5, 2024
@rusackas rusackas changed the title [SIP] Proposal for data summarization using LLM [SIP-140] Proposal for data summarization using LLM Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design:proposal Design proposals sip Superset Improvement Proposal
Projects
Development

No branches or pull requests

1 participant