Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #414

Open
451222664 opened this issue Jul 7, 2024 · 6 comments

Comments

@451222664
Copy link

This is my configuration:

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: lm-studio
  type: openai_chat # or azure_openai_chat
  model: bartowski/gemma-2-9b-it-GGUF/gemma-2-9b-it-Q6_K-Q8.gguf
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:1234/v1

parallelization:
  stagger: 0.3

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: lm-studio
    type: openai_embedding # or azure_openai_embedding
    model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf
    api_base: http://localhost:1234/v1

This is my error log:

00:17:52,468 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
    ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,469 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
00:17:52,469 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/run.py", line 323, in run_pipeline
    result = await workflow.run(context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
    timing = await self._execute_verb(node, context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
    ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,470 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

This is my console log:

🚀 Reading settings from ragtest/settings.yaml
/opt/anaconda3/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59:
FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a
future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
🚀 create_base_text_units
                                 id  ... n_tokens
0  4d58d18fc8bedcf601e27bb07cdc3f8e  ...      300
1  288d3e4ebc58510cc7153d89f5946a5f  ...      300
2  a13a2f2347995e03c804450b08354b12  ...      208
3  d53faf2c8abaa7cd58e253d514fe6ad3  ...        8

[4 rows x 5 columns]
🚀 create_base_extracted_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
❌ create_base_entity_graph
None
⠴ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (1 filtered) ━ 100% … 0…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details.

@AlonsoGuevara
Copy link
Contributor

Hi!
Can you please check in your cache files or output files if the entity extraction was succesful? Most errors on the clustering step relate to faulty entity extractions, either by 0 extracted entities or by wrong responses from the ll..

@451222664
Copy link
Author

It means that there is something wrong with the result of LLM processing, right?

"<|COMPLETE|> 


Let me know if you'd like to try another example!  I'm ready when you are."

@Nuclear6
Copy link

Nuclear6 commented Jul 8, 2024

This error should be caused by your embedding or model not loading correctly. You can refer to my configuration modification.

image

image image

@AnandMoorthy
Copy link

Hi @451222664

I am also getting same error!

Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.

Screenshot 2024-07-08 214205

@AlonsoGuevara
Copy link
Contributor

Hi @451222664
By the response provided, yup, the LLM you're using is ignoring the format we are looking for in the output and it is being more "chatty".
I would suggest doing some prompt tuning to try to force the LLM into the format we need for parsing.

@AnandMoorthy
Copy link

Hi @451222664

I am also getting same error!

Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.

Screenshot 2024-07-08 214205

It turns out ollama was not started properly, restarting the service fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants