Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #414

451222664 · 2024-07-07T16:24:49Z

This is my configuration:

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: lm-studio
  type: openai_chat # or azure_openai_chat
  model: bartowski/gemma-2-9b-it-GGUF/gemma-2-9b-it-Q6_K-Q8.gguf
  model_supports_json: true # recommended if this is available for your model.
  # max_tokens: 4000
  # request_timeout: 180.0
  api_base: http://localhost:1234/v1

parallelization:
  stagger: 0.3

async_mode: threaded # or asyncio

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: lm-studio
    type: openai_embedding # or azure_openai_embedding
    model: nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf
    api_base: http://localhost:1234/v1

This is my error log:

00:17:52,468 datashaper.workflow.workflow ERROR Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
    ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,469 graphrag.index.reporting.file_workflow_callbacks INFO Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key details=None
00:17:52,469 graphrag.index.run ERROR error running workflow create_base_entity_graph
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/run.py", line 323, in run_pipeline
    result = await workflow.run(context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 369, in run
    timing = await self._execute_verb(node, context, callbacks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/datashaper/workflow/workflow.py", line 410, in _execute_verb
    result = node.verb.func(**verb_args)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/graphrag/index/verbs/graph/clustering/cluster_graph.py", line 102, in cluster_graph
    output_df[[level_to, to]] = pd.DataFrame(
    ~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4299, in __setitem__
    self._setitem_array(key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/frame.py", line 4341, in _setitem_array
    check_key_length(self.columns, key, value)
  File "/opt/anaconda3/lib/python3.11/site-packages/pandas/core/indexers/utils.py", line 390, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
00:17:52,470 graphrag.index.reporting.file_workflow_callbacks INFO Error running pipeline! details=None

This is my console log:

🚀 Reading settings from ragtest/settings.yaml
/opt/anaconda3/lib/python3.11/site-packages/numpy/core/fromnumeric.py:59:
FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a
future version. Please use 'DataFrame.transpose' instead.
  return bound(*args, **kwds)
🚀 create_base_text_units
                                 id  ... n_tokens
0  4d58d18fc8bedcf601e27bb07cdc3f8e  ...      300
1  288d3e4ebc58510cc7153d89f5946a5f  ...      300
2  a13a2f2347995e03c804450b08354b12  ...      208
3  d53faf2c8abaa7cd58e253d514fe6ad3  ...        8

[4 rows x 5 columns]
🚀 create_base_extracted_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
🚀 create_summarized_entities
                                        entity_graph
0  <graphml xmlns="http://graphml.graphdrawing.or...
❌ create_base_entity_graph
None
⠴ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (1 filtered) ━ 100% … 0…
├── create_base_text_units
├── create_base_extracted_entities
├── create_summarized_entities
└── create_base_entity_graph
❌ Errors occurred during the pipeline run, see logs for more details.

The text was updated successfully, but these errors were encountered:

AlonsoGuevara · 2024-07-07T18:07:44Z

Hi!
Can you please check in your cache files or output files if the entity extraction was succesful? Most errors on the clustering step relate to faulty entity extractions, either by 0 extracted entities or by wrong responses from the ll..

451222664 · 2024-07-08T10:26:04Z

It means that there is something wrong with the result of LLM processing, right?

"<|COMPLETE|> 


Let me know if you'd like to try another example!  I'm ready when you are."

Nuclear6 · 2024-07-08T13:42:25Z

This error should be caused by your embedding or model not loading correctly. You can refer to my configuration modification.

AnandMoorthy · 2024-07-08T16:12:37Z

Hi @451222664

I am also getting same error!

Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.

AlonsoGuevara · 2024-07-09T21:50:35Z

Hi @451222664
By the response provided, yup, the LLM you're using is ignoring the format we are looking for in the output and it is being more "chatty".
I would suggest doing some prompt tuning to try to force the LLM into the format we need for parsing.

AnandMoorthy · 2024-07-12T18:04:48Z

Hi @451222664

I am also getting same error!

Pasted the logs below, feels like an issue with ollama. Please confirm you are also getting same logs.

It turns out ollama was not started properly, restarting the service fixed the issue.

dvdtoth mentioned this issue Jul 8, 2024

[Bug] "ValueError: Columns must be same length as key" - Entity extraction fails due to invalid format returned by API #443

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #414

Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #414

451222664 commented Jul 7, 2024

AlonsoGuevara commented Jul 7, 2024

451222664 commented Jul 8, 2024

Nuclear6 commented Jul 8, 2024

AnandMoorthy commented Jul 8, 2024

AlonsoGuevara commented Jul 9, 2024

AnandMoorthy commented Jul 12, 2024

Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #414

Error executing verb "cluster_graph" in create_base_entity_graph: Columns must be same length as key #414

Comments

451222664 commented Jul 7, 2024

AlonsoGuevara commented Jul 7, 2024

451222664 commented Jul 8, 2024

Nuclear6 commented Jul 8, 2024

AnandMoorthy commented Jul 8, 2024

AlonsoGuevara commented Jul 9, 2024

AnandMoorthy commented Jul 12, 2024