Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running Pipeline.from_dict on the same payload crashes #7961

Open
1 task
tstadel opened this issue Jul 1, 2024 · 0 comments · May be fixed by #8030
Open
1 task

Running Pipeline.from_dict on the same payload crashes #7961

tstadel opened this issue Jul 1, 2024 · 0 comments · May be fixed by #8030
Assignees
Labels
2.x Related to Haystack v2.0 P1 High priority, add to the next sprint type:bug Something isn't working

Comments

@tstadel
Copy link
Member

tstadel commented Jul 1, 2024

Describe the bug
When a pipeline_dict with nested instance (e.g. a document store) is passed twice to Pipeline.from_dict, a TypeError is raised.

Error message

{
	"name": "TypeError",
	"message": "argument of type 'InMemoryDocumentStore' is not iterable",
	"stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[52], line 1
----> 1 p = Pipeline.from_dict(pipeline_config)

File ~/.local/share/hatch/env/virtual/deepset-cloud-custom-nodes/1TRZ-Wjw/deepset-cloud-custom-nodes/lib/python3.11/site-packages/haystack/core/pipeline/base.py:181, in PipelineBase.from_dict(cls, data, callbacks, **kwargs)
    179         # Create a new one
    180         component_class = component.registry[component_data[\"type\"]]
--> 181         instance = component_from_dict(component_class, component_data, name, callbacks)
    182     pipe.add_component(name=name, instance=instance)
    184 for connection in data.get(\"connections\", []):

File ~/.local/share/hatch/env/virtual/deepset-cloud-custom-nodes/1TRZ-Wjw/deepset-cloud-custom-nodes/lib/python3.11/site-packages/haystack/core/serialization.py:117, in component_from_dict(cls, data, name, callbacks)
    114     return default_from_dict(cls, data)
    116 if callbacks is None or callbacks.component_pre_init is None:
--> 117     return do_from_dict()
    119 with _hook_component_init(component_pre_init_callback):
    120     return do_from_dict()

File ~/.local/share/hatch/env/virtual/deepset-cloud-custom-nodes/1TRZ-Wjw/deepset-cloud-custom-nodes/lib/python3.11/site-packages/haystack/core/serialization.py:112, in component_from_dict.<locals>.do_from_dict()
    110 def do_from_dict():
    111     if hasattr(cls, \"from_dict\"):
--> 112         return cls.from_dict(data)
    114     return default_from_dict(cls, data)

File ~/.local/share/hatch/env/virtual/deepset-cloud-custom-nodes/1TRZ-Wjw/deepset-cloud-custom-nodes/lib/python3.11/site-packages/haystack/components/writers/document_writer.py:79, in DocumentWriter.from_dict(cls, data)
     77 if \"document_store\" not in init_params:
     78     raise DeserializationError(\"Missing 'document_store' in serialization data\")
---> 79 if \"type\" not in init_params[\"document_store\"]:
     80     raise DeserializationError(\"Missing 'type' in document store's serialization data\")
     82 try:

TypeError: argument of type 'InMemoryDocumentStore' is not iterable"
}

Expected behavior

  • Pipeline.from_dict does not change payloads.
  • Pipeline.from_dict is idempotent.

Additional context
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce

pipeline_yaml = """
components:
    text_converter:
        type: haystack.components.converters.txt.TextFileToDocument
        init_parameters:
            encoding: utf-8

    writer:
        type: haystack.components.writers.document_writer.DocumentWriter
        init_parameters:
            document_store:
                type: haystack.document_stores.in_memory.document_store.InMemoryDocumentStore
                init_parameters: {}
            policy: OVERWRITE

connections: # Defines how the components are connected
    - sender: text_converter.documents
      receiver: writer.documents

max_loops_allowed: 100
"""
import yaml

pipeline_config = yaml.safe_load(pipeline_yaml)
p = Pipeline.from_dict(pipeline_config) # works

p = Pipeline.from_dict(pipeline_config) # crashes

FAQ Check

System:

  • OS:
  • GPU/CPU:
  • Haystack version (commit or version number):
  • DocumentStore:
  • Reader:
  • Retriever:
@ArzelaAscoIi ArzelaAscoIi changed the title Running Pipeline.from_dict on the dame payload crashes Running Pipeline.from_dict on the same payload crashes Jul 1, 2024
@shadeMe shadeMe added type:bug Something isn't working 2.x Related to Haystack v2.0 P1 High priority, add to the next sprint labels Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 P1 High priority, add to the next sprint type:bug Something isn't working
Projects
None yet
3 participants