Skip to content

Project files, scripts, configurations, and workflow publications for the Archives-Textract Test Project

Notifications You must be signed in to change notification settings

prys0000/archives-handwriting-text-extract-project

Repository files navigation

archives-handwriting-text-extraction

The objective of this project is to create versatile text extraction and cleaning tools available through local application or by Amazon Textract. This flexibility allows the tools to align with a specific repository or project requirements, as well as facilitate local file processing and customization.

Both local and AWS codes extract text from handwritten documents, performs text cleaning operations and saves the extracted and cleaned text to the existing metadata templates used by the repository.

Extracting text from handwritten documents and exporting it to metadata worksheets can significantly enhance the efficiency of processing archival collections. Here's how:

1. Time Efficiency:

  • Automated text extraction eliminates the need for manual transcription, saving a significant amount of time.

2. Bulk Processing:

  • Automation enables bulk processing, allowing the extraction of text from multiple documents simultaneously.

3. Efficient Review:

  • Archivists can quickly scan the extracted text for keywords, names, or dates to determine the document's significance without reading every page.

4. Cross-Collection Analysis:

  • Extracted text can be used for cross-collection analysis.
  • Researchers can analyze trends, topics, and themes across different collections, leading to deeper insights.

By integrating text extraction and metadata creation, archival processing becomes more streamlined, accessible, and conducive to meaningful research. Automation empowers archivists to manage and leverage archival content more effectively, ultimately enhancing the value and impact of the collection.

student contributors (graduate and undergraduate)

See acknowledgements for more information

communication

license

See LICENSE for more information.

About

Project files, scripts, configurations, and workflow publications for the Archives-Textract Test Project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages