Presentation Schedule


Presenter Registration Banner 5

Exploring Algorithmic Close Reading: Using Large Language Models as an Innovative Tool for Archival Research (94296)

Session Information: Archival Research
Session Chair: Xiaoshuang Jia

Wednesday, 14 May 2025 17:10
Session: Session 5
Room: Room 707 (7F)
Presentation Type: Oral Presentation

All presentation times are UTC + 9 (Asia/Tokyo)

This presentation introduces our novel approach to processing a collection of 3,000+ memoranda and related documents from post-World War II Japan. These materials, gathered from the National Archives in College Park, the National Diet Library in Tokyo, and the Gordon W. Prange Collection at the University of Maryland, detail negotiations between Allied Occupation staff, Japanese industry, and government officials regarding the alleviation of restrictions on domestic automobile production. While our project ultimately aims to analyze rhetorical strategies utilized in these negotiations, this talk focuses on our methodological approach.

We begin with our success using Google’s Document AI with customized templates for extracting text and simple metadata (sender, recipient, dates, etc.). This serves as a baseline as we transition toward Large Language Model (LLM)-based tools, first as a replacement for our text-extraction pipeline, then as a tool for “algorithmic close reading,” leveraging natural-language prompts to analyze each document individually.

Our presentation outlines the specific tools involved, including Document AI, Gemini (Google’s LLM), and custom Python scripts for passing documents to these systems for analysis. We examine pitfalls encountered and potential mitigation strategies. Key research questions include: (1) How effective are LLMs in streamlining optical character recognition of mixed-quality typewritten documents? and (2) Can LLMs simulate and perhaps improve upon traditional close reading through rapid, guided analysis using simple programming tools and natural-language prompts? Preliminary results suggest LLMs can be a valuable, transformative tool for archival research, enabling scholars to explore primary sources more efficiently and in greater depth.

Authors:
Lindsay Amthor Yotsukura, University of Maryland, United States
Sheila Zellner-Jenkins, University of Maryland, United States
Michael Wolk, University of Maryland, United States
Deeksha Ramakrishna, University of Maryland, United States
Brian Krznarich, University of Maryland, United States


About the Presenter(s)
Dr. Lindsay Yotsukura is an Associate Professor of Japanese language/linguistics at the University of Maryland, College Park. Her research utilizes Google’s DocAI and BigQuery tools to analyze correspondence from the Allied Occupation of Japan.

Brian Krznarich is a professional software developer with a background in computer science, Japanese, and linguistics. He moved to Japan in 2019 to continue studying Japanese and is interested in the intersection of language and technology.
https://www.linkedin.com/in/brian-krznarich

Connect on Linkedin
https://www.linkedin.com/in/lindsay-yotsukura-7a88409/

See this presentation on the full scheduleWednesday Schedule



Conference Comments & Feedback

Place a comment using your LinkedIn profile

Comments

Share on activity feed

Powered by WP LinkPress

Share this Presentation

Posted by James Alexander Gordon

Last updated: 2023-02-23 23:45:00