DH Monday: Supporting Curatorial Work with Machine Learning

Digital Public Library of American logoBy Audrey Altman, March 30, 2022.

Curatorial work with large archival collections is challenging.  In order to find artifacts related to a chosen topic, curators have to sift through thousands – sometimes millions – of materials and determine which are truly relevant.  Machine learning can help streamline curatorial workflows by analyzing complex datasets and making some of their underlying patterns legible.  Yet, computers do not understand the underlying meaning of the patterns they uncover, and the results are imperfect and sometimes nonsensical.  Curators can use their expertise to make sense of machine learning outputs, and utilize them within a larger decision-making process about which materials belong in a curated compilation, and which do not.

I recently co-developed an experimental machine learning tool for curators of large library or archival collections.  The tool uses market basket analysis to identify topically-related artifacts.  Pairing human expertise with machine learning tools allows curators to work more efficiently and make large, complex archival collections legible to the public.  Ultimately, the partnership between human and machine intelligence helped curators at the Digital Public Library of America uncover stories of people and communities that would have otherwise been lost or obscured.

The problem of curatorial overload

The motivation for this project originated in my experience assisting curators of the Black Women’s Suffrage Digital Collection.  Over several months, I worked with curators Shaneé Yvette Murrain and Kat Williams to explore the DPLA’s aggregation of over 40 million cultural heritage materials.  We identified artifacts related to the history of Black women’s participation in voting rights movements and included them in a curated compilation with its own discovery interface.

The curators and I developed an automated query to select materials with relevant keywords and phrases in their metadata records (brief descriptions that archivists give to each artifact).  Since the DPLA aggregation grows and changes on a regular basis, the query is executed every other week to get the latest materials and add them to the Black Women’s Suffrage collection.

Click here to continue reading “Supporting Curatorial Work with Machine Learning.”

Audrey Altman is a senior software engineer at DPLA.  In this post, she discusses an experimental machine learning tool, designed for curators of the Black Women’s Suffrage Digital Collection.


Discover more from the Material / Image Research Lab (MIRL)

Subscribe to get the latest posts sent to your email.