Under the Hood

Structuring a Documentary Archive with AI

This documentary project, The Human Foundation, is drawn from an archive of thousands of photographs taken in Doha, Qatar, between 2012 and 2015. While the photographs, historical context, and original observations belong entirely to the photographer, an AI assistant (Perplexity) was utilized in 2026 to help articulate, edit, and sequence these memories into a cohesive journalistic photo essay.

Initially, the AI was used to generate detailed, standalone descriptions for over 30 candidate photographs. However, as the descriptions accumulated, it became clear that a rigid editorial methodology was required to transform a raw archive into a flowing narrative. This page documents that process.

1. Arriving at the Three-Chapter Structure

As the initial batch of photos was processed, it became evident that the images naturally fell into three distinct geographic and thematic categories, representing the full lifecycle of the migrant experience in Doha. This realization led to a "three-act structure":

Chapter I: The Shadow of West Bay focused on the active physical labor, the massive scale of the megaprojects, and the rigid, color-coded hierarchies of the construction sites.
Chapter II: The Erasure of Msheireb shifted the lens to the hidden living conditions, documenting the severe housing decay, displacement, and the encroachment of luxury development on the slums.
Chapter III: The Respite of Al Ghanim concluded the narrative by exploring the workers' sole day of rest, focusing on their reclaimed identity, community building, and micro-economies.

2. The "Macro / Micro" Captioning Methodology

During the initial drafting phase, a significant problem emerged: repetition. Because each photo description was written to stand alone, the AI repeatedly included the same historical background (e.g., explaining the kafala system, the 10-hour shifts, or the fact that Friday was the only day of rest) in almost every caption.

To solve this, we implemented a "Macro / Micro" structural approach for each chapter:

The "Macro" Context (Chapter Introductions): All the recurring historical, political, and sociological background was extracted from the individual photo descriptions and synthesized into a single, punchy introductory paragraph at the start of each chapter. This explained the phenomenon once so it did not have to be repeated.
The "Micro" Details (Streamlined Captions): Freed from the burden of explaining the overarching history, the individual photo captions were aggressively streamlined. Each photo was assigned a distinct Title and a specific Focus (e.g., "The digital lifeline" or "The vertical dichotomy of power"). The resulting captions focused purely on the unique visual evidence present in the frame—such as the color of a hard hat, a specific gesture, or the layout of an improvised flea market.

3. The Filtering and Curation Process

The most difficult phase of the project was editing down a large volume of high-quality photos. The photographer provided 34 fully described candidate photos. To prevent viewer fatigue and narrative repetition, the total was strictly capped at 22 photos.

To aid in the selection process, the photographer introduced an internal "Impact Rating" system (scaled 1 to 10) for each photo. The AI then grouped the photos by their "Focus" and selected the highest-rated image from each group, rejecting the rest. Here is how the final selection was made:

Chapter I: West Bay (12 candidates → 9 selected, 3 rejected)

We rejected three photos (1337-01, 1539-02, and 1540-04). Photo 1337-01 was discarded because it was a slightly less impactful, zoomed-in version of a highly-rated shot (1337-02) showing workers resting against a reflective glass facade. Photos 1539-02 and 1540-04 were discarded because they were visual duplicates of the 17:00 exit bottleneck; keeping them would have stalled the pacing of the chapter's conclusion.

Chapter II: Msheireb (7 candidates → 4 selected, 3 rejected)

We rejected three photos (1325-03, 1328-19, and 1383-08). All three of these images featured close-ups of crumbling walls covered in handwritten signs, flyers, or eviction notices. While historically interesting, they were visually redundant when placed next to each other. We retained the four most visually distinct images: a wide architectural juxtaposition, a ruined interior, a humanizing detail (the "QATAR ♥ I" door), and a portrait of displaced workers.

Chapter III: Al Ghanim (15 candidates → 9 selected, 6 rejected)

This chapter required the heaviest editing. We rejected six photos (1016-22, 1016-23, 1039-16, 1219-21, 1039-12, and 1039-07). Many of these were wide-angle shots of the massive crowds on the grass. We eliminated the lower-rated wide shots to avoid repeating the exact same visual information. Notably, photo 1039-12 (despite a high rating of 8) was discarded because it was virtually the exact same scene as the ultimate establishing shot (1039-14), just at a slightly different focal length. By clearing out the duplicates, we made room for highly specific micro-narratives, such as the improvised flea market and men sharing a mobile phone screen.