Skip to main content
The Daily Rio de Janeiro

All of Rio de Janeiro, every day

News

How Rio's City Archive Ended Up With Thousands of Duplicate Images — and What It's Doing About It

Decades of fragmented digitisation projects, competing municipal departments, and a 2019 server migration gone wrong left the Arquivo Geral da Cidade holding tens of thousands of redundant image files.

Share

By Rio de Janeiro News Desk · Published 4 July 2026, 3:35 PM

4 min read

Updated 7 h ago· 4 July 2026, 9:54 PM

How we reported this

This article was generated by AI from the linked public sources. The Daily Rio de Janeiro is independently owned and covers Rio de Janeiro news free from advertiser or sponsor influence. Read our editorial standards →

How Rio's City Archive Ended Up With Thousands of Duplicate Images — and What It's Doing About It
Photo: Photo by Daniel Maforte on Pexels

The Arquivo Geral da Cidade de Rio de Janeiro, headquartered on Rua Amoroso Lima in the Cidade Nova neighbourhood, is sitting on an estimated 40,000 duplicate digital image files accumulated over roughly 15 years of stop-start digitisation efforts. The problem did not arrive overnight. It is the documented residue of at least four separate scanning drives, each launched under a different municipal administration, each using different file-naming conventions, and none talking cleanly to the others.

Why does this matter right now? The Prefeitura do Rio has been pushing hard since 2024 to complete a unified open-data portal — the Dados Abertos Rio platform — and the image-duplication backlog is one of the concrete obstacles preventing the archive from delivering a clean, searchable visual record of the city to the public. Every redundant file consumes server space, muddies search results, and adds labour costs every time a cataloguer has to manually verify whether two nearly identical .tif files are truly the same scan or represent different prints of the same photograph.

A Trail of Competing Projects

The roots of the problem go back to at least 2009, when the then-Secretaria Municipal de Cultura launched its first large-scale effort to digitise the archive's photographic collection, which spans everything from 19th-century portraits taken in the Santa Teresa neighbourhood to aerial survey photographs of the Baixada Fluminense from the 1970s. That initial project used a proprietary metadata schema that was never fully documented.

A second, larger digitisation drive began in 2014 ahead of the World Cup and Olympics, when federal and municipal money briefly flowed toward cultural heritage projects. Scanning teams worked across multiple sites, including the Biblioteca Nacional on Avenida Rio Branco and partner institutions in Madureira, and delivered files in a different format — largely .jpg rather than archival .tif — often without cross-referencing what had already been done five years earlier.

The decisive complication came in August 2019, when the archive's primary server infrastructure was migrated to a new data centre managed by the Empresa Municipal de Informática, known as Iplanrio. According to internal technical documentation cited in a 2023 audit report by the Controladoria Geral do Município, the migration process created automatic backup copies of approximately 28,000 image folders before the old directory structure was decommissioned. Many of those backup copies were never deleted. The audit estimated the total redundant storage load at roughly 1.2 terabytes — modest by commercial standards, but significant for a municipal archive operating on a constrained IT budget.

What the Replacement Process Actually Involves

Replacing or merging duplicate images is not as simple as running a hash-matching algorithm and deleting files. Archivists at the Arquivo Geral have argued internally — and correctly — that two scans of the same physical photograph can carry different metadata, different resolution levels, and different conservation timestamps, meaning the lower-quality duplicate sometimes holds contextual information the higher-quality version lost. That argument has slowed automated culling proposals at least twice since 2021.

The current approach, approved by the archive's technical council in March 2026, is a hybrid: automated perceptual-hash comparison will flag probable duplicates, but a team of three trained archivists will review each flagged pair before any file is permanently removed or merged. The process is expected to run through the end of 2026, targeting completion before January 2027, when the Dados Abertos Rio portal is scheduled for its next major public release.

For researchers, journalists, and community groups in neighbourhoods like Lapa, Saúde, and the Zona Norte who rely on the archive's photographic record to support urban heritage claims and planning disputes, the practical advice is straightforward: file any formal image requests through the archive's official Protocolo Geral system now, noting if you need archival-quality .tif versions, before the deduplication process potentially alters file identifiers and disrupts existing reference numbers. The archive's public reading room on Rua Amoroso Lima is open Tuesday through Friday, 9 a.m. to 5 p.m.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Rio de Janeiro

Covering news in Rio de Janeiro. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Rio de Janeiro news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Rio de Janeiro and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network