Skip to main content
The Daily Rio de Janeiro

All of Rio de Janeiro, every day

News

How Rio's Public Archives Ended Up Flooded With Duplicate Images — And What Came Next

A decade of fragmented digitisation projects left the city's visual heritage riddled with redundant files, ballooning storage costs and a cataloguing backlog that administrators are still working to clear.

Share

By Rio de Janeiro News Desk · Published 4 July 2026, 4:06 PM

4 min read

Updated 7 min ago· 5 July 2026, 5:05 AM

How we reported this

This article was generated by AI from the linked public sources. The Daily Rio de Janeiro is independently owned and covers Rio de Janeiro news free from advertiser or sponsor influence. Read our editorial standards →

How Rio's Public Archives Ended Up Flooded With Duplicate Images — And What Came Next
Photo: Photo by Ej Agumbay on Pexels

Rio de Janeiro's municipal archive holds roughly 4.2 million digitised photographs, maps and urban planning documents — but auditors working inside the Arquivo Geral da Cidade, on Avenida Gomes Freire in the Centro district, discovered in late 2024 that an estimated 30 percent of those files were exact or near-exact duplicates, the result of at least seven separate scanning drives conducted by different city agencies between 2011 and 2023 without a shared database standard.

The finding matters now because the Prefeitura de Rio is in the middle of a R$18 million digital infrastructure overhaul, part of the broader Rio Digital programme launched in March 2025, and administrators cannot finalise storage contracts or complete public search portals until the duplicate problem is resolved. Every redundant file consumes server space the city is paying for at a commercial cloud rate.

How Seven Scanning Projects Became One Very Expensive Mess

The story starts not with one bad decision but with the accumulated logic of competitive grant funding. Between 2011 and 2018, the Instituto Municipal de Urbanismo Pereira Passos — which oversees cartographic and photographic collections across the city — ran three separate digitisation campaigns, each funded under different federal cultural heritage programmes with their own metadata schemas and file-naming conventions. The Museu da Imagem e do Som, on Praça Rui Barbosa in Flamengo, ran two more campaigns in 2016 and 2020. The city's own Secretaria Municipal de Cultura commissioned a sixth scan of overlapping Carnaval and urban imagery in 2021. A seventh, tied to the commemoration of the city's 2022 founding anniversary, produced another round of high-resolution TIFFs from collections already partly digitised in 2016.

None of the seven projects checked what the others had already done. Each delivering agency had an incentive to report maximum volume — files scanned, not files unique — because funding disbursements were tied to output totals. The result was a server estate across three data centres, including a leased facility in the Barra da Tijuca technology corridor, holding tens of thousands of images of the same Getúlio Vargas-era street scenes, Maracanã construction photographs and Zona Sul coastal surveys filed under different names, different dates and incompatible resolution specs.

A 2024 internal review commissioned by the Subsecretaria de Patrimônio Cultural e Bens Imóveis put the wasted annual storage cost at approximately R$340,000 — money spent maintaining files that add no informational value. That figure does not include the cataloguing hours spent by archivists who processed the same image multiple times under different accession numbers.

The Push to Clean Up the Catalogue

Duplicate image replacement — the technical process of identifying redundant files, selecting the canonical version, updating database references and retiring the rest — became a formal municipal priority in October 2025 when the Arquivo Geral signed a cooperation agreement with the Pontifícia Universidade Católica do Rio de Janeiro to deploy perceptual hashing algorithms across the full collection. PUC-Rio's computer science department had developed similar tools for a private media client and adapted them for public-sector archival use.

Progress has been methodical rather than fast. By June 2026, the joint team had processed roughly 1.1 million of the 4.2 million files, retiring around 290,000 confirmed duplicates and flagging another 180,000 for human review because the hashing software returned similarity scores that fell below the automatic-deletion threshold. Archivists on Avenida Gomes Freire say the human-review queue is the real bottleneck, since each flagged image requires a trained eye to determine whether two near-identical photographs represent genuinely distinct historical moments or simply two scans of the same physical print.

The practical consequence for researchers and journalists using the public-facing Acervo Digital portal is that search results have already become more reliable: before deduplication began, a keyword search for Morro do Providência could return the same aerial photograph 14 times under different accession numbers. That figure is now typically two or three for previously processed collections.

The Prefeitura expects the bulk of the automated processing to conclude by December 2026, with human review of flagged files continuing into the first quarter of 2027. Once complete, the city plans to publish a unified metadata standard — the first in its history — to govern all future digitisation contracts, a condition that grant-funded projects will be required to meet before disbursements are approved.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Rio de Janeiro

Covering news in Rio de Janeiro. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Rio de Janeiro news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Rio de Janeiro and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network