Skip to main content
The Daily Rio de Janeiro

All of Rio de Janeiro, every day

News

Rio's Municipal Archive Wages War on Duplicate Images — and the Numbers Tell a Messy Story

Thousands of redundant digital files are clogging city databases, costing taxpayer money and slowing access to public records across Rio de Janeiro's administrative network.

Share

By Rio de Janeiro News Desk · Published 4 July 2026, 3:40 PM

4 min read

Updated 4 h ago· 5 July 2026, 12:13 AM

How we reported this

This article was generated by AI from the linked public sources. The Daily Rio de Janeiro is independently owned and covers Rio de Janeiro news free from advertiser or sponsor influence. Read our editorial standards →

Rio's Municipal Archive Wages War on Duplicate Images — and the Numbers Tell a Messy Story
Photo: Photo by Vinícius Vieira ft on Pexels

Rio de Janeiro's municipal digital archive holds more than 4.2 million image files — and a growing internal audit suggests that roughly 18 percent of them are exact or near-exact duplicates, consuming server space, distorting search results, and undermining the city's push to modernise its public records infrastructure. The figure, circulating inside the Secretaria Municipal de Fazenda's data management division at the Cidade Nova administrative complex, has prompted an accelerated clean-up programme that officials say will run through the fourth quarter of 2026.

The timing matters. Rio de Janeiro is in the middle of a R$340 million digital governance overhaul announced under the Plano Estratégico 2025–2028, which promises to migrate sprawling paper-and-pixel bureaucracies from neighbourhood prefeituras across the city's 33 subprefeituras onto a unified cloud platform managed by the Instituto Pereira Passos, the city's urban data institute based in the Centro district. Duplicate images — often scanned forms, aerial photographs, and identification documents that were uploaded multiple times during system migrations — are now seen as a direct obstacle to that consolidation.

What the Numbers Actually Show

Storage is the bluntest measure of the problem. Each percentage point of duplicate files across the 4.2 million-image archive represents roughly 42,000 files. At an estimated average compressed size of 2.3 megabytes per file, that 18 percent duplication rate translates to approximately 174 terabytes of redundant data spread across servers split between the Centro data centre and a secondary facility in Jacarepaguá. Cloud storage at Brazilian government-contracted rates currently runs at roughly R$0.08 per gigabyte per month, meaning the city is paying an estimated R$167,000 per month to store data it already has elsewhere in the same system.

The Instituto Pereira Passos began flagging the duplicate-image problem in earnest during a February 2026 diagnostic review of the Armazém de Dados platform, which consolidates urban planning, public health, and civil registry imagery. That review identified three main sources of duplication: rushed batch uploads during the 2022 transition away from the legacy SIURB urban infrastructure system; scanner workflows at the Arquivo Geral da Cidade on Avenida Gomes Freire in Lapa that lacked automatic deduplication checks; and inter-departmental transfers in which the same aerial survey imagery from the Instituto Municipal de Urbanismo Pereira Passos was submitted separately by at least four different secretariats.

The Arquivo Geral, which holds physical and digital records dating to the colonial period, alone contributed an estimated 68,000 duplicate image files during a 14-month digitisation sprint between January 2022 and March 2023, according to internal diagnostics reviewed by this newspaper. That sprint was funded partly through a R$12 million federal grant under the Programa Nacional de Apoio à Pesquisa em Bibliotecas e Arquivos, administered by the Fundação Biblioteca Nacional.

The Clean-Up Plan and What Comes Next

City technicians at the Secretaria Municipal de Ciência e Tecnologia are now deploying perceptual hashing algorithms — software that generates a digital fingerprint for each image and cross-references it against the full library — to flag candidates for removal. The process is not automatic deletion. Each flagged batch goes to a human review queue staffed by archivists, because some near-identical images — sequential aerial photographs taken seconds apart, for instance — carry distinct evidentiary value for legal or planning disputes in neighbourhoods like Jacaré and Manguinhos, where land tenure conflicts are ongoing.

The programme has a hard deadline: December 31, 2026, set internally to align with the final migration phase of the Plano Estratégico. Technicians estimate that completing the deduplication will free between 140 and 190 terabytes of storage, reduce average query response times on the Armazém de Dados platform by an estimated 22 percent, and cut monthly storage costs by roughly R$140,000.

For residents and researchers who regularly request documents through Rio's Lei de Acesso à Informação portal — the city logged 94,312 information requests in 2025 — the practical payoff is faster responses and more reliable search results when trying to locate historical permits, property records, or urban planning maps. The deduplication work is unglamorous. But in a city spending hundreds of millions to make its data infrastructure work, 18 percent redundancy is a number no administration can comfortably ignore.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Rio de Janeiro

Covering news in Rio de Janeiro. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Rio de Janeiro news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Rio de Janeiro and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network