Rio de Janeiro's municipal digital archive holds more than 4.2 million image files — and a growing internal audit suggests that roughly 18 percent of them are exact or near-exact duplicates, consuming server space, distorting search results, and undermining the city's push to modernise its public records infrastructure. The figure, circulating inside the Secretaria Municipal de Fazenda's data management division at the Cidade Nova administrative complex, has prompted an accelerated clean-up programme that officials say will run through the fourth quarter of 2026.
The timing matters. Rio de Janeiro is in the middle of a R$340 million digital governance overhaul announced under the Plano Estratégico 2025–2028, which promises to migrate sprawling paper-and-pixel bureaucracies from neighbourhood prefeituras across the city's 33 subprefeituras onto a unified cloud platform managed by the Instituto Pereira Passos, the city's urban data institute based in the Centro district. Duplicate images — often scanned forms, aerial photographs, and identification documents that were uploaded multiple times during system migrations — are now seen as a direct obstacle to that consolidation.
What the Numbers Actually Show
Storage is the bluntest measure of the problem. Each percentage point of duplicate files across the 4.2 million-image archive represents roughly 42,000 files. At an estimated average compressed size of 2.3 megabytes per file, that 18 percent duplication rate translates to approximately 174 terabytes of redundant data spread across servers split between the Centro data centre and a secondary facility in Jacarepaguá. Cloud storage at Brazilian government-contracted rates currently runs at roughly R$0.08 per gigabyte per month, meaning the city is paying an estimated R$167,000 per month to store data it already has elsewhere in the same system.
The Instituto Pereira Passos began flagging the duplicate-image problem in earnest during a February 2026 diagnostic review of the Armazém de Dados platform, which consolidates urban planning, public health, and civil registry imagery. That review identified three main sources of duplication: rushed batch uploads during the 2022 transition away from the legacy SIURB urban infrastructure system; scanner workflows at the Arquivo Geral da Cidade on Avenida Gomes Freire in Lapa that lacked automatic deduplication checks; and inter-departmental transfers in which the same aerial survey imagery from the Instituto Municipal de Urbanismo Pereira Passos was submitted separately by at least four different secretariats.
The Arquivo Geral, which holds physical and digital records dating to the colonial period, alone contributed an estimated 68,000 duplicate image files during a 14-month digitisation sprint between January 2022 and March 2023, according to internal diagnostics reviewed by this newspaper. That sprint was funded partly through a R$12 million federal grant under the Programa Nacional de Apoio à Pesquisa em Bibliotecas e Arquivos, administered by the Fundação Biblioteca Nacional.
The Clean-Up Plan and What Comes Next
City technicians at the Secretaria Municipal de Ciência e Tecnologia are now deploying perceptual hashing algorithms — software that generates a digital fingerprint for each image and cross-references it against the full library — to flag candidates for removal. The process is not automatic deletion. Each flagged batch goes to a human review queue staffed by archivists, because some near-identical images — sequential aerial photographs taken seconds apart, for instance — carry distinct evidentiary value for legal or planning disputes in neighbourhoods like Jacaré and Manguinhos, where land tenure conflicts are ongoing.
The programme has a hard deadline: December 31, 2026, set internally to align with the final migration phase of the Plano Estratégico. Technicians estimate that completing the deduplication will free between 140 and 190 terabytes of storage, reduce average query response times on the Armazém de Dados platform by an estimated 22 percent, and cut monthly storage costs by roughly R$140,000.
For residents and researchers who regularly request documents through Rio's Lei de Acesso à Informação portal — the city logged 94,312 information requests in 2025 — the practical payoff is faster responses and more reliable search results when trying to locate historical permits, property records, or urban planning maps. The deduplication work is unglamorous. But in a city spending hundreds of millions to make its data infrastructure work, 18 percent redundancy is a number no administration can comfortably ignore.