Rio de Janeiro's municipal digital infrastructure holds an estimated 40 percent redundancy rate in its photographic archives — meaning nearly four in every ten images stored across public servers is a duplicate of something already catalogued elsewhere. That figure, drawn from an internal audit framework discussed at last year's Fórum Rio de Dados in November 2025, has prompted a quiet but urgent push inside the Secretaria Municipal de Fazenda to modernise how the city manages its visual records.
The problem matters now for a specific reason. The Prefeitura do Rio is mid-way through a R$380 million digital transformation programme launched in 2024 under the Plano de Transformação Digital Carioca. A significant chunk of that investment targets data infrastructure — but duplicated image files inflate storage costs, slow down retrieval systems and compromise the integrity of archives used by urban planners, journalists and civil servants alike. Getting the numbers under control before the programme's 2027 deadline is no longer optional.
Where the Bloat Is Worst
The heaviest concentrations of duplicated content sit inside two systems. The first is the photographic database maintained by the Instituto Pereira Passos, the city's official urban research and cartography agency, headquartered on Rua Heitor Beltrão in Tijuca. The institute's archive documents everything from street-level changes in Santa Teresa to infrastructure shifts along the Transoeste corridor in Campo Grande. Over years of incremental uploads from multiple city departments, the same images — often aerial shots of Barra da Tijuca development sites or flood-zone mapping imagery from the Baixada de Jacarepaguá — have been ingested repeatedly without automated de-duplication checks.
The second pressure point is the Armazém de Dados, Rio's open data platform, which crossed 1.2 million hosted files in early 2026. City technicians working on a reclassification project begun in March this year identified more than 180,000 image files flagged as probable duplicates using perceptual hashing tools — a technique that compares image fingerprints rather than file names. That is roughly 15 percent of the platform's total visual content sitting in limbo pending manual review or automated purging.
Storage costs are not abstract. Municipal cloud contracts — renewed annually with providers through the Empresa Municipal de Informática, known as Iplanrio, based in Centro — run at approximately R$0.09 per gigabyte per month for archival tiers. Duplicated image files alone are estimated to consume upward of 600 terabytes of that allocation. The arithmetic is unflattering: the city may be spending close to R$650,000 a year storing images it already has.
Why Fixing It Is Harder Than It Looks
Automated de-duplication sounds straightforward. In practice, public archives carry legal complications. Many images are timestamped public records, and deleting a file — even an apparent duplicate — requires sign-off from the city's Arquivo Geral, which operates under rules set by the 2011 Lei de Acesso à Informação. A compressed image and its original may look identical to an algorithm but carry different metadata chains that matter in legal or administrative contexts. That tension between efficiency and archival integrity has slowed the clean-up.
The Iplanrio team piloting the de-duplication project is working through a phased model. Phase one, targeting roughly 80,000 files, was scheduled for completion by June 2026 — that deadline has slipped by at least six weeks according to the programme's public project tracker. Phase two, which covers the Instituto Pereira Passos holdings, is not expected to begin before the fourth quarter of this year.
For residents and civil society groups that rely on the Armazém de Dados for neighbourhood research — particularly organisations like the Observatório das Favelas, based in Maré — the practical effect of the backlog is slower search results and occasional retrieval errors when duplicate files carry conflicting tags. Cleaning up the archive will not just save money; it will make the data more useful to the people it is supposed to serve. The Prefeitura's own timeline for a fully de-duplicated municipal image archive now points to mid-2027, aligned with the broader digital transformation deadline. Whether the phased rollout catches up with that target will depend on staffing decisions at Iplanrio expected to be announced before September.