Skip to main content
The Daily Rio de Janeiro

All of Rio de Janeiro, every day

News

Rio's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Sobering Story

A growing backlog of redundant visual files is costing municipal agencies time, money and storage capacity across the city's public records systems.

Share

By Rio de Janeiro News Desk · Published 4 July 2026, 3:47 PM

4 min read

Updated 4 h ago· 5 July 2026, 12:13 AM

How we reported this

This article was generated by AI from the linked public sources. The Daily Rio de Janeiro is independently owned and covers Rio de Janeiro news free from advertiser or sponsor influence. Read our editorial standards →

Rio's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Sobering Story
Photo: Photo by Victor Cayke on Pexels

Rio de Janeiro's municipal digital infrastructure holds an estimated 40 percent redundancy rate in its photographic archives — meaning nearly four in every ten images stored across public servers is a duplicate of something already catalogued elsewhere. That figure, drawn from an internal audit framework discussed at last year's Fórum Rio de Dados in November 2025, has prompted a quiet but urgent push inside the Secretaria Municipal de Fazenda to modernise how the city manages its visual records.

The problem matters now for a specific reason. The Prefeitura do Rio is mid-way through a R$380 million digital transformation programme launched in 2024 under the Plano de Transformação Digital Carioca. A significant chunk of that investment targets data infrastructure — but duplicated image files inflate storage costs, slow down retrieval systems and compromise the integrity of archives used by urban planners, journalists and civil servants alike. Getting the numbers under control before the programme's 2027 deadline is no longer optional.

Where the Bloat Is Worst

The heaviest concentrations of duplicated content sit inside two systems. The first is the photographic database maintained by the Instituto Pereira Passos, the city's official urban research and cartography agency, headquartered on Rua Heitor Beltrão in Tijuca. The institute's archive documents everything from street-level changes in Santa Teresa to infrastructure shifts along the Transoeste corridor in Campo Grande. Over years of incremental uploads from multiple city departments, the same images — often aerial shots of Barra da Tijuca development sites or flood-zone mapping imagery from the Baixada de Jacarepaguá — have been ingested repeatedly without automated de-duplication checks.

The second pressure point is the Armazém de Dados, Rio's open data platform, which crossed 1.2 million hosted files in early 2026. City technicians working on a reclassification project begun in March this year identified more than 180,000 image files flagged as probable duplicates using perceptual hashing tools — a technique that compares image fingerprints rather than file names. That is roughly 15 percent of the platform's total visual content sitting in limbo pending manual review or automated purging.

Storage costs are not abstract. Municipal cloud contracts — renewed annually with providers through the Empresa Municipal de Informática, known as Iplanrio, based in Centro — run at approximately R$0.09 per gigabyte per month for archival tiers. Duplicated image files alone are estimated to consume upward of 600 terabytes of that allocation. The arithmetic is unflattering: the city may be spending close to R$650,000 a year storing images it already has.

Why Fixing It Is Harder Than It Looks

Automated de-duplication sounds straightforward. In practice, public archives carry legal complications. Many images are timestamped public records, and deleting a file — even an apparent duplicate — requires sign-off from the city's Arquivo Geral, which operates under rules set by the 2011 Lei de Acesso à Informação. A compressed image and its original may look identical to an algorithm but carry different metadata chains that matter in legal or administrative contexts. That tension between efficiency and archival integrity has slowed the clean-up.

The Iplanrio team piloting the de-duplication project is working through a phased model. Phase one, targeting roughly 80,000 files, was scheduled for completion by June 2026 — that deadline has slipped by at least six weeks according to the programme's public project tracker. Phase two, which covers the Instituto Pereira Passos holdings, is not expected to begin before the fourth quarter of this year.

For residents and civil society groups that rely on the Armazém de Dados for neighbourhood research — particularly organisations like the Observatório das Favelas, based in Maré — the practical effect of the backlog is slower search results and occasional retrieval errors when duplicate files carry conflicting tags. Cleaning up the archive will not just save money; it will make the data more useful to the people it is supposed to serve. The Prefeitura's own timeline for a fully de-duplicated municipal image archive now points to mid-2027, aligned with the broader digital transformation deadline. Whether the phased rollout catches up with that target will depend on staffing decisions at Iplanrio expected to be announced before September.

You might also like

Editorial picks

How did this story land?

Spread the word

Share

Have your say

Loading comments…

Sources

About this article

Published by The Daily Rio de Janeiro

Covering news in Rio de Janeiro. This article was generated by AI from the linked sources and was not reviewed by a human editor before publishing. See our editorial standards.

Spread the word

Share

See something wrong? Suggest a correction.

Daily brief

Enjoyed this? Wake up to Rio de Janeiro news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Rio de Janeiro and accept our Privacy Policy. Unsubscribe anytime.

The Daily Network