Rio de Janeiro's Prefeitura confirmed earlier this year that its central digital document repository, managed through the Instituto Municipal de Urbanismo Pereira Passos on Rua Heitor Beltrão in Tijuca, had accumulated hundreds of thousands of duplicate image files — scanned maps, planning permits, infrastructure photographs and census documents that had been uploaded multiple times across at least four separate digitisation drives since 2009. The problem did not emerge overnight.
The redundancy issue matters now because the city is mid-way through a R$47 million urban data modernisation contract, approved by the Câmara Municipal in late 2024, that depends on a clean, indexed archive. Duplicate images inflate storage costs, slow retrieval times and, more critically, create version-control chaos when planners in neighbourhoods like Santa Teresa or Jacarepaguá pull permit records and cannot tell which scan is the authoritative copy.
How the Backlog Built Up
The roots stretch back to 2009, when the city's Arquivo Geral in the Centro district launched its first mass-scanning campaign for 19th-century cadastral maps of the port zone. That effort used a proprietary file-naming convention. Three years later, a separate initiative under the Secretaria Municipal de Habitação adopted a different metadata standard to digitise favela regularisation documents in complexes including Rocinha and Complexo do Alemão. Neither system talked to the other.
Between 2016 and 2019, two more rounds of scanning occurred — one tied to the post-Olympics audit of public works contracts in Barra da Tijuca, another driven by a federal transparency mandate that required municipalities above one million residents to publish infrastructure images online. Each round ingested files from the previous round as if they were new source material, creating layered duplicates. By the time a technical audit was commissioned in March 2025, preliminary counts placed the duplication rate for certain planning-permit image folders above 60 percent.
The Pereira Passos institute has publicly acknowledged the situation in budget documentation submitted to the city council. The institute's 2025 annual report, a public document, described the archive as containing overlapping datasets that required deduplication before the new urban data platform could be fully operational. No specific cleanup deadline was stated in that report.
The Deduplication Push and What Comes Next
The current remediation effort centres on a phased review. The first phase, covering scanned material related to the Porto Maravilha redevelopment zone along Avenida Rodrigues Alves, was scheduled for completion by June 2026. Technical staff are using hash-comparison tools to flag identical image files regardless of filename or upload date, then routing flagged items to archivists for manual confirmation before deletion.
The second phase will address favela urbanisation records, the largest single category of duplicate material. This is politically sensitive: these documents underpin land-titling processes in communities across the Zona Norte and Zona Oeste, and any error in the deduplication step could affect a household's legal standing over a property.
For residents and journalists trying to access public planning documents through the Carioca Digital portal, the practical advice is straightforward: when retrieving scanned permits or map files, check the upload metadata for the most recent version and cross-reference the document number against the original paper registry, which the Arquivo Geral on Rua Amoroso Lima in the Centro district maintains separately. Until the deduplication work is certified complete — a milestone the Prefeitura has not publicly dated — the digital archive should be treated as a working draft rather than a definitive record.
The broader lesson, visible in similar cleanup efforts undertaken by Bogotá's city government and Lisbon's Arquivo Municipal in recent years, is that digitisation without interoperability standards creates a different kind of administrative burden. Rio spent more than a decade scanning its past. It is now spending a significant portion of its current technology budget scanning the scans.