
Inspiring Tech Leaders
Dave Roberts talks with tech leaders from across the industry, exploring their insights, sharing their experiences, and offering valuable advice to help guide the next generation of technology professionals. This podcast gives you practical leadership tips and the inspiration you need to grow and thrive in your own tech career.
Inspiring Tech Leaders
The Hidden Costs of Data Hoarding - What Every Tech Leader Needs to Know
In today's digital landscape, organisations are drowning in data while simultaneously being held accountable for how they manage it. The latest episode of the Inspiring Tech Leaders podcast tackles this critical challenge head-on, exploring the complexities of data retention, archiving, and purging.
Did you know the global datasphere is expected to reach 175 zettabytes this year? This exponential growth is forcing organisations to shift from a "keep everything forever" mentality to strategic data governance throughout the information lifecycle.
What struck me most was the hidden cost of data hoarding. Beyond the obvious storage expenses, excessive data creates security vulnerabilities, slows systems, complicates e-discovery processes, and can actually increase compliance risks. This podcast episode offers practical guidance on implementing effective archiving and purging strategies to mitigate these issues.
For those struggling with unstructured data (which typically accounts for 80-90% of all organisational data), the episode explores powerful tools like Microsoft Purview, Strac, BigID, and others that can help discover, classify, and manage this information at scale.
The future of data management clearly lies in AI-powered automation, evolving privacy regulations, cloud-based solutions, and sophisticated approaches to cross-border data governance. Organisations that master these elements will not only ensure compliance but also reduce costs and extract more value from their information assets.
This episode is essential listening for CIOs, compliance officers, data protection specialists, and anyone responsible for organisational data strategy.
What data management challenges is your organisation facing? I'd love to hear your thoughts in the comments.
Listen Again – How to Navigate the First 100 Days in a Technology Leadership role. I breakdown the essentials every tech leader needs to know for a successful start. This is your playbook to navigate a high-impact first 100 days as a tech leader. Tune in, take notes, and start leading with purpose and vision! https://www.buzzsprout.com/1702192/episodes/16075330
I’m truly honoured that the Inspiring Tech Leaders podcast is now reaching listeners in over 70 countries and 1,000+ cities worldwide. Thank you for your continued support! If you’d enjoyed the podcast, please leave a review and subscribe to ensure you're notified about future episodes. For further information visit - https://priceroberts.com
Welcome to the Inspiring Tech Leaders podcast, with me Dave Roberts. Today we are discussing a topic that affects every organisation regardless of size or industry; managing the ever-growing mountain of data that businesses generate, store, and are responsible for maintaining.
In today's digital landscape, data has become both an invaluable asset and a significant liability. Organisations are generating and collecting more data than ever before, be that from customer information and financial records to operational metrics and communications. According to recent estimates, the volume of data created globally is doubling approximately every two years, creating enormous challenges for organisations trying to manage it effectively.
But it is not just about storage capacity. Today, we are going to explore the complexities of data retention, archiving, and purging, the critical processes that help organisations maintain compliance, reduce costs, and mitigate risks.
Throughout this episode, I will share insights, regulatory guidance, and provide real-world examples to help you navigate this complex but essential aspect of modern business operations.
Before we get started, let's address the fundamental question; Why should organisations care about data retention, archiving, and purging?
First, there is the regulatory imperative. Many countries have increasingly stringent requirements around how long certain types of data must be kept and when it must be deleted. Non-compliance can result in significant fines, legal action, and reputational damage.
Second, there is the cost factor. Storing data indefinitely is expensive, not just in terms of storage infrastructure but also in terms of management overhead, security costs, and the resources required to search through vast data repositories when needed.
Third, there is the risk dimension. Every piece of data you retain represents a potential security risk. The longer you keep data, the more opportunities there are for breaches, leaks, or unauthorised access.
Finally, there is the operational efficiency angle. Organisations that implement effective data retention strategies can operate more efficiently, respond more quickly to information requests, and make better use of their valuable data assets.
So, whether you are a CIO, a compliance officer, a data protection specialist, or simply someone interested in how organisations manage their information assets, I hope you will find value in today's discussion.
Let's get started by looking at the current state of data growth and the challenges it presents for modern organisations.
In today's digital economy, organisations are experiencing an unprecedented explosion in data volume. Every click, transaction, communication, and interaction generates data that organisations must manage. According to recent industry reports, the global datasphere is expected to reach 175 zettabytes this year, that is 175 trillion gigabytes of data.
This exponential growth presents significant challenges for organisations of all sizes. Traditional approaches to data management, where companies simply expanded storage capacity to accommodate growing data volumes, are no longer sustainable from both cost and compliance perspectives.
Many organisations have defaulted to a keep everything forever approach, often driven by fears of deleting something important or concerns about regulatory compliance. However, this data hoarding mentality comes with significant hidden costs. While storage has become cheaper over time, the sheer volume of data being generated means storage costs can still be substantial, especially for high-performance or highly available storage systems.
More data requires more resources to manage, back up, and protect. Excessive data can slow down systems and applications, affecting productivity and user experience. Every piece of data you retain is a potential security vulnerability. The more data you have, the larger your attack surface.
When facing litigation or regulatory investigations, organisations must be able to quickly locate and produce relevant information. Excessive data makes this process more time-consuming, expensive, and error prone. As we will discuss later, many regulations now mandate specific retention periods and deletion requirements. Keeping data longer than necessary can actually increase compliance risks.
Poor data retention practices do not just create technical challenges, they can have real business impacts.
The UK and US have distinctly different approaches to data retention regulations. The UK operates under a centralised, principles-based framework governed primarily by the UK GDPR and Data Protection Act 2018, which emphasises storage limitation without specifying exact retention periods, while the Ministry of Defence has specific requirements for defence and nuclear information.
In contrast, the US employs a fragmented, rules-based approach with a complex patchwork of federal, state, and industry-specific regulations including SEC Rule 17a-4, Sarbanes-Oxley Act, HIPAA, and the Bank Secrecy Act, often with explicitly defined retention timeframes. Key differences include the UK's centralised enforcement through the Information Commissioner's Office versus the US's distributed enforcement across multiple agencies. The UK also has flexible retention periods versus the US's specific timeframes.
Organisations handling financial data face heavy regulation in both jurisdictions, while those managing nuclear and defence information must balance transparency with national security concerns. For entities operating across borders, best practices include mapping all applicable regulations and implementing policies that satisfy the most stringent requirements in each jurisdiction.
Given these challenges, organisations are increasingly shifting their focus from simply storing data to governing it throughout its lifecycle. This means implementing policies and processes that determine what data should be kept and for how long. How data should be classified and protected. When and how data should be archived. And when and how data should be permanently deleted or purged.
This shift requires a more strategic approach to data management, one that balances business needs, regulatory requirements, cost considerations, and risk management.
Archiving and purging are essential components of any comprehensive data management strategy. While they serve different purposes, the processes help organisations maintain control over their data landscape, reduce costs, and ensure compliance with regulatory requirements.
Archiving is the process of moving data that is no longer actively used but still needs to be retained, for legal, historical, or other business reasons, to a separate storage system optimised for long-term retention. Archived data remains accessible but is typically stored on lower-cost, higher-capacity media.
Purging, on the other hand, involves the permanent deletion of data that has reached the end of its required retention period and no longer serves a business purpose. Proper purging ensures that data cannot be recovered, even with specialised tools.
Several technologies and approaches can support effective archiving, these include Hierarchical Storage Management which automatically moves data between high-cost and low-cost storage media based on predefined policies. Frequently accessed data remains on high-performance storage, while rarely accessed data moves to less expensive media.
Content-Addressable Storage provides immutable storage for archived data, ensuring that once written, information cannot be modified or deleted until its retention period expires.
Cloud-Based archiving solutions offer scalable, cost-effective options for long-term data retention, with various tiers of storage based on access frequency requirements.
While tape archives, despite being an older technology, still offer cost advantages for very large volumes of data that rarely need to be accessed.
Effective archiving is not just about moving data to different storage, it's about ensuring that archived information remains discoverable and usable when needed. This requires capturing descriptive information about archived data to facilitate future searches and creating searchable indexes of archived content.
It is also important to categorise archived data based on content type, department, project, or other relevant attributes, while also clearly marking when archived data can be purged.
When litigation or regulation investigations are pending or anticipated, organisations must implement legal holds that suspend normal archiving and purging processes for relevant data. This requires quickly identifying the data subject to legal holds and ensuring that held data is not modified or deleted. You should maintain records of what data is on hold and why, and also the processes for releasing the hold when they are no longer needed.
Organisations face a fundamental tension between data protection principles that favour deletion, such as the right to be forgotten under GDPR and regulatory requirements that mandate retention. Navigating this tension requires that you document exactly how long each type of data should be kept and periodically assessing whether retained data still needs to be kept. It is important to ensure that purging decisions are consistent, documented, and legally defensible.
When it is time to purge data, organisations must ensure that deletion is complete and irreversible. Some methods for achieving this include cryptographic erasure, which involves encrypting data and then destroying the encryption keys, rendering the information unrecoverable.
There is also the method of multiple-pass overwriting, which involves writing patterns of data over existing information multiple times to prevent recovery. While there are physical destruction methods for media include degaussing and shredding.
It is critical that you document that data has been properly destroyed, especially when using third-party services, who should provide you with certificates of destruction for your records.
In today's complex IT environments, completely purging data can be challenging due to data duplication, where the same information may exist in multiple systems or backups. Data can also sometimes be stored in unauthorised or undocumented systems. In particular, the complexity of using cloud, where data may reside in various cloud services with different purging capabilities. And of course, data may persist in backup systems even after being purged from primary storage.
To manage the complexity of retention and purging at scale, organisations are increasingly turning to automated solutions that apply retention policies automatically based on data classification. These systems can also flag data for review when retention periods are about to expire and execute secure deletion according to predefined schedules.
Effective archiving and purging should be part of a broader data lifecycle management programme. Before you can properly archive or purge data, you need to know what you have and its value. This involves identifying all data repositories across the organisation and classifying data based on sensitivity, business value, and regulatory requirements. Mapping data flows to understand how information moves through the organisation is crucial.
Develop clear, documented retention policies that specify how long different types of data should be retained and when data should be moved from active systems to archives. The documented process should focus on when and how data should be purged and when there are exceptions for legal holds or special circumstances.
It is important to select and implement appropriate technologies for data discovery and classification, with both policy-based archiving and secure purging. The ongoing monitoring and reporting is critical to maintain compliance.
It is also crucial to ensure that all stakeholders understand the importance of proper data lifecycle management, their roles and responsibilities in the process, and the consequences of non-compliance.
Continuously improve your programme through regular audits of archiving and purging practices, feedback from stakeholders and updates to policies based on regulatory changes.
As we have discussed throughout this podcast, managing data retention, archiving, and purging is complex enough when dealing with structured data in databases and other organised systems. However, the challenge becomes significantly more difficult when we consider unstructured data, where information that does not fit neatly into predefined data models or organised fields.
Unstructured data typically accounts for 80-90% of all data within organisations and includes emails and attachments, word processing documents, spreadsheets, and presentations, chat and messaging conversations, images, audio, and video files, log files and sensor data, and much much more!
This data often contains sensitive information subject to regulatory requirements, yet it is scattered across network drives, cloud storage, email systems, collaboration platforms, and even individual devices. Without proper tools, organisations struggle to identify what unstructured data they have and where it's located. Not to mention how to go about determining what sensitive information it contains and how to apply appropriate retention policies.
There are tools that can help organisations locate, identify, and classify sensitive information within unstructured data sources. For example, Microsoft Purview Information Protection provides automated sensitive information discovery across Microsoft 365 environments, on-premises, and multi-cloud.
The platform can automatically identify over 200 types of sensitive data, including financial information, personal identifiable information, and health data. It also allows organisations to define custom sensitive information types specific to their industry or business.
Strac is a comprehensive platform for sensitive data discovery and classification across SaaS, cloud, and endpoint devices. It uses machine learning and AI to enhance accuracy and efficiency in data discovery processes, which is crucial when dealing with large volumes of unstructured data.
What makes Strac particularly valuable is its ability to identify sensitive data within context, reducing false positives that can plague other discovery tools. It also integrates robust data loss prevention capabilities to secure sensitive information once discovered.
BigID offers an AI-powered platform for discovering, cataloguing, and managing unstructured data. Its automated classification and policy-driven governance capabilities help organisations comply with regulations like GDPR and CCPA.
BigID's approach focuses on identity-aware data discovery, connecting personal information to its owner across structured and unstructured data sources. This is particularly valuable for addressing right to be forgotten requests and other data subject rights under privacy regulations.
Data Governance and Lifecycle Management tools help implement retention policies and manage data throughout its lifecycle.
Cohesity specialises in unstructured data management with features like global deduplication and robust security measures. It provides data visibility and accessibility while maintaining compliance with retention requirements.
What sets Cohesity apart is its approach to consolidating data management functions, including backup, recovery, archiving, and analytics, into a single platform. This helps eliminate the data silos that often complicate retention and archiving efforts.
Rubrik offers integrated tools for data discovery, protection, and management with a Zero Trust Security model. It provides air-gapped, immutable backups crucial for rapid data recovery and cybersecurity.
Rubrik's policy-based automation allows organisations to define retention policies once and apply them consistently across their data estate. Its ability to create immutable backups is particularly valuable for industries with strict compliance requirements, such as financial services and healthcare.
OneTrust DataDiscovery automates data discovery and classification to inform business decisions among privacy, security, and data governance teams. It helps maintain compliance with regulatory frameworks by providing a comprehensive view of data across the organisation.
The platform's integration with broader privacy and compliance functions makes it particularly valuable for organisations looking to connect data discovery with privacy impact assessments, data subject rights management, and vendor risk management.
Exterro specialises in data discovery for legal and compliance purposes. It helps organisations discover, classify, and assess data risk for privacy compliance and cybersecurity.
Its e-discovery capabilities are particularly valuable for organisations that face frequent litigation or regulatory investigations. The platform can quickly identify relevant information across unstructured data sources, apply legal holds, and manage the entire e-discovery process.
Digital Guardian offers extensive solutions for identifying, classifying, and protecting sensitive data with robust data loss prevention capabilities. It helps maintain compliance with industry-specific regulations by providing context-aware data classification and protection.
The platform's endpoint focus makes it particularly valuable for organisations with remote workforces, where sensitive data may reside on individual devices outside the traditional network perimeter.
Implementing data discovery tools requires careful planning and execution. Before implementing tools, map where unstructured data exists across the organisation. Identify all repositories and storage locations, document data flows between systems, understand access patterns and permissions, and prioritise high-risk or high-volume areas.
Rather than attempting to discover all unstructured data at once start with a pilot in one department or data repository. You can then refine processes based on initial findings and gradually expand the scope to cover the entire organisation.
Focus first on high-risk data categories, data discovery requires input from multiple stakeholders, including the IT teams for technical implementation, legal and compliance for regulatory requirements, business units for context and prioritisation, Security teams for risk assessment, and records management for retention policies.
Data discovery is not a one-time project. You should schedule regular discovery scans and update classification rules as new data types emerge. It is important to refine policies based on changing regulations and continuously improve accuracy and coverage.
If you are looking to improve your organisation's approach to data retention, archiving, and purging, here are some practical next steps to consider.
Before you can effectively manage your data, you need to know what you have and where it's located. Start by mapping your data landscape, focusing first on high-risk areas like personal information, financial records, and sensitive business data.
Based on your regulatory requirements and business needs, create a clear policy that specifies how long different types of data should be retained. Ensure this policy is documented, communicated to all stakeholders, and regularly reviewed.
Evaluate and implement tools that can help automate data discovery, classification, retention, archiving, and purging. Remember that technology alone is not enough, it must be supported by clear policies and processes.
Ensure that everyone in your organisation understands their role in data management. This includes not just IT and compliance teams but also business users who create and handle data daily.
Data management isn't a one-time project but an ongoing process. Schedule regular audits of your retention practices, and continuously refine your approach based on changing regulations, business needs, and technologies.
Looking ahead, several trends are likely to shape the future of data retention, archiving, and purging. AI and automation will play an increasingly important role in identifying sensitive data, applying retention policies, and making decisions about archiving and purging. This will help organisations manage growing data volumes more efficiently.
Privacy regulations will continue to evolve, with more jurisdictions implementing GDPR-like requirements for data minimisation and the right to be forgotten. Organisations will need to balance these requirements with other regulatory obligations to retain information.
Cloud-based solutions will become the norm for data management, offering scalability, cost-effectiveness, and advanced features. However, organisations will need to ensure these solutions meet their specific regulatory and security requirements.
Cross-border data management will grow more complex as different jurisdictions implement varying requirements. Organisations will need sophisticated approaches to navigate these differences while maintaining compliance.
Managing organisational data effectively is no longer optional, it is a business imperative. Organisations that implement robust approaches to data retention, archiving, and purging will not only ensure regulatory compliance but also reduce costs, mitigate risks, and extract more value from their information assets.
I hope this discussion has provided valuable insights and practical guidance for your own data management journey. As always, I encourage you to share your experiences, challenges, and successes with data retention. Your feedback helps shape future episodes of the Inspiring Tech Leaders podcast and ensures I continue to address the topics that matter most to you.
Well, that’s all for today. If you enjoyed this episode, don’t forget to subscribe, leave a review, and share it with your network. You can find more insights, show notes, and resources at www.inspiringtechleaders.com
Thanks again for listening, and until next time, stay curious, stay connected, and keep pushing the boundaries of what's possible in tech.