Back Arrow
From the blog

Learnings from using Sitecore ADM

Let's try to understand how the ADM module works, its limitations and tactics for optimising its performance.

Anna Bastron

Sitecore MVP

If you worked with a Sitecore XP website that has been live for a long time, I am sure you are familiar with the xDB cleanup requirements. I usually recommend to agree xDB data retention and set up the clean up process from day 1, but it is not always possible. Some websites were developed on an older version of Sitecore that did not have the built-in functionality for data removal, or the volume of xDB data was underestimated, or maybe tracking was enabled later down the line without considering data retention.

This is a common issue and the exact reason why Sitecore Analytics Database Manager (ADM) module exists. It allows viewing xDB records statistics and removing collection data (raw analytical data collected by Sitecore XP) via Sitecore UI while keeping the data integrity.

However, when I tried to use ADM for the first time on Sitecore 9 and Sitecore 10 websites, I faced a few challenges, particularly when dealing with large volumes of old data. In this article, we’ll explore these challenges and look at optimisation strategies for efficient data cleanup.

How Sitecore ADM works

High-level overview of Sitecore ADM contact processing

One thing that is critical to understand is that any ADM cleanup process consists of two key phases: task generation and task processing.

Task generation

During this phase, the module retrieves all contacts from the xDB regardless of the date range specified by the user. It begins processing the most recent contacts first and systematically works backward through the database. The module reads additional data associated with each contact, such as interactions and facets, to determine whether a record meets the deletion criteria. If a contact qualifies for deletion, its ID is stored in the [Tasks] table of the [ADM.Tasks] database for later task processing.

Task processing

In the second phase, the module processes the tasks generated in the previous step, deleting all records marked for removal.

While this approach seems robust at first, it comes with a few significant limitations, particularly around the performance and scalability of the task generation phase.

Challenges

1. Inefficient SQL queries

One of the main challenges is the way how the ADM module works with xDB Shard databases. Instead of relying on SQL indexes to efficiently filter contacts by date, the module retrieves all contacts and iterates through them, starting with the most recent. This results in slower data retrieval times, especially when working with large datasets spanning several years.

For instance, even if you specify a short date range in 2022, the module will still need to process and check all contacts, including those from 2023-2024. This significantly increases the load on the shard databases, leading to higher data input/output operations and longer processing times.

2. Threading and timeout issues

By default, the ADM module operates with 4 threads and processes 1,000 contacts per batch. While these settings may be sufficient for smaller databases, you may want to tweak these settings if you have a larger database.

The module’s batching system divides work between threads at the start of the process, meaning that if there is an SQL timeout in one of the threads, the entire thread will be aborted and the cleanup of this batch will not be completed. Therefore, the same process will have to be restarted from scratch to pick up remaining DB records. This can be frustrating, particularly when processing large datasets that take several hours to be processed.

3. Single task limit

Another challenge is that only one cleanup process can run at a time. Attempting to create a new task while another is running will terminate the existing process, which makes it critical to monitor and manage the process carefully.

4. Limited pause functionality

The task generation phase cannot be paused, it is supported only for task processing. There was an idea to run the cleanup in batches outside of business hours for one of the websites with a lot of xDB data to remove, but it was impossible to do run task generation in batches because it cannot be paused and it had to be done in one go.

Optimisation tactics

Here are some tricks that helped me to reduce processing times and achieve the best performance for the cleanup in the past:

1. Read documentation

I know this sounds obvious, but if you plan to use ADM and have not looked at the documentation that comes with it, do it now! It is quite technical and can answer many questions you already have.

2. Upscale to Premium tier

If your server or database resources are under strain, especially Shards and ProcessingPools databases, you can try temporarily upscaling them for the cleanup. Premium Azure SKUs are optimised for high data I/O, which is critical when working with the ADM intensive data processing.

3. Adjust performance settings

There is some flexibility in tweaking the performance of the ADM module. You can increase multiple NumberOfThreads, RetrieveDataBatchSize and NumberOfConnectionRetries settings in the ADM configuration files (see section "6. Performance Tuning" in the ADM module documentation). This helps speed up task generation and processing but should be balanced against server and database available resources.

4. Measure and plan accordingly

Although tempting, splitting the cleanup process into smaller date ranges may not necessarily improve performance. The ADM module will still read all contacts every time, so this strategy may not be the most efficient for some cases. Instead, allow the process to finish at least one time and monitor its completion, note the time taken to better estimate and plan additional runs.

Also, if you plan to perform the cleanup in multiple runs, start with the recent date ranges because they will be accessed first during the task generation phase.

5. Consider direct SQL queries

If you have an older version of Sitecore, or if you struggle to run ADM process on your xDB Shards even after tweaking performance settings, an alternative approach could be used - running direct SQL queries to clean up data (you can find a few useful scripts here). However, this method requires understanding of SQL scripts and xDB tables structure, so do this only if you have backed up your Shard DBs and you are confident in your skills.

Conclusion

Sitecore xDB cleanup is not always simple and the ADM module can be useful tool for managing and cleaning up contact and interaction data. I hope this article helped you to understand how the module works, its limitations and tactics for optimising its performance.

It's easy to start working with us. Just fill the brief or call us.

Find out more
White Arrow
From the blog
Related articles

Your last migration to Xperience by Kentico

Dmitry Bastron

The more mature Xperience by Kentico product becomes, the more often I hear "How can we migrate there?”

Kentico

5 Key Software Architecture Principles for Starting Your Next Project

Andrey Stepanov

In this article, we will touch on where to start designing the architecture and how to make sure that you don’t have to redo it during the process.

Architecture
Software development

Assessing Algorithm Complexity in C#: Memory and Time Examples

Anton Vorotyncev

Today, we will talk about assessing algorithm complexity and clearly demonstrate how this complexity affects the performance of the code.

.NET

Top 8 B2B Client Service Trends to Watch in 2024

Tatiana Golovacheva

The development market today feels like a race - each lap is quicker, and one wrong move can cost you. In this race, excellent client service can either add extra points or lead to a loss dot to high competition.

Customer Service
Client Service

8 Non-Obvious Vulnerabilities in E-Commerce Projects Built with NextJS

Dmitry Bastron

Ensuring security during development is crucial, especially as online and e-commerce services become more complex. To mitigate risks, we train developers in web security basics and regularly perform third-party penetration testing before launch.

Next.js
Development

How personalisation works in Sitecore XM Cloud

Anna Bastron

In my previous article, I shared a comprehensive troubleshooting guide for Sitecore XM Cloud tracking and personalisation. This article visualises what happens behind the scenes when you enable personalisation and tracking in your Sitecore XM Cloud applications.

Sitecore

Server and client components in Next.js: when, how and why?

Sergei Pestov

All the text and examples in this article refer to Next.js 13.4 and newer versions, in which React Server Components have gained stable status and became the recommended approach for developing applications using Next.js.

Next.js

How to properly measure code speed in .NET

Anton Vorotyncev

Imagine you have a solution to a problem or a task, and now you need to evaluate the optimality of this solution from a performance perspective.

.NET

Formalizing API Workflow in .NET Microservices

Artyom Chernenko

Let's talk about how to organize the interaction of microservices in a large, long-lived product, both synchronously and asynchronously.

.NET

Hidden Aspects of TypeScript and How to Resolve Them

Dmitry Berdnikov

We suggest using a special editor to immediately check each example while reading the article. This editor is convenient because you can switch the TypeScript version in it.

TypeScript

Troubleshooting tracking and personalisation in Sitecore XM Cloud

Anna Gevel

One of the first things I tested in Sitecore XM Cloud was embedded tracking and personalisation capabilities. It has been really interesting to see what is available out-of-the-box, how much flexibility XM Cloud offers to marketing teams and what is required from developers to set it up.

Sitecore

Mastering advanced tracking with Kentico Xperience

Dmitry Bastron

We will take you on a journey through a real-life scenario of implementing advanced tracking and analytics using Kentico Xperience 13 DXP.

Kentico
Devtools

Why is Kentico of such significance to us?

Anastasia Medvedeva

Kentico stands as one of our principal development tools, we believe it would be fitting to address why we opt to work with Kentico and why we allocate substantial time to cultivating our experts in this DXP.

Kentico

Where to start learning Sitecore - An interview with Sitecore MVP Anna Gevel

Anna Gevel

As a software development company, we at Byteminds truly believe that learning and sharing knowledge is one of the best ways of growing technical expertise.

Sitecore

Sitecore replatforming and upgrades

Anastasia Medvedeva

Our expertise spans full-scale builds and support to upgrades and replatforming.

Sitecore

How we improved page load speed for Next.js ecommerce website by 50%

Sergei Pestov

How to stop declining of the performance indicators of your ecommerce website and perform optimising page load performance.

Next.js

Sitecore integration with Azure Active Directory B2C

Dmitry Bastron

We would like to share our experience of integrating Sitecore 9.3 with the Azure AD B2C (Azure Active Directory Business to Consumer) user management system.

Sitecore
Azure

Activity logging with Xperience by Kentico

Dmitry Bastron

We'll dive into practical implementation in your Xperience by Kentico project. We'll guide you through setting up a custom activity type and show you how to log visitor activities effectively.

Kentico

Interesting features of devtools for QA

Egor Yaroslavcev

Chrome DevTools serves as a developer console, offering an array of in-browser tools for constructing and debugging websites and applications.

Devtools
QA

Kentico replatforming and upgrades

Anastasia Medvedeva

Since 2015, we've been harnessing Kentico's capabilities well beyond its core CMS functions.

Kentico

Umbraco replatforming and upgrades

Anastasia Medvedeva

Our team boasts several developers experienced in working with Umbraco, specialising in development, upgrading, and replatforming from other CMS to Umbraco.

Umbraco

Sitecore Personalize: tips & tricks for decision models and programmable nodes

Anna Gevel

We've collected various findings around decision models and programmable nodes working with Sitecore Personalize.

Sitecore

Fixed Price, Time & Materials, and Retainer: How to Choose the Right Agreement for Your Project with Us

Andrey Stepanov

We will explain how these agreements differ from one another and what projects they are suitable for.

Customer success

Enterprise projects: what does a developer need to know?

Fedor Kiselev

Let's talk about what enterprise development is, what nuance enterprise projects may have, and which skills you need to acquire to successfully work within the .NET stack.

Development

Headless CMS. Identifying Ideal Use Cases and Speeding Up Time-to-Market

Andrey Stepanov

All you need to know about Headless CMS. We also share the knowledge about benefits of Headless CMS, its pros and cons.

Headless CMS

Dynamic URL routing with Kontent.ai

We'll consider the top-to-bottom approach for modeling content relationships, as it is more user-friendly for content editors working in the Kontent.ai admin interface.

Kontent Ai
This website uses cookies. View Privacy Policy.