Implementing a Basic Data Quality Program in Salesforce
[ data ]

TL;DR – Implementing a basic Data Quality Program for Salesforce is straightforward, and setting up a monitoring solution in Salesforce is relatively easy and can provide big benefits.

Why Talk About Data Quality?

Maintaining relevant, high-quality data is necessary to achieve successful adoption of Salesforce, and to realize the potential value of the application.

Salesforce success depends on creating a “flywheel effect” or “virtuous cycle” whereby users find valuable, useful information and insights in the application, which help them be more productive and effective. This motivates users to put more information into the application and to expand its usage to more scenarios, which in turn makes it more useful and valuable to users.

Poor quality data acts to reverse this flywheel – increasing the difficulty of using the application, and leading users to mistrust its information. This incents them to use the application less, ensuring it is less valuable and less useful, which in turn reduces usage further. The end result can be an abandoned application that has failed to deliver value to the business. In addition, poor data quality makes all of the supporting processes surrounding Salesforce (reporting, integration, business intelligence) more complicated, difficult and expensive

One Way to Think About Data Quality

Data quality is a broad term and it can be difficult for organizations to catalog and quantify their data issues in a way that is concrete and actionable. It is helpful to think about the quality of data within a framework of six dimensions laid out by the DAMA UK organizations in its paper, The Six Primary Dimensions for Data Quality Assessment. The six dimensions are:

Completeness

This is the state of having data for a given record. Do we have middle names on our customer records? For how many records do we have email addresses?

Uniqueness

What data should be unique, and is it? Do we have multiple customer records that represent the same real-life customer?

Timeliness

How soon after an event described by data occurs, is the data available? If a lead visits your website and fills out a form, how long does it take for that data to be in your marketing database and available?

Validity

Does all of the the data conform to the correct definition and format? What is the correct format to store telephone number information? Is the country where New York is located represented as “United States” “US” “U.S.A.” or “America”?

Accuracy

Does our data accurately represent reality? Our contact record says that Jane Doe’s title is “CEO” – is it? Note that accuracy decays over time.

Consistency

Is our data represented the same way everywhere? Are the rules used to classify a Lead as “suspect” in our Salesforce system the same as in our marketing automation system and our BI platform?

Defining Salesforce Data Quality for Your Organization

Now that we have a way to talk about data quality and to describe our data quality issues, we can start the assessment of data quality for your Salesforce data. One way to do this is to engage your stakeholders to create a definition of quality for each key object in Salesforce. You will end up with a list of “tests” that each record must pass to be considered “high quality”.

Let’s look at an example, for the Lead object. Your team might define a high-quality Lead as passing the following tests:

  1. First Name must be complete
  2. Last Name must be complete
  3. Company must be complete
  4. Address must be complete
  5. Email or Phone must be complete
  6. Title must be complete
  7. Email address, if provided, must be unique among all Leads
  8. Mobile Phone, if provided, must be unique among all Leads
  9. Phone numbers must be valid numbers, formatted according to the E164 standard

So for the Lead, the team has defined nine tests, across three of the data quality dimensions, to assess whether a Lead is high-quality. This process can be repeated to define quality for each Salesforce object.

A Basic Data Quality Monitoring Solution for Salesforce

Once the definition of data quality is complete, it is a fairly straightforward exercise to implement a monitoring solution in Salesforce, based loosely on Salesforce’s own published solution, “Data Quality Analysis Dashboards” on the AppExchange. Begin by creating a custom field of type percent for each “test”, on the Salesforce objects. A value of “100” indicates a pass and a “0” indicates a failure. For completeness tests, simple formula fields can be used. For uniqueness and validity tests, you will likely need to implement a scheduled Apex job to evaluate records.

Once the fields representing tests are in place and being populated, you can surface the results in a couple of ways.

On Page Layouts

This helps keep data quality top of mind with your users:

A Score field shows the percentage of tests this record passes – 100% would indicate a high-quality record. A Failed Rules field lists the tests that this particular record fails – and tells the user what they can do to raise this record’s quality.

Via Dashboards

This dashboard shows an average score for the Account records, as well as the % of records passing each test. You can slice the data by user, department, record type, geography to get a better handle on your data issues.

You can also use Salesforce’s Reporting Snapshots feature to monitor data quality trends over time:

Data Quality Management

Where to go from here? How do you make data quality management a more central part of how you manage your Salesforce environment? A few ideas:

  1. With this basic monitoring solution in place, you can see where your biggest problems are – where gap between the definition of quality and the state of your data is the widest. Identify and execute remediation strategies to bring your data into line – perhaps with third party enrichment tools, or a deduplication exercise. You’ll be able to monitor the results of your efforts and know that you are moving the needle in the right direction.
  2. Stay close to your business stakeholders, to understand how the definition of data quality may need to change in the future. Perhaps your organization is planning to start a direct mail campaign – suddenly high-quality mailing address info is critical. Or considering opening a branch office – lead data quality in that geography might suddenly become more important. Your definition of data quality needs to change to respond to the business and provide it what it needs.
  3. Use a data quality lens to evaluate any major changes to your Salesforce environment – new integrations, new teams on-boarding, etc. – how will the change likely impact data quality?

Hope this post has been thought provoking and has provided a useful path to approaching the problem of data quality in Salesforce.