Basics of Marketing Data Cleaning & Quality

Posted November 12, 2024
BLOG COVER-Basics of Marketing Data Cleaning & Quality_@3x-100

Table of Contents

Data cleaning for better data quality may not be the sexiest topic in marketing operations. Still, it is essential to create reports that make sense from a numeric standpoint and to improve readability.

Data quality is also the key ingredient when:

  • Creating accurate segments
  • Spotting profiling trends and updating an ideal customer profile 
  • Maintaining a timeline of activities on a contact record
  • Calculating pipeline and bookings in the aggregate
  • And more!

 

This article covers how to think about data cleansing, the top fields we see prioritized in many B2B organizations, and how to prioritize which fields to tackle first.

What does “data cleaning” mean?

Data cleaning is taking raw or corrupted data and improving the quality through logic and normalization. Cleaning could mean scrubbing invalid characters from existing fields, creating a new field that categorizes your original field values into a manageable subset or grouping, or standardizing the format of a field (like analyzing a State/Province field and converting all values into a standard two-character abbreviation rather than the full name).

Data cleaning does more than make information more readable. Data cleaning can also prevent malicious attacks via form fills.

member sign in screen

A good example is Marketo. Platforms like Marketo don’t come with a character-type screening mechanism for form fields out of the box, meaning that an end-user could enter a value like “=UPLOAD(A1:Z5000): Source” (which is meaningless – this is an article on data cleaning, not hacking forms). Hackers can use this loophole to drop a formula into your database that gets activated once you export data to a program like Excel. 

Once the data is exported, your entire spreadsheet is uploaded to a remote server, and your lead data is stolen. Adding some code to your website to prevent form submissions with malicious characters or running form fills through a screening workflow in your marketing automation platform that strips out non-alphabetic characters is data cleaning and prevents these types of attacks.

we're sorry message when signing in

Data cleaning does not include deanonymization or creating a universal record ID. You can read more about the importance of these processes in our marketing data collection article.

What makes manually cleaning data challenging?

Manual data cleaning involves assigning someone to manually review records via a user interface or downloading data to a spreadsheet.

Manually reviewing data is tedious, and staying focused for very long is impossible (humans have the attention span of a goldfish, right?).

What makes manually cleaning data challenging_@3x-100

Spreadsheets can be helpful because of the ability to apply formulas and text delimiters, but you’re limited by your spreadsheet vendor’s processing capabilities and/or laptop’s capabilities. Too many sheets or tabs, unoptimized formulas, data volume, and formatting can all take a huge toll on processing time.

How does data cleaning impact data quality?

Data quality is often in the eye of the beholder. Standardized and normalized data make it easier for end users to run reports and feel comfortable with the output.

Let’s walk through a data hygiene issue that’s very common in CRM instances:

Let’s say your previous admin liked all State/Province values as the first three characters of any value and deleted anything else. Then you had another admin come in and remove the rule, which meant that marketing could upload lists with the abbreviated State/Province, and sales could use their data enrichment vendor’s default, which was the State/Province full name.

Salespeople often distrust marketing data and can jump to conclusions – whether the data deserves to be criticized. Let’s say one of your reps is trying to run a report that summarizes their target accounts by state, and they see the following in the State/Province column when trying to pull a report for Washington state:

  • WA
  • Was
  • Washington
  • Washington, D.C.
How does data cleaning impact data quality_@3x-100

“Was” could have been an abbreviation for Washington or Washington D.C. Further complicating matters, “WA” could represent Western Australia, Washington, or West Africa. While we know it isn’t too tricky to group values and apply a Country filter, the sales team now has grounds to question all data in your system – even though the abbreviations were technically correct due to rules put in place at the time of data entry.

How to prioritize which data to clean

We recommend prioritizing which data cleansing workflows to build first by:

  1. Degree of security risk
  2. Dependency for key metrics
  3. Degree of use across reports
  4. Readability

 

Fields included in forms we know historically have not gated special character submissions should be at the top of your list because of potential security vulnerabilities via formula injector attacks. We recommend stripping out all characters that do not belong in fields collected on your forms. 

For example, we would expect the following fields NOT to contain the values in this table:

Field Name
Characters to Eliminate
First Name
!@#$%^&*():’;”?/|\{[}]
Last Name
!@#$%^&*():’;”?/|\{[}]
Email
!#$%^&*():’;”?/|\{[}],
Title
!@#$%^&*():’;”?/|\{[}]
Phone Number
abcdfghijklmnopqrsuvwyz!@#$%^&*():’;”?/|\{[}]

There may be exceptions and searching for examples to review before applying a global update or processing your entire database through a data-cleansing workflow is always wise.

The next category to be prioritized is key metric driving fields. You should consider numeric values contained in fields, establishing rules for managing a NULL or blank value, and how related date fields are captured and stored.

The last priority should include fields that are relied upon for business workflows and common groupings and filters applied by business teams. You’ll want to work with your operations and finance peers to determine which fields are used by the rest of the business, but common fields by object are listed below:

Object
Field Label
Contact
Job Title, Job Role, Funnel Stage Dates, UTM Parameters
Account
Industry, Sub-Industry, Vertical, State, Country, Employee Range, Employee Count, Annual Revenue, Annual Revenue Range
Campaign Member
Member Status (by Campaign Type), Responded and Response Date (by Member Status and Campaign Type)

Other objects will have fields that need attention but may not be impacted by marketing operations workflows. Your peer group must determine who tackles which fields and why to establish a clear code of conduct to prevent overlapping or contradicting workflows.

A simplified example of data cleaning with a workflow

Many B2B SaaS companies sell by use case or a subset of benefits buyers most acutely feel in a particular industry. Some data vendors are better sources for particular industries than others, so having a wide range of data inputs isn’t uncommon.

As a best practice, create custom fields for your data vendor-specific information that will be used in commonly referenced fields. For example, you may have a primary Industry field that a group of custom fields feeds:

A simplified example of data cleaning with a workflow -A@3x-100

Having different custom fields will allow you to rank which fields are preferred based on the values in those fields. 

Let’s say that Vendor D is preferred for data around job title, email, and other contact data Vendor A is preferred for Manufacturing, and Vendor B wins out for technology and professional services industries. Our Industry field logic would look something like the following:

A simplified example of data cleaning with a workflow-b@3x-100

This logic workflow is also an opportunity to clean up information in these fields. For example, Vendor A may have the following classifications:

  • Software and services
  • Artificial intelligence
  • Cyber security
  • Data storage
  • Fintech
  • BioTech

 

These values could be aggregated into a single “Technology” Industry value by searching for text that contains “Tech” in your workflow (just make sure to check that “tech” as a phrase doesn’t appear in non-technical industries) and for the key phrases listed above. 

You may also run into scenarios where the SIC or NAICS codes are passed instead of the industry names. Use the same workflow to populate those values with the actual industry name to keep a consistent list of industries in your main Industry field.

A simplified example of data cleaning with a workflow-C@3x-100

Whether you have separate “source” fields for the different vendors or maintain a single Industry field, we recommend running the Industry field through your cleanup logic after downloading a backup. Then, you’ll want to run cleanup logic when any relevant fields are updated to maintain the Industry field going forward. 

In summary, when creating a data cleansing flow, you should: 

  1. Create new fields for different data vendors
  2. Create a backup file with Account ID and Industry values.
  3. Test your logic by looking for exceptions (for example, if you plan to assume that any industry with “tech” in the name should be filed under Technology, run a search to determine whether your assumption is true first)
  4. Run all existing Industry values through cleanup logic
  5. Create triggered Industry cleanup flow to maintain data hygiene going forward

 

Best of luck on your data quality journey!

View Our Other Thought Leadership