Recently, EverString’s Chief Operating Officer Amit Rai hosted an hour-long enterprise data roundtable, where a group of experts discussed how the quality and trustworthiness of business data can make a significant impact on team alignment and performance. Below is a recap of the cohort’s discussion, which includes real-world challenges, strategies to consider, and what’s on their data buying wishlist for 2021.
Data Discussions To Make A Difference
In an effort to support recovery from the recent California wildfires and the hurricanes in the Southeast, we’re donating $100 for each participant in our Roundtable Discussion Series to the American Red Cross.
Meet Our Data Panelists
- Asha Mahesh, Director of Data Science at Janssen Pharmaceuticals, a division of Johnson & Johnson, focused on research and development of pharmaceutical products starting from product discovery to clinical, late-stage development. Often working 24×7, Asha and her team have been involved in the rapid development of a COVID-19 vaccine, including an investigational vaccine candidate.
- Manjula Mahajan, Data & Analytics Leader at NetApp, supporting all business units from marketing to support. Responsible for all analytics and data, they support the all cross-functional data science teams within NetApp. On a personal note, Manjula has appreciated the extra quality time spent with her family, including her teenage children.
- Hayelom Tadesse, formerly the leader for the Enterprise Data Platform Strategy and recently promoted to Sr. Director of CRM Platform Strategy for Iron Mountain. Personally, Hayelom has enjoyed exploring more of the outdoors with his family, noting that the shelter-in-place orders have created a reason to visit new spots nearby.
Challenges & Impacts Of Managing Multiple Data Sources
There are universal challenges we all face when it comes to big data management. The biggest challenges discussed during our Roundtable centered around consolidating multiple data sources and building trust regarding the quality of their external data, through data cleansing, entity matching, de-duplication, customer hierarchy tracking, and governance.
Entity Matching & De-duplication
Considering the amount of data movement within an organization, it gets complex quickly, especially when it comes to merging and matching data sets together with high accuracy. Manjula shared that since the data set often moves from place to place, platform to platform it can be difficult to match records together and still ensure high quality.
For example, customer data may travel from the core system to the data warehouse, then into several other tools for data exploration, before reaching the data lake. This much data movement across the organization causes issues across the board, like disparities in reports which can diminish trust very quickly.
Deduplication is another major challenge when you combine multiple data sources together, as it stands in the way of truly measuring the quality of your data. When records can’t match together because they’re slightly different names or the addresses aren’t spelled exactly the same, these small problems can grow into gigantic challenges.
Data scientists are hungry for data, but there’s also sensitivity, especially in highly regulated markets. When data quality is low, any models using that data will be flawed from the start. Hayelom pointed out that, especially when larger organizations go through acquisitions, multiple data source challenges grow exponentially. Having a proper strategy to handle those challenges is crucial for building data confidence.
Integrating the applications still leaves you managing multiple systems, hierarchies, etc. In addition to using their enterprise data platform to centralize and cleanse the data, Hayelom mentioned that to resolve this issue, they focus on optimizing the governance of their data, including identifying the right data stewards for each data set. You can build all the fancy layering and integration within your tech stack, but it still won’t resolve the ultimate challenge unless it’s governed effectively.
Data Volume, Variety & Accessibility
Asha mentioned a challenge for them in the life sciences data science space, is the variety of data available, almost more so than the volume of data. For their industry, there are so many different attributes and signals that are highly regulated, but they need to analyze, that knowing what you need is half the battle. Computing it at scale is the challenge. You must have a high level of quality in your data and you’ve got to have a wide coverage area too.
Accessibility and transparency pose hurdles as well. Asha mentioned that a common struggle is helping their business users actually know what data sets are available to them. To help here, Asha’s team focuses on curating metadata about data to improve accessibility and visibility. They also invested in data cataloging tools, like Alation, to help data scientists discover and request the data they need.
Effective Strategies To Improve Data Quality When Handling Multiple Data Sources
Implement a Single Source of Truth
You can have the most advanced data models in place, but if you put bad data in, you will undoubtedly receive bad data outcomes, regardless of the strength of the models themselves. To resolve the challenge of data quality, leaders are exploring new solutions to help them establish a single source of high-quality data from which your business and your teams can operate.
To learn the best ways to handle their business data in a meaningful way, Hayelom shared that within their enterprise data platform, having their data aggregated in a single, centralized place, where you can transform it all on a single layer, has helped a great deal.
The platform approach is helping Manjula too. By bringing external data sets together with internal data, such as customer records, and building a singular platform of data each unit can pull from, Manjula helps stakeholders see the full, 360-degree view of each account.
Establish a Data Stewardship Framework
For Hayelom, having a formal data governance model in place, including clear definitions of data stewardship, was one of the biggest wins. This type of structure helped them determine who the right stakeholders are, from the beginning.
Then, once the data is in the lake, Hayelom and the team invested in different technology solutions that enable the governance controls, access, and business projects.
Virtualize & Prioritize Your Data
Merging internal and external data sources together is a challenge all around. The big data lake approach can often produce very low-reliability data. Instead of building a massive data lake, Manjula takes a methodical approach, focusing on one at a time, and prioritizing based on business impact and objectives.
Instead of physically moving that data into those different stores, Manjula virtually delivers integrated data sets or “views”, making the appropriate data available to the right business teams.
This pragmatic approach makes data discoverable through a catalog, enabling individual business units to take third-party data and enrich it based on their specific business needs.
Approaches To Customer Data Platforms & Stitching Multiple Data Sources Together
To stitch data sets together across multiple sources, you need a common entity to tie back to, whether it’s a company name or address. Many teams have realized that traditional data vendors approach this matching process using a fuzzy method that winds up garnering considerably lower quality data which leads to mismatches abound.
Instead, advanced data providers are leaning on the same advanced technology that helps Google and Amazon be so successful. This technology uses AI and machine learning to gather and assess digital footprints in real-time, which is proven to produce far more clear and precise matches.
Manjula shared that, while for B2B companies the account data is important, that alone isn’t enough. Contacts must be enriched and verified too. To address this, Manjula has created a role dedicated to contact data management, including all-important contact verification work. Using external APIs and fuzzy logic built-in, her team brings in contacts from different data sources, merges them together, and creates a master record for each customer account.
Data can be gathered about a customer starting from the pre-sales process, all the way through customer support. By capturing and feeding such data back into a go-to-market team can result in even higher engagement and lead generation in future programs.
Instead of a deterministic, ID-to-ID method, Asha shared they have taken a probabilistic approach, matching from a broader set and looking for patterns to determine the record each belongs to. Using AI and machine learning, several algorithms can help put this trend and pattern matching technique in place.
Predictions For The State of Data in 2021
We know many data challenges exist, and the demand for quality B2B data will only increase over time. When asked what they felt the forecast was for the future in terms of the State of Data in 2021, our panelists said this:
- Manjula recommended that in order to solve the challenges being discussed, organizations should look for ways to reduce the movement of data. When users do not trust the data, that is a foundational issue to address. Cataloging is also another noteworthy trend since this helps empower users to own their individual use cases for data, along with your guidance, governance, and support.
- Asha commented that she seeks more tools to help teams quickly experiment on certain data sets, gauge the usefulness in terms of meeting business goals and priorities, and then make a decision in a streamlined fashion.
A Data Buyer’s Wishlist
In closing, we asked our guests to wave an imaginary wand and share what would be on their wishlist as a data buyer in the coming year. In other words, what’s a data feature or aspect that they would be most excited about from a firmographic data provider? They had this to say:
- Manjula underscored that above anything, data quality is the biggest priority when considering a data purchase, especially after the movement of that data.
- Asha shared that data quality, as well as open, and transparency regarding the data lineage to ensure it was properly collected and dispersed.
Consolidate your research, calculate your potential savings, compare vendors, and access checklists, templates, and tools to help you make the most informed external data purchases. Access the B2B Data Buyer’s Toolkit, a 4-step guide to making informed data decisions.