deterministic/probabilistic data

Deterministic and probabilistic are opposing terms that can be used to describe customer data and how it is collected. Deterministic data, also referred to as first party data, is information that is known to be true; it is based on unique identifiers that match one user to one dataset. Examples include email addresses, phone numbers, credit card numbers, usernames and customer IDs. Probabilistic data is information that is based on relational patterns and the likelihood of a certain outcome. A common example of probabilistic data at use is in weather forecasting, where a value is based off of past conditions and probability.

While deterministic data is consistent, more accurate and always true, it can be hard to scale. Probabilistic data can solve the issue of scalability, but can be less precise. Therefore, most data management and marketing professionals combine both types of data to get the most valuable insights.

How deterministic and probabilistic data is collected

Deterministic and probabilistic data are collected in two different ways:

  • Deterministic data is typically collected from users inputting their own information, such as signing up for a service. Common channels for collecting deterministic data include online surveys, social media platforms, point of sale (POS) software and newsletters. Companies can then utilize deterministic matching to encrypt personally identifiable information (PII) and use it to recognize profiles for future logins.
  • Probabilistic data is usually anonymously collected based on a user’s browsing behavior, such as gathering browser cookies or tracking website clicks. The information is then aggregated to create a model of a customer, which can then be compared to deterministic data points. Probabilistic matching is done when a user’s behavioral data is identified as a registered, known user. It can also be used in identity resolution to recognize the same user across multiple devices and applications.

Choosing between deterministic and probabilistic data

Deciding which data approach is best relies on the underlying target business goal. If the goal is to identify actual buyers of a product for marketing or outreach purposes, deterministic data is the best option. However, if the goal is to convert new customers that may be interested in the product, probabilistic data can be of help.

Most data management processes use both methods together. More specifically, probabilistic data can be used to add value to deterministic data. One way is to use probabilistic data to widen the scale and expand reach to deterministic data. When something is unknown in the deterministic dataset, probabilistic data can give companies their best bet. Another way is by using probabilistic data to learn more about the deterministic data. For example, finding out which known customers might be interested in other products or understanding their preferred browsing behavior.

Deterministic data can also be used to train probabilistic data models. When a probabilistic model is created, it can be compared to the known deterministic data for validation. Without a solid foundation of deterministic data, the probabilistic data cannot be precise.

Applications of deterministic and probabilistic data

When combined, deterministic and probabilistic data can be used for:

  • Properly executing cross-device tracking and attribution.
  • Validating the success of marketing campaigns toward new audiences.
  • Enhancing deterministic data with probabilistic information, such as profiling multiple family members that share the same account.
  • Creating buyer personas that can be used for customer segmentation.
  • Launching programmatic buying campaigns, such as making product suggestions.
  • Charting customer profiles in an accurate, real time identity graph.
  • Expanding the reach of advertising across various audiences.


This was last updated in August 2019

Continue Reading About deterministic/probabilistic data

Dig Deeper on Master data management strategy