Ideal Datasets

For the ideal analysis experience with Max.ai, here are some things to consider about your dataset. 

Please note that Max will not function without a date column, at least one metric column, and clear column headers. Max currently only supports a single fact table.

The More Data, the Better


The more data you have, the more value you will get from Max. We recommend having the following present in your dataset:

  • At least 2 years of historical data. Having 2 plus years of data will provide much more accurate and valuable predictions for the future when Max.ai is forecasting metrics.
  • The more metrics, the higher the value of analysis. Having several metrics in your data will add value to your analysis by allowing you to get a broader view of your data and compare performance across different metrics. 
  • More dimensions for additional viewpoints. Additional dimensions in your dataset will provide more variety for breaking analysis out and seeing different views. If you were to upload a dataset of “Sales by State” most of the information that Max might provide would be simplistic insights that you may have been able to ascertain yourself. However, if you upload a dataset of “Sales by State, Store Associate, and Product Category” your insights will be far more valuable and robust and might provide information that you wouldn’t have otherwise discovered.

Back to Top


Data Structure

While column headers being present is all that is required to upload your data, headers that clearly explain the data within that column will make it easier for Max to provide value. Since Max.ai is not doing any data mapping, all data definitions will come from your column headers. Keep this in mind when evaluating the complexity of your data structure.

Examples of possible data structures from best to worst -

  • Ideal: Category > subcategory > product would be a simple hierarchy
  • Workable: Category > subcategory OR segment > product 
  • Poor: Category > subcategory OR segment > product, where product can belong to multiple subcategories or segments

Back to Top


Qualities of Ideal Data

The ideal dataset is defined by being consistent, granular, compatible, complete, and unique.

Consistent: The data needs to be consistent in format, structure, and labeling. For example, if you're aggregating sales data from different stores, all the stores should have the same types of sales data recorded in the same format.

In the below example we can see that Order ID, Order Date, and Customer ID are all recorded in consistent formats making it easy for Max to analyze the data without having to convert any items to match one another. 

Consistent_Data.jpg

Versus in this example, we have two different stores recording their Order IDs, Customer IDs, and States in different formats. Order IDs are still unique and will therefore have no impact on Max's analysis. If a customer shops at both stores, they would have two unique customer IDs, one in each format, that Max would not be able to reconcile as the same customer since the formatting of the customer ID is inconsistent. Max would not be able to look at aggregated metrics by state since each store is recording state in a different format. (For example, Max would not combine sales from TX and Texas as being in the same state.)

Stores_Reporting_Differently.jpg

 

Granular: The data needs to be at the same level of granularity. For example, if you're aggregating sales data from different stores, all the stores should have the same level of detail recorded, such as daily or weekly sales totals. Max wants the most granular data that you have without you also providing an aggregate of those numbers. For instance, if you have city level data, do not also provide state level or country level. The granularity should be consistent across all the metrics in the dataset. So only go as granular as the least granular level that each metric has. 

In this example we see that every sale is broken down at the Order ID level, even if each customer has multiple orders. This allows Max to analyze trends within customer spending and look at the items, subcategories, and categories that each customer is buying from.

Granular_Data.jpg

Here, sales are only as granular as the customer level. If a customer has multiple orders in their lifetime, they are totaled up by customer and Max cannot analyze anything at the individual order level.

Not_Granular_Data.jpg

Please note that you should not include total rows (for example, weekly sales if you have daily sales data) as Max.ai will calculate the totals as needed and duplicative counting may occur if totals or subtotals are included in the data.

 

Compatible: The data needs to be in a format that can be easily combined. For example, if you're aggregating sales data from different stores, they should all be in the same format, currency, etc.

In the below data, we see that all sales data is reportedly neatly and in the same numerical format and currency.

Compatible_Data.jpg

Versus here, sales are being reported in several formats and multiple currencies making it difficult for Max to provide complete or accurate analysis of sales overall.

Incompatible_Data.jpg

 

Complete: The data needs to be complete. If there are missing values or incomplete records, it may be difficult to aggregate the data accurately. For instance if you have products, subcategories, and categories, but subcategories aren’t provided for each product, then Max will have limited accuracy at the subcategory level without all of the data being present. 

Below we see an example of what ideal data would look like. All fields are complete for each data entry making it easy for Max to interpret and analyze the data.

Complete_Data.jpg

In this dataset however, there are many missing entries including missing Order IDs, Product IDs, categories, and subcategories. This would make it difficult for Max to perform any sort of analysis about how products, categories, or subcategories are performing.

Incomplete_Data.jpg

 

Unique: The data needs to be uniquely identifiable. For example, if you're aggregating customer data from different sources, each customer should have a unique identifier such as a customer ID number or email address.

In the example shown here, every order has a unique Order ID and each customer is assigned a unique Customer ID. This prevents confusion or overlap if you have multiple customers with the same name for instance.

Unique_Data.jpg

Back to Top

Updated

Was this article helpful?