Product development is a risky proposition particularly for hardware companies that make complex products with long development cycles. Often times, capital has to be allocated against features several years prior to product release. With dozens of attributes and features all contending for the same pool of capital budgets, how should an organization decide where to place strategic bets?
Nowhere is this problem more pervasive than the automotive sector. A typical full generation refresh cycle for a vehicle takes 3-4 years to complete. Since moving the needle on several vehicle attributes such as acceleration or handling can entail changes to complex vehicle subsystems that take years to develop, that means allocation of capital to vehicle attributes needs to happen up to 5 years prior to product launch.
Most hardware producers use a combination of science and art to try to figure out what attributes to invest in. They will typically conduct surveys across past and prospective customers asking them why they purchased the vehicle they did and what they want from the next one. They will look at quality and reliability data and maybe even conduct conjoint studies to prioritize what attributes to go after. Ultimately they will try and reconcile those insights with strategic brand priorities, for example, handling in the case of BMW vehicles or design in the case of Sony Vaio laptops. The challenge however is that the data sources are almost always conflicting. What do you do when Marketing’s preferred data source – a prospective customer survey – says that interior space is what’s most important while Engineering’s preferred quality data source indicates that improvements in handling will drive the greatest lift? What results is typically a holy war with the HIPPO winning out in the end. Lack of data is not the issue in this environment. Paradoxically, the underlying issue is having too much data and not being able to differentiate the signal from the noise.
I worked with a major U.S. automotive company who had struggled with this problem for years and wanted to develop a more objective data driven approach to product development rooted in fact. This article describes the insights from that effort and how any organization developing complex products with multiple attributes can benefit tremendously from adopting a similar quantitative approach to attribute prioritization.
Core to this thought process is the underlying assumption which states that in any given product category, the financial outcomes achieved by a specific competitive product (e.g. units sold, average price realized, market share etc.) is a function of the product’s attributes plus all other non-attribute factors across all unique customer segments. For the more quantitatively inclined, this notion can be expressed in an equation as follows:
The ‘other’ bucket of factors here can include a host of variables such as marketing investment, brand equity, distribution points and even macroeconomic and other exogenous variables.
Translated to layman’s terms, the key question being posed over here is that after controlling for all logical input variables that can be controlled for, is it possible to attribute variation in commercial outcomes of the product to the objective performance of its underlying attributes? In the context of an automobile, if we had an objective way to measure ‘true’ performance on things like braking, handling, acceleration, space etc., could we create a model that told us for say non-premium midsize vehicles, that every unit increase in braking performance results in 50,000 more vehicles sold or that every unit reduction in wind noise decibel ratings results in 40,000 more units sold? If so, the marginal impact of each attribute on the outcome can be used to stack rank the attributes and attribute prioritization is then simply a matter of looking at the current state performance of the attribute relative to the marginal gain possible by improving performance on it. This decision framework is illustrated visually below:
This premise sounds simple enough in theory but is incredibly difficult to test given the myriad of complexities and details associated with assembling the right data set and preparing it for analysis that yields easily comparable and interpretable results. The complexities are driven in large part by the disparate and fragmented structure of the data sources which makes consolidating it extremely challenging.
After much iteration however, our team of data scientists was able to demonstrate that ~30% of the out-of-sample variation in outcomes in the preferred output metric for our OEM client could be directly ascribed to vehicle attribute performance in a defined segment of competitive products thereby definitively answering the question of what attributes matter most to customers. In fact, the attribute driven portion of variation in commercial outcomes increased up to two fold for some customer segments. The executive team then used these insights to calibrate their decision on what attributes to allocate more capital towards based on the customer segments they wanted to target.
Based on our success in this pilot effort, this model is now “in-production” at this major automotive company and insights from it are a critical milestone deliverable in the product development process for each vehicle program.
The rest of this post describes some of the execution details of the analysis and the explicit set of steps you can follow to conduct a similar analysis for your product category. These steps are described in the context of the automotive data landscape, however the principals outlined are applicable to product development in any sector.
Step 1: Organizing and understanding your universe of data sources
Organizing all your data sources and understanding the dimensions available across each source is a critical first step. The most critical dimensions are data granularity and attribute coverage in addition to refresh cycle and geographical coverage. Continuing with our example of the automotive sector, commonly available data sets available include:
- Historical vehicle purchase surveys: ~17 million consumer vehicles are sold in the U.S. annually and 1.5% of those vehicle buyers fill out an in depth 2-hour long vehicle purchase survey that contains ~60-70 questions related to demographics, reasons they purchased their vehicles, alternative vehicles considered and usage patterns. This dataset is available for purchase commercially
- Vehicle quality surveys: Certain third party research companies such as J.D. Power and others conduct extensive vehicle quality studies by asking vehicle owners about the ‘things that went wrong’ (TGWs) with their vehicles. These quality statistics are then published at the nameplate level in intricate detail (e.g. the number of times the volume knob in the entertainment system failed on the 2015 Civic, Malibu etc.)
- Vehicle satisfaction surveys: Similar to quality surveys, third party research companies such as J.D. Power and others also conduct extensive surveys across the universe of new vehicle buyers to understand how satisfied they are with various attributes of their vehicle. The results are published at the nameplate level by attribute (e.g. exterior design, handling, performance etc.)
- A sampling of actual dealer sales: This dataset contains sell-out volumes estimated though a sampling of actual dealer sales that shows what products are moving, total units sold, average transaction price, average consumer incentive rebate etc. This data source is critical to understanding the commercial outcomes for a vehicle nameplate. While this data isn’t available at the transaction level, it is available in an aggregated level by week if needed.
- Marketing and distribution data: Third party research firms also compile automotive marketing and distribution data. This includes estimated fixed advertising dollars at the nameplate level and the number of dealerships at the Brand level.
- Macroeconomic and other exogenous factors: This includes data published by the BLS and Federal Reserve and contains macroeconomic indicators such as GDP, GDP growth, unemployment, interest rates, housing starts etc.
To summarize then, our data landscape may look like the following:
Step 2: Developing an attribute master and mapping feature fields from each data set to your desired attribute master
A key element we are concerned with in every data source is the attribute battery for which data is available. A situation that can add considerable complexity to our analysis is if the attribute battery and numerical rating scales differ across data sources which they often tend to do.
Two additional steps are required to address this issue:
- Create a master product attribute taxonomy at the level of granularity at which the organization is looking to make large capital allocation decisions at. For a laptop manufacturer, the taxonomy might include attributes such as battery life, screen resolution, speed, keyboard comfort, exterior design and so on. An automotive manufacturer may choose to consider exterior design, interior roominess, acceleration, handling, braking, cargo space etc. It is important to note that attributes don’t necessarily need to be tangible features, they can be an abstraction of features. Also, the list should be finite, ideally no more than 15-20 attributes. Settling on the right level of detail will require some judgment and input from the Product Development and Marketing functions but suffice to say, in the case of automobiles, an attribute like ‘performance’ is probably too high level while ‘wheel rims’ is too specific.
- Once a master attribute taxonomy is defined, the attribute markers in each individual data source need to be mapped to this taxonomy. Scale differences across data fields are fine at this point and can be normalized later. Also, not every desired master attribute might have a corresponding data point in the raw survey data and that too is fine. The results of the analysis will need to be contextualized for these gaps.
Step 3: Merging each disparate data source into one consolidated matrix
Armed with insights on the structure of the data sources and attribute mappings, we can start to think about how to assemble a single consolidated matrix that incorporates all our relevant data sources.
Before doing so, let’s recap the objectives of this exercise. Our goal with the consolidated matrix is to simulate a market model where we can:
- A) Understand customer purchase reasons and segment the customer base along key dimensions of purchase heterogeneity
- B) Capture the performance of vehicles in a given product segment
- C) Control for advertising investments and brand level distribution points
- D) Control for the macroeconomic climate
- E) Capture actual market outcomes for a vehicle (e.g. units sold, price realized, share realized etc.)
Our overarching goal is then to try and predict market outcomes (E) as a function of B, C, D. Once we’ve done so at the aggregate level, we may want to segments our customers using stated purchase reasons (A) and repeat the exercise for each segment.
To merge our data sources, let’s refer back to our summary data matrix
Because the data exists at different levels of granularity, we can do one of two things. We can either roll up the more granular datasets to match the less granular ones, or we can keep the data at the lowest level of granularity and merge the less granular data with the understanding that there will be some repetition of values. I would advocate doing the latter so as to not lose any potential signal from the data.
Since according to our summary table, purchase surveys are our most granular form of data (available at the respondent level), we would designate that as our primary data set and merge all other data sources to that primary data set. This process can be summarized in the illustration below
To merge the other 6 datasets to our primary data set (Purchase Surveys) which is at the respondent level, we need to use different field combinations for the joins. In the case of quality surveys, satisfaction surveys and fixed marketing data, we use the nameplate and purchase year fields. Distribution data can be joined with respondent surveys using vehicle brand information. Macro economic data is joined based on the purchase quarter in the survey and is the same for all nameplates in the data for a given quarter. Lastly average price, volume and market share can be merged with our primary survey dataset using the nameplate and purchase quarter fields.
We now have a single matrix in which we’ve consolidated all of our data sources. If we were to try and predict an outcome variable such as price based on the input variable set, we would expect to see higher satisfaction scores, higher fixed marketing investments and greater distribution points positively correlated with higher realized price. Conversely, we would expect greater quality issues negatively correlated with favorable outcomes.
Step 4: Picking a peer group and time horizon for the analysis
The data matrix we have assembled thus far includes all vehicles and theoretically goes back in time to the point where these surveys first became available which in the automotive world spans several decades. Prior to building a predictive model using this data, we need to make two critical decisions
- What competitive products to filter for?
- What time horizon to filter for?
These decisions could have significant ramifications on the insights revealed via the model and therefore deserve some thoughtful consideration.
If we were to pick a competitive product set that was too broad, we would run the risk of diluting the attribute importance signal with noise that isn’t representative of the underlying customer needs for our target product. On the other hand, if we filtered the competitive product set too narrowly, we would miss out on potential signal in the data. While it may be tempting to pick the handful of competitive products defined by the marketing department or from publically available third party consumer rankings, doing so would be a disservice to ourselves. Best practice for defining a competitive set consists of analyzing actual consumer cross-shop behavior. This can be estimated by adding up two groups of products from customer survey data. The first are competitive products that your eventual customers considered most closely during their purchase decision but decided not to purchase. The second group consists of competitive products that customers ultimately purchased while seriously contemplating the purchase of our target product. This distribution of products can have a very long tail (particularly in highly competitive product segments) so to keep it manageable we would want to cut it off once we’ve captured ~70-80% of the total purchases across both groups. We would then filter all our data sources in the consolidated matrix for just this subset of competitive products that are most often cross-shopped with our target product.
The decision to select a time horizon for the analysis also has similar tradeoffs. The key consideration here is to use a period that is long enough to give us enough data points for the analysis while at the same time not so long that we are introducing confounders driven by fundamental changes in the technology landscape or in macro consumer behavior. In the automotive space, this sweet spot is typically 6-10 years of historical data for most vehicle segments.
Step 5: A model construct
Now that we have filtered the data down to just the competitive product set (defined via cross-shop behavior) over the past 8 or so years, we can start to think about our predictive model’s construct.
Let’s say that we are standing in early 2017 and our data sources cover the period up to end of 2016. Furthermore, let’s assume that our goal is to make attribute decisions for a product that will be released in 2019. What this means is that we need to demonstrate predictive accuracy at least two years ahead of time. The figure below visually depicts this scenario:
In order to objectively test the out-of-sample predictive accuracy of this model, we need to train it using data up to 2014 holding out any data after that (indicated by the blue line). Once we’ve the trained the model on this period, we then test it by assuming that we’re standing in the middle of 2015 and have data up to the ‘14-‘15 period available. We do this by running the model with input variables from this ‘14-‘15 period to predict outcomes in the ‘16-‘17 period. Since we have ‘actual’ historical outcomes available from the ‘16-‘17 period as well, we evaluate the accuracy of the model by looking at predicted outcomes in the ‘16-‘17 period versus actual outcomes in this period. A high out-of-sample accuracy of the ‘16-‘17 outcome prediction can give us confidence in the generalizability of the model’s coefficients. As a final step, we can train the model using data available up to 2017 and look at the attribute coefficient values to make a decision on which attributes to invest in based on those that provide the greatest lift.
Step 6: Potential approaches to normalize the data
Recall that since our attribute fields in the predictor matrix come from different data sources, they are likely to be on different numerical scales. In the automotive space, one satisfaction data source may rate customer satisfaction by attribute on a 1-5 scale and another source may rate attribute satisfaction on a categorical hi/medium/low scale. Similarly, quality data is often measured using an “issues per 1000 vehicles” metric and so forth. Mean centering or statistical scaling are techniques that are available to resolve this issue. Z-score scaling essentially converts data to ‘relative standard deviations’ by subtracting from each value the mean value of the field and dividing the result by the standard deviation of the field. Scaling the data this way is essential for accurate results.
Another issue you may encounter is that you have more than one attribute field mapping to your master attribute (e.g. three different satisfaction or quality fields across datasets all mapping to you master vehicle ‘Braking’ attribute). This is potentially a problem because if you were to try and interpret variable coefficients from a linear model output, it would be challenging to look at the net effect of all three variables as they would not all be orthogonal and therefore not independent. A potential solution is therefore to orthoganalize the data set using principal components, however, this too comes at a tradeoff with model interpretability. The details of orthoganalizing are beyond the scope of this post, however note that some judgement will be necessary to determine if this step is required.
Step 7: Pros and Cons of statistical vs. machine learnings algorithms and interpreting results
We are now ready to build the actual predictive model. The list of algorithms available to us is long and ranges from traditional linear models to modern ensemble machine learnings techniques. Before getting too carried away with modern ML techniques though, it is important to understand the type of problem we are trying to solve and the tradeoffs across the methods available.
Broadly speaking, a fundamental tradeoff in traditional linear versus modern predictive machine learning algorithms is one of accuracy versus interpretability. Because interpretability is a critical issue in this model, many modern ML techniques may not lend themselves well to this type of problem. For a more in-depth discussion on the two types of predictive analytical use-cases read my post here (which I highly recommend).
The figure above provides a summary of a few popular predictive modeling algorithms in use today across a variety of use cases. In my experience, the algorithms well suited to our type of predictive modeling problem tend to include Regularization or CART/RF type algorithms. That said, we would want to test predictive out-of-sample accuracy using a metric such as MSE across various methods to determine which one best suits our needs while keeping in mind our critical interpretability constraint.
Once we’ve built a model with strong out of sample predictive accuracy using the construct described in step 5, we can move to our final step which entails assessing the attribute coefficient values to determine which attributes we should be investing in.
Plotting the results visually on a chart similar to the one below can be an extremely powerful way to interpret results.
Each dot on this chart represents a product attribute. On the vertical axis of this chart is a numerical scale representing the coefficients of our predictive model’s attribute variables (i.e. our model output). On the horizontal axis is the relative performance of the product’s attributes calculated directly from the customer satisfaction data. The color represents the incremental capital cost of improving an attribute (e.g. in the case of automobiles, fuel economy may be high and entertainment system may be low). By plotting all the attributes on these three dimensions, we now have a clear picture of:
- How the product is currently performing (or perceived to be performing) on each attribute
- The relative financial benefit of investing in one attribute over the other
- The relative cost of improving each attribute
Two clusters in this chart are then of particularly high importance. On the top right hand side are attributes (labeled A) that our target product is rated highly on (relative to competitive products) and represents attributes that matter tremendously from a commercial standpoint. These attributes can be thought of as those underpinning the product’s competitive advantage. Investment should be made in these attributes to ensure that the advantage in these remains. On the top left hand side are a cluster of attributes (labeled B) that offer the greatest opportunity for financial benefit if improved. Strategically selecting a handful of attributes in this cluster, (particularly those that are low cost) can significantly improve the financial outcomes of the refreshed product.
Note that in the visual above, we have only plotted the attribute coefficients and not the coefficients associated with the “other” factors. Implicit in this notion is the assumption that ‘attributes matter’ or in other words, that they demonstrated some type of correlation with financial outcomes. A potential insight of the analysis could have been that attribute performance does not matter and that financial results are simply a function of fixed marketing, distribution points and other macro / exogenous factors. I would argue that this is just as powerful of an insight as it allows the organization to focus on other brand building, distribution and macro risk mitigation efforts rather than worrying about attributes as those other factors are what will objectively move the needle on financial outcomes.
Step 8: Expanding the analysis to discrete segments to find a strategic position
The analysis outlined thus far considers the product’s buyers as one large consumer group. This is probably an oversimplification. In reality, there are likely to be multiple customer segments that exist which are likely to have different attitudes towards different product attributes.
Recall that in our automotive data landscape, in the customer survey data, we have purchase reasons as a set of data fields where consumers were asked to rate the importance of each attribute in their purchase decision. By running an unsupervised clustering algorithm (such as k-means) on these attribute importance fields, we can segment the target consumer population (and our dataset) into a set of discrete clusters. We could then repeat the same predictive modeling exercise outlined above separately on each of these clusters to determine if there are certain consumer segments that value a set of attributes where our product has a strong advantage or where it could potentially develop one.
Step 9: Demonstrating the statistical link between engineering decisions and quality and satisfaction scores
What we’ve been able to do thus far is to objectively demonstrate a correlation between performance of a product on a set of attributes (measured via satisfaction and quality scores) and its financial outcomes. What we have assumed implicitly is the link between an attribute’s engineering specifications and its customer satisfaction / quality scores. This is important because the ultimate lever available to a hardware manufacturer wanting to modify an attribute is the engineering spec for it. Therefore there needs to be evidence to suggest that engineering spec changes impact customer satisfaction. In the automotive world this could mean testing whether reducing decibel levels through reinforced windows has an impact on wind noise satisfaction scores or whether actual 0-60 times are correlated with satisfaction with vehicle acceleration.
The automotive OEM we worked with had over the years established this link between engineering specs and customer perceptions so we were confident in the robustness of the insights revealed by the model. Being able to establish this link is critical in order to have confidence in the decision being taken as a result of the model’s output.
Conclusion
As any statistician or data scientist knows, building a predictive model to rank the relative strength of each predictor variable’s relationship with the response variable can be a tricky exercise. A fundamental limitation of the approach is that it can’t help you understand phenomena that haven’t occurred in the past. In an uncertain technology landscape like the one we face today (driven by the inevitability of autonomous vehicles, new modes of transportation and propulsion, alternate forms of energy creation etc.), we need to keep reminding ourselves of these limitations so we don’t blind ourselves to them.
That said, hardware manufacturers often have to make capex allocation decisions with billions of dollars at stake and placing these bets with objective evidence of where the greatest returns have historically come from can have tremendous implications on the profitability of these programs.