COVID Task Force 3: What Are We Doing With Your Data?
It’s been over 2 years since the pandemic has been spreading across the globe. In response, PROS assembled the COVID-19 Task Force (CTF). As of today, we have a total of 16 participants with their signed data consent form at the time of this blog’s publishing. This number will grow as there are a few more who have already verbally agreed to participate and are just working through the legal hurdles.
We sincerely appreciate your willingness to contribute to the greater good of understanding how your respective industry is going to recover from this crisis. But more importantly, we appreciate your trust in sharing your data with us. Rest assured that we will handle it with the utmost care and security.
Ever since our COVID-19 Task Force has started getting participants, some of the most common questions we received have been “what are you doing with our data, what do you expect to find, and when can we expect to get the results?” I completely empathize with your concern. Since data is the new oil, by sharing your most valuable asset with us, you are taking a huge leap of faith. So it’s perfectly understandable why you might be a little antsy.
Today, I am going to do my best to address your concerns.
What Have We Done so Far
We hosted a webinar for our airline participants in which we shared our approach, the third-party data we are using, and initial insights that are general for the entire airline industry, and we also outlined our next steps. This webinar was anonymized to protect the identity of the participants, and it ran twice to cover the global time zones. We were happy to see a total turnout of 85 unique participants across 20 airlines.
This is more than the number of participating airlines we had at the time, because we have opened this webinar to several airlines that have expressed strong interest to participate but have some questions and concerns. So we have used this webinar to address their concerns while sharing general findings at the regional level. As we progress and reveal more specific findings, future webinars will likely be open to participants only.
We did not host a similar webinar for our B2B participants, because our B2B customer base is very diverse, spanning across 25 different industries. Aside from the diversity of business, their use case of our solution is also different with different data structures, and each requires slightly different analyses. Therefore, it’s more challenging to host a general interest webinar for our B2B participants. Instead, we have held meetings with the participants to share the relevant findings individually.
Our Approach to Forecasting Business Recovery
Since PROS is a SaaS solution provider with a global presence, we host transaction data from iconic brands around the world in our secured cloud environment. When analyzed in aggregate, these transactions could provide a glimpse of the demand levels of the global economy in different markets. We want to study how this demand changes and reacts to the COVID-19 pandemic. We need to understand when demand will recover as the pandemic subsides, how it will recover, at what rate, etc. This knowledge will enable our customers to prepare early as the demand shifts again.
The problem we are trying to study can be reformulated as a simple machine learning problem where we want to predict the market demand as a function of the pandemic. How we approach this problem involves a little bit of math, so it’s more efficient to explain it in a video than words. After all, a picture is worth a thousand words. I’ve tried to make the math as simple as possible, so you should be able to get the logic behind our model, how we are going to solve it, and what we can expect to find.
If you’ve gone through the 7-minute video, congrats! Whether you are our airline or B2B customers, you should now have a general understanding of the kinds of analyses we will be performing on your data.
Your Data is Safe with us
As you can see from the video, we will start our analyses by using 2 public data sources as predictors to the demand data that you shared with us. These are the coronavirus government response data from Oxford University and the epidemic data from Johns Hopkins, each further upstream in terms of its impact on demand. This demand data is bookings for the airlines; but it may be volume, frequency of purchase, or in some case revenue for our B2B customers.
These are extremely sensitive data! But the good news is that all the analyses we described so far are based on correlation and regression, so the absolute number of bookings or the exact dollar amount of revenue is not important. In fact, most of our analyses will be performed on a rescaled or z-scored version of your data. Moreover, all airline booking data are pre-aggregated at the country level before they are combined and analyzed globally.
To further protect the identity of our participants and prevent any insight from being reversed engineered, all demand related predictions will be reported in relative terms. All results will be expressed in percentages or percent changes relative to another part of your own data (e.g. your demand the previous year). Since the denominator is only known to respective participants, the actual demand data (whether it’s bookings for the airlines or revenue for our B2B customers) cannot be inferred. As you can see, we are diligent when it comes to protecting your identity, and we are serious when we say we will handle your data with utmost security.
More Data and Better Prediction
It’s important to emphasize that our goal is to produce a tight chain of causality between the epidemic data and your demand data. And to create a tighter chain of causal links, we need more data sources to help us model the potentially missing intermediary processes in between. So we are actively pursuing partnership and collaboration with additional 3rd party data providers.
Our initial model is currently using just public data sources (i.e. the epidemic data and the government response data). If we applying this initial model to the airline industry, we are basically assuming the following:
- A change in the epidemic would cause the government to respond with different policies to the epidemic (e.g. travel ban, lockdown, etc.).
- Then a change in the government response (e.g. lifting a lockdown, etc.) would cause consumers to book more flights, resulting in a change in the airlines’ demand data.
However, we are currently exploring and evaluating partnerships with various data providers in the airline industry. We could potentially augment our initial model with flight search data, public events data, and airfare filing data through these partnerships. Having these data would enable us to build a tighter chain of causality between epidemic and booking. In this more extensive mode, we will be assuming the following:
- A change in the epidemic would lead to a change in the government response.
- This would cause a change in the establishment, cancelation, or postponement of public events
- This would, in turn, lead to a change in people’s flight searching behavior
- People’s decision to book could then be influenced by the airfares offered
- Finally, this would lead to a change in bookings
As we include more and more data sources, we will have more data and a greater variety of data to cover the potentially missing processes between epidemic and demand change. This is like climbing a ladder with more rungs between the bottom (representing the epidemic data) and the top (representing your demand data). It will make the model training easier, because each data set only has to bridge a smaller gap between the processes immediately before and after it. Since each link of the causal chain is necessarily simpler than the whole, each data set only has to model a simpler relationship.
Even though this will be a lot more work for the entire COVID-19 Task Force, it will make our model better and more predictive. We could potentially reveal more interesting insights with this extended model. So we hope at least some of these partner pursuits will come to fruition.
I hope this summary gave you a peek into what we are doing with your data. If you’d like to participate, please consult your CSM for eligibility. You may still be able to join us and contribute to the recovery of your industry.
For the next update, I hope I’ll be able to share some general findings from our initial results. So stay tuned.
If you missed any of the previous updates, they are all accessible here: