As we look ahead to a new year, and reflect on the last, we consider how data science can be used to optimise the future. But to what degree can we trust past experiences and observations, essentially relying on historical data to predict the future? And with what level of accuracy?
In this episode of the DataCafé we ask: how can we optimise our predictions of future scenarios to maximise the benefit we can obtain from them while minimising the risk of unknowns?
Data Science is made up of many diverse technical disciplines that can help to answer these questions. Two among them are mathematical optimisation and machine learning. We explore how these two fascinating areas interact and how they can both help to turbo charge the other's cutting edge in the future.
We speak with Dimitrios Letsios from King's College London about his work in optimisation and what he sees as exciting new developments in the field by working together with the field of machine learning.
With interview guest Dr. Dimitrios Letsios, lecturer (assistant professor) in the Department of Informatics at King's College London and a member of the Algorithms and Data Analysis Group.
Some links above may require payment or login. We are not endorsing them or receiving any payment for mentioning them. They are provided as is. Often free versions of papers are available and we would encourage you to investigate.
Recording date: 23 October 2020
Interview date: 21 February 2020
Intro music by Music 4 Video Library (Patreon supporter)
Thanks for joining us in the DataCafé. You can follow us on twitter @DataCafePodcast and feel free to contact us about anything you've heard here or think would be an interesting topic in the future.
Hello, and welcome to the Data Cafe. I'm Jason.
And I'm Jeremy. And today we're talking about optimising the future.
Oh, wow. Okay, so what does optimising the future mean exactly, Jeremy?
So this is a really nice idea that I think fits in very beautifully with data science as a discipline, in general. And it really comes from where the topic of optimization and bio optimization here, we're really talking about mathematical optimization, where that disciplines come from. So optimization, in years gone by used to be something where scientists would go into a company and say, Well, I think I can see that if we did your process slightly differently, if we optimised it, we could get something that was brilliant and better. And and it really sort of pivoted off this notion that you could display this marvellous hindsight on an industrial process may be, and you could produce a result that was massive improvement, or 20%, saving on time taken to do a process that you're looking at. And then that would be applicable to all of the stuff that was going to happen in the future. So broadly, you know, yesterday's plan would be applicable today or tomorrow without with very little change. And so it would make sense to, to come up with an optimization that would work yesterday, and will still work tomorrow. Okay, so you've painted a picture there that there is an optimal way for a process to run. And we're looking at how it's been running so far, and then searching for an optimal way to run it again, in the future. Exactly. What's really important about this is that the future, and certainly we see this in data science and in the data science projects that we execute, the future is not necessarily the same as the past. And I think, you know, if there is one message to take from today's episode, it's it's pretty much that is that you can't assume that just because you've seen a set of data yesterday, or last month or last year, that is going to be exactly what you see tomorrow, or next week, or next month or next year. And so as a result of that, you've got to somehow take this into account when you're producing your optimization, or your new efficient model or process for the following day, or whatever future epoch you're looking at.
And that efficiency. And it occurred to me when we were looking into this topic that we kind of alluded to it when we had a previous episode about vehicle routing. And the problem there, we were looking at all the different routes that a vehicle or logistics kind of hope would send their vehicles out on. But then there's a pre step to that right, where all of the vehicles have to agree when they're going out, I guess there's a schedule. And that schedule has to work to the constraints that may be fed by the routes that they've got. So if I'm going out on a long route, I want to go out early, and maybe that's a constraint. And so in a very inefficient process, there's something delaying me I'm not getting out early, I can't do my long route on time, for example. So you'd look at that and say, right, well, let's pull together all those constraints and figure out what's the optimal version here. These kinds of examples are really interesting, you can draw the parallels, I guess, across a lot of processes.
I think, I think so I think I think the vehicle routing example is a really good one. Because there you've got this setup, where you're trying to come up with an optimal sequence for how you drop parcels or fridges, or whatever it is you're trying to deliver, to set of locations, it's absolutely standard to go here with a set of locations. Here are the constraints that I am giving you. And you have to look through all of the possible combinations of the problem, so that you can find a really good or optimal solution, which says you should visit these customers in this order with this set of vehicles. And if you do that, you know, you get the potential to get a fantastic result. But here's the problem. I don't know, necessarily which customers are going to require deliveries tomorrow, okay? Or maybe I do tomorrow, but I don't know next week. So if I'm trying to make a really good plan, I need to somehow factor in that uncertainty, that lack of knowledge into my optimization assessment or my benchmarking of my optimization, so that I can come up with a reasonable look ahead to the future, which says, In the future, this is a good set of customers, that will probably be the set of customers that is typical of a Monday delivery pattern, or a Tuesday delivery pattern or a November delivery pattern, or whatever it is.
It's really interesting, because the way that you set it up is sounding like you're laying out different versions of the future, you're laying out ways to decide, well, what do I know? And what version of the future Can I work towards? What is the optimal that's attainable? So you're almost searching for that version, you know, the efficiencies are worth it. As opposed to something that's a hard prediction, you know, that is a given that it's going to happen, you're adjusting your future based on what you know, and what you can learn from the previous behaviour and the constraints that are fed to you essentially, the example that you gave there of dropping off fridges like even today, I was talking to some friends about the demand for white goods. And I think it's off the back of the luck then, that people are stocking up on like freezers because they're buying in bulk and putting all of their stuff into the freezers. So that's a whole new demand that wasn't there a year ago, that's now cropped up. And your models got to change the futures that you want to model off the back of that are different and feed different decisions.
Absolutely. And if you're delivering fridges, right, I mean, they've got a whole whole set of horrible constraints, there is a heavy, they probably need two or three people to carry them up into a customer's house, you need to know, if you're delivering it to a block of flats, you need to you need to have a good idea of how many blocks of flats, you're delivering those two, because that's adding 1520 minutes, probably for each each delivery. And if you get that wrong, then you're going to be running very late. And you're not going to be coming up with anything remotely optimal. So yeah, it's entirely relevant to to that we had a nice way of setting this up, too, that comes up often. And the idea of whether we're looking at descriptive, predictive or prescriptive modelling, and did you want to lay that out a bit, I always think it's a really nice way to describe it. I agree. I think this doesn't just introduce the the notion of these different styles of, of data size, but what I think importantly does is show that they both feed off each other and link together. So I mean, classically, it you'll hear this in a lot of sort of data science texts, you'll have three types of project from a from a data science perspective that you hear people talk about descriptive models, predictive models and prescriptive models. So what are they? Descriptive models are, briefly, what has happened? So can I look at the data I have, and just analyse what happened in the past in my operation in my customer sales, whatever it is that you've been asked to look at, and evaluate those past behaviours.
And we do that in our science class all the time, right? We run an experiment, and we look at the data afterwards and try to evaluate, well, what did our experiment show?
Yes, exactly. What's happened. And then there's the predictive. So that's sort of taking it to the next level, if you like what she says, well, it's nice to know what happened. But actually, what I'm really interested in is what's going to happen. I'm going to do some time series analysis or some regression modelling. And I'm going to come up with some kind of view on how many things I'm going to sell, or how many customers or which customers I'm going to sell to, in the future on a week on Friday.
Yeah, and we run our experiment again, but the result is different. Isn't that right? You're going to get that variance being introduced the unknowns around the prediction come from...
Precisely. Yeah, so there's absolutely there's a level of uncertainty here about that, because it's the future, you know, we're not, you know, this, this fits very nicely into a mantra I use, usually two or three times a week, which is data science is not magic, okay. So, so data science is is about coming up with good quality, scientifically driven techniques, which gives you a handle on representing quite quite tricky problems. And in this case, you know, how I can represent or predict the future when it comes to, you know, sales figures or something like that. But of course, because inevitably, there's a deal of a deal of uncertainty a deal of noise. In that prediction in any prediction, that would be whether it's that whether it's predicting the US presidential election or whether it's predicting the weather, but of course, you can't know you can only come up with your best guess, and found that guess in some good quality techniques.
Even with the weather, I always find it kind of funny when we're told there is a 10% chance of rain. And I'm thinking, well, I can't bring 10% to my umbrella. What am I supposed to do with this information?
Exactly. Yeah, I was I was busy swearing the weather forecasters yesterday when I taken my family out to to for a nice walk. And it was supposed to be beautifully dry and sunny, and we arrived and it was actually pouring with rain. But then, you know, kudos to them. 10 minutes later, it stopped and the sun did come out. Yeah. That was just that was super forecasting.
Yeah. But actually, that leads nicely into the idea of the prescriptive part, because you will take that information forward, and you will lay out well, what are the kind of limiting factors when I plan my next day? And even at that simple level, you make a decision between well have I got room to bring all the things I need to cater for the likelihood of different weather conditions? So even just in my hand, do I carry my umbrella and, you know, just that that simple level, it start off with a couple of constraints. And this is where when we start to look at it algorithmically, it's going to get very difficult very quickly, because the simple level is already going to be a challenge to even conceive of mathematically and programme and then run on a computer, which I think we'll get to later on. But just to tie it up with that prescriptive element where you have some insight to the future. But you want to prescribe off the back of that and make a decision off the back of that.
Critically, the decision is the key. In that scenario, can I make an optimal decision that will be appropriate to the future scenario that I'm hoping to anticipate if you like, so if I've, if I've predicted something, from my predictive data science, can I use that information to then make a decision, a prescriptive outcome that will make sense and give me access to the efficiency I potentially could have got, were I to be looking at yesterday's schedule and going or yesterday's forecasting, what had I known it was going to rain at 10 past 11, then I wouldn't have left the house until half past 10, and so on, you know, you can never get as good as looking to the past and going, I could always do that, I could always get a really fantastic, optimal schedule, based on what I knew happened in the future. hindsight is exactly it's all about the 2020 hindsight. And and, you know, it gets basic, it does get people very, very annoyed, I think, especially if you talk to operational experts in this area and say, well, gosh, you could have done this so much better. It's like, yeah, yeah, but I didn't know
It's obvious afterwards. Yeah, you can explain this when you've seen it happen.
Right, right. Yeah. How was I supposed to know that my van broke down, and three of my customers were out, and therefore I had to, I had to leave the fridge round the back.
So all of these concepts are so challenging mathematically. And we've specialists working on this, and you interviewed with Dimitrios Letsios a lecturer in Data Science at King's College London, do you want to tell us about what that interview entailed?
Yes, so Dimitrios had been working on a really interesting problem of looking at van schedules and how vans were being used in a postal delivery setting, in particular, looking at whether they could be used more efficiently by aggregating the jobs that they were being sent on, so that the routes they're being sent on, in a way which allows allows the operation to use fewer vehicles. So you know, if I have a duty, which would be what the company would refer to it as that runs from nine till 10. And another duty that was being sent out on another van, maybe leaving at 0930. And coming back at 1030. Maybe I could take that second duty and put it on the first van, just have it start half an hour later. So so he was looking at, you know, how could we rearrange the schedule of these duties on these vans, and then, you know, use fewer vehicles, which is great, because that means less maintenance, less wear and tear on the vehicles, and ultimately, hopefully an easier operation.
Yeah. And the environmental emissions targets comes into it too.
Completely. Okay, so let's hear what Dimitrios had to say.
So I'm joined by Demetrios Letsios, who's a lecturer in Data Science at King's College London. And we had a really fantastic time talking about optimization today, as he's applied to some interesting problems in Van scheduling. And what I wanted to ask you, Dimitrios is, well, in data science, we use optimization, a lot optimization techniques a lot, but we find that there's a lot of noise, there's a lot of perturbation, and change in in the data that we were trying to parameterize or feed into our optimization model. So how, how have you been able to sort of cope with with that?
So I guess there are a couple of major approaches for dealing with optimization under uncertainty when there is error, there is the stochastic optimization approach, which assumes that the parameters of an optimization problems are unknown, but they follow some probability distribution, right. But here, in this talk, today, we focus more on the robust optimization side where the parameters of an optimization problem are restricted to be in a well defined uncertainty sets. And so now we're dealing with a problem of finding a solution, which is good, not for one instance, but for every possible instance, in the uncertainty sets. And we want to minimise the worst case realisation.
Because typically, you'd in a traditional optimization model, you'd feed in a value. So the problem, the problem you were looking at today was one of how to schedule van departures from particular operational locations. Given that you knew that the duty length was, say two and a half hours long, but and that's what you'd feed it, you'd say you'd feed in two and a half hours, and then a set of other links for the other vehicles. But, but in reality, it might be two hours, it might be two hours, 45 minutes, it might be three hours, so we just don't know.
Yeah, exactly. That's correct. But we need, we have usually an estimation, and we have, you know, some bounds of where the true value is located. Right. And that's what was what is about, like, optimization under uncertainty from a deterministic viewpoint, finding a good solution for any possible realisation? Yeah. And that's what we did in the context of van scheduling.
So one of the approaches that seems to be taken was where you look at the level of time, the amount of time that you'd be putting onto your vehicles. And you'd say, well, let's assume it's a wet day, let's assume that there's some disruption, maybe which means that this, these duties that we're putting onto the vehicles take a little longer. So we can add a factor, maybe multiplicative factor to the length of the duties, and then see if our schedule looks similar after optimization, because if it doesn't, that's that's that's problematic, isn't it?
Yeah. That's, yeah, that's correct. So basically, we want to compute a solution after uncertainty after we know the uncertainty the true values, yes. Which is reasonably good. There is this perturbation factor indeed, which indicates, like the levels of the uncertainty that we should expect? Yeah. And the question is that once we have this information, maybe once you know, you have an idea of what perturbation you will see is because nobody can avoid perturbation, the question is how you can construct a solution, which is robust, which will not fail. When uncertainty is realised. Yeah. And that's what this work was about how to... an approach for constructing robust schedules based on lexicographic optimisation.
Okay. So the idea is, if if, if you had a small change in whatever you were inputting into your optimization model, and it caused the, the the output to completely disappear in terms of the goodness of the solution, if you've lost all of that goodness, just because there was a tiny, tiny change in the in the optimization input, then that would be that would be quite difficult to realise in practice, if we were trying to suggest a new schedule to an operation, but the constraint that we gave them was You have to do it, it has to be exactly as prescribed. It has to be the length of time that you've spent doing these jobs have to be exactly as as given that that's not likely to happen, is it. So, so there's a real problem with with something which wouldn't be robust.
Yeah, exactly. So in such a case, we need to tolerate some level of flexibility. And, and we make any use potentially recovery approaches or something along those lines. Yes.
And then towards the end of the talk you you gave us, there was a very interesting slide you, you talked about the vision that you had for optimization in a data science environment. And maybe you'd like to say a few words about that, and how you see this is an important general approach for optimization techniques dealing with data heavy applications.
Yeah, so I find, like this viewpoint, quite interesting. That's what we had, why adopt in the end, so that the science is about making predictions and understanding the data and so on, right? optimization is about making optimal decisions. And partially optimization is very frequently used in data science anyway, for building these models. But another question that is posed is that once a nice predictive model has been produced, and it has an output, how we can explore, you know, these predictive models, together with optimization for generating optimal decisions, that's the challenge. And then the output of data science model can be different things, it can be some regression three that you can put potentially embedding optimization problem, or it can be an estimated value of a parameter of an optimization problem with a certain degree of uncertainty. And we have an uncertainty set. Yeah, and this is quite the general setting and different ways to view it.
I like this view a lot. And I like it, because it speaks towards what you're going to use your output for, it speaks towards the decision, the context in which the optimization and the data science sit. And, and the fact that you're you're focusing on, I want something that's going to be not just not just good for what you've done, not just sort of given the perfect knowledge of what happened, this is how you could have done it optimally. But more More importantly, much more importantly, allows you to look to look to the future and say this is what you could do in a in a predictive scenario.
Yeah, I think we agree. So that's the line of, of thinking. Yeah. So basically, predictive models should also... would be would be nice, they would also be used to better inform decisions and if they're come about the future about, you know, things like efficient resource allocation. And if they are combined with mathematical optimization approaches, then it seems like there are going to be very nice and influential outputs. Yeah.
Dimitrios, thank you very much.
Thank you very much, Jeremy, for hosting me here today and for your time.
That's really interesting, where, especially some of the context around that it occurred to me, we have an element of randomness, right to configure or account for. And Demetrius was talking about this being a stochastic process, which means that there's an element of randomness to it. And, versus a deterministic viewpoint, where there is no randomness. And if you know all of your initial conditions, your starting point, the deterministic view, says, you know, every point going forward in the future, and I find this really interesting because, in, in our life science learnings, we come across the double pendulum, and it paints the picture for me, because when you swing the double pendulum, it carves out a certain like, way of sweeping through the air, and then it looks chaotic, and how it behaves. And if you started again, it will sweep at a completely different chaotic wave swinging back the place. It's one pendulum hanging from the other, they interact. And the physics causes a different pattern every single time you swing. And the only way to account for the fact that the sim simulation of it from the same initial starting point would be the same, because all the physics is accounted for. But in the real world, there's so much that's not accounted for, you know, right down to the molecules of air that it's moving through that you cannot account for. So it appears random. It looks random, or at least the random that we put on this as a human observer is what we call a chaotic, but it's only chaotic because the initial starting condition has been different in some way that we cannot account for.
No, I think it's important to realise where the randomness that Dimitrios is alluding to come from. And it can come from many sources. But it doesn't mean it can be completely arbitrary. I think we should say that it doesn't mean, it could be anything it might be, it might be 10 milliseconds before, before I arrive at my next destination, or it might be four and a half hours, it doesn't mean that isn't it, what it means is, there is some uncertainty and that uncertainty can have a model attributed to it, it can have a distribution it can be, but it does mean there is some uncertainty. So where does that uncertainty come from, you've hit the nail on the head, it's, it can be stuff that is inherently uncertain, as you said, almost down to the air molecules moving around, or I think, you know, in the problem he was looking at, there was good uncertainty in the length of these duties that he was capturing, and you know, how long it was going to take was going to have dramatic impact on whether he could create an efficient schedule for these vans, and what might impact the length of duty? Well, the weather might impact, the length of the duties, we've already talked about how uncertain that is, and that, I think, in many levels is inherently uncertain. But also things like the traffic patterns on the road, you know, you might, you might just make that red light on, on Monday. And on Tuesday, you might not make that red light. And then it's it's a, it's a really awful light, and you're stuck there for three or four minutes. And that three or four minutes pushes you into a different modality of traffic or something, you know, that sensitive dependence on initial conditions, which you alluded to, can indeed come into play with with, with uncertainties around traffic patterns, and, and that kind of thing. So there's, so you might introduce randomness because of that. And then you might just introduce randomness, because you just don't know quite enough about the problem, you think, well, it might take half an hour, but it might take 40 minutes. And I, I haven't got enough information to tell what the duration actually might be precisely. So I'm just going to I'm just going to put in a little bit of uncertainty. To cover my back almost.
We see this in project management, you know, you've got to put in some probabilistic estimate on your time to account for parts of a project. And a lot of the parallels that I would even draw to project management come down to the critical path method, which is to look at the kind of dependent tasks. And if there's dependencies, they have to follow sequentially. And then there's other tasks that can be run in parallel. And if you can parallelize some of it, it's the ones that are dependent, that form the critical path, that you can't run your project for a shorter time then, and you're going to try to find the optimal way of orienting the tasks, the activities in your project to make the critical path be as short as possible. But there is a shortest possible one, and you have constraints and you have those probabilistic estimates all at play.
And it's particularly hard in that scenario, where your critical path looks beautifully clear until some uncertainty creeps in and one of your key tasks takes a lot longer, or maybe even shorter amount of time that you anticipate, and then suddenly, suddenly, another element of your project drops onto the critical path unexpectedly. So that's, I think, a very nice analogy to the vehicle routing or logistics example.
So what do you think is the kind of cutting edge with this or some of the algorithmic approaches we should consider?
I think one thing I'd quite like to say is, why this is so important in a data science, and actually in the machine learning context. You know, machine learning is a hugely important part of the data science toolkit. And any data scientist who's engaged with any of the sort of standard algorithms will know that almost all of them whether it's this sort of regression, whether it's a deep learning neural network, whether it's a random forest or NLP technique, almost all of them will start with "minimise the error on the following." But that minimise statement is a real clue because that's an optimization. And in fact, you look at the loss of computational statistics. And, you know, change point analysis is all about minimising the error given a set of suggested change points that we talked about in a previous episode.
Yeah, shout out to our episode on that.
Yeah, absolutely. Absolutely. So so we you take you see this optimization as an approach creeping in all over the place. It's in machine learning and in statistics, and therefore, in data science, so understanding how you can deal with uncertainty in your data, and how you can incorporate uncertainty that is maybe coming from forecasts of your inputs, forecasts of your parameters for your techniques is so important. It is not different from or set aside from machine learning and data science at all, it is absolutely central to it. So I think I think understanding the, the joined-at-the-hip connection between these approaches is part of the cutting edge, actually, because I don't think enough relevance, if you like, has been attributed to these approaches. And Dimitrios actually talks about how we can bring new machine learning and optimization together, and one of the ways that they are so closely tied together and how improvements in one will hugely benefit improvements. The other is in this idea that machine learning is a form of optimization from the start, you asked about cutting edge. And yeah, the uncertainty then is where How can we deal with uncertainty in the modelling, sometimes there are parameters in your model or elements that you want to input into your optimization algorithm, which are random, and they have a known distribution. So you can put in that distribution as part of the input, you can say, the amount of time it's gonna take for this fan to travel from Hackney to his LinkedIn is 10 minutes, except it can take 15 minutes on the outside. So what I'm gonna do is I'm gonna say it's normally distributed with a mean of 12, and a half, three, something like that. And, and that would work. And but you do need quite sophisticated approaches to then deal with those stochastic distributions, as they sort of trickle through the model and get analysed as part of the optimization. And then the approach that Dimitrios is using on his problem was one where he was looking at more deterministic scenarios where we say, well, let's assume that the amount of time is, although it could take 10 minutes, we're going to assume it's going to take 30 minutes, 40 minutes, 50 minutes, whatever we're gonna assume is essentially a conservative assumption for how long some piece of the operation is going to take. And then in taking that conservative assumption, we are going to robustify our output. So what does that mean? That means, if we were to see the traffic lights do the dirty on us, if we were to see a bad weather day, if we were to see a lot of customer activity, which meant that we had to hold our fridges up three flights of stairs, then we factored some of that into our optimization model from the outset, hopefully with some data backing that up. And that allows us to come up with a still something which is optimal, but has some slack in it so that we're not going to have a an output, which is unattainable, which is looks beautifully efficient on the surface, but is not robust to these perturbations to these things that can go wrong, essentially,
The unknown unknowns as I refer to them in a lot of my projects. How can I get benefit from this?
So I think the important takeaway from this is that when optimization or machine learning, but data science in general focuses on the decision, that's going to be taken as a result of the output from the data science tool, whether it's an optimization output, or a machine learned output or whatever, then you really see the benefit. And if if you don't focus on the decision, then you you lose that, that connection. And if you lose that connection with what decision is gonna be taken, the fact that I'm dispatching people dispatching, content, whatever it is, if you don't get that link to the decision, then you also lose the absolute most important thing, which is you lose the connection to the impact that you're trying to achieve. So understanding the connection from the descriptive through the predictive to the prescriptive. And connecting that to the decision which whoever it is, is going to be using your data science tool is going to take and understanding the impact. What can go wrong with that impact? And what can go right, and how to really maximise that going right? Is the the end goal here. And it's what Dimitrios has really beautifully elucidated in his in his interview.
Cool. That's really interesting, Jeremy, I think, if anything from going over all of this in this episode, I see in the future us delving into that machine learning embedded in optimization techniques.
I'd love to do that. I think there's so much for the two approaches to learn and to benefit from each other. I think that would be a really exciting episode to do.
Yeah, it's all really nice mathematically, but I still don't know whether I have to bring my umbrella tomorrow or not.
Frankly, nor do I.
Thanks very much, Jeremy.
Thanks for joining us today at the data cafe. You can like and review this on iTunes or your prefered podcast provider. Or if you'd like to get in touch you can email us jason at datacafe dot uk or jeremy at datacafe dot uk or on Twitter at datacafepodcast. We'd love to hear your suggestions for future episodes.