Our friends at bet365 certainly know a thing or two about innovation – so how does this pertain to their engineering and product? Find out in this blog written by Alan Reed, Head of Sports Development, Hillside Technology (bet365’s tech innovation business) below!
Maintaining leadership in a saturated market isn't easy. Doing so means consistently pushing boundaries to keep up with market demand while developing software that’s both sustainable and has the headroom for innovation.
That’s the theory anyway. Of course, it doesn’t cover every eventuality, as we found out when the product team came to us with a request to innovate one of our marquee services.
With over half a million concurrent players at peak times, Bet Builder is one of our most popular products. When we launched it in 2018 it was the first of its kind. Today, however, it’s one of many. Originally offering users premade bet builds they could edit; the business saw an opportunity to add value and create differentiation by putting the customer in complete control of their build options.
With millions of potential combinations at their fingertips, the question my department, Sports Development had to answer was, how do you achieve the raw power needed to deliver that kind of flexibility, on the fly, to 100,000s of users and ensure the system can calculate the odds in real-time?
Presenting the changes onscreen to users wasn’t the problem. We knew we could make the user interface slick. However, to give them carte blanche on the bets they could build, we knew the challenge would be one of raw power.
To answer the question of ‘could it be done?’, we needed to stress the compute to see if it had the elasticity we needed. That meant using a rapid application framework to set up a string of pilots so we could observe what would happen.
Our initial theory was to inject some new software into the Bet Builder engine that would give it a turbo boost. But we couldn’t get anywhere near the speed or scale we needed.
The problem was the original Bet Builder engine was designed to achieve a specific goal - to move people through the bet selection sequence as quickly as possible. That sequence was built on top of our existing data hierarchy, so was familiar to customers. The bet builds were pre-selected, so we could move users along at pace.
Injecting new software into the old system wouldn’t work because we were asking it to do something that it hadn’t been designed to do. Whereas Bet Builder one had been designed for sequential action, the Bet Builder 2.0 software needed to be built for concurrency and scale. The two wouldn’t be compatible. To get it to work, we would have to build a new system from scratch.
We needed to take a holistic approach. We had to look at every element individually and determine where the extra grunt would come from. That meant going back to formula and asking our mathematicians if it would be possible to make their models work more efficiently.
We knew we couldn’t ask that part of the system to do more, but we could make the most of what we already had. While we wouldn’t be able to build on the mathematical engine, we thought about how we could build an additional tier on the existing Bet Builder servers. One that would take the original price and multiply it out by all the possible combinations.
We also knew that we’d need to find the answer in software. When building the first version of Bet Builder we thought that GPUs would give us an edge over CPUs, but the hardware’s architecture didn’t offer the speed needed.
GPUs are excellent at processing numbers, but our real problem was access to the data. Every time someone create a bet build, we’d need to process it in real-time. It didn’t matter how fast we could make the calculations if we couldn’t rapidly access the data.
We couldn’t just plug in a grid of GPUs and solve the problem because they wouldn’t help us to fetch, redistribute, and reformat the massive data sets, that were on different technologies, data sources, and platforms. We realised that our original assumption was wrong. It wasn’t our ability to compute or even to compute concurrently. It was our ability to talk to disparate systems and reformat and process that information.
We couldn’t just throw money at the problem because increasing the raw power wouldn’t solve it. We needed a new software architecture.
Golang was a good candidate. It would give us the velocity, the scale, and the availability, and could do all these things on relatively small kit. But it wasn’t going to solve the compute issue.
We couldn’t use databases because they were too slow. However, we knew that if we could predict all the possible combinations that a user might want and hold them in state then the player would be able to pick them off as if they were making selections from a menu.
First, the mathematicians squeezed as much power out of the models that they could. Development then compounded this by creating services that would extrapolate all the possible combinations into datasets that could be made available in state.
You’re told when you begin in IT is that if you can compartmentalise a problem, you can solve it and then multiply that solution out. A lot of the time this works. However, it wasn’t something we could do here because the problem was too complex.
In this instance, there were additional factors we couldn’t control. Namely change and the volatility of that change. We had the compute power and could distribute it, but the challenge was the communication chatter in between.
We had to find a way to hold all the selections our customers made in state so that if someone else made the same selection, the system wouldn’t have to recalculate. It would simply have to dip into the cache. We then made the caches brittle so it would exist for a few seconds and then disappear.
This was important because of the volume of bets we receive at any one time. People don’t bet throughout the game. They make them just before it begins and during the commercial breaks. We see huge lulls in activity and then there’s a mad rush.
The result was a system that learnt in real-time and could go faster the more you use it.
We had to look at the fundamentals of how we were processing the simulations. Conflation of events became a key technique. When an event happens in a game, multiple calculations are performed, almost simultaneously.
In order to avoid overwhelming the system, we conflate multiple outputs and this is what is passed into the simulation. Every millisecond counts and there is no need to process anything that isn’t deemed as the most relevant item
Essentially, it’s a probability puzzle. Yes, there are millions of combinations of bets but reading into the data gave us some interesting parameters to work with. We knew what the most popular bets were and that once the initial bet had been made, the number of options available to the use reduced significantly.
Once we understood what all the different permutations of choices were, it was then a combination of pushing data into statable caches and spanning it out to all the permutations of the bets that someone could select.
By placing the prediction engine at the heart of the system, we don’t have to compute anything on the fly because the calculations have already taken place. The result is we can serve 100,000s of concurrent users. Recently, at peak, we’ve had up to 600,000 people on it at the same time.
Of course, that doesn’t mean that we are sitting still. That’s not how things work around here. We are now looking at our next challenge. To offer Bet Builder across more opportunities, build it out across different sports, and then across numerous fixtures.
This blog is reshared with bet365's permission. The original can be viewed here.