Sign In

The potential of the data lakehouse

The latest business buzzword offers real opportunity – if deployed intelligently
Databricks Lakehouse Article

Data lake­hous­es are becom­ing increas­ing­ly pop­u­lar in busi­ness as lead­ers seek to eke out effi­cien­cies from their oper­a­tions and unlock new oppor­tu­ni­ties by analysing the data giv­en to them every day. Some data lead­ers might have heard of the term, or seen com­peti­tors util­is­ing the pow­er of the data lake­house in their busi­ness. But what exact­ly is it? What poten­tial does it offer? And how can busi­ness­es ensure they get it right?

It’s about ‘how do I organ­ise my data in the most ratio­nal, stan­dard­ised and effi­cient way’

The arrival of the lake­house is the nat­ur­al evo­lu­tion in the world of busi­ness data. As organ­i­sa­tions began to inte­grate more and more data into their dai­ly process­es, the need for the so-called data ware­house emerged. These are man­age­ment sys­tems that organ­ise data in a way that makes it easy to exam­ine. But data some­times defies cat­e­gori­sa­tion: it can be too messy or hap­haz­ard to fit into the ware­house struc­ture, stacked on metaphor­i­cal shelves and in rows. And while the solu­tion to that may seem to be the flex­i­bil­i­ty that the data lake offers, chal­lenges with gov­er­nance and struc­ture can turn a lake into a data swamp if organ­i­sa­tions aren’t care­ful. 

Enter the data lake­house, com­bin­ing the flex­i­bil­i­ty, cost-effi­cien­cy and scale of data lakes with the data man­age­ment capa­bil­i­ties of the ware­house. Pools of unor­gan­ised data can be analysed where they are, negat­ing the need to try to put it into box­es into which it doesn’t real­ly fit.

“The lake­house is an archi­tec­tur­al par­a­digm and effec­tive­ly a stan­dard we’re try­ing to form in the mar­ket,” says Dael Williamson, EMEA field CTO at Data­bricks. In many ways, it doesn’t mat­ter what exact­ly the data is: it can still be analysed through the data lake­house. “It’s about ‘how do I organ­ise my data in the most ratio­nal, stan­dard­ised and effi­cient way to be able to stream­line pro­duc­tion and dis­tri­b­u­tion of any form of data?’,” adds Williamson. 

Putting the lakehouse to work

Data­bricks helps cus­tomers, includ­ing Condé Nast, H&M, Gous­to, La Liga, and over 40% of the For­tune 500, to uni­fy their data, ana­lyt­ics and AI using its own data lake­house plat­form. Among them is Fastned, a super­fast elec­tric vehi­cle charg­ing com­pa­ny, which over­hauled the infra­struc­ture behind its busi­ness to embrace the data lake­house. “As we scaled, we want­ed to get our hands on stream­ing data from charg­ers for near real-time insights, but were unable to deliv­er on that with our old infra­struc­ture,” says Bruna Maia, data and insights man­ag­er at Fastned. 

But that changed when they insti­gat­ed a data lake­house plat­form that enabled them to bet­ter utilise the reams of data they had access to. “We have been able to struc­ture stream­ing pipelines and cre­ate bet­ter stan­dards regard­ing our data engi­neer­ing,” says Maia. “We can now ensure net­work uptime at a high­er scale.” 

It’s not just tra­di­tion­al busi­ness­es that can ben­e­fit from util­is­ing a data lake­house in their oper­a­tions. Dur­ing the coro­n­avirus pan­dem­ic, lake­hous­es came into their own, help­ing col­late infor­ma­tion that helped for­mu­late respons­es to ris­ing case num­bers, and devel­op­ing plans for how to tack­le it. “Covid tests or the covid vac­cine are a great exam­ple of where we had to take big sets of data and pull them togeth­er from dis­parate organ­i­sa­tions,” says Robin Sutara, field CTO at Data­bricks.

The health­care sector’s response to coro­n­avirus is an ide­al exam­ple of the oppor­tu­ni­ties afford­ed and enabled by the data lake­house, says Sutara. “The lake­house empow­ers data shar­ing across organ­i­sa­tions that tra­di­tion­al­ly have not shared their data before, and the val­ue that you can dri­ve for soci­ety as a result, when you unlock the pow­er of those datasets,” she says. It’s a mod­el that could be car­ried for­ward to cross-sec­tor col­lab­o­ra­tion in oth­er fields, for the ben­e­fit not just of the busi­ness­es involved, but soci­ety as a whole. 

Boosting efficiency

Smart util­i­sa­tion of data lake­hous­es can help busi­ness­es run more effi­cient­ly, too – some­thing that’s vital giv­en the eco­nom­ic chal­lenges fac­ing organ­i­sa­tions in all areas and of all sizes at present. “I think every organ­i­sa­tion is real­ly start­ing to think about what the ris­ing cost of ener­gy, food short­ages and oth­er [chal­lenges] mean for them,” says Sutara. “How do they make sure that they’re using their data as effi­cient­ly and as effec­tive­ly as pos­si­ble to ensure that they’re dri­ving the best val­ue that they can for their con­sumers and their cus­tomers?”

Ship­ping and trans­port com­pa­nies have also ben­e­fit­ted enor­mous­ly from the pow­er of data lake­hous­es. Around 20 com­pa­nies are involved in for­ward­ing a sin­gle freight car­go from one side to anoth­er. “There’s the con­tain­er, there’s the actu­al ship itself, there’s insur­ance, there’s the bro­ker, there’s the whole logis­tics play. And that can often be a very, very com­pli­cat­ed piece of work,” says Williamson. Intro­duc­ing fric­tion to that process, which often includes the han­dling and under­stand­ing of unstruc­tured data, can be dis­as­trous, which is why lake­hous­es can add real val­ue. Com­pa­nies are able to set up their sys­tems to pull rel­e­vant data from the lake­house for their cho­sen area that’s need­ed, with­out wor­ry­ing about the oth­er sec­tions. 

Identifying new ways to innovate

But it’s not just about dri­ving effi­cien­cies or con­tin­u­ing with busi­ness as usu­al where data lake­hous­es come into their own. “I always appre­ci­ate the soci­etal impact you can have when you unlock the pow­er of your data,” says Sutara. Pulling data into a lake­house allows organ­i­sa­tions to make link­ages they pre­vi­ous­ly may not have con­sid­ered. 

I always appre­ci­ate the soci­etal impact you can have when you unlock the pow­er of your data

It’s pos­si­ble to invent new busi­ness areas or iden­ti­fy poten­tial sec­tions for busi­ness growth sim­ply by pool­ing data in one place and inter­ro­gat­ing what it’s say­ing with­out con­straints. It can iden­ti­fy pat­terns where pre­vi­ous­ly there appeared to be none; it can boost the bot­tom line in ways that hadn’t been con­sid­ered until that point. It pro­vides end­less poten­tial and oppor­tu­ni­ties that wouldn’t oth­er­wise be iden­ti­fied.

Nor is that lim­it­ed to the world of busi­ness. “We haven’t even thought about the inno­va­tion, the capa­bil­i­ty, and the impact it can have on humans and the Earth and all of those things that we want to make bet­ter for future gen­er­a­tions,” says Sutara. “I just think it’s amaz­ing what we’re going to be able to do to deal with it, once we have our arms around it.”

To find out more, vis­it databricks.com