Sign In

Data quality: your company’s biggest obstacle

The data ana­lyt­ics mar­ket has evolved rapid­ly, apply­ing algo­rithms to larg­er datasets on a more fre­quent basis by a wider group of peo­ple and prac­ti­tion­ers. Many of the sophis­ti­cat­ed algo­rithms deployed are now more than 50 years old, but the recent ubiq­ui­ty of almost lim­it­less data stor­age and com­pute pow­er, and improved devel­op­ment tools, has enabled greater scale today and for the future of data ana­lyt­ics.

Data ana­lyt­ics sys­tems have advanced to apply more com­plex analy­sis in a much more respon­sive man­ner. Ana­lyt­ics remains all about cre­at­ing insight and pro­vid­ing answers to impor­tant ques­tions, which busi­ness­es can exploit to achieve improved out­comes. Con­tem­po­rary sys­tems now apply machine-learn­ing to more diverse datasets, where the analy­sis is dri­ven to a larg­er extent by the com­put­er rather than the oper­a­tor. The real-time arrival of new data enables insights to be dis­cov­ered as soon as they occur.

Companies looking to make better use of data

His­tor­i­cal­ly, the demand and needs of cor­po­rate data have been less oner­ous. But as com­pa­nies now look to lever­age their data, rather than just record it, a rapid­ly grow­ing major issue is the verac­i­ty of that data. The qual­i­ty of the analy­sis con­duct­ed is only ever as good as the qual­i­ty of the data fed into a sys­tem. Increas­ing the qual­i­ty and man­age­ment of data has not yet been a require­ment, but this leads to the cre­ation and use of data that isn’t up to scratch for the type of ana­lyt­ics com­pa­nies need to devel­op.

Ana­lyt­ics tools and tech­nolo­gies are to a large extent com­modi­tised, but the sin­gle biggest chal­lenge for organ­i­sa­tions to exploit them is the cre­ation and avail­abil­i­ty of ade­quate datasets. Estab­lish­ing these foun­da­tions is typ­i­cal­ly a par­tic­u­lar prob­lem in organ­i­sa­tions that have not his­tor­i­cal­ly fos­tered a data-cen­tric cul­ture.

“Some­times rel­e­vant data doesn’t exist and when it does its loca­tion is often poor­ly under­stood, rid­dled with qual­i­ty issues and spread across mul­ti­ple sys­tems of record,” says Paul Fer­mor, UK solu­tions direc­tor at Soft­ware AG. “The major­i­ty of organ­i­sa­tions that have realised the poten­tial val­ue of their data are engaged in sub­stan­tive projects to improve its qual­i­ty, real-time avail­abil­i­ty and inte­gra­tion across sys­tems.”

Is cloud the future of data analytics?

Senior lead­ers want data ana­lyt­ics capa­bil­i­ties for their organ­i­sa­tion, but can often become frus­trat­ed with slow progress due to the under­ly­ing lim­i­ta­tions of their exist­ing core data infra­struc­ture. This is par­tic­u­lar­ly com­mon in com­pa­nies that have grown through acqui­si­tions, lead­ing to frag­ment­ed tech­nol­o­gy, teams and cul­tures.

The rise of cloud has helped resolve data infra­struc­ture scal­a­bil­i­ty con­cerns, pro­vid­ing data ana­lyt­ics soft­ware as a ser­vice. Cloud has ensured the lat­est tool­ing is read­i­ly avail­able with­out the need to main­tain and patch, while tra­di­tion­al data­base admin­is­tra­tors can build machine-learn­ing mod­els with­out the knowl­edge required just a few years ago.

“Cot­tage indus­tries and data fief­doms will grad­u­al­ly dis­in­te­grate; the future of data ana­lyt­ics is in the cloud,” says James Tro­mans, tech­ni­cal direc­tor at Google Cloud. “Those with the cor­rect clear­ance can quick­ly start apply­ing advanced data ana­lyt­ics to a valu­able busi­ness prob­lem in a way that sim­ply wasn’t pos­si­ble pre­vi­ous­ly.”

Some­times rel­e­vant data doesn’t exist and when it does its loca­tion is often poor­ly under­stood, rid­dled with qual­i­ty issues

The injec­tion of new tech­ni­cal capa­bil­i­ty into organ­i­sa­tions means their abil­i­ty to trans­form with cloud infra­struc­ture is becom­ing much eas­i­er. Increased avail­abil­i­ty of the infra­struc­ture that algo­rithms reside in means major cloud providers are rac­ing to add on relat­ed tech­nol­o­gy such as live stream­ing or secu­ri­ty. As well as the ana­lyt­ics, they want to cov­er the con­trol and reg­u­la­to­ry com­pli­ance of the data.

Smart data analysis key to beating competition

An exam­ple of a com­pa­ny already com­bin­ing machine-learn­ing and data ana­lyt­ics in the cloud is Oca­do Tech­nol­o­gy. Part­ner­ing Ten­sor­Flow with Big­Query, Oca­do devel­oped a mech­a­nism for pre­dict­ing and recog­nis­ing inci­dents of fraud among mil­lions of oth­er nor­mal events using data col­lect­ed from past orders, as well as cas­es of fraud. Cre­at­ing a reli­able mod­el, Oca­do improved its pre­ci­sion of detect­ing fraud by a fac­tor of 15.

“Busi­ness and indus­tries are being dis­rupt­ed by cheap­er, more agile com­pe­ti­tion. How they respond is based on how they are able to man­age and exploit their data,” says Nick Whit­feld, head of data and ana­lyt­ics at KPMG UK. “The cul­ture around data needs to change. Organ­i­sa­tions need to ful­ly under­stand that if the qual­i­ty of data at the point of cre­ation is poor, this will under­mine invest­ment or focus in the future of data ana­lyt­ics.”

Mr Fer­mor at Soft­ware AG believes the future of data ana­lyt­ics will address hard­er prob­lems, such as cre­at­ing more human-like machines. “This might man­i­fest itself in more con­vinc­ing chat­bots and arti­fi­cial intel­li­gence assis­tants, or improved med­ical diag­no­sis tools. There are also efforts to auto­mate the machine-learn­ing process, which is still dri­ven by humans, and cre­ate a less tech­ni­cal, self-ser­vice approach to cre­at­ing and deploy­ing sophis­ti­cat­ed mod­els,” he says.

Organ­i­sa­tions that want to become data enabled must eval­u­ate their skills and oper­at­ing mod­els in the future of data ana­lyt­ics. Ensur­ing they are able to process and exploit data quick­ly will require an ecosys­tem of tal­ent­ed peo­ple, who are geared up to work at an unprece­dent­ed lev­el of accu­ra­cy.