What do you think about data hygiene?

April 14, 2021

data hygiene

Hygiene is a big and important topic in Corona times and data hygiene in companies should be similar. Many companies still do not care enough or not at all about data quality management. Washing hands, disinfecting, wearing mouth-nose protection. In the physical world, we have internalised hygiene rules so that they run almost automatically. But what does it actually look like in the digital world? Do we know and apply data hygiene rules in our daily work?

Data hygiene in the customer management system

Imagine, for example, the address data in your CRM. How much customer data is there doubled up? In different spellings? With different contact persons? With different telephone numbers or postal addresses? You know best why this is so: different employees use the tool in their own way, make spelling mistakes, forget one or the other entry, mix up line entries, and so on and so forth. The list of everyday mistakes goes on and on. You may ask yourself whether a few unintentionally incorrectly entered customer data are really problematic. Yes, they are, and they are if you want to integrate systems, e.g. CRM and email client, because with increasing data volume, data quality becomes more important for companies: the successful integration of ERP, BI, BPM or CRM systems also depends on reliable and consistent data. Not to be neglected are also the unnecessary or even critical costs caused by incorrect data. These can range from a misguided marketing action (e.g. customer address no longer exists) to incorrect strategic decisions (e.g. process optimisation due to incorrect data).

Data hygiene when sending e-mails and in documents

Sending e-mails is still the most common way in which electronic documents are transported in companies. In addition to the data that is visibly sent in e-mails, these messages also contain metadata and hidden information. If a document attachment is added to the e-mail, further metadata is added in addition to the content. The so-called metadata include, for example, title, author or subject, but also – in edited documents – version management, change tracking, comment function, software used, creation, modification and printing dates. In addition, file paths, printer paths, e-mail addresses or user names can also be found there in metadata – without your knowledge. Basically, this data is practical for assigning and managing. However, the information that can be extracted from it is also suitable for manipulation, such as social engineering. If conclusions about your network structures can be drawn from your file paths, a friendly Trojan may visit you soon. Software companies offer you, among other things, functions with which you can remove “hidden files”. However, how likely is it that your employees know about this function or even use it regularly? My hypothesis: The probability is < 1%.

Data hygiene when saving data

The volume of data generated daily in companies is increasing – be it the quickly filed file on the desktop, a USB stick with sensitive information, the shared document in the cloud – and is also increasingly becoming an IT security risk.

Already in 2019, a study by the company Kaspersky revealed that around 69% of German office workers filed their documents with personally identifiable and sensitive data. Two-thirds (66%) of the respondents did not see themselves as responsible for managing and administering documents securely and consistently, but relied on the IT department or management staff. It is often still the case that companies have no or only a rudimentarily structured framework for data organisation with clear file naming. This not only has an inhibiting effect on workflow, but also increases data chaos, especially as companies grow.

“As the volume of data increases exponentially, business leaders should pay more attention to folder clutter and the potential security risk it creates,” advises Maxim Frolov of Kaspersky Lab.

Minimise errors – increase data quality and hygiene

In order to reduce errors in the above-mentioned business processes, there are basically the following possibilities:

  • ensure that incorrect data entry or storage becomes less likely: i.e. choose software that is as easy to use as possible for the majority of all employees
  • develop a binding framework as a standard for data entry, transfer and storage (ideally jointly in teams)
  • regular application training for all staff – only those who practise learn in the long term; a one-off distribution of operating manuals/on-boarding booklets can only be the first step of many.

Tips for practice

Kopano’s experts have various tips for daily hygiene practice:

“Data hygiene also means saving resources, i.e. reducing unnecessary data. Dealing with your emails in a structured way can mean using subfolders in your inbox or making it a routine to only touch each email twice at most (matrix: urgent/important/less, urgent/less important). This makes processing faster and nothing gets lost.” (Anke Pawla, Partner Manager at Kopano)

“Versions of a document sometimes become important, so choosing the right tool for data hygiene is even more important. Kopano is integrated with sync & share solutions – we use this integration too to prevent waste of data. It also makes perfect sense to integrate data from other applications (e.g. CRM, ERP) in a sensible way instead of storing it twice. And of course, one should not let the ‘deleted objects’ grow endlessly.” (Andreas Rösler, Managing Director at Kopano)

Data hygiene and digitalisation

Are you in the middle of the digitisation process and think that the previous points no longer affect you because an AI will soon be intelligently optimising your processes? Well, far from it. A study by the Massachusetts Institute of Technology (MIT) shows that the ten most frequently used AI data sets are falsified by faulty test data.

“These are datasets used for automated recognition of visual, linguistic and auditory cues. These errors distort the idea of how advanced artificial intelligence technology actually is, science journalist Karen Hao writes in an article for Technology Review.”

(Source: newsletter join-ada.com ; https://www.technologyreview.com/2021/04/01/)

The above data sets form the basis for AI research and are therefore particularly sobering. For the study, the scientists used AI themselves and checked the data sets for accuracy using machine learning. As soon as the AI recognition deviated from the original data set, it was checked by humans. The corrected data sets were ultimately better than “far more complex data models used by Google, for example, which are considered a general indicator of the state of technology”.

The study shows how important data hygiene or clean data sets are. Without effective data hygiene, errors write themselves into algorithms for AI applications and not only cause wrong decisions, but can also lead to discriminatory results. This in turn can compromise digital sovereignty. But that is another topic for a future blog post.