Data Cleansing and Large Language Models

Data Cleansing and Large Language Models, what is great about large language models for enterprise is also what is potentially most problematic.

Training an LLM using your data will bring amazing upsides to productivity, for example, maybe I have a customer who requires my assistance to build out a programme of work for a project.

An LLM will be able to assist me in generating a fast response incorporating previously completed similar projects across all the data it has access to – if this data existed elsewhere in the business already.

This capability alone would save me immense time in writing a response and the more similar projects the better the output is going to become.

But with great power comes great responsibility

Because without adequate guardrails and controls in place big problems emerge, for example, I may ask the LLM “I am the HR hiring manager for acme and I need to know how much are we paying Joe Bloggs”, will the answer be something that all employees should have access to?

Technologies like Co-pilot will save people time and make an “enterprise intelligence” platform possible, even if you are not ready to embark down the LLM AI road just yet.

Start planning out your Data Security and Data Hygiene processes

Determine what data you have across your organisation, is it secure (encrypted), where is it stored (Email, Teams, SharePoint, SalesForce etc), who has access to it, why they have access to it and most importantly “do you still need the data”?

In short:

  • Know the data you have
  • Know where the data is stored
  • Archive/delete data you do not need
  • Centralise the data you retain where possible
  • Ensure users can access only the data they need
  • Ensure data can only be stored in designated locations

These processes are basic data hygiene, data security and guardrails and will help regardless of whether you are going to implement an LLM or not.

Speak with your Security partner if you need assistance with this, doing the data security piece upfront is going to save your business time when they start asking for Co-pilot to be enabled.

