Step 1 - Input profiling and validations
Open source ETL software Talend Data Integration
offers a powerful alternative to excel for running sales
compensation management systems. Featuring a simple drag and drop interface and flexibility to build complex logic using Java
Talend is not only simple to learn and easy to use but also beats many other tools when it comes to processing performance.
The first step is to load various inputs into the system. Loading data in Talend is as simple as it can get. Just drop the
component which corresponds to the type of the data source, configure that component and Talend will do the rest. Talend imports
data directly from databases and from a wide variety of file formats such as Excel, CSV, EBCDIC, XML,
Text Delimited, HL7, JSON, etc. It also integrates with popular third-party applications such as salesforce.com,
Marketo and SugarCRM.
Once data has been imported, it is most important to ensure the data integrity of inputs. This is where the tremendous
potential of Talend can be realized. Checks, as simple as ensuring that the Employee ID is always an 8 digit number to as
complex as isolating cases in which sales of an employee exceed 10 times of his previous months sales can be built easily in
the Talend process flow. Various checks can be designed and automated as part of your regular sales compensation process. Some
of these are highlighted below so that you can start using such checks more effectively.
Referential integrity/Check for missing data
First, define a master list or identify master file for each of the key dimensions such as Roster for Employee_ID,
Hierarchy for Geography_ID etc. Loaded inputs during every run are then compared against these master files to identify any
missing or extra data in the input files. For example, If the sales data contains no record of sales from territory x,
there kickout reports would indicate that perhaps data for territory x has not been received (see fig 1). Ideally,
a master file for each field must be defined at the start of the process to ensure data completeness and correctness.
Similarly, there also are checks of missing data among different types of files. For example,
if the roster contains a record of employee 'Adam' and the sales data doesn’t, this would be feature in a validation report.
Such checks have to be built once and then run every month automatically.
Fig 1. Checking if a particular territory is not present in the sales data.
Technical details: Sales data is grouped by region using tAggregateRow_3 component and this list is matched with the
existing Territories list in tMap_3. Inside the tMap_3, an inner join is done and the matches are flagged.
Correct format and structure of data
You can predefine the schema (format) of all your input files and use Talend to generate errors where the file does not follow
the pre-defined format. For example, you might define that the Employee ID will always be a 6-digit number or that sales date
would always lie in the month for which data is being processed. Such checks help in detecting manual data entry errors. These
checks are extremely simple to build as they require only pre-defining the format.
Fig 2 shows a job which rejects all records which have an incorrect format and stores them in a file and processes the
remaining valid data.
Check for duplicates
Checking for duplicates is important to ensure accuracy of data. Depending on the nature of business and type of business
process we can define the duplicate check at any pre-defined combination level (for eg. One record should exist for each
Data Month-Employee ID-Role combination). Simple uniqueness checks such as validating that a single sale record exists per
OrderID can be built in Talend to either remove duplicate records or highlight such instances. Here is a simple Talend job that
Fig 3. Checking duplicate records. In the tUniqRow component we specify the combination of columns which constitute a unique record.
Checks can be built to ensure that number of records loaded matches with the number of records actually processed by the
upstream system. This ensures that the data is complete and has not been corrupted during transmission.
These checks highlight specific records with changes that defy historical trends. We can define thresholds to ensure that
abnormal values are at least highlighted and are reviewed for consistency before processing them. For example,
let’s say that the average sales per month for the last 12 months is 90k. However sales data received this month drops to 30k.
There is a real possibility of this data being incomplete or some other issue with the input file or the upstream system and
detecting this in the validation stage can help prevent processing of incorrect data. Such analyses is particularly easy to
perform for Pharma companies since they receive historical data every month along with current month’s data. Row level
validations such as thresholding the revenue of a single order between $100 and $5k can also be automated. Additionally,
each of these inputs are summarized using various levels of descriptive statistics and visualizations to identify any other
anomalies in the data. A summary report with figures such as maximum revenue of any order, highest contract length,
average rep order volume, etc. can easily be generated after the data is processed and abnormalities will pop out in the summary.
Business logic checks
Checks specific to the business process can also be built in similar way as sanity checks. For example,
a check that kicks out whenever the roster file shows an employee from certain region reporting to a manager of different
region can be easily built. An analysis of the span of control for each of the managers can be built to highlight the manager
who is particularly over leveraged.
In this post we focused on Incentius’ validation framework and how validations can be designed and implemented on Talend so
as to ensure the accuracy of input data before processing final sales compensation. This approach helps increase accuracy of
sales compensation, reducing reruns saving your analysts' valuable time.
Watch out for our subsequent blog posts on this topic. They will include techniques to actually build a sales compensation
plan on Talend and build effective output validations and analysis.
If you got this far, we think you’d like our future blog content, too. Please subscribe on the right side.