Onsite Bureau Services

These specialized services are required for the carrying out any data cleansing, data standardization, deduplication, clustering towards forming a unique customer master repository and in some cases are prerequisites for the deployment of the solution. These services can be provided onsite or offsite based on the mechanism of
sharing of the data.


Methodology & Approach

Base data formation

For this purpose, the data from all source systems will be considered. The client would share the structure in which the data will be shared, Source system Information, Fields that would define the Customer Unique Id, data count of each source system. Data will be provided from the above source systems in a mutually agreed format for One Time Dedupe.
Note: The Uniqueness defined through the above field mentioned should be strictly adhered to.

Data profiling, Cleaning & Standardisation

Data profiling will be done and the statistics shared during implementation. Basis of profiling, cleaning and standardization, required for the purpose of deduplication will be carried out as necessary duly obtaining the necessary approvals of the client during the course of implementation.

Deduplication & Clustering

Post cleansing & Standardisation, deduplication and clustering are carried out. The rules for deduplication and clustering will be decided during the course of implementation, considering the data quality, reliability of parameters available etc. Clustering is done to form the Most Probable Clusters(MPC) and Less Probable Clusters(LPC).
A Cluster is a set of records which are identified as pertaining to same id(entity). The id(entity) could be an individual or a Company. All such records identified as belonging to a single entity are tagged together with a common ClusterID.

The Clustering is based on graph theory. Because of Linkage, records could be part of same cluster, even if they are not directly matched.
With an objective to target maximum Recall & Precision, dual clustering is adopted. SetMatch makes provision for two levels of clustering. One clustering based on very stringent rules and other based on liberal set of rules. The former is referred to as More Probable Cluster, MPC (formed from confident matches) and later Less Probable Cluster, LPC (formed from less confident matches). MPC targets highest Precision and LPC highest Recall. Every record of customer table will be assigned both these clusterIDs. MPC may be utilized for all business requirements which are formed automatically. LPC may be reviewed manually at any point of time as desired.

Manual Resolution & freezing on the clusters

An iterative process is followed to arrive at the MPC & LPC. Once they are frozen, the clusters are manually eyeballed, to reduce the gap between the LPC & MPC. Thus only (LPC-MPC) records are required to be verified.
Note: Manual resolution has to be carried out by the client.

Address Enrichment

Accurate postal address information about an customer (Individual or corporate) is very critical for business as it is only information through which official correspondence is made with the customer. The correspondence could be as important as verification for proof of existence in that address, to, mailing offers to the customer about new product launches etc. Though it’s such an important information, it is often seen that most of the time there are errors and variations in its recording and poses a real world problem. As with Name field, there could be spelling errors, abbreviation of common names, and improper postal code information. These errors lead to poor data quality and associated costs both direct and indirect to company are very high.

Address enrichment is process of correcting the information and validating the address by comparing with a standard reference data, mostly the postal dictionary. In advanced countries such as UK, USA, the information can be validated upto the door number within the address, whereas in countries like India, it is still a challenge. With no standard representation of address and reliance on landmarks, this is a very complex task, particularly address in rural areas.

Posidex has built set of process to make this task simpler and can help the organizations to enrich the address by comparing it with standard postal dictionary. It can match the information in the address with the reference in the dictionary inspite of the variations. It can help suggest postal codes wherever it is missing and validate the address against the postal code provided.