COMPANY NAME MATCHER

One of the challenges with querying company names from different databases is the way entity names are written. For example, company “ABC” is shown as “ABC” in the Bloomberg system, but “AB C” in another data source. We need to make sure they actually refer to the same company.

I have decided to tackle this problem in two stages.

STAGE 1: ACCURATE REGISTERED BUSINESS ENTITY NAMES

A quick search of "SINGTEL" revealed more than 30 different business entities registered in Singapore -- with some devastatingly similar names like "SINGTEL AUSTRALIA HOLDING PTE LTD" and "SINGTEL AUSTRALIA INVESTMENT LTD.". A logical first step is to access the Accounting and Corporate Regulatory Authority's(ACRA) API to cross-check the names in our dataset. We can confirm with a high degree of accuracy if there are perfect matches. We will access the API via data.gov.sg.

STAGE 2: INTERNAL MATCHING

For non-perfect matches, we will then create a function to check similarity scores within the data set. We will be using Levenshtein distance as our metric for measuring the difference between company names. Levenshtein distance between two words can be explained as the the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

CONCLUSION/LIMITATIONS

Spelling-Errors? - We got lucky that our test examples did not have severe typos. As an improvement, we could possibly add in a third stage to account for some non-matches. One of the first steps in stage 3 could be searching ACRA, but having a less stringent matching criteria. For all remaining non-matches, we might need to end up manually sorting the data.

Singapore-only - One of the limitations of our methods here is that ACRA only accounts for Singapore-registered companies. To improve our approaches, we might want to consider a more global database of companies.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
code		code
data		data
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMPANY NAME MATCHER

STAGE 1: ACCURATE REGISTERED BUSINESS ENTITY NAMES

STAGE 2: INTERNAL MATCHING

DIRECTORY

CONCLUSION/LIMITATIONS

About

Releases

Packages

Languages

License

hesamuel/coy_name_matcher

Folders and files

Latest commit

History

Repository files navigation

COMPANY NAME MATCHER

STAGE 1: ACCURATE REGISTERED BUSINESS ENTITY NAMES

STAGE 2: INTERNAL MATCHING

DIRECTORY

CONCLUSION/LIMITATIONS

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages