Fuzzy Graphs (for addresses)

Hi ,

I have a situation where I don't know when two entities are related before hand as they are "fuzzy" , for example 123 Main Street and 123 Main St for an address.

Is there any recommended approach to handle this?

At the moment , I am considering to use a direct connection to Solr and perform a custom scoring afterwards but for a big list of addresses that sounds very slow.

1 Answer

@danielleplex_185261 I would've gone with Solr (DSE Search) as well since it seems to be a good fit for that use case.

EDIT: After discussing this internally, the recommendation is to clean the addresses before loading it to DSE Graph if that is at all possible using tools such as (open-source). Even with Solr scoring, it would be difficult to differentiate say "124 Main St" from "123 Main St" because they're only 1 character away from each other as an example. Cheers!

P.S. Credit to Kelly Mondor for the idea. :)

David Jones-Gilardi commented

Cleaning the data before data load is a great idea. Even with the fuzzy searching capabilities of Solr it will make your life much easier when determining search parameters.

danielleplex_185261 commented

It is easier said that done , but I guess that's the only option

Erick Ramirez commented

That's understandable and we're very sympathetic.

But if you're application is able to do validation/verification at input time, your data will be more useful to you. Cheers!

