Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

danielleplex_185261 avatar image
danielleplex_185261 asked ·

Fuzzy Graphs (for addresses)

Hi ,

I have a situation where I don't know when two entities are related before hand as they are "fuzzy" , for example 123 Main Street and 123 Main St for an address.


Is there any recommended approach to handle this?


At the moment , I am considering to use a direct connection to Solr and perform a custom scoring afterwards but for a big list of addresses that sounds very slow.

graph
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered ·

@danielleplex_185261 I would've gone with Solr (DSE Search) as well since it seems to be a good fit for that use case.

EDIT: After discussing this internally, the recommendation is to clean the addresses before loading it to DSE Graph if that is at all possible using tools such as https://github.com/openvenues/libpostal (open-source). Even with Solr scoring, it would be difficult to differentiate say "124 Main St" from "123 Main St" because they're only 1 character away from each other as an example. Cheers!

P.S. Credit to Kelly Mondor for the idea. :)

3 comments Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Cleaning the data before data load is a great idea. Even with the fuzzy searching capabilities of Solr it will make your life much easier when determining search parameters.

1 Like 1 · ·
danielleplex_185261 avatar image danielleplex_185261 David Jones-Gilardi ♦ ·

It is easier said that done , but I guess that's the only option

0 Likes 0 · ·

That's understandable and we're very sympathetic.

But if you're application is able to do validation/verification at input time, your data will be more useful to you. Cheers!

0 Likes 0 · ·