Paul Suganthan
paulgc@google.com
1600 Amphitheatre Pkwy
Mountain View, CA 94085

 

 

 

 


 

Paul Suganthan

 

I am a software engineer at Google Research since March 2018, working as a part of the Google Brain team to solve problems at the intersection of data management and machine learning (ML). Specifically, I am one of the core contributors of TensorFlow Data Validation, which is an open-source library that helps developers understand, validate, and monitor their ML data at scale.

 

Prior to joining Google, I received my Ph.D. in Computer Sciences at University of Wisconsin-Madison. Before coming to the United States, I obtained my Bachelors degree in Computer Science from College of Engineering Guindy, Anna University, India.

 

You can find my C.V. here. 

 


 

Publications

 

Below is a list of my publications.

 

·      TensorFlow Data Validation: Data Analysis and Validation in Continuous ML Pipelines, E. Caveness, Paul Suganthan G. C., Z. Peng, N. Polyzotis, S. Roy, M. Zinkevich. SIGMOD-20 Demo Track (To Appear).

 

·      Entity Matching Meets Data Science: A Progress Report from the Magellan Project, Y. Govind, P. Konda, Paul Suganthan G. C., P. Martinkus, P. Nagarajan, A. Soundararajan, H. Li, S. Mudgal, J. Ballard, H. Zhang, A. Ardalan, S. Das, D. Paulsen, A. Singh Saini, E. Paulson, Y. Park, M. Carter, M. Sun, G. Fung, A. Doan. SIGMOD-19 Industrial Track.

 

·      Smurf: Self-Service String Matching Using Random Forests, Paul Suganthan G.C., Adel Ardalan, AnHai Doan, Aditya Akella. VLDB-19 (To Appear).

 

·      CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching, Y. Govind, E. Paulson, P. Nagarajan, P. Suganthan G.C., A. Doan, Y. Park, G. Fung, D. Conathan, M. Carter, M. Sun. VLDB-18 demo.

 

·      Magellan: Toward Building Entity Matching Management Systems, P. Konda, S. Das, P. Suganthan G.C., P. Martinkus, A. Doan, A. Ardalan, J. R. Ballard, Y. Govind, H. Li, F. Panahi, H. Zhang, J. Naughton, S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. SIGMOD Research Highlights 2017.

 

·      Toward a System Building Agenda for Data Integration (and Data Science), A. Doan, P. Konda, P. Suganthan G.C., A. Ardalan, J. Ballard, S. Das, Y. Govind, H. Li, P. Martinkus, S. Mudgal, E. Paulson, H. Zhang. IEEE Data Engineering Bulletin, Special Issue on Large-Scale Data Integration, 2018.

 

·      MatchCatcher: A Debugger for Blocking in Entity Matching, H. Li, P. Konda, Paul Suganthan G.C., A. Doan, B. Snyder, Y. Park, G. Krishnan, R. Deep, V. Raghavendra. EDBT-18.

 

·      Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services, S. Das, P. Suganthan G.C., A. Doan, J. Naughton, G. Krishnan, R. Deep, E. Arcaute, V. Raghavendra, Y. Park. SIGMOD-17.

 

·      Human-in-the-Loop Challenges for Entity Matching: A Midterm Report, A. Doan, A. Ardalan, J. Ballard, S. Das, Y. Govind, P. Konda, H. Li, S. Mudgal, E. Paulson, P. Suganthan G.C., H. Zhang. HILDA Workshop @ SIGMOD-17.

 

·      CloudMatcher: A Cloud/Crowd Service for Entity Matching, Y. Govind, E. Paulson, M. Ashok, P. Suganthan G.C., A. Hitawala, A. Doan, Y. Park, P. Peissig, E. LaRose, J. Badger. BIGDAS Workshop @ KDD-17.

 

·      Magellan: Toward Building Entity Matching Management Systems, P. Konda, S. Das, P. Suganthan G.C., A. Doan, A. Ardalan, J. R. Ballard, H. Li, F. Panahi, H. Zhang, J. Naughton, S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. VLDB-16.

 

·      Magellan: Toward Building Entity Matching Management Systems over Data Science Stacks, P. Konda, S. Das, P. Suganthan G.C., A. Doan, A. Ardalan, J. R. Ballard, H. Li, F. Panahi, H. Zhang, J. Naughton, S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. VLDB-16, demo.

 

·      Why Big Data Industrial Systems Need Rules and What We Can Do About It, Paul Suganthan G.C., C. Sun, Krishna Gayatri K., H. Zhang, F. Yang, N. Ram, S. Prasad, E. Arcaute, G. Krishnan, R. Deep, V. Raghavendra, A. Doan. SIGMOD (Industrial Track) 2015.

 

·      Social Media Analytics: the Kosmix Story, with many authors. IEEE Data Engineering Bulletin, Sept 2013.

 

·      AJAX Crawler, Paul Suganthan G.C. IEEE ICDSE 2012.

 


 

Professional Activities

  • PC Member: IEEE BigData 2018/19, WSDM 2019/20, SDM 2019, WWW 2019/20, SIGMOD 2020.
  • Reviewer: IEEE TKDE, ACM JDIQ, VLDB Journal.
  • External Reviewer, SIGMOD 2018.

 


 

Open Source Contributions

I have been the main developer of two Python packages providing tools for scalable string matching (py_stringmatching and py_stringsimjoin). I have been managing the end-to-end development and release process of these packages.

 

 

The packages are currently being used at multiple organizations (such as RIT, Johnson Controls, Marshfield Clinic etc.) and in data science classes at UW-Madison. The packages are currently available in PyPI and Conda. Feel free to ping me in case you face any issues with the packages.