Paul Suganthan |
Paul Suganthan I am a software
engineer at Google
Research since March 2018, working as a part of the Google Brain team to
solve problems at the intersection of data management and machine learning
(ML). Specifically, I am one of the core contributors of TensorFlow Data Validation, which is an open-source
library that helps developers understand, validate, and monitor their ML data
at scale. Prior to joining
Google, I received my Ph.D. in Computer Sciences at University of
Wisconsin-Madison. Before coming to the United States, I obtained my
Bachelors degree in Computer Science from College of Engineering Guindy, Anna University, India. You can find my
C.V. here. Publications Below
is a list of my publications. · TensorFlow Data Validation: Data Analysis and Validation in
Continuous ML Pipelines, E. Caveness, Paul Suganthan G. C., Z. Peng, N. Polyzotis,
S. Roy, M. Zinkevich. SIGMOD-20 Demo Track (To
Appear). · Entity
Matching Meets Data Science: A Progress Report from the Magellan Project,
Y. Govind, P. Konda, Paul
Suganthan G. C., P. Martinkus,
P. Nagarajan, A. Soundararajan,
H. Li, S. Mudgal, J. Ballard, H. Zhang, A. Ardalan, S. Das, D. Paulsen, A. Singh Saini, E. Paulson,
Y. Park, M. Carter, M. Sun, G. Fung, A. Doan. SIGMOD-19 Industrial Track. · Smurf:
Self-Service String Matching Using Random Forests, Paul Suganthan G.C., Adel Ardalan, AnHai Doan, Aditya Akella.
VLDB-19 (To Appear). · CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity
Matching, Y. Govind, E. Paulson, P. Nagarajan, P. Suganthan G.C.,
A. Doan, Y. Park, G. Fung, D. Conathan, M. Carter,
M. Sun. VLDB-18 demo. · Magellan:
Toward Building Entity Matching Management Systems, P. Konda, S. Das, P. Suganthan
G.C., P. Martinkus, A. Doan, A. Ardalan,
J. R. Ballard, Y. Govind, H. Li, F. Panahi, H. Zhang, J. Naughton,
S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. SIGMOD Research Highlights 2017. · Toward a
System Building Agenda for Data Integration (and Data Science), A. Doan,
P. Konda, P. Suganthan
G.C., A. Ardalan, J. Ballard, S. Das, Y. Govind, H. Li, P. Martinkus, S.
Mudgal, E. Paulson, H. Zhang. IEEE Data Engineering
Bulletin, Special Issue on Large-Scale Data Integration, 2018. · MatchCatcher: A Debugger for Blocking in Entity Matching,
H. Li, P. Konda, Paul Suganthan
G.C., A. Doan, B. Snyder, Y. Park, G. Krishnan, R. Deep, V. Raghavendra. EDBT-18. · Falcon:
Scaling Up Hands-Off Crowdsourced Entity Matching
to Build Cloud Services, S. Das, P. Suganthan
G.C., A. Doan, J. Naughton, G. Krishnan, R. Deep,
E. Arcaute, V. Raghavendra,
Y. Park. SIGMOD-17. · Human-in-the-Loop
Challenges for Entity Matching: A Midterm Report, A. Doan, A. Ardalan, J. Ballard, S. Das, Y. Govind,
P. Konda, H. Li, S. Mudgal,
E. Paulson, P. Suganthan G.C., H. Zhang. HILDA
Workshop @ SIGMOD-17. · CloudMatcher: A Cloud/Crowd Service for Entity Matching,
Y. Govind, E. Paulson, M. Ashok, P. Suganthan G.C., A. Hitawala, A.
Doan, Y. Park, P. Peissig, E. LaRose,
J. Badger. BIGDAS Workshop @ KDD-17. · Magellan:
Toward Building Entity Matching Management Systems, P. Konda, S. Das, P. Suganthan
G.C., A. Doan, A. Ardalan, J. R. Ballard, H. Li, F.
Panahi, H. Zhang, J. Naughton,
S. Prasad, G. Krishnan, R. Deep, V. Raghavendra.
VLDB-16. · Magellan:
Toward Building Entity Matching Management Systems over Data Science Stacks,
P. Konda, S. Das, P. Suganthan
G.C., A. Doan, A. Ardalan, J. R. Ballard, H. Li, F.
Panahi, H. Zhang, J. Naughton,
S. Prasad, G. Krishnan, R. Deep, V. Raghavendra.
VLDB-16, demo. · Why
Big Data Industrial Systems Need Rules and What We Can Do About It, Paul Suganthan G.C., C. Sun, Krishna Gayatri
K., H. Zhang, F. Yang, N. Ram, S. Prasad, E. Arcaute,
G. Krishnan, R. Deep, V. Raghavendra, A. Doan.
SIGMOD (Industrial Track) 2015. · Social
Media Analytics: the Kosmix Story, with many
authors. IEEE Data Engineering Bulletin, Sept 2013. · AJAX
Crawler, Paul Suganthan G.C. IEEE ICDSE 2012. Professional Activities
I have been the
main developer of two Python packages providing tools for scalable string
matching (py_stringmatching and py_stringsimjoin).
I have been managing the end-to-end development and release process of these
packages. The packages are
currently being used at multiple organizations (such as RIT, Johnson
Controls, Marshfield Clinic etc.) and in data science classes at UW-Madison.
The packages are currently available in PyPI and Conda. Feel free to ping me in case you face any issues
with the packages. |