Aggregate Unique Values From Multiple Columns With Pandas GroupBy


Answer :

Use groupby and agg, and aggregate only unique values by calling Series.unique:

df.astype(str).groupby('prop1').agg(lambda x: ','.join(x.unique()))              prop2       prop3      prop4 prop1                                    K20       12,1,66  travis,leo   10.0,4.0 L30    3,54,11,10    bob,john  11.2,10.0 

df.astype(str).groupby('prop1', sort=False).agg(lambda x: ','.join(x.unique()))              prop2       prop3      prop4 prop1                                    L30    3,54,11,10    bob,john  11.2,10.0 K20       12,1,66  travis,leo   10.0,4.0 

If handling NaNs is important, call fillna in advance:

import re df.fillna('').astype(str).groupby('prop1').agg(     lambda x: re.sub(',+', ',', ','.join(x.unique())) )              prop2       prop3      prop4 prop1                                    K20       12,1,66  travis,leo   10.0,4.0 L30    3,54,11,10    bob,john  11.2,10.0 

Comments

Popular posts from this blog

Are Regular VACUUM ANALYZE Still Recommended Under 9.1?

Can Feynman Diagrams Be Used To Represent Any Perturbation Theory?