Discussion:
[pydata] columns overlap but no suffix specified
Soren
2015-09-27 19:47:36 UTC
Permalink
Hi,

when I want to join two tables on a specific column pandas by default asks
for a suffices.
However, the column that is used for the join must have the same values.
Including both columns in the output therefore is redundant.
You would always join the table and throw away one of the columns,
wouldn't you?

Can I change this behavior somehow, so that the column appears without
suffix and just one time instead of two?
That would be much more convenient in my opinion.
regards
Sören
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Paul Hobson
2015-09-28 02:34:41 UTC
Permalink
The columns on which the join is performed should not be duplicated. Could
you post an exampled demonstrating this behavior?
Post by Soren
Hi,
when I want to join two tables on a specific column pandas by default asks
for a suffices.
However, the column that is used for the join must have the same values.
Including both columns in the output therefore is redundant.
You would always join the table and throw away one of the columns,
wouldn't you?
Can I change this behavior somehow, so that the column appears without
suffix and just one time instead of two?
That would be much more convenient in my opinion.
regards
Sören
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Soren
2015-09-27 20:30:07 UTC
Permalink
Where is the code that I posted??
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Paul Hobson
2015-09-28 20:42:17 UTC
Permalink
So there's the thing:

df.join -- use this when joining on an (subset of an) index
df.merge -- use this when joining on a subset of the columns

So using merge on your data as-is:
print(left.merge(right, on=['key']))

A key C
0 A0 K0 C1
1 A1 K1 C0
2 A2 K2 C2
3 A3 K3 C3


ll = left.set_index('key')
rr = right.set_index('key')
print(ll.join(rr))

A C
key
K0 A0 C1
K1 A1 C0
K2 A2 C2
K3 A3 C3
Post by Soren
Where is the code that I posted??
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Soren
2015-09-29 01:48:46 UTC
Permalink
Would it be difficult to change the behavior of join in a way that it does
fill up the NaN's?
I mean, is there a meaningful reason for why join is behaving like that?

E.g. performance?
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Paul Hobson
2015-09-29 15:18:55 UTC
Permalink
I don't know the answers to your question other than: "join" is for
indices, "merge" is for columns.
Post by Soren
Would it be difficult to change the behavior of join in a way that it does
fill up the NaN's?
I mean, is there a meaningful reason for why join is behaving like that?
E.g. performance?
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Soren
2015-09-28 15:54:40 UTC
Permalink
import pandas as pd
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'key': ['K0', 'K1', 'K2', 'K3']})
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'key': ['K1', 'K0', 'K2', 'K3']})
print left.join(right,on='key',rsuffix='_r',how='right')

print left.join(right,on='key')
ValueError: columns overlap but no suffix specified: Index([u'key'],
dtype='object')

print left.join(right,on='key',rsuffix='_r',how='left')
A key C key_r
0 A0 K0 NaN NaN
1 A1 K1 NaN NaN
2 A2 K2 NaN NaN
3 A3 K3 NaN NaN
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...