[pydata] columns overlap but no suffix specified

Discussion:

Soren

2015-09-27 19:47:36 UTC

Hi,

when I want to join two tables on a specific column pandas by default asks
for a suffices.
However, the column that is used for the join must have the same values.
Including both columns in the output therefore is redundant.
You would always join the table and throw away one of the columns,
wouldn't you?

Can I change this behavior somehow, so that the column appears without
suffix and just one time instead of two?
That would be much more convenient in my opinion.
regards
SÃ¶ren

--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Hobson

2015-09-28 02:34:41 UTC

Permalink

The columns on which the join is performed should not be duplicated. Could
you post an exampled demonstrating this behavior?

Post by Soren
Hi,
when I want to join two tables on a specific column pandas by default asks
for a suffices.
However, the column that is used for the join must have the same values.
Including both columns in the output therefore is redundant.
You would always join the table and throw away one of the columns,
wouldn't you?
Can I change this behavior somehow, so that the column appears without
suffix and just one time instead of two?
That would be much more convenient in my opinion.
regards
SÃ¶ren
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Soren

2015-09-27 20:30:07 UTC

Permalink

Where is the code that I posted??

Paul Hobson

2015-09-28 20:42:17 UTC

Permalink

So there's the thing:

df.join -- use this when joining on an (subset of an) index
df.merge -- use this when joining on a subset of the columns

So using merge on your data as-is:
print(left.merge(right, on=['key']))

A key C
0 A0 K0 C1
1 A1 K1 C0
2 A2 K2 C2
3 A3 K3 C3

ll = left.set_index('key')
rr = right.set_index('key')
print(ll.join(rr))

A C
key
K0 A0 C1
K1 A1 C0
K2 A2 C2
K3 A3 C3

Post by Soren
Where is the code that I posted??
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Soren

2015-09-29 01:48:46 UTC

Permalink

Would it be difficult to change the behavior of join in a way that it does
fill up the NaN's?
I mean, is there a meaningful reason for why join is behaving like that?

E.g. performance?

Paul Hobson

2015-09-29 15:18:55 UTC

Permalink

I don't know the answers to your question other than: "join" is for
indices, "merge" is for columns.

Post by Soren
Would it be difficult to change the behavior of join in a way that it does
fill up the NaN's?
I mean, is there a meaningful reason for why join is behaving like that?
E.g. performance?
--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.

Soren

2015-09-28 15:54:40 UTC

Permalink

import pandas as pd
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'key': ['K0', 'K1', 'K2', 'K3']})
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'key': ['K1', 'K0', 'K2', 'K3']})
print left.join(right,on='key',rsuffix='_r',how='right')

print left.join(right,on='key')
ValueError: columns overlap but no suffix specified: Index([u'key'],
dtype='object')

print left.join(right,on='key',rsuffix='_r',how='left')
A key C key_r
0 A0 K0 NaN NaN
1 A1 K1 NaN NaN
2 A2 K2 NaN NaN
3 A3 K3 NaN NaN