toss
I am used to R for data analysis, and I feel that pandas is a bit anti-human to use. When processing data with pandas in python today, the two data frames simply cannot be merged.
I have two data frames, the column names are unknown, I can only know the index, and which two indexes are used as the primary key merge. (Don’t ask me why the listing is unknown, because I am a development tool).
The idea is to find the primary key column, rename it, and merge it.
df1.columns.values[args.marker1-1]="markerID"
df2.columns.values[args.marker2-1]="markerID"
pd.merge(df1,df2,on='markerID')
But the result is that it cannot be merged anyway. Report an error KeyError:'markerID'
.
I checked it online and said that columns.values
assignments cannot be used directly , and they must be separated. then:
colnames_df1 = df1.columns
colnames_df1[args.marker1-1] = "markerID"
df1.columns = colnames_df1
colnames_df2 = df2.columns
colnames_df2[args.marker1-1] = "markerID"
df2.columns = colnames_df2
pd.merge(df1,df2,on='markerID')
There is actually no difference.TypeError: Index does not support mutable operations
Print out all the column names of the data frame. It seems normal. Why can’t it be merged?
Use a very rough method:
pd.merge(df1,df2,left_index=True,right_index=True)
For most of the data, but some data is not, and no error is reported, the result is wrong, there is a great risk.
Solution
Finally, I checked on the Internet and finally found the answer.
Reason:
The primary key names of the two data frames look the same, but may actually be different because they may contain spaces. how come? No solution.
The solution is to remove any spaces that may exist in the column names.
method one:
df1.columns = df1.columns.str.strip()
df2.columns=df2.columns.str.strip()
Method 2:
Remove when the data is read in.
pd.read_csv(file,sep='\s*,\s*')
# delimiter includes x*whitespace before and after
The data corresponding to me is:
df1.columns = df1.columns.str.strip()
df2.columns = df2.columns.str.strip()
mrkid = df1.columns.values[args.marker-1]
df1.columns.values[args.marker-1]="markerID"
mergesnp = pd.merge(df1,df2,on='markerID')
mergesnp.columns.values[args.marker-1]=mrkid
Similar Posts:
- Python Pandas: Read_Excel() and to_Excel() function
- [Solved] MySQL ERROR 1062 (23000): Duplicate entry ‘0’ for key ‘PRIMARY’
- used in key specification without a key length
- Treatment of MySQL database keyword as column name by mybatisplus — sqlsyntax errorexception: you have an error in your SQL syntax;
- Problem solving: error in reading CSV file by Panda: typeerror: invalid type comparison
- mysql Column count doesn’t match value count at row 1
- How to Solve R Error: Can’t bind data because some arguments have the same name (The R Programming Language)
- [Solved] MySQL uses the workbench tool, and the table status is read only
- DataRow Write Data Error: column 0 or column 1 cannot be found
- MySQL database insert into statement with parameters Error