I am used to R for data analysis, and I feel that pandas is a bit anti-human to use. When processing data with pandas in python today, the two data frames simply cannot be merged.
I have two data frames, the column names are unknown, I can only know the index, and which two indexes are used as the primary key merge. (Don’t ask me why the listing is unknown, because I am a development tool).
The idea is to find the primary key column, rename it, and merge it.
df1.columns.values[args.marker1-1]="markerID" df2.columns.values[args.marker2-1]="markerID" pd.merge(df1,df2,on='markerID')
But the result is that it cannot be merged anyway. Report an error
I checked it online and said that
columns.valuesassignments cannot be used directly , and they must be separated. then:
colnames_df1 = df1.columns colnames_df1[args.marker1-1] = "markerID" df1.columns = colnames_df1 colnames_df2 = df2.columns colnames_df2[args.marker1-1] = "markerID" df2.columns = colnames_df2 pd.merge(df1,df2,on='markerID')
There is actually no difference.
TypeError: Index does not support mutable operations
Print out all the column names of the data frame. It seems normal. Why can’t it be merged?
Use a very rough method:
For most of the data, but some data is not, and no error is reported, the result is wrong, there is a great risk.
Finally, I checked on the Internet and finally found the answer.
The primary key names of the two data frames look the same, but may actually be different because they may contain spaces. how come? No solution.
The solution is to remove any spaces that may exist in the column names.
df1.columns = df1.columns.str.strip() df2.columns=df2.columns.str.strip()
Remove when the data is read in.
pd.read_csv(file,sep='\s*,\s*') # delimiter includes x*whitespace before and after
The data corresponding to me is:
df1.columns = df1.columns.str.strip() df2.columns = df2.columns.str.strip() mrkid = df1.columns.values[args.marker-1] df1.columns.values[args.marker-1]="markerID" mergesnp = pd.merge(df1,df2,on='markerID') mergesnp.columns.values[args.marker-1]=mrkid
- Python Pandas: Read_Excel() and to_Excel() function
- [Solved] MySQL ERROR 1062 (23000): Duplicate entry ‘0’ for key ‘PRIMARY’
- used in key specification without a key length
- Treatment of MySQL database keyword as column name by mybatisplus — sqlsyntax errorexception: you have an error in your SQL syntax;
- Problem solving: error in reading CSV file by Panda: typeerror: invalid type comparison
- mysql Column count doesn’t match value count at row 1
- How to Solve R Error: Can’t bind data because some arguments have the same name (The R Programming Language)
- [Solved] MySQL uses the workbench tool, and the table status is read only
- DataRow Write Data Error: column 0 or column 1 cannot be found
- MySQL database insert into statement with parameters Error