When reading a piece of data recently, an error such as the title was reported.
args[1] <- "RT_10-VS-RT_0"
all <- read.delim(paste0(args[1],".xls"),header = T,check.names = F)
dat <- all %>% dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)
This is because the select function cannot select a data frame with repeated column names. (This error will be reported even if you do not select a duplicate column).
You can use the following script to check the duplicate column names:
#check the reapting
> tibble::enframe(names(all)) %>% count(value) %>% filter(n > 1)
# A tibble: 1 x 2
value n
<chr> <int>
1 Protein_ID 2
Found that there are two columns of Protein_ID.
How to solve it? It can be read by readr instead, and it will be analyzed intelligently.
all <- readr::read_delim(paste0(args[1],".xls"),delim = "\t") %>%
dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)
Parsed with column specification:
cols(
.default = col_character(),
No. = col_double(),
Mass = col_double(),
Protein_Coverage = col_double(),
`Mean_Ratio_RT_10_118/RT_0_117` = col_double(),
`Tremble Identity` = col_double(),
`Tremble E-value` = col_double()
)
See spec(...) for full column specifications.
Warning: 29 parsing failures.
row col expected actual file
1001 Tremble Identity a double - 'RT_10-VS-RT_0.xls'
1001 Tremble E-value a double - 'RT_10-VS-RT_0.xls'
1410 Mean_Ratio_RT_10_118/RT_0_117 a double n/a 'RT_10-VS-RT_0.xls'
1871 Tremble Identity a double - 'RT_10-VS-RT_0.xls'
1871 Tremble E-value a double - 'RT_10-VS-RT_0.xls'
.... ............................. ........ ...... ...................
See problems(...) for more details.
Warning message:
Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]
In the warning, there are also columns and rows that indicate that the parsing (col_double by default) failed, and the duplicate column Protein_ID is prompted. How to remove the long Parsed with column specification information, we can specify the column name resolution type when reading, or use the default parameters col_types = cols()
.
all <- readr::read_delim(paste0(args[1],".xls"),delim = "\t",col_types = cols()) %>%
dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)
Warning: 29 parsing failures.
row col expected actual file
1001 Tremble Identity a double - 'RT_10-VS-RT_0.xls'
1001 Tremble E-value a double - 'RT_10-VS-RT_0.xls'
1410 Mean_Ratio_RT_10_118/RT_0_117 a double n/a 'RT_10-VS-RT_0.xls'
1871 Tremble Identity a double - 'RT_10-VS-RT_0.xls'
1871 Tremble E-value a double - 'RT_10-VS-RT_0.xls'
.... ............................. ........ ...... ...................
See problems(...) for more details.
Warning message:
Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]
The warning message is still there, it is best to keep it.
Similar Posts:
- Python Pandas: Read_Excel() and to_Excel() function
- Python: How to Read file initialization from file failed by panda
- Postgres invalid command data recovery processing
- [Solved] UnicodeDecodeError: ‘gbk’ codec can’t decode byte 0x80 in position 128: illegal multibyte sequence
- Treatment of MySQL database keyword as column name by mybatisplus — sqlsyntax errorexception: you have an error in your SQL syntax;
- [Solved] Mybatis multi-table query error: Column ‘id’ in field list is ambiguous
- Program error: the table or view does not exist [Solved]
- Python pandas merge cannot merge two data frames based on column names (Key Error)?
- SET SQL_MODE=”NO_AUTO_VALUE_ON_ZERO”
- How to Solve Error: ssh_exchange_identification:read connection reset by peer