Can’t bind data error

When reading a piece of data recently, an error such as the title was reported.

args[1] <- "RT_10-VS-RT_0"
all <- read.delim(paste0(args[1],".xls"),header = T,check.names = F) 
dat <- all %>% dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)

This is because the select function cannot select a data frame with repeated column names. (This error will be reported even if you do not select a duplicate column).

You can use the following script to check the duplicate column names:

#check the reapting
> tibble::enframe(names(all)) %>% count(value) %>% filter(n > 1)
# A tibble: 1 x 2
  value          n
  <chr>      <int>
1 Protein_ID     2

Found that there are two columns of Protein_ID.

How to solve it? It can be read by readr instead, and it will be analyzed intelligently.

all <- readr::read_delim(paste0(args[1],".xls"),delim = "\t") %>% 
  dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)

Parsed with column specification:
cols(
  .default = col_character(),
  No. = col_double(),
  Mass = col_double(),
  Protein_Coverage = col_double(),
  `Mean_Ratio_RT_10_118/RT_0_117` = col_double(),
  `Tremble Identity` = col_double(),
  `Tremble E-value` = col_double()
)
See spec(...) for full column specifications.
Warning: 29 parsing failures.
 row                           col expected actual                file
1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
.... ............................. ........ ...... ...................
See problems(...) for more details.

Warning message:
Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]

In the warning, there are also columns and rows that indicate that the parsing (col_double by default) failed, and the duplicate column Protein_ID is prompted. How to remove the long Parsed with column specification information, we can specify the column name resolution type when reading, or use the default parameters col_types = cols().

all <- readr::read_delim(paste0(args[1],".xls"),delim = "\t",col_types = cols()) %>% 
  dplyr::select(Protein_ID,starts_with("Ratio"),starts_with("Qvalue"),starts_with("KEGG"),Description,Protein_Sequence)  

Warning: 29 parsing failures.
 row                           col expected actual                file
1001 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1001 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
1410 Mean_Ratio_RT_10_118/RT_0_117 a double    n/a 'RT_10-VS-RT_0.xls'
1871 Tremble Identity              a double    -   'RT_10-VS-RT_0.xls'
1871 Tremble E-value               a double    -   'RT_10-VS-RT_0.xls'
.... ............................. ........ ...... ...................
See problems(...) for more details.

Warning message:
Duplicated column names deduplicated: 'Protein_ID' => 'Protein_ID_1' [14]

The warning message is still there, it is best to keep it.

DebugAH

How to Solve Your Programmer Error

Tag Archives: Can’t bind data error

How to Solve R Error: Can’t bind data because some arguments have the same name (The R Programming Language)