Complex assignment in R

(Disclaimer: I’m not recommending the use of R, this question just came up in some of Nadiah’s work. I suggest Seaborn.)

If x is a dataframe in R, then colnames(x) gives you the column names:

> colnames(x)
[1] "Column1" "Column2" "Column3"

R supports “complex assignment”, so you can do this:

> colnames(x) <- c('a', 'b', 'c')
> colnames(x)
[1] "a" "b" "c"

and even sub-assignment:

> colnames(x)[1] <- 'uhh'
> colnames(x)
[1] "uhh" "b"   "c"

As a Haskell developer this looks quite strange because you normally bind to a name, not an expression, and especially not a sub-expression like colnames(x)[1].

I wondered if colnames(x) might be an object that overrides the bind somehow. That’s how I might achieve something similar in Python, although there is no __assign__ to override as far as I am aware.

Looking in the R source code we find colnames in src/library/base/R/matrix.R:

$ git grep ^colnames | grep -v tests
share/dictionaries/en_stats.txt:colnames
src/library/base/R/matrix.R:colnames <- function(x, do.NULL = TRUE, prefix = "col")
src/library/base/man/chol.Rd:colnames(x) <- letters[20:22]
src/library/base/man/colnames.Rd:colnames(x, do.NULL = TRUE, prefix = "col")
src/library/base/man/colnames.Rd:colnames(x) <- value
src/library/base/man/colnames.Rd:colnames(m2, do.NULL = FALSE)
src/library/base/man/colnames.Rd:colnames(m2) <- c("x","Y")
src/library/base/man/dimnames.Rd:colnames0 <- function(x) dimnames(x)[[2]]
src/library/base/man/isSymmetric.Rd:colnames(D3) <- c("X", "Y", "Z")
src/library/datasets/data/EuStockMarkets.R:colnames(EuStockMarkets) <- c("DAX", "SMI", "CAC", "FTSE")
src/library/grDevices/man/col2rgb.Rd:colnames(crgb) <- cc
src/library/stats/man/cor.Rd:colnames(swM) <- abbreviate(colnames(swiss), min=6)
src/library/stats/man/kmeans.Rd:colnames(x) <- c("x", "y")
src/library/stats/man/printCoefmat.Rd:colnames(cmat) <- c("Estimate", "Std.Err", "Z value", "Pr(>z)")
src/library/tools/R/sotools.R:colnames(so_symbol_names_table) <-
src/library/tools/man/CRANtools.Rd:colnames(pdb)

After the definition of colnames there is an oddly named colnames<-. At first sight this looks like a bit of a troll, creating a function with bind in its name:

colnames <- function(x, do.NULL = TRUE, prefix = "col")
{
    if(is.data.frame(x) && do.NULL)
    return(names(x))
    dn <- dimnames(x)
    if(!is.null(dn[[2L]]))
    dn[[2L]]
    else {
        nc <- NCOL(x)
    if(do.NULL) NULL
        else if(nc > 0L) paste0(prefix, seq_len(nc))
        else character()
    }
}

`colnames<-` <- function(x, value)
{
    if(is.data.frame(x)) {
        names(x) <- value
    } else {
        dn <- dimnames(x)
        if(is.null(dn)) {
            if(is.null(value)) return(x)
            if((nd <- length(dim(x))) < 2L)
                stop("attempt to set 'colnames' on an object with less than two dimensions")
            dn <- vector("list", nd)
        }
        if(length(dn) < 2L)
            stop("attempt to set 'colnames' on an object with less than two dimensions")
        if(is.null(value)) dn[2L] <- list(NULL) else dn[[2L]] <- value
        dimnames(x) <- dn
    }
    x
}

When we see

foo(x) <- y

it can be expanded as

`foo<-`(x, y)

It turns out there are three functions involved in evaluating an expression like

colnames(x)[1] <- 'uhh'

Apart from colnames and colnames<-, there is also a complex assignment for list-like things called [<-.

Let’s make our own versions of each and add some debug output:

`colnames2<-` <- function(x, value)
{
    cat("colnames2<- ::: x\n")
    print(x)
    cat("\n")
    cat("colnames2<- ::: value\n")
    print(value)
    cat("\n")

    names(x) <- value
    x
}

colnames2 <- function(x)
{
    cat("colnames2 ::: x\n")
    print(x)
    cat("\n")

    return(colnames(x))
}

`[<-` <- function(x, idx, value)
{
    cat("square-bracket bind ::: x\n")
    print(x)
    cat("\n")
    cat("square-bracket bind ::: idx\n")
    print(idx)
    cat("\n")
    cat("square-bracket bind ::: value\n")
    print(value)
    cat("\n")

    x[[idx]] <- value
    x
}

Here is a test script:

df <- data.frame(Column1=character(),
                 Column2=character(),
                 Column3=character(),
                 stringsAsFactors=FALSE)

colnames2(df)[1] <- c('uhh')

And this is the output:

$ Rscript stupid.R 
colnames2 ::: x
[1] Column1 Column2 Column3
<0 rows> (or 0-length row.names)

square-bracket bind ::: x
[1] "Column1" "Column2" "Column3"

square-bracket bind ::: idx
[1] 1

square-bracket bind ::: value
[1] "uhh"

colnames2<- ::: x
[1] Column1 Column2 Column3
<0 rows> (or 0-length row.names)

colnames2<- ::: value
[1] "uhh"     "Column2" "Column3"

Final value:

[1] uhh     Column2 Column3
<0 rows> (or 0-length row.names)

First, the column names:

colnames2(x)

Next, list sub-assignment:

``[<-``(colnames2(x), 1, "uhh")

and

``colnames2<-``(x, ["uhh", "Column2", "Column3"])

In full, the expression colnames2(x)[1] <- 'uhh' is equivalent to

x <- ``colnames2<-``(x, ``[<-``(colnames2(x), 1, 'uhh'))

Note that the return value of colnames2<- is the input dataframe/matrix.