Dynamically assigning calculation results in R

I'm in my first week programming in R and while I've made much progress on solving specific issues, I am in need for advice on a larger scale.

I have a directory full of data files in CSV format. The file names specifically identify the data source. I need to import the data, condition the data through various calculations, and keep the results of each file's conditioning for analysis and review. I have successfully learned to open and do extensive conditioning of the data on an individual file basis. The conditioning results in multiple calculation output. I need to automate this process and dynamically name the results based on the respective file name.

Since the data conditioning is the same for each file, I've written a function that can be called for each file. I understand functions operate in their own environment which disappears after the function runs. I can dynamically name variables using paste to build names and assign to assign results to those names. Those assignments will then be lost when the function closes.

I'm not certain of the optimal way to step through all the files and keep all the individual calculation results available in the workspace. I know I'm "supposed to" write the function output to a single list which I can later index. However, I will have hundreds of calculation results and later indexing will be complicated. Lets say two of the files contains air temperature measurements at a different locations. Since I dynamically name my calculation results based on the descriptive file names, I can have results stored as Temperature.Air.Location1 and Temperature.Air.Location2. I much prefer the ability to later calculate a temperature delta by simply typing Temperature.Air.Location1 - Temperature.Air.Location2 instead of having to look up the corresponding indices of a large list.

I'm certain there is an elegant way of achieving this that's staring me in the face, but I'm afraid I've gotten so wrapped up in learning about functions, interpolation, and plotting in R that I've lost sight of the big picture. Any advice is much appreciated.

EDIT TO ADD EXAMPLE CODE In this portion of the function, I'm converting a table to x,y,z coordinates as well as interpolating the values.

CalibrationImport.Table <- function(filename, parametername, xmin, xmax, ymin, ymax){
  Path.File <- paste0(Path.Folder,filename)
  assign(parametername, read.csv(Path.File, header = FALSE))

  # Extract x coordinates from original table
  assign(paste0(parametername,".x"), get(parametername)[1, ])
  assign(paste0(parametername,".x"), unlist(get(paste0(parametername,".x"))[-1], use.names=FALSE))
  assign(paste0(parametername,".x"), c(t(replicate(nrow(get(parametername))-1, get(paste0(parametername,".x"))))))

  # Extract y coordinates from original table
  assign(paste0(parametername,".y"), get(parametername)[ ,1])
  assign(paste0(parametername,".y"), unlist(get(paste0(parametername,".y"))[-1], use.names=FALSE))
  assign(paste0(parametername,".y"), c(replicate(ncol(get(parametername))-1, get(paste0(parametername,".y")))))

  # Extract data for original table
  assign(paste0(parametername,".z"), unlist(get(parametername)[-1, -1], use.names=FALSE))

  # Interpolate 100x100 surface
  assign(paste0(parametername,".i"), interp(get(paste0(parametername,".x")), get(paste0(parametername,".y")), get(paste0(parametername,".z")),
                                        xo=seq(xmin, xmax, length=100), yo=seq(ymin, ymax, length=100)))
}

Answers


In general the workflow that works well for me is to use lapply. For example:

file_names = list.files(pattern = "*csv")
data_list = lapply(file_names, read.csv)

perform_interpolation = function(dataset) {
   # Perform interpolation on dataset
   return(interpolated_dataset)
}
interpolated_data_list = lapply(data_list, perform_interpolation)

Here I have lists of objects which I transform using functions (i.e. functional programming). The crux is to have simple functions that take a few inputs, and generate one output object.

Without more specifics from you, it is hard to provide more detailed advice.


Don't use assign inside the function, use it outside to assign the result of the function, i.e.

 `assign( "name1" , myfunc(x) )`

If you are applying it to your directory of CSV files, you can do something akin to this:

fl <- list.files( "path/to/my/directory" , pattern = ".csv" )

for( i in 1: length(fl) ){      
  assign( paste0( "file." , i ) , myfunc( fl[i] ) )
}

Which is one of the classic uses of a for loop - applying it for it's side-effects.

However, you have hundreds of files so an lapply might be better, which will return results in a list, and is syntactically very simple:

myresults <- lapply( fl , myfunc )

However, you may need to rewrite parts of your function so it doesn't assign anything, but instead returns the values you want to keep. Use assigment (i.e. <- ) to put the return values in an object in the workspace. Without a reproducible example this can only be a rough sketch.

If you want to retain the names of the files, sapply might be better, and it returns your results as a vector and can keep the names:

sapply( fl , myfunc , USE.NAMES = TRUE )

Need Your Help

Socket.io Rooms in a Hostile Network Environment?

vba socket.io webbrowser-control socket.io-1.0 weinre

I have a very frustrating problem with a client's network environment, and I'm hoping someone can lend a hand in helping me figure this out...

Drag Drop using SendMessage

c# c++ winapi drag-and-drop sendmessage

This sounds funny..just a little experiment.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.