Discussion:
[R] rbind wastes memory
lutz.thieme at amd.com ()
2005-05-30 13:51:50 UTC
Permalink
Hello everybody,

if I try to (r)bind a number of large dataframes I run out of memory because R
wastes memory and seems to "forget" to release memory.

For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols
by 800 rows) which needs ~20 MB RAM if it is loaded as the only object.
Now I try to bind all data frames to a large one and need more than 1165MB (!)
RAM (To simplify the R code, I use the same file ten times):

________ start example 1 __________
load(myFile)
ds.tmp <- ds
for (Cycle in 1:10) {
ds.tmp <- rbind(ds.tmp, ds)
}
________ end example 1 __________



Stepping into details I found the following (comment shows RAM usage after this line
was executed):
load(myFile) # 40MB (19MB for R itself)
ds.tmp <- ds # 40MB; => only a pointer seems to be copied
x<-rbind(ds.tmp, ds) # 198MB
x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to
# 35MB more RAM usage - why?


Now I played around, but I couldn't find a solution. For example I bound each dataframe
step by step and removed the variables and cleared memory, but I still need 1140MB(!)
RAM:

________ start example 2 __________
tmpFile<- paste(myFile,'.tmp',sep="")
load(myFile)
ds.tmp <- ds
save(ds.tmp, file=tmpFile, compress=T)

for (Cycle in 1:10) {
ds <- NULL
ds.tmp <- NULL
rm(ds, ds.tmp)
gc()
load(tmpFile)
load(myFile)
ds.tmp <- rbind(ds.tmp, ds)
save(ds.tmp,file=tmpFile, compress=T)
cat(Cycle,': ',object.size(ds),object.size(ds.tmp),'\n')
}
________ end example 1 __________


platform i386-pc-solaris2.8
arch i386
os solaris2.8
system i386, solaris2.8
status
major 1
minor 9.1
year 2004
month 06
day 21
language R




How can I avoid to run in that memory problem? Any ideas are very appreciated.
Thank you in advance & kind regards,



Lutz Thieme
AMD Saxony/ Product Engineering AMD Saxony Limited Liability Company & Co. KG
phone: + 49-351-277-4269 M/S E22-PE, Wilschdorfer Landstr. 101
fax: + 49-351-277-9-4269 D-01109 Dresden, Germany
Douglas Bates
2005-05-30 14:03:04 UTC
Permalink
Post by lutz.thieme at amd.com ()
Hello everybody,
if I try to (r)bind a number of large dataframes I run out of memory because R
wastes memory and seems to "forget" to release memory.
For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols
by 800 rows) which needs ~20 MB RAM if it is loaded as the only object.
Now I try to bind all data frames to a large one and need more than 1165MB (!)
________ start example 1 __________
load(myFile)
ds.tmp <- ds
for (Cycle in 1:10) {
ds.tmp <- rbind(ds.tmp, ds)
}
________ end example 1 __________
Stepping into details I found the following (comment shows RAM usage after this line
load(myFile) # 40MB (19MB for R itself)
ds.tmp <- ds # 40MB; => only a pointer seems to be copied
x<-rbind(ds.tmp, ds) # 198MB
x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to
# 35MB more RAM usage - why?
Now I played around, but I couldn't find a solution. For example I bound each dataframe
step by step and removed the variables and cleared memory, but I still need 1140MB(!)
________ start example 2 __________
tmpFile<- paste(myFile,'.tmp',sep="")
load(myFile)
ds.tmp <- ds
save(ds.tmp, file=tmpFile, compress=T)
for (Cycle in 1:10) {
ds <- NULL
ds.tmp <- NULL
rm(ds, ds.tmp)
gc()
load(tmpFile)
load(myFile)
ds.tmp <- rbind(ds.tmp, ds)
save(ds.tmp,file=tmpFile, compress=T)
cat(Cycle,': ',object.size(ds),object.size(ds.tmp),'\n')
}
________ end example 1 __________
platform i386-pc-solaris2.8
arch i386
os solaris2.8
system i386, solaris2.8
status
major 1
minor 9.1
year 2004
month 06
day 21
language R
How can I avoid to run in that memory problem? Any ideas are very appreciated.
Thank you in advance & kind regards,
If you are going to look at the memory usage you should use gc(), and
perhaps repeated calls to gc(), before checking the memory footprint.
This will force a garbage collection.

Also, you will probably save memory by treating your data frames as
lists and concatenating them, then converting the result to a data frame.
Duncan Murdoch
2005-05-30 14:08:01 UTC
Permalink
Post by lutz.thieme at amd.com ()
Hello everybody,
if I try to (r)bind a number of large dataframes I run out of memory because R
wastes memory and seems to "forget" to release memory.
For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols
by 800 rows) which needs ~20 MB RAM if it is loaded as the only object.
Now I try to bind all data frames to a large one and need more than 1165MB (!)
________ start example 1 __________
load(myFile)
ds.tmp <- ds
for (Cycle in 1:10) {
ds.tmp <- rbind(ds.tmp, ds)
}
________ end example 1 __________
Stepping into details I found the following (comment shows RAM usage after this line
load(myFile) # 40MB (19MB for R itself)
ds.tmp <- ds # 40MB; => only a pointer seems to be copied
x<-rbind(ds.tmp, ds) # 198MB
x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to
# 35MB more RAM usage - why?
I'm guessing your problem is fragmented memory. You are creating big
objects, then making them bigger. This means R needs to go looking for
large allocations for the replacements, but they won't fit in the spots
left by the things you've deleted, so those are being left empty.

A solution to this is to use two passes: first figure out how much
space you need, then allocate it and fill it. E.g.

for (Cycle in 1:10) {
rows[Cycle] <- .... some calculation based on the data ...
}

ds.tmp <- data.frame(x=double(sum(rows)), y=double(sum(rows)), ...

for (Cycle in 1:10) {
ds.tmp[ appropriate rows, ] <- new data
}


Duncan Murdoch
Roger D. Peng
2005-05-30 15:00:12 UTC
Permalink
Rather than 'rbind' in a loop, try putting your dataframes in a list and
then doing something like 'do.call("rbind", list.of.data.frames")'.

-roger
Post by lutz.thieme at amd.com ()
Hello everybody,
if I try to (r)bind a number of large dataframes I run out of memory because R
wastes memory and seems to "forget" to release memory.
For example I have 10 files. Each file contains a large dataframe "ds" (3500 cols
by 800 rows) which needs ~20 MB RAM if it is loaded as the only object.
Now I try to bind all data frames to a large one and need more than 1165MB (!)
________ start example 1 __________
load(myFile)
ds.tmp <- ds
for (Cycle in 1:10) {
ds.tmp <- rbind(ds.tmp, ds)
}
________ end example 1 __________
Stepping into details I found the following (comment shows RAM usage after this line
load(myFile) # 40MB (19MB for R itself)
ds.tmp <- ds # 40MB; => only a pointer seems to be copied
x<-rbind(ds.tmp, ds) # 198MB
x<-rbind(ds.tmp, ds) # 233MB; the same instruction a second time leads to
# 35MB more RAM usage - why?
Now I played around, but I couldn't find a solution. For example I bound each dataframe
step by step and removed the variables and cleared memory, but I still need 1140MB(!)
________ start example 2 __________
tmpFile<- paste(myFile,'.tmp',sep="")
load(myFile)
ds.tmp <- ds
save(ds.tmp, file=tmpFile, compress=T)
for (Cycle in 1:10) {
ds <- NULL
ds.tmp <- NULL
rm(ds, ds.tmp)
gc()
load(tmpFile)
load(myFile)
ds.tmp <- rbind(ds.tmp, ds)
save(ds.tmp,file=tmpFile, compress=T)
cat(Cycle,': ',object.size(ds),object.size(ds.tmp),'\n')
}
________ end example 1 __________
platform i386-pc-solaris2.8
arch i386
os solaris2.8
system i386, solaris2.8
status
major 1
minor 9.1
year 2004
month 06
day 21
language R
How can I avoid to run in that memory problem? Any ideas are very appreciated.
Thank you in advance & kind regards,
Lutz Thieme
AMD Saxony/ Product Engineering AMD Saxony Limited Liability Company & Co. KG
phone: + 49-351-277-4269 M/S E22-PE, Wilschdorfer Landstr. 101
fax: + 49-351-277-9-4269 D-01109 Dresden, Germany
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/
Loading...