Discussion:
[R] - counting factor occurrences within a group: tapply()
Ian Chidister
2009-07-29 16:57:10 UTC
Permalink
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090729/9b62d7c1/attachment-0001.pl>
jim holtman
2009-07-29 17:17:07 UTC
Permalink
This is probably what you want; you need to count the number of unique
tapply(Trees$SppID, Trees$PlotID, function(x) length(unique(x)))
BU3F10 BU3F11 BU3F12
1 2 4
Dear List,
I'm an [R] novice starting analysis of an ecological dataset containing the
basal areas of different tree species in a number of research plots.
Trees<-data.frame(SppID=as.factor(c(rep('QUEELL',2), rep('QUEALB',3),
'CORAME', 'ACENEG', 'TILAME')), BA=c(907.9, 1104.4, 113.0, 143.1, 452.3,
638.7, 791.7, 804.3), PlotID=as.factor(c('BU3F10', rep('BU3F11',2),
rep('BU3F12',5))))
Trees
? SppID ? ? ? BA ? ? ?PlotID
1 QUEELL ?907.9 ?BU3F10
2 QUEELL ?1104.4 BU3F11
3 QUEALB ?113.0 ?BU3F11
4 QUEALB ?143.1 ?BU3F12
5 QUEALB ?452.3 ?BU3F12
6 CORAME ?638.7 BU3F12
7 ACENEG ?791.7 BU3F12
8 TILAME ?804.3 ? BU3F12
Fields are (in order): Tree Species Code, Basal Area, and Plot Code.
I've been successful in computing summary statistics by species or plot
tapply(BA, PlotID, sum)
BU3F10 BU3F11 BU3F12
?907.9 ? ?1217.4 ? ?2830.1
*My Question* I'd like to perform a similar function that tells me how many
species are in each plot, I thought this would be possible using something
tapply(SppID, PlotID, nlevels)
BU3F10 BU3F11 BU3F12
? ? ? ? ? 5 ? ? ? ? ? ? 5 ? ? ? ? ? ?5
however, this outputs the total number of levels for the factor SppID rather
BU3F10 BU3F11 BU3F12
? ? ? ? ? 1 ? ? ? ? ? ?2 ? ? ? ? ? ?4
I understand, from reading the archive, that this occurs because R does not
subset factor levels, but I'm wondering if there's a simple way around this.
Thanks for your help,
Ian Chidister
Environment and Resources
The Nelson Institute for Environmental Studies
University of Wisconsin-Madison, USA
? ? ? ?[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?
Daniel Malter
2009-07-29 17:20:11 UTC
Permalink
does "length" instead of "nlevels" do what you want to do?

with(Trees,tapply(SppID,PlotID,unique))

daniel


-------------------------
cuncta stricte discussurus
-------------------------

-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Ian Chidister
Gesendet: Wednesday, July 29, 2009 12:57 PM
An: r-help at r-project.org
Betreff: [R] - counting factor occurrences within a group: tapply()

Dear List,

I'm an [R] novice starting analysis of an ecological dataset containing the
basal areas of different tree species in a number of research plots.
Trees<-data.frame(SppID=as.factor(c(rep('QUEELL',2), rep('QUEALB',3),
'CORAME', 'ACENEG', 'TILAME')), BA=c(907.9, 1104.4, 113.0, 143.1, 452.3,
638.7, 791.7, 804.3), PlotID=as.factor(c('BU3F10', rep('BU3F11',2),
rep('BU3F12',5))))
Trees
SppID BA PlotID
1 QUEELL 907.9 BU3F10
2 QUEELL 1104.4 BU3F11
3 QUEALB 113.0 BU3F11
4 QUEALB 143.1 BU3F12
5 QUEALB 452.3 BU3F12
6 CORAME 638.7 BU3F12
7 ACENEG 791.7 BU3F12
8 TILAME 804.3 BU3F12

Fields are (in order): Tree Species Code, Basal Area, and Plot Code.

I've been successful in computing summary statistics by species or plot
tapply(BA, PlotID, sum)
BU3F10 BU3F11 BU3F12
907.9 1217.4 2830.1

*My Question* I'd like to perform a similar function that tells me how many
species are in each plot, I thought this would be possible using something
tapply(SppID, PlotID, nlevels)
BU3F10 BU3F11 BU3F12
5 5 5

however, this outputs the total number of levels for the factor SppID rather
than the number of species in each plot category which would look like:

BU3F10 BU3F11 BU3F12
1 2 4

I understand, from reading the archive, that this occurs because R does not
subset factor levels, but I'm wondering if there's a simple way around this.


Thanks for your help,

Ian Chidister

Environment and Resources
The Nelson Institute for Environmental Studies University of
Wisconsin-Madison, USA

[[alternative HTML version deleted]]
Ian Chidister
2009-07-29 18:13:57 UTC
Permalink
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090729/fffea874/attachment-0001.pl>
jim holtman
2009-07-29 18:22:43 UTC
Permalink
One way is to exclude the NAs from consideration by creating a new
object without NAs in that column:

newTrees <- Trees[!is.na(Trees$SppID),]
tapply(newTrees$SppID, newTrees$PlotID, function(x) length(unique(x)))
Hi All-
Thanks for your quick responses. ?I was looking for unique instances, so
Jim's and Daniel's suggestions got the job done. ?Using "length" alone
didn't discriminate between multiple occurrences of the same species and
multiple species.
I do have one followup question- my full data set (not the example data) has
a number of NAs in the SppID column, and [r] is currently counting the NAs
tapply(SppID, PlotID, function(Trees, na.rm=T) length(unique(Trees,
na.rm=T)))
tapply(SppID, PlotID, function(Trees) length(unique(Trees)), na.rm=T)
which doesn't seem to convince [r] to ignore the NAs. ?What am I doing
wrong?
Thanks,
Ian
? ? ? ?[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?
jim holtman
2009-07-29 18:24:32 UTC
Permalink
Or even easier:

tapply(Trees$SppID, Trees$PlotID, function(x) length(unique(na.omit(x))))
Hi All-
Thanks for your quick responses. ?I was looking for unique instances, so
Jim's and Daniel's suggestions got the job done. ?Using "length" alone
didn't discriminate between multiple occurrences of the same species and
multiple species.
I do have one followup question- my full data set (not the example data) has
a number of NAs in the SppID column, and [r] is currently counting the NAs
tapply(SppID, PlotID, function(Trees, na.rm=T) length(unique(Trees,
na.rm=T)))
tapply(SppID, PlotID, function(Trees) length(unique(Trees)), na.rm=T)
which doesn't seem to convince [r] to ignore the NAs. ?What am I doing
wrong?
Thanks,
Ian
? ? ? ?[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?
Ian Chidister
2009-07-29 18:29:02 UTC
Permalink
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090729/90dd97b4/attachment-0001.pl>
Loading...