Discussion:
[R] cor vs cor.test
Godmar Back
2009-07-07 14:25:39 UTC
Permalink
Hi,

I am trying to use R for some survey analysis, and need to compute the
significance of some correlations. I read the man pages for cor and
cor.test, but I am confused about

- whether these functions are intended to work the same way
- about how these functions handle NA values
- whether cor.test supports 'use = complete.obs'.

Some example output may explain why I am confused:

-----------------------------------------------
cor(q[[9]], q[[10]])
perceivedlearningcurve
overallimpression 0.7440637
-----------------------------------------------
cor.test(q[[9]], q[[10]])
Error in `[.data.frame`(x, OK) : undefined columns selected
-----------------------------------------------

(I assume that's because of R's generous type coercions.... does R
have a "typeof" operator to learn what the type of q[[9]] is?)

-----------------------------------------------
cor.test(q[[9]][,1], q[[10]][,1])
Pearson's product-moment correlation

data: q[[9]][, 1] and q[[10]][, 1]
t = 12.9877, df = 136, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.6588821 0.8104055
sample estimates:
cor
0.7440637
-----------------------------------------------
cor(q[[9]], q[[51]])
usefulnessautodetectionbox_ord
overallimpression NA
-----------------------------------------------
WORKS, and uses complete observations only
cor(q[[9]], q[[51]], use="complete.obs")
usefulnessautodetectionbox_ord
overallimpression 0.2859895
-----------------------------------------------
WORKS, apparently, but does not require 'use="complete.obs"' (!?)
cor.test(q[[9]][,1], q[[51]][,1])
Pearson's product-moment correlation

data: q[[9]][, 1] and q[[51]][, 1]
t = 3.1016, df = 108, p-value = 0.002456
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1043351 0.4491779
sample estimates:
cor
0.2859895
-----------------------------------------------

The help page for cor.test states that 'getOption('na.action')'
getOption("na.option")
NULL
-----------------------------------------------

No action is taken, yet cor.test appears to only use complete observations (!?)

Others believe that cor.test accepts 'use=complete.obs':
http://markmail.org/message/nuzqeouqhbb7f6ok

--------------

Needless to say, this makes writing robust code very hard.

I'm wondering what the rationale for the inconsistencies between cor
and cor.test is.

Thanks!

- Godmar
Peter Ehlers
2009-07-07 15:42:47 UTC
Permalink
?cor says that cor() can be applied to
'numeric vector, matrix or data frame'

?cor.test requires
'numeric vectors of data values'

So, what's your q?

As to na.action:
?cor.test makes no reference to na.action for the default method.
Looking at the code of cor.test.default shows that only complete
cases are used. The formula method does have an argument na.action
and it works just fine for me.
Try getOption('na.action') and you'll probably find that it is set
^^^^^^
to 'na.omit'.

-Peter Ehlers
Post by Godmar Back
Hi,
I am trying to use R for some survey analysis, and need to compute the
significance of some correlations. I read the man pages for cor and
cor.test, but I am confused about
- whether these functions are intended to work the same way
- about how these functions handle NA values
- whether cor.test supports 'use = complete.obs'.
-----------------------------------------------
cor(q[[9]], q[[10]])
perceivedlearningcurve
overallimpression 0.7440637
-----------------------------------------------
cor.test(q[[9]], q[[10]])
Error in `[.data.frame`(x, OK) : undefined columns selected
-----------------------------------------------
(I assume that's because of R's generous type coercions.... does R
have a "typeof" operator to learn what the type of q[[9]] is?)
-----------------------------------------------
cor.test(q[[9]][,1], q[[10]][,1])
Pearson's product-moment correlation
data: q[[9]][, 1] and q[[10]][, 1]
t = 12.9877, df = 136, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
0.6588821 0.8104055
cor
0.7440637
-----------------------------------------------
cor(q[[9]], q[[51]])
usefulnessautodetectionbox_ord
overallimpression NA
-----------------------------------------------
WORKS, and uses complete observations only
cor(q[[9]], q[[51]], use="complete.obs")
usefulnessautodetectionbox_ord
overallimpression 0.2859895
-----------------------------------------------
WORKS, apparently, but does not require 'use="complete.obs"' (!?)
cor.test(q[[9]][,1], q[[51]][,1])
Pearson's product-moment correlation
data: q[[9]][, 1] and q[[51]][, 1]
t = 3.1016, df = 108, p-value = 0.002456
alternative hypothesis: true correlation is not equal to 0
0.1043351 0.4491779
cor
0.2859895
-----------------------------------------------
The help page for cor.test states that 'getOption('na.action')'
getOption("na.option")
NULL
-----------------------------------------------
No action is taken, yet cor.test appears to only use complete observations (!?)
http://markmail.org/message/nuzqeouqhbb7f6ok
--------------
Needless to say, this makes writing robust code very hard.
I'm wondering what the rationale for the inconsistencies between cor
and cor.test is.
Thanks!
- Godmar
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Godmar Back
2009-07-07 17:14:36 UTC
Permalink
Thanks, Peter.

You're right, I mistyped and getOption('na.action') shows na.omit.

Perhaps my question was more commentary about my perceived lack of
rationale and orthogonality in R than it should have been. Presumably,
q[[i]] is a data frame and q[[i]][,1] is a numeric vector, so cor and
cor.test work differently. The interfaces for how to handle NAs
between the two functions are completely different. Why design things
this way, though.

- Godmar
Post by Peter Ehlers
?cor says that cor() can be applied to
?'numeric vector, matrix or data frame'
?cor.test requires
?'numeric vectors of data values'
So, what's your q?
?cor.test makes no reference to na.action for the default method.
Looking at the code of cor.test.default shows that only complete
cases are used. The formula method does have an argument na.action
and it works just fine for me.
Try getOption('na.action') and you'll probably find that it is set
? ? ? ? ? ? ? ? ?^^^^^^
to 'na.omit'.
?-Peter Ehlers
Post by Godmar Back
Hi,
I am trying to use R for some survey analysis, and need to compute the
significance of some correlations. I read the man pages for cor and
cor.test, but I am confused about
- whether these functions are intended to work the same way
- about how these functions handle NA values
- whether cor.test supports 'use = complete.obs'.
-----------------------------------------------
cor(q[[9]], q[[10]])
? ? ? ? ? ? ? ? ?perceivedlearningcurve
overallimpression ? ? ? ? ? ? ?0.7440637
-----------------------------------------------
cor.test(q[[9]], q[[10]])
Error in `[.data.frame`(x, OK) : undefined columns selected
-----------------------------------------------
(I assume that's because of R's generous type coercions.... does R
have a "typeof" operator to learn what the type of q[[9]] is?)
-----------------------------------------------
cor.test(q[[9]][,1], q[[10]][,1])
? ? ? ?Pearson's product-moment correlation
data: ?q[[9]][, 1] and q[[10]][, 1]
t = 12.9877, df = 136, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
?0.6588821 0.8104055
? ? ?cor
0.7440637
-----------------------------------------------
cor(q[[9]], q[[51]])
? ? ? ? ? ? ? ? ?usefulnessautodetectionbox_ord
overallimpression ? ? ? ? ? ? ? ? ? ? ? ? ? ? NA
-----------------------------------------------
WORKS, and uses complete observations only
cor(q[[9]], q[[51]], use="complete.obs")
? ? ? ? ? ? ? ? ?usefulnessautodetectionbox_ord
overallimpression ? ? ? ? ? ? ? ? ? ? ?0.2859895
-----------------------------------------------
WORKS, apparently, but does not require 'use="complete.obs"' (!?)
cor.test(q[[9]][,1], q[[51]][,1])
? ? ? ?Pearson's product-moment correlation
data: ?q[[9]][, 1] and q[[51]][, 1]
t = 3.1016, df = 108, p-value = 0.002456
alternative hypothesis: true correlation is not equal to 0
?0.1043351 0.4491779
? ? ?cor
0.2859895
-----------------------------------------------
The help page for cor.test states that 'getOption('na.action')'
getOption("na.option")
NULL
-----------------------------------------------
No action is taken, yet cor.test appears to only use complete observations (!?)
http://markmail.org/message/nuzqeouqhbb7f6ok
--------------
Needless to say, this makes writing robust code very hard.
I'm wondering what the rationale for the inconsistencies between cor
and cor.test is.
Thanks!
?- Godmar
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...