[R] Reading text file with fortran format

Discussion:

Steven Yen

2014-09-30 21:04:05 UTC

Hello

I read data with fortran format:
mydata<-read.fortran('foo.txt',
c("4F10.4","F8.3","3F3.0","20F2.0"))
colnames(mydata)<-c("q1","q2","q3","q4","income","hhsize",
"weekend","dietk","quart1","quart2","quart3","male","age35",
"age50","age65","midwest","south","west","nonmetro",
"suburb","black","asian","other","hispan","hhtype1",
"hhtype2","hhtype3","emp_stat")
dstat(mydata,digits=6)

I produced the following sample statistics for the first 4
variables (q1,q2,q3,q4):

Mean Std.dev Min Max Obs
q1 0.000923 0.002509 0 0.035245 5649
q2 0.000698 0.001681 0 0.038330 5649
q3 0.000766 0.002138 0 0.040100 5649
q4 0.000373 0.001140 0 0.026374 5649

The correct sample statistics are:
Variable| Mean Std.Dev. Minimum Maximum
--------+----------------------------------------------------
Q1| 9.227632 25.09311 0.0 352.4508
Q2| 6.983078 16.80984 0.0 383.2995
Q3| 7.657381 21.38337 0.0 400.9950
Q4| 3.727952 11.40446 0.0 263.7398
INCOME| 16.01603 13.70296 0.0 100.0
HHSIZE| 2.586475 1.464282 1.0 16.0

In other words, values for q1-q4 were scaled down by a factor of 10,000.
My raw data look like (with proper format)

0.0000 0.0000 0.0000 0.0000 48.108...
0.0000 0.0000 0.0000 0.0000 11.640...
35.3450 0.0000 95.7656 0.0000 4.667...
0.0000 0.0000 0.0000 0.0000 9.000...
84.0000 4.8038 0.0000 3.1886 2.923...
0.0000 0.0000 0.0000 1.1636 10.000...
0.0000 10.7818 109.7884 0.0000 17.000...
0.0000 7.9528 0.0000 4.7829 35.000...

True that the data here are space delimited. But I need to read data
elsewhere where data are not space delimited.

Any idea/suggestion would be appreciated.

[[alternative HTML version deleted]]

Nordlund, Dan (DSHS/RDA)

2014-09-30 22:18:32 UTC

Permalink

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Steven Yen
Sent: Tuesday, September 30, 2014 2:04 PM
To: r-help
Subject: [R] Reading text file with fortran format
Hello
mydata<-read.fortran('foo.txt',
c("4F10.4","F8.3","3F3.0","20F2.0"))
colnames(mydata)<-c("q1","q2","q3","q4","income","hhsize",
"weekend","dietk","quart1","quart2","quart3","male","age35",
"age50","age65","midwest","south","west","nonmetro",
"suburb","black","asian","other","hispan","hhtype1",
"hhtype2","hhtype3","emp_stat")
dstat(mydata,digits=6)
I produced the following sample statistics for the first 4
Mean Std.dev Min Max Obs
q1 0.000923 0.002509 0 0.035245 5649
q2 0.000698 0.001681 0 0.038330 5649
q3 0.000766 0.002138 0 0.040100 5649
q4 0.000373 0.001140 0 0.026374 5649
Variable| Mean Std.Dev. Minimum Maximum
--------+----------------------------------------------------
Q1| 9.227632 25.09311 0.0 352.4508
Q2| 6.983078 16.80984 0.0 383.2995
Q3| 7.657381 21.38337 0.0 400.9950
Q4| 3.727952 11.40446 0.0 263.7398
INCOME| 16.01603 13.70296 0.0 100.0
HHSIZE| 2.586475 1.464282 1.0 16.0
In other words, values for q1-q4 were scaled down by a factor of 10,000.
My raw data look like (with proper format)
0.0000 0.0000 0.0000 0.0000 48.108...
0.0000 0.0000 0.0000 0.0000 11.640...
35.3450 0.0000 95.7656 0.0000 4.667...
0.0000 0.0000 0.0000 0.0000 9.000...
84.0000 4.8038 0.0000 3.1886 2.923...
0.0000 0.0000 0.0000 1.1636 10.000...
0.0000 10.7818 109.7884 0.0000 17.000...
0.0000 7.9528 0.0000 4.7829 35.000...
True that the data here are space delimited. But I need to read data
elsewhere where data are not space delimited.
Any idea/suggestion would be appreciated.

The read.fortran function appears to work differently from how FORTRAN would read the data if there are already decimals points in the numbers. If memory serves, FORTRAN ignores the decimal portion of the format if it finds a decimal in what it reads. The read.fortran function appears to read the number 'as is' and then multiplies by 10^-d, where d is the number of decimal places in the format. Since you have decimals specified, you should specify the format with 0 decimal places, i.e.

c("4F10.0","F8.0","3F3.0","20F2.0"))

hope this is helpful,

Dan

Daniel J. Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services

Steven Yen

2014-09-30 22:22:37 UTC

Permalink

Thanks to all.
Steven Yen

Post by Nordlund, Dan (DSHS/RDA)

The read.fortran function appears to work differently from how
FORTRAN would read the data if there are already decimals points in
the numbers. If memory serves, FORTRAN ignores the decimal portion
of the format if it finds a decimal in what it reads. The
read.fortran function appears to read the number 'as is' and then
multiplies by 10^-d, where d is the number of decimal places in the
format. Since you have decimals specified, you should specify the
format with 0 decimal places, i.e.
c("4F10.0","F8.0","3F3.0","20F2.0"))
hope this is helpful,
Dan
Daniel J. Nordlund, PhD
Research and Data Analysis Division
Services & Enterprise Support Administration
Washington State Department of Social and Health Services

Steven Yen

2014-10-14 05:55:41 UTC

Permalink

Hello
Any idea how to read a text file with fortran format, WITH MULTIPLE
RECORDS? My fortran format is as follows, and I do know I need to
change F7.4 to F7.0, and 2F2.0 to 2I2, etc.
I just have no idea how to handle the "slash" (/) which dictates a
jump to the next record in fortran. Thank you all.
---

10 FORMAT(F8.0,4F2.0,6F7.4,F8.4,3F4.1,2F3.0,36F2.0,11F8.5
* /2F8.5,F10.4,F2.0,F8.1,F10.4,F11.4,F6.2,F2.0,3F10.4,2F12.7,2F2.0,
* F4.0,2F2.0,F8.5,5F2.0)