Discussion:
[R] How to sum some columns based on their names
Kuma Raj
2014-10-13 12:57:45 UTC
Permalink
I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with

I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with

dput(df1)
structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
1231545600, 1231632000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names = c("date",
"f014card", "f1534card", "f3564card", "f6574card", "f7584card",
"f85card", "m014card", "m1534card", "m3564card", "m6574card",
"m7584card", "m85card"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
Charles Determan Jr
2014-10-13 13:05:43 UTC
Permalink
You can use grep with some basic regex, index your dataframe, and colSums

colSums(df[,grep("*6574*|*7584*|*85*", colnames(df))])
colSums(df[,grep("f6574*|f7584*|f85*", colnames(df))])


Regards,
Dr. Charles Determan
Post by Kuma Raj
I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with
I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with
dput(df1)
structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
1231545600, 1231632000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names = c("date",
"f014card", "f1534card", "f3564card", "f6574card", "f7584card",
"f85card", "m014card", "m1534card", "m3564card", "m6574card",
"m7584card", "m85card"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Dr. Charles Determan, PhD
Integrated Biosciences

[[alternative HTML version deleted]]
Jeff Newmiller
2014-10-13 13:32:24 UTC
Permalink
Your regular expressions are invalid, Charles. You seem to be thinking of file name globbing as at the command line.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Post by Charles Determan Jr
You can use grep with some basic regex, index your dataframe, and colSums
colSums(df[,grep("*6574*|*7584*|*85*", colnames(df))])
colSums(df[,grep("f6574*|f7584*|f85*", colnames(df))])
Regards,
Dr. Charles Determan
Post by Kuma Raj
I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with
I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with
dput(df1)
structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
1231027200, 1231113600, 1231200000, 1231286400, 1231372800,
1231459200,
Post by Kuma Raj
1231545600, 1231632000), class = c("POSIXct", "POSIXt"), tzone =
"UTC"),
Post by Kuma Raj
f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card =
c(0,
Post by Kuma Raj
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names =
c("date",
Post by Kuma Raj
"f014card", "f1534card", "f3564card", "f6574card", "f7584card",
"f85card", "m014card", "m1534card", "m3564card", "m6574card",
"m7584card", "m85card"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Jeff Newmiller
2014-10-13 13:30:27 UTC
Permalink
Learn regular expressions.. there are many websites and books that describe how they work. R has a number of functions that use them...

?regexp
?grep

For example...

grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))

where dta is your data frame. You can read that regular expression as zero or more characters that are not digits at the beginning of the string, followed by any of three specified sequences of digits, followed by zero or more non-digit characters at the end of the string.

You can then use that function as the column specification index to look only at certain columns. The sapply function can apply the sum function to all of those columns:

sapply(dta[,grep("^[^0-9]*(6574|85|7584)[^0-9]*$",names(dta))],sum)
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Post by Kuma Raj
I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with
I want to sum columns based on their names. As an exampel how could I
sum columns which contain 6574, 7584 and 85 as column names? In
addition, how could I sum those which contain 6574, 7584 and 85 in
ther names and have a prefix "f". My data contains several variables
with
dput(df1)
structure(list(date = structure(c(1230768000, 1230854400, 1230940800,
1231027200, 1231113600, 1231200000, 1231286400, 1231372800, 1231459200,
1231545600, 1231632000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
f014card = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), f1534card = c(0,
1, 1, 0, 0, 1, 0, 0, 1, 0, 1), f3564card = c(1, 6, 1, 5,
5, 4, 4, 7, 6, 4, 6), f6574card = c(3, 6, 4, 5, 5, 2, 10,
3, 4, 2, 4), f7584card = c(13, 6, 1, 4, 10, 6, 8, 12, 10,
4, 3), f85card = c(5, 3, 1, 0, 2, 10, 7, 9, 1, 7, 3), m014card = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0), m1534card = c(0, 0, 1, 0,
0, 0, 0, 1, 1, 1, 0), m3564card = c(12, 7, 4, 7, 12, 13,
12, 7, 12, 2, 11), m6574card = c(3, 4, 8, 8, 8, 10, 7, 6,
7, 7, 5), m7584card = c(8, 10, 5, 4, 12, 7, 14, 11, 9, 1,
11), m85card = c(1, 4, 3, 0, 3, 4, 5, 5, 4, 5, 0)), .Names = c("date",
"f014card", "f1534card", "f3564card", "f6574card", "f7584card",
"f85card", "m014card", "m1534card", "m3564card", "m6574card",
"m7584card", "m85card"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Arnaud Michel
2014-10-14 08:46:27 UTC
Permalink
Hello

I have 2 df Dem and Rap.
I would want to build all the df (dfnew) by associating these two df
(Dem and Rap) in the following way :

For each value of Dem$Nom (dfnew$Demandeur), I associate 2 different
values of Rap$Nom (dfnew$Rapporteur1 and dfnew$Rapporteur2) in such a way

* for each dfnew$Demandeur, dfnew$Rapporteur1 does not have the same
value for Departement as Dem$Departement
* for each dfnew$Demandeur, dfnew$Rapporteur2 does not have the same
value for Unite as Dem$Unite
* the value of table(dfnew$Rapporteur1) and the value of
table(dfnew$Rapporteur2) must be balanced and not too different
(Accepted differences : 1)

table(dfnew$Rapporteur1)
Rapporteur01 Rapporteur02 Rapporteur03 Rapporteur04 Rapporteur05
4 4 4 4
4

Thanks for your help
Michel

Dem <- structure(list(Nom = c("John", "Jim", "Julie", "Charles",
"Michel",
"Emma", "Sandra", "Elodie", "Thierry", "Albert", "Jean", "Francois",
"Pierre", "Cyril", "Damien", "Jean-Michel", "Vincent", "Daniel",
"Yvan", "Catherine"), Departement = c("D", "A", "A", "C", "D",
"B", "D", "B", "C", "D", "B", "B", "B", "A", "C", "D", "B", "A",
"D", "D"), Unite = c("Unite8", "Unite4", "Unite4", "Unite7",
"Unite9", "Unite1", "Unite6", "Unite5", "Unite7", "Unite3", "Unite2",
"Unite6", "Unite8", "Unite8", "Unite3", "Unite8", "Unite9", "Unite7",
"Unite9", "Unite5")), .Names = c("Nom", "Departement", "Unite"
), row.names = c(NA, -20L), class = "data.frame")

Rap <- structure(list(Nom = c("Rapporteur01", "Rapporteur02",
"Rapporteur03",
"Rapporteur04", "Rapporteur05"), Departement = c("C", "D", "C",
"C", "D"), Unite = c("Unite10", "Unite6", "Unite5", "Unite5",
"Unite4")), .Names = c("Nom", "Departement", "Unite"), row.names = c(NA,
-5L), class = "data.frame")

dfnew <- structure(list(Demandeur = structure(c(13L, 12L, 14L, 3L, 15L,
8L, 17L, 7L, 18L, 1L, 10L, 9L, 16L, 4L, 5L, 11L, 19L, 6L, 20L,
2L), .Label = c("Albert", "Catherine", "Charles", "Cyril", "Damien",
"Daniel", "Elodie", "Emma", "Francois", "Jean", "Jean-Michel",
"Jim", "John", "Julie", "Michel", "Pierre", "Sandra", "Thierry",
"Vincent", "Yvan"), class = "factor"), Rapporteur1 = structure(c(3L,
1L, 3L, 5L, 1L, 5L, 1L, 2L, 5L, 4L, 2L, 4L, 2L, 3L, 5L, 4L, 4L,
2L, 3L, 1L), .Label = c("Rapporteur01", "Rapporteur02", "Rapporteur03",
"Rapporteur04", "Rapporteur05"), class = "factor"), Rapporteur2 =
structure(c(1L,
3L, 4L, 4L, 2L, 4L, 5L, 1L, 2L, 3L, 3L, 3L, 5L, 5L, 1L, 1L, 2L,
5L, 4L, 2L), .Label = c("Rapporteur01", "Rapporteur02", "Rapporteur03",
"Rapporteur04", "Rapporteur05"), class = "factor")), .Names =
c("Demandeur",
"Rapporteur1", "Rapporteur2"), row.names = c(NA, -20L), class =
"data.frame")
--
Michel ARNAUD
Cirad Montpellier


[[alternative HTML version deleted]]
David.Kaethner
2014-10-14 11:18:22 UTC
Permalink
Hello,

here's a draft of a solution. I hope it's not overly complicated.

# find all possible combinations
combi <- expand.grid(Dem$Nom, Rap$Nom); names(combi) <- c("Dem", "Rap")

# we need the corresponding departments and units
combi$DemDep <- apply(combi, 1, function(x) Dem$Departement[x[1] == Dem$Nom])
combi$DemUni <- apply(combi, 1, function(x) Dem$Unite[x[1] == Dem$Nom])
combi$RapDep <- apply(combi, 1, function(x) Rap$Departement[x[2] == Rap$Nom])
combi$RapUni <- apply(combi, 1, function(x) Rap$Unite[x[2] == Rap$Nom])

# we exclude the combinations that we don't want
dep <- combi[combi$DemDep != combi$RapDep, c("Dem", "Rap")]
dep$id <- as.numeric(dep$Rap)
uni <- combi[combi$DemUni != combi$RapUni, c("Dem", "Rap")]
uni$id <- as.numeric(uni$Rap)

# preliminary result
resDep <- reshape(dep,
timevar = "id",
idvar = "Dem",
direction = "wide"
)

resUni <- reshape(uni,
timevar = "id",
idvar = "Dem",
direction = "wide"
)

In resDep and resUni you find the results for Rapporteur1 and Rapporteur2. NAs indicate where conditions did not match. For Rap1/Rap2 you can now choose any column from resDep and resUni that is not NA for that specific Demandeur. I wasn't exactly sure about your third condition, so I'll leave that to you. But with the complete possible matches, you have a more general solution.

Btw, you can construct data.frames just like this:

Dem <- data.frame(
Nom = c("John", "Jim", "Julie", "Charles", "Michel", "Emma", "Sandra", "Elodie", "Thierry", "Albert", "Jean", "Francois", "Pierre", "Cyril", "Damien", "Jean-Michel", "Vincent", "Daniel", "Yvan", "Catherine"),
Departement = c("D", "A", "A", "C", "D", "B", "D", "B", "C", "D", "B", "B", "B", "A", "C", "D", "B", "A", "D", "D"),
Unite = c("Unite8", "Unite4", "Unite4", "Unite7", "Unite9", "Unite1", "Unite6", "Unite5", "Unite7", "Unite3", "Unite2", "Unite6", "Unite8", "Unite8", "Unite3", "Unite8", "Unite9", "Unite7", "Unite9", "Unite5")
)

-dk

-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Arnaud Michel
Gesendet: Dienstag, 14. Oktober 2014 10:46
An: r-help at r-project.org
Betreff: [R] To build a new Df from 2 Df

Hello

I have 2 df Dem and Rap.
I would want to build all the df (dfnew) by associating these two df (Dem and Rap) in the following way :

For each value of Dem$Nom (dfnew$Demandeur), I associate 2 different values of Rap$Nom (dfnew$Rapporteur1 and dfnew$Rapporteur2) in such a way

* for each dfnew$Demandeur, dfnew$Rapporteur1 does not have the same
value for Departement as Dem$Departement
* for each dfnew$Demandeur, dfnew$Rapporteur2 does not have the same
value for Unite as Dem$Unite
* the value of table(dfnew$Rapporteur1) and the value of
table(dfnew$Rapporteur2) must be balanced and not too different
(Accepted differences : 1)

table(dfnew$Rapporteur1)
Rapporteur01 Rapporteur02 Rapporteur03 Rapporteur04 Rapporteur05
4 4 4 4
4

Thanks for your help
Michel

Dem <- structure(list(Nom = c("John", "Jim", "Julie", "Charles", "Michel", "Emma", "Sandra", "Elodie", "Thierry", "Albert", "Jean", "Francois", "Pierre", "Cyril", "Damien", "Jean-Michel", "Vincent", "Daniel", "Yvan", "Catherine"), Departement = c("D", "A", "A", "C", "D", "B", "D", "B", "C", "D", "B", "B", "B", "A", "C", "D", "B", "A", "D", "D"), Unite = c("Unite8", "Unite4", "Unite4", "Unite7", "Unite9", "Unite1", "Unite6", "Unite5", "Unite7", "Unite3", "Unite2", "Unite6", "Unite8", "Unite8", "Unite3", "Unite8", "Unite9", "Unite7", "Unite9", "Unite5")), .Names = c("Nom", "Departement", "Unite"
), row.names = c(NA, -20L), class = "data.frame")

Rap <- structure(list(Nom = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), Departement = c("C", "D", "C", "C", "D"), Unite = c("Unite10", "Unite6", "Unite5", "Unite5", "Unite4")), .Names = c("Nom", "Departement", "Unite"), row.names = c(NA, -5L), class = "data.frame")

dfnew <- structure(list(Demandeur = structure(c(13L, 12L, 14L, 3L, 15L, 8L, 17L, 7L, 18L, 1L, 10L, 9L, 16L, 4L, 5L, 11L, 19L, 6L, 20L, 2L), .Label = c("Albert", "Catherine", "Charles", "Cyril", "Damien", "Daniel", "Elodie", "Emma", "Francois", "Jean", "Jean-Michel", "Jim", "John", "Julie", "Michel", "Pierre", "Sandra", "Thierry", "Vincent", "Yvan"), class = "factor"), Rapporteur1 = structure(c(3L, 1L, 3L, 5L, 1L, 5L, 1L, 2L, 5L, 4L, 2L, 4L, 2L, 3L, 5L, 4L, 4L, 2L, 3L, 1L), .Label = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), class = "factor"), Rapporteur2 = structure(c(1L, 3L, 4L, 4L, 2L, 4L, 5L, 1L, 2L, 3L, 3L, 3L, 5L, 5L, 1L, 1L, 2L, 5L, 4L, 2L), .Label = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), class = "factor")), .Names = c("Demandeur", "Rapporteur1", "Rapporteur2"), row.names = c(NA, -20L), class =
"data.frame")


--
Michel ARNAUD
Cirad Montpellier


[[alternative HTML version deleted]]
Arnaud Michel
2014-10-15 06:51:16 UTC
Permalink
Thank you David
Now, the problem is to list all the combinations which verify the
condition III (ie every Rapporteur has to have more or less the same
number of demandeur)
Have you any idea ?
Michel
Post by David.Kaethner
Hello,
here's a draft of a solution. I hope it's not overly complicated.
# find all possible combinations
combi <- expand.grid(Dem$Nom, Rap$Nom); names(combi) <- c("Dem", "Rap")
# we need the corresponding departments and units
combi$DemDep <- apply(combi, 1, function(x) Dem$Departement[x[1] == Dem$Nom])
combi$DemUni <- apply(combi, 1, function(x) Dem$Unite[x[1] == Dem$Nom])
combi$RapDep <- apply(combi, 1, function(x) Rap$Departement[x[2] == Rap$Nom])
combi$RapUni <- apply(combi, 1, function(x) Rap$Unite[x[2] == Rap$Nom])
# we exclude the combinations that we don't want
dep <- combi[combi$DemDep != combi$RapDep, c("Dem", "Rap")]
dep$id <- as.numeric(dep$Rap)
uni <- combi[combi$DemUni != combi$RapUni, c("Dem", "Rap")]
uni$id <- as.numeric(uni$Rap)
# preliminary result
resDep <- reshape(dep,
timevar = "id",
idvar = "Dem",
direction = "wide"
)
resUni <- reshape(uni,
timevar = "id",
idvar = "Dem",
direction = "wide"
)
In resDep and resUni you find the results for Rapporteur1 and Rapporteur2. NAs indicate where conditions did not match. For Rap1/Rap2 you can now choose any column from resDep and resUni that is not NA for that specific Demandeur. I wasn't exactly sure about your third condition, so I'll leave that to you. But with the complete possible matches, you have a more general solution.
Dem <- data.frame(
Nom = c("John", "Jim", "Julie", "Charles", "Michel", "Emma", "Sandra", "Elodie", "Thierry", "Albert", "Jean", "Francois", "Pierre", "Cyril", "Damien", "Jean-Michel", "Vincent", "Daniel", "Yvan", "Catherine"),
Departement = c("D", "A", "A", "C", "D", "B", "D", "B", "C", "D", "B", "B", "B", "A", "C", "D", "B", "A", "D", "D"),
Unite = c("Unite8", "Unite4", "Unite4", "Unite7", "Unite9", "Unite1", "Unite6", "Unite5", "Unite7", "Unite3", "Unite2", "Unite6", "Unite8", "Unite8", "Unite3", "Unite8", "Unite9", "Unite7", "Unite9", "Unite5")
)
-dk
-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im Auftrag von Arnaud Michel
Gesendet: Dienstag, 14. Oktober 2014 10:46
An: r-help at r-project.org
Betreff: [R] To build a new Df from 2 Df
Hello
I have 2 df Dem and Rap.
For each value of Dem$Nom (dfnew$Demandeur), I associate 2 different values of Rap$Nom (dfnew$Rapporteur1 and dfnew$Rapporteur2) in such a way
* for each dfnew$Demandeur, dfnew$Rapporteur1 does not have the same
value for Departement as Dem$Departement
* for each dfnew$Demandeur, dfnew$Rapporteur2 does not have the same
value for Unite as Dem$Unite
* the value of table(dfnew$Rapporteur1) and the value of
table(dfnew$Rapporteur2) must be balanced and not too different
(Accepted differences : 1)
table(dfnew$Rapporteur1)
Rapporteur01 Rapporteur02 Rapporteur03 Rapporteur04 Rapporteur05
4 4 4 4
4
Thanks for your help
Michel
Dem <- structure(list(Nom = c("John", "Jim", "Julie", "Charles", "Michel", "Emma", "Sandra", "Elodie", "Thierry", "Albert", "Jean", "Francois", "Pierre", "Cyril", "Damien", "Jean-Michel", "Vincent", "Daniel", "Yvan", "Catherine"), Departement = c("D", "A", "A", "C", "D", "B", "D", "B", "C", "D", "B", "B", "B", "A", "C", "D", "B", "A", "D", "D"), Unite = c("Unite8", "Unite4", "Unite4", "Unite7", "Unite9", "Unite1", "Unite6", "Unite5", "Unite7", "Unite3", "Unite2", "Unite6", "Unite8", "Unite8", "Unite3", "Unite8", "Unite9", "Unite7", "Unite9", "Unite5")), .Names = c("Nom", "Departement", "Unite"
), row.names = c(NA, -20L), class = "data.frame")
Rap <- structure(list(Nom = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), Departement = c("C", "D", "C", "C", "D"), Unite = c("Unite10", "Unite6", "Unite5", "Unite5", "Unite4")), .Names = c("Nom", "Departement", "Unite"), row.names = c(NA, -5L), class = "data.frame")
dfnew <- structure(list(Demandeur = structure(c(13L, 12L, 14L, 3L, 15L, 8L, 17L, 7L, 18L, 1L, 10L, 9L, 16L, 4L, 5L, 11L, 19L, 6L, 20L, 2L), .Label = c("Albert", "Catherine", "Charles", "Cyril", "Damien", "Daniel", "Elodie", "Emma", "Francois", "Jean", "Jean-Michel", "Jim", "John", "Julie", "Michel", "Pierre", "Sandra", "Thierry", "Vincent", "Yvan"), class = "factor"), Rapporteur1 = structure(c(3L, 1L, 3L, 5L, 1L, 5L, 1L, 2L, 5L, 4L, 2L, 4L, 2L, 3L, 5L, 4L, 4L, 2L, 3L, 1L), .Label = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), class = "factor"), Rapporteur2 = structure(c(1L, 3L, 4L, 4L, 2L, 4L, 5L, 1L, 2L, 3L, 3L, 3L, 5L, 5L, 1L, 1L, 2L, 5L, 4L, 2L), .Label = c("Rapporteur01", "Rapporteur02", "Rapporteur03", "Rapporteur04", "Rapporteur05"), class = "factor")), .Names = c("Demandeur", "Rapporteur1", "Rapporteur2"), row.names = c(NA, -20L), class =
"data.frame")
--
Michel ARNAUD
Cirad Montpellier
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Michel ARNAUD
Cirad Montpellier
Continue reading on narkive:
Loading...