R conditional sum in data frame depending on word in a column

I have a data frame containing words and numeric entries. I want to sum all the entries for which the row entry in the word now is identical.

District name   Population   Child birth rate
A               30,000       .7
A               20,000       .5
B               10,000       .09
B               15,000       .6
C               80,000       .007

I want to sum up the population and child birth rates on the district level. I tried using lapply and sum, but I can't figure it out.

The result to dput(head(mydata) is:

structure(list(District = structure(c(5L, 5L, 5L, 5L, 5L, 5L), .Label =         c("Charlottenburg-Wilmersdorf", 
"Friedrichshain-Kreuzberg", "Lichtenberg", "Marzahn-Hellersdorf", 
"Mitte", "Neukoelln", "Pankow", "Reinickendorf", "Spandau", "Steglitz-Zehlendorf", 
"Tempelhof-Schoeneberg", "Treptow-Koepenick"), class = "factor"), 
Population = c(81205L, 70911L, 5629L, 12328L, 78290L, 84789L
), Overall.crime = c(27864L, 13181L, 943L, 4515L, 15673L, 
16350L), Robbery = c(315L, 195L, 20L, 79L, 232L, 261L), Mugging = c(183L, 
81L, 9L, 54L, 111L, 118L), Assault = c(2016L, 1046L, 51L, 
468L, 1679L, 1718L), Molestation.Stalking = c(480L, 429L, 
16L, 114L, 567L, 601L), Theft = c(13587L, 4961L, 396L, 2019L, 
6725L, 6954L), Car.Theft = c(185L, 149L, 10L, 28L, 159L, 
159L), Bycicle.Theft = c(1444L, 561L, 95L, 123L, 588L, 595L
), Burglary = c(557L, 297L, 37L, 87L, 397L, 528L), Arson = c(36L, 
51L, 7L, 15L, 28L, 56L), Property.Damage = c(2113L, 871L, 
64L, 260L, 1257L, 1172L), Drug.Offenses = c(781L, 538L, 24L, 
87L, 604L, 492L)), .Names = c("District", "Population", "Overall.crime", 
"Robbery", "Mugging", "Assault", "Molestation.Stalking", "Theft", 
"Car.Theft", "Bycicle.Theft", "Burglary", "Arson", "Property.Damage", 
"Drug.Offenses"), row.names = c(NA, 6L), class = "data.frame")

I had spared you all those German names before, but I guess that was stupid since the problem is within the data...

Using ddply gives me following error:

Error in df$Population : object of type 'closure' is not subsettable

Thank you for any help!

Answers


Using the data you originally posted did you mean to do this?

df <- read.table( text = "District_name   Population   Child_birth_rate
A               30000       .7
A               20000       .5
B               10000       .09
B               15000       .6
C               80000       .007" , h = TRUE )

aggregate( cbind( Population , Child_birth_rate ) ~ District_name , data = df , sum )
#  District_name Population Child_birth_rate
#1             A      50000            1.200
#2             B      25000            0.690
#3             C      80000            0.007

Is it a good idea to sum the birth rate?

Using your actual data it might be more convenient to use ddply from plyr to aggregate in the a simillar fashion (but you want to use sum and mean on two different columns):

require( plyr )
ddply( mydata , "District" , function(df) c( "Pop" = sum( df$Population), "Robbery" = mean( df$Robbery ) ) )
#  District    Pop    Crime
#1    Mitte 333152 183.6667

Need Your Help

Looking for a good C# exception logging tool

c# exception logging reporting

I got an exception in a program that I tested for someone. The program is written in Delphi and uses a nice exception logging tool named madExcept. It allows exception reporting of the user's syste...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.