Advanced tips and tricks with data.Tips and tricks learned along the way.This is mostly a running list of data.Stack. Overflow posts, or more often than not, experimenting for hours.Id like to persist these discoveries somewhere with more memory than my head hello internet so I can reuse them after my mental memory forgets them.A less organized and concise addition to Data.Update Column Value Datatable Using Context' title='Update Column Value Datatable Using Context' />Camps sweet cheat sheet for the basics.Most, if not all of these techniques were developed for real data science projects and provided some value to my data engineering.Ive generalized everything to the mtcars dataset which might not make this value immediately clear in this slightly contrived context.This list is not intended to be comprehensive as Data.Camps data. table cheatsheet is.OK, enough disclaimers Some more advanced functionality from data.Matt Dowle here. 1.DATA STRUCTURES ASSIGNMENTColumns of listssummary table long and narrowThis could be useful, but is easily achievable using traditional methods.V1. 8 8 5summary table short and narrowAdd all categories of gear for each cyl to original data.This is more nifty.Its so simple, I find myself using this trick to quickly explore data ad hoc at the command line.Can also be useful for more serious data engineering.L listlistuniquegear,bycyl original, ugly.L. listuniquegear,bycyl improved, pretty.L. 1 4 6 4,3,5. Update 1.Per these comments.Stack. Overlow referencing my post, t,gears.L listlistuniquegear, bycyl can be more elegantly written as t,gears.L. listuniquegear, bycyl.Thanks for pointing out my unnecessarily verbose and unusual syntaxI think I wrote the first thing that worked when I posted this, not realizing the normal.Accessing elements from a column of lists.Extract second element of each list in gear.L1 and create row gear.L1. This isnt that groundbreaking, but explores how to access elements of columns which are constructed of lists of lists.L1 lapplygears. L,functionxx2dt,gear.S1 sapplygears. L,functionxx2headdt gear cyl gears.L gear. L1 gear. S1.L1 List of 6. S1 num 1 6 3 3 3 3 5 3.Update 92. 42. 01.Per Matt Dowles comments, a slightly more syntactically succinct way of doing this dt,gear.L1 lapplygears. L,2dt,gear.S1 sapplygears. Have you ever had the desire to use a temporary SQL table using LINQToSQL and found that there was no direct support built in for working with temporary tables.L,2Calculate all the gears for all cars of each cyl excluding the current current row.This can be useful for comparing observations to the mean of groups, where the group mean is not biased by the observation of interest.L,ygearheaddt gear cyl gears.L gear. L1 gear. S1 othergear.Update 92. 42. 01.Per Matt Dowles comments, this achieves the same as above.L,gearThis is actually a base R trick that I didnt discover until working with data.Hi shunmuga, In ur example u have one datatable, but in my requirement i have two datatables and some static text also the and i am using HDATATABLE instead of P.See for some documentation and examples.Ive only used it within the J slot of data.I find it pretty useful for generating columns.I need to perform some multi step vectorized operation.It can clean up code by allowing you to reference the same temporary variable.Defaults to just returning the last object defined in the braces unnamed.V1. 3 8 1. 7. We can be more explicit by passing a named list of what we want to keep.Can also write it like this without semicolons.This is trickier with assignments I dont think is intended to work when wrapped in.Assigning multiple columns with at once.Chaining and then dropping unwanted variables is a messy workaround still exploring this one.NULLheaddt cyl mpg tmp.Fast looping with set.I still havent worked much with the loop set framework.Ive been able to achieve pretty much everything with which is more flexible and powerful.However, if you must loop, set is orders of magnitude faster than native R assignments within loops.Heres a snippet from data.New function setDT,i,j,value allows fast assignment to elements.DT. Similar to but avoids the overhead of.Less flexible than, but as flexible.Similar in spirit to setnames, setcolorder.M matrix1,nrow1. DF as.M. DT as. data. tableM.DFi,1. L lt i 5. DTi,V1 i 1.Mi,1. L lt i 0. DT,i,1.L,i 0. SD. I was actually directed to this solution after I posed this question on Stack.Overflow. I was also pleased to learn that the.I was looking for applying a function to a subset of columns with.SDcols while preserving the untouched columns was added as a feature request.Fforjinc1. L,2. L,4.Lsetdt,jj,value dtj integers using L passed for efficiency.L,5. Lsetdt,jj,valuepaste.Using shift for to leadlag vectors and lists.Note this feature is only available in version 1.Github, not CRAN.Base R surprisingly does not have great tools for dealing with leadslags of vectors that most social science.Stata, SAS, even FAME which I used in my formative data years come equipped with out of the box.NA 2. 1. 0. 2 2. Date2.Date2. 01. 5 0. NA,k,x1 lengthxdt,indpctslow indlagpadind,1 1,byentityheaddt,1.NA NA. 2 2. NA NA.Create multiple columns with in one statement.This is useful, but note that that the columns operated on must be atomic vectors or lists.That is they must exist before running computation.Building columns referencing other columns in this set need to be done individually or chained.Assign a column with named with a character object.This is the advised way to assign a new column whose name you already have determined and saved as a character.Simply surround the character object in parentheses.This is old now deprecated way which still works for now.Not advised. thing.Fheaddt cyl mpg mpgx.BYCalculate a function over a group using by excluding each entity in a second category.This title probably doesnt immediately make much sense.Let me explain what Im going to calculate and why with an example.We want to compare the mpg of each car to the average mpg of cars in the same class the same of cylinders.However, we dont want.This assumption doesnt appear useful in this example, but assume that gearcyl uniquely identify the cars.In the real project where I faced this.I was calculating an indicator related to an appraiser relative to the average of all other appraisers in their ID.METHOD 1 in line.Biased mean simple mean by cyl.However we want to know for each row, what is the mean among all the other cars with the same of cyls, excluding that car.GRP without setting keydt,dt GRP,meanmpg,bycyl,bygearunbiasedmean gear cyl V1.Update 92. 42. 01.Per Matt Dowles comments, this also works with slightly less code.For my simple example, there was also a marginal speed gain.Time savings relative to the.GRP method will likely increase with the complexity of the problem.BY1,meanmpg,bycyl,bygearunbiasedmean gear cyl V1.Same as 1. a, but a little fasteruidlt uniquedtgeardt,dt GRP,meanmpg,bycyl,bygearordercyl,gearunbiasedmean gear cyl V1.Why does this work 1.GRP. dt. GRP,bycyl cyl GRP.GRP,uniquedtgear.GRP,bycyl cyl GRP V2.GRP,uniquedtgear.GRP,bycyl,bygear gear cyl GRP V2.Setting keysetkeydt,gearuidlt uniquedtgeardt,dtuid.GRP,meanmpg,bycyl,bygearunbiasedmean gear cyl V1.METHOD 2 using and.SD is used for to suppress intermediate operations.Building up. No surprises here.SD,meanmpg,bygearsameasdt, meanmpg, bygear gear V1.SD,meanmpg,bycyl,bygearsameasdt, meanmpg, by.V1. Nested data. tables and by statements.This chunk shows what happens with two by statements nested within two different data.Explanatory purposes only not necessary for our task.N counts the number of cars by cyl and gear.N. SD. n. N,sumingearcylsummpg,sumincylvbar,bygear,bycyl cyl gear n N sumingearcyl sumincyl.V1. 1 6 1. 38. Calculating unbiased meanThis is in a summary table.This would need to be merged back onto dt if that is desired.N. SD,nvbar summpgn.N,bygear,bycyl cyl gear V1.METHOD 3 Super Fast Mean calculation.Non function direct way.Using a vectorized approach to calculate the unbiased mean for each combination of gear and cyl. Download I Like You Pasquales there. Mechanically. it calculates the biased average for all cars by cyl.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |