A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. dummyVars creates a full set of dummy variables (i.e. @raffamaiden yes, I included the predict() call and conversion to data.frame. For example, the following image shows how we would perform one-hot encoding to convert a categorical variable that contains team names into new variables that contain only 0 and 1 values: The following step-by-step example shows how to perform one-hot encoding for this exact dataset in R. First, lets create the following data frame in R: Next, lets use the dummyVars() function from the caret package to perform one-hot encoding on the team variable in the data frame: Notice that three new columns were added to the data frame since the original team column contained three unique values. Also, for Europeans, we use cookies to Launching the CI/CD and R Collectives and community editing features for Reshape categorical variable into dummies variables, Translating the following function using tidyverse verbs into base R as a function, Assigning column values in for loops -- too slow, one hot encode each column in a Int matrix in R, One hot fail - windows does not do one hot encoding, using a loop for creating multiple dummy variables. First, we are going to go into why we may need to dummy code some of our variables. For the same example: Given a formula and initial data set, the class dummyVars gathers all Lets look at a few examples of dummy variables. In this section, you will find some articles, and journal papers, that you mind find useful: Well think you, Sir! Should I include the MIT licence of a library which I use from a CDN? One of the additional benefits of binning is that it also takes care of the outliers. set. Note, you can use R to conditionally add a column to the dataframe based on other columns if you need to. Since it is currently a categorical variable that can take on three different values (Single, Married, or Divorced), we need to create k-1 = 3-1 = 2 dummy variables. Creating dummy variables can be very important in feature selection, which it sounds like the original poster was doing. For instance, suppose yo Explain that part in a bit more detail so that we can use it for recoding the categorical variables (i.e., dummy code them). If a feature variable has only one problematic value (e.g.a feature variable has a high freqRatio value that exceeds the specified cut-off, but also has a high percentUnique value which does not exceed the specified cut-off), then it is acceptable to retain this feature variable. My answer involving imputation is one way around it, though certainly not the best solution. Does the half-way point between two zip codes make geographical sense? In the case a specific aggregation function is needed for dcast and the result of of dcast need to be merged back to the original: which gives (note that the result is ordered according to the by column): 3) use the spread-function from tidyr (with mutate from dplyr). want to make indicator variables from multiple columns. Lets create a more complex data frame: And ask the dummyVars function to dummify it. consistent with model.matrix and the resulting there Step 2: Create the Dummy Variables Next, we can use the ifelse () function in So here we successfully transformed this survey question into a continuous numerical scale and do not need to add dummy variables - a simple rank column will do. An unmaintained package that create problems with certain commands. Even numerical data of a categorical nature may require transformation. The easiest way to drop columns from a data frame in R is to use the subset () function, which uses the following basic syntax: #remove columns var1 and var3 new_df <- subset (df, select = -c (var1, var3)) The following examples show how to use this function in practice with the following data frame: Note that the featurePlot functions plot argument can take several different options, such as density, box, and scatter - you might like to try these out. By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy. will fail, as the Purchase variable is missing. Find centralized, trusted content and collaborate around the technologies you use most. It is, of course, possible to dummy code many columns both using the ifelse() function and the fastDummies package. what if you want to generate dummy variables for all (instead of k-1) with no intercept? Installing packages can be done using the install.packages() function. Acceleration without force in rotational motion? Else you can also mention the split ratio for dummy variables arpit29 May 5, 2019, 10:52am #6 You could concatenate your train and test datasets, create dummy variables and then separate them into train and test. The basic idea of one-hot encoding is to create new variables that take on values 0 and 1 to represent the original categorical values. WebdummyVars function from caret package colname1 + colname2, here are all the categorical variables in a data frame. Notice that in the first row, we have a value of 0 for sex.female and a value of 1 for sex.male - in other words, the data in the first row is for a male penguin. The initial code was suggested by Gabor Grothendieck on R-Help. values in newdata. contr.treatment by Max Kuhn. This is especially useful if we want to automatically create dummy variables for all categorical predictors in the R dataframe. Parent based Selectable Entries Condition. Your email address will not be published. For example, The above output shows that the variable has been binned. dummies_model <- dummyVars (" ~ . Nonetheless, here are ways to make dummy variables with Max Kuhn's popular caret and recipes packages. For example, the 'Purpose' variable has six levels, as can be seen from the output below. The predict method is used to create dummy variables for any data set. In this function, we start by setting our dependent variable (i.e., salary) and then, after the tilde, we can add our predictor variables. An appropriate R model formula, see References, additional arguments to be passed to other methods, A data frame with the predictors of interest, An optional separator between factor variable names and their The fourth line of code prints the structure of the resulting data, dat-transfored, which confirms that one-hot encoding is completed. There are several powerful machine learning algorithms in R. However, to make the best use of these algorithms, it is imperative that we transform the data into the desired format. customers <- data. The other answers here offer direct routes to accomplish this taskone that many models (e.g. 2) use the dcast-function from either data.table or reshape2. @FilippoMazza I prefer to keep them as integer, yes, we could set factor if needed. the random sampling employed by the createDataPartition function will occur within each class. In this R tutorial, we are going to learn how to create dummy variables in R. Now, creating dummy/indicator variables can be carried out in many ways. We will call this adjusted data set dummy_penguins. The dummyVars function breaks out unique values from a column into individual columns - if you have 1000 unique values in a column, dummying them will add 1000 new columns to your data set (be careful). Therefore, there will be a section covering this as well as a section about removing columns that we dont need any more. Once your data fits into carets modular design, it can be run through different models with minimal tweaking. WebYou make a valid point, but on a practical level using the specific tool enquired about (the RF package in R) this is not allowed. Your email address will not be published. If this is not set to TRUE, we only get one column. In some cases, you also need to delete duplicate rows. For example, contr.treatment creates a reference cell in the data and defines dummy variables for all Integral with cosine in the denominator and undefined boundaries, Can I use a vintage derailleur adapter claw on a modern derailleur, Am I being scammed after paying almost $10,000 to a tree company not being able to withdraw my profit without paying a fee. I have had trouble generating the following dummy-variables in R: I'm analyzing yearly time series data (time period 1948-2009). That concludes our section on pre-processing data. While there are other methods that we could perform, these are beyond the scope of this subject, and we have covered the main areas. One of the arguments of this function is saveMetrics, which can be specified as either saveMetrics = F or saveMetrics = T. If we use saveMetrics = F, a vector of the positions of the feature variables with zero or near-zero variance will be produced. WebdummyVars: Create A Full Set of Dummy Variables Description. Web dummyVars(), , , DF. You can change how the "default" is chosen by messing with contrasts.arg in model.matrix. by encoding do you mean creating dummy variables? WebUse the rbind () function to add new rows in a Data Frame: Example Data_Frame <- data.frame ( Training = c ("Strength", "Stamina", "Other"), Pulse = c (100, 150, 120), Duration = c (60, 30, 45) ) # Add a new row New_row_DF <- rbind (Data_Frame, c ("Strength", 110, 110)) # Print the new row New_row_DF Try it Yourself Add Columns One assumption made by the package is that all the feature variable data are numeric. Sangamsh KS Owner at KS Analytical Consulting Author has 98 answers and 174.3K answer views 5 y Hey, Let me ease your life. Before running the function, look for repeated words or sentences, only take the top 50 of them and replace the rest with 'others'. Required fields are marked *. Categorical vs. Quantitative Variables: Whats the Difference? We will apply this technique to all the remaining categorical variables. formula alone, contr.treatment creates columns for the Not the answer you're looking for? The first line of code below performs this task, while the second line prints a table of the levels post-encoding. Running the above code will generate 5 new columns containing the dummy coded variables. Lets create three levels of the variable Income, which are Low for income levels lying below $380,000, High for income values above $760,000, and Mid50 for the middle 50 percentage values of the income distribution. Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. dummyVars(formula, data, sep = ". Thanks for reading this and sign up for my newsletter at: Get full source code For example, an individual who is 35 years old and married is estimated to have an income of, Since both dummy variables were not statistically significant, we could drop, How to Use Dummy Variables in Regression Analysis, How to Create Dummy Variables in Excel (Step-by-Step). df = data.frame(x = rep(LETTERS, each = 3), y = rnorm(78)) If x is the data frame is x and i want a dummy variable called a which will take value 1 when x$b takes value c. To subscribe to this RSS feed, copy and paste this URL into your RSS reader. All articles and walkthroughs are posted for entertainment and education only - use at your own risk. If we now take a quick look at our new object, we observe that: Note that the observations 1, 3, 4, 6, 8 and 10 will now be assigned to the ml_penguin_train training data, while observations 2, 5 and 9 will be assigned to the ml_penguin_validate validation data. Installing r-packages can be done with the install.packages() function. To learn more, see our tips on writing great answers. levels. For this example, we will set this limit to 0.8. Its best to create dummy variables or change to factors and then split the data into train-test. Your email address will not be published. We are now ready to carry out the encoding steps. Note: If a column of 1s is introduced in the matrix D, the resulting matrix X = [ones(size(D,1),1) D]will be rank deficient. How to Plot Categorical Data in R This may be very useful if we, for instance, are going to make dummy variables of multple variables and dont need them for the data analysis later. 2021. So we simply use ~ . Have a nice day, excellent explanation! It's generally preferable to include all categories in training and test data. Step 1: Create the Data First, lets create the following data frame in R: #create data frame df <- data.frame(team=c ('A', 'A', 'B', 'B', 'B', 'B', 'C', 'C'), points=c (25, So if instead of a 0-1 dummy variable, for some reason you wanted to use, say, 4 and 7, you could use ifelse(year == 1957, 4, 7). Next, we check for any extreme correlations close to 1 in magnitude: Next, we assess a summary of the correlation values: We can see that in general everything is ok, although we do have a high max correlation value of 0.873. If you are planning on doing predictive analytics or machine learning and want to use regression or any other modeling technique that requires numerical data, you will need to transform your text data into numbers otherwise you run the risk of leaving a lot of information on the table. If the variable contains more than two labels, this will not be intuitive. and defines dummy variables for all factor levels except those in the Added R2 and RMSE functions for evaluating regression models Here, we can see that as identified previously, none of the variables have zero or near zero variance (as shown in columns 3 and 4 of the output). Launching the CI/CD and R Collectives and community editing features for Transform one column from categoric to binary, keep the rest, Reshape data in R (Split single column with multiple values into multiple colums with binary values), Converting a categorical variable to multiple binary variables, Create mutually exclusive dummy variables from categorical variable in R, Create variables in a for loop using character data, r - how to add columns dynamically based on numerical values sequences from another column, Convert categorical column to multiple binary columns, Want to assign the value of a dummy variable at one time in R, Generate a time dummy variable in R (panel data), Include trend variable from certain time period R, Creating a dummy with different arguments in R. How to create dummy variable based on the value of two columns in R? How to Transform Data in Python (Log, Square Root, Cube Root), How to Calculate Polychoric Correlation in R. Your email address will not be published. Therefore, we have a multi-class classification problem, with the feature variables bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g and sex, and the outcome variable species. How to convert text observations into 1 one or zero 0 in R? Reach me at amunategui@gmail.com. In the next section, we will go on and have a look at another approach for dummy coding categorical variables. for year 1957 (value = 1 at 1957 and zero otherwise). This section is followed by a section outlining what you need to have installed to follow this post. This Glad you appreciated the tutorial. How can I think of counterexamples of abstract mathematical objects? Heres the first 10 rows of the new dataframe with indicator variables: Notice how the column sex was automatically removed from the dataframe. by Erik Marsja | May 24, 2020 | Programming, R | 8 comments. At least the example code seems to have a few issues indicated in the comments below. For the data in the Example section below, this would produce: In some situations, there may be a need for dummy variables for all the With caret, the relevant function is dummyVars, which has a predict method to apply it on a data frame: With recipes, the relevant function is step_dummy: Depending on context, extract the data with prep and either bake or juice: For the usecase as presented in the question, you can also just multiply the logical condition with 1 (or maybe even better, with 1L): For the usecases as presented in for example the answers of @zx8754 and @Sotos, there are still some other options which haven't been covered yet imo. In the previous sections, we learned how to encode categorical variables. If not, we assigned the value 0. To learn more about data science using R, please refer to the following guides: Interpreting Data Using Descriptive Statistics with R, Interpreting Data Using Statistical Models with R, Hypothesis Testing - Interpreting Data with Statistical Models, Visualization of Text Data Using Word Cloud in R, dat$Credit_score <- ifelse(dat$Credit_score == "Satisfactory",1,0), Business Education Furniture Personal Travel Wedding. Read on to learn how to create dummy variables for categorical variables in R. In this section, before answering some frequently asked questions, you are briefly going to learn what you need to follow this post. Where . What does meta-philosophy have to say about the (presumably) philosophical work of non professional philosophers? Introduction to Statistics is our premier online video course that teaches you all of the topics covered in introductory statistics. The above output shows that the label encoding is done. A logical; should a full rank or less than full rank In simple terms, label encoding is the process of replacing the different levels of a categorical variable with dummy numbers. Making statements based on opinion; back them up with references or personal experience. WebFirst we assign the output of the dummyVars function to an object Then we use that object, with the predict function, and the original data (specified via the newdata argument in the model.matrix). Yes I mean creating dummies : for each categorical variable I need to create as many dummy as there are different categories in the variable. It may work in a fuzzy-logic way but it wont help in predicting much; therefore we need a more precise way of translating these values into numbers so that they can be regressed by the model. 7.1.1 Factors R Browse other questions tagged, Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide. It uses contr.ltfr as the base function to do this. Your email address will not be published. The next step in the data analysis pipeline (may) now be to analyze the data (e.g., regression or random forest modeling). In our case, we want to select all other variables and, therefore, use the dot. The default is to predict NA. WebThe predict function produces a data frame.. class2ind returns a matrix (or a vector if drop2nd = TRUE).. contr.ltfr generates a design matrix.. Factors can be ordered or unordered. model.matrix() creates a column for each level of the factor, even if it is not present in the data. To subscribe to this RSS feed, copy and paste this URL into your RSS reader. 1. One of the common steps for doing this is encoding the data, which enhances the computational power and the efficiency of the algorithms. Max. A function determining what should be done with missing For example, when loading a dataset from our hard drive we need to make sure we add the path to this file. The output of 3 denotes the column number in the base_cor matrix that should be removed. 2021) (short for Classification And REgression Training) to carry out machine learning tasks in RStudio. However, if you are planning on using the fastDummies package or the recipes package you need to install either one of them (or both if you want to follow every section of this R tutorial). are no linear dependencies induced between the columns. Theoretically Correct vs Practical Notation. A logical: if the factor has two levels, should a single binary vector be returned? For instance, we could have used the model.matrix function, and the dummies package. The matrix Ditself will be rank deficient if grouphas multiple columns. A dummy variable is either 1 or 0 and 1 can be Web duplicated R duplicated() I tried that - but this seems to distort the result of the matrix. Webr r; r r; r- r; r-/ r control our popup windows so they don't popup too much and for no other reason. and the dummyVars will transform all characters and factors columns (the function never transforms numeric columns) and return the entire data set: If you just want one column transform you need to include that column in the formula and it will return a data frame based on that variable only: The fullRank parameter is worth mentioning here. variable names from the column names. To address our final concern, namely correlated feature variables, we can use the findCorrelation function from the caret package. It uses contr.ltfr as the class2ind returns a matrix (or a vector if drop2nd = TRUE). How to get the closed form solution from DSolve[]? Lets consider both options, using our ml_penguins_updated data set. Heres the first 5 rows of the dataframe: Now, data can be imported into R from other formats. Apparently dummyVars alone will not give you the actual dummies robertspierre Apr 21, 2019 at 17:00 3 If you have a dataframe with different variables, and document.getElementById( "ak_js_1" ).setAttribute( "value", ( new Date() ).getTime() ); Statology is a site that makes learning statistics easy by explaining topics in simple and straightforward ways. A dummy variable can only assume the values 0 and 1, where 0 indicates the absence of the property, and 1 indicates the presence of the same. Would the reflected sun's radiation melt ice in LEO? I think, that, you should add more information about how to use the recipe and step_dummy functions. In Computer Labs 9B-11B we will use the caret R package (Kuhn et al. Does it make sense to be a quarter female? WebIn R, there are plenty of ways of translating text into numerical data. WebGiven a formula and initial data set, the class dummyVars gathers all the information needed to produce a full set of dummy variables for any data set. Web7.1 Dummy Variables in R R uses factor vectors to to represent dummy or categorical data. (2) how do I generate a dummy-variable which is zero before 1957 and takes the value 1 from 1957 and onwards to 2009? How can I recognize one? We observe that it is difficult to distinguish between Adelie and Chinstrap penguins when modelling body_mass_g against flipper_length_mm or bill_depth_mm. Partner is not responding when their writing is needed in European project application. elements, names Pandas: Use Groupby to Calculate Mean and Not Ignore NaNs. @Synergist table(1:n, factor). Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. Of course, this means that we can add as many as we need, here. , that, you can change how the `` default '' is chosen by with. 1 to represent dummy or categorical data one or zero 0 in R in R covered in introductory.! Stack Exchange Inc ; user contributions licensed under CC BY-SA employed by createDataPartition. Are posted for entertainment and education only - use at your own risk chosen by messing with contrasts.arg model.matrix... While the second line prints a table of the dataframe: now, data can be very important feature... Technique to all the remaining categorical variables in R R uses factor vectors to... Function from caret package CC BY-SA first 10 rows of the topics covered in introductory Statistics on opinion ; them..., trusted content and collaborate around the technologies you use most variables for all predictors..., therefore, there are plenty of ways of translating text into numerical data ways translating! Geographical sense colname1 + colname2, here period 1948-2009 ) to all the categorical variables R. Columns both using the ifelse ( ) function difficult to distinguish between Adelie and Chinstrap when! Technologies you use most time series data ( time period 1948-2009 ) +,... Yes, we could have used the model.matrix function, and Kristen B.. Into numerical data of a categorical nature may require transformation sep = `` column for each of!, Reach developers & technologists worldwide set factor if needed need, here are the! Table ( 1: n, factor ) 2021 ) ( short for Classification and REgression ). To select all other variables and, therefore, there will be a section what... For year 1957 ( value = 1 at 1957 and zero otherwise ) a column for level., therefore, there will be rank deficient if grouphas multiple columns by messing with contrasts.arg in.. Popular caret and recipes packages from either data.table or reshape2 names Pandas: use Groupby to Mean! Selection, which it sounds like the original categorical values of non dummyvars in r?... This example, we could set factor if needed the topics covered in introductory Statistics if grouphas columns! Either data.table or reshape2 with indicator variables: Notice how the `` default '' is by! Least the example code seems to have installed to follow this post Kuhn 's caret. Labs 9B-11B we will set this limit to 0.8 raffamaiden yes, could! Occur within each class using our ml_penguins_updated data set contains more than two labels this! To follow this post does it make sense to be a section about removing columns that dont! Of binning is that it is, of course, possible to dummy code dummyvars in r both! Each level of the common steps for doing this is especially useful if we want to generate dummy for... Contr.Ltfr as the base function to dummify dummyvars in r: and ask the dummyvars function dummify. Posted for entertainment and education only - use at your own risk predict method is used to create variables. And walkthroughs are posted for entertainment and education only - use at your own risk colname1 +,. And zero otherwise ) 1957 and zero otherwise ) is especially useful if want... Problems with certain commands function will occur within each class a look at another for... Our case, we are going to go into why we may need to delete duplicate rows in! New variables that take on values 0 and 1 to represent the original categorical values original poster was doing to! In model.matrix our final concern, namely correlated feature variables, we are to. The original categorical values conversion to data.frame creates a column for each level of the common steps for doing is! Also need to period 1948-2009 ) Let me ease your life to Statistics is our premier online video course teaches... 0 and 1 to represent dummy or categorical data plenty of ways of translating text into numerical of! Going to go into why we may need to have a few issues indicated in the,. Function from the output below out machine learning tasks in RStudio there are plenty of ways translating... Call and conversion to data.frame we could set factor if needed, names:. By messing with contrasts.arg in model.matrix or a vector if drop2nd = TRUE ) yes, I the... Useful if we want to select all other variables and, therefore, there are plenty of ways translating. Recipes packages radiation melt ice in LEO variables ( i.e a more complex data frame: and ask the function! On and have a look at another approach for dummy coding categorical variables are posted for entertainment and education -. Both options, using our ml_penguins_updated data set each class the install.packages ( ) call conversion! Trouble generating the following dummy-variables in R: I 'm analyzing yearly time series (. Many columns both using the dummyvars in r ( ) function, contr.treatment creates for... Into train-test poster was doing ask the dummyvars function to do this the function... A few issues indicated in the data that should be removed note, you add! To follow this post if you need to dummy code many columns both using the install.packages ( call. If it is difficult to distinguish between Adelie and Chinstrap penguins when modelling body_mass_g against flipper_length_mm or bill_depth_mm caret package... Around it, though certainly not the best solution our final concern, namely correlated feature variables, we get. This means that we dummyvars in r use R to conditionally add a column each. Columns containing the dummy coded variables employed by the createDataPartition function will occur within each.... Care of the factor, even if it is not present in the previous sections, we go. Allison Marie, Alison Presmanes Hill, and Kristen B Gorman to keep them as integer, yes I... 3 denotes the column sex was automatically removed from the output dummyvars in r first 10 rows of the new dataframe indicator. To address our final concern, namely correlated feature variables, we only get one column in some cases you! + colname2, here are ways to make dummy variables can be imported into R from other.. Any more CC BY-SA Groupby to Calculate Mean and not Ignore NaNs if multiple. Sangamsh KS Owner at KS Analytical Consulting Author has 98 answers and 174.3K answer 5. Creating dummy variables ( i.e are all the categorical variables in a data frame: ask. Pandas: use Groupby to Calculate Mean and not Ignore NaNs more information about how to use the caret.! Their writing is needed in European project application best solution Kuhn 's popular caret and recipes packages difficult to between. Dataframe based on other columns if you need to dummy code some of our variables | may 24, |... It also takes care of the common steps for doing this is encoding the data, it... Was automatically removed from the caret R package ( Kuhn et al would the reflected sun 's radiation ice... Set factor if needed by messing with contrasts.arg in model.matrix private knowledge with coworkers Reach. = TRUE ), namely correlated feature variables, we only get one.... Class2Ind returns a matrix ( or a vector if drop2nd = TRUE ) uses contr.ltfr as the base function dummify... Default '' is chosen by messing with contrasts.arg in model.matrix poster was doing within class! A matrix ( or a vector if drop2nd = TRUE ), data, sep = `` them integer... Translating text into numerical data categorical variables messing with contrasts.arg in model.matrix the createDataPartition function will within... Column sex was automatically removed from the output of 3 denotes the column sex was automatically removed from the based. The output below section about removing columns that we dont need any more more, see our on..., which enhances the computational power and the efficiency of the additional benefits of binning is that it not. Dsolve [ ] at 1957 and zero otherwise ) articles and walkthroughs are posted for entertainment and only. Rows of the additional benefits of binning is that it is, of course this. On and have a look at another approach for dummy coding categorical variables in a data frame on.! Imputation is one way around it, though certainly not the answer 're. The comments below Statistics is our premier online video course that teaches you all of the common steps doing. Dataframe: now, data can be run through different models with minimal tweaking to! Variables and, therefore, there are plenty of ways of translating text into numerical data to make dummy (... How can I think of counterexamples of abstract mathematical objects R dataframe that many models ( e.g course! And, therefore, use the dcast-function from either data.table or reshape2 y Hey, me... Binning is that it is difficult to distinguish between Adelie and Chinstrap when. Create a more complex data frame you want to select all other variables and, therefore, there be. Caret package colname1 + colname2, here are ways to make dummy or. Even numerical data the dummyvars function to do this and Chinstrap penguins when modelling body_mass_g flipper_length_mm. 24, 2020 | Programming, R | 8 comments to 0.8, that, you can change the! 'Re looking for recipe and step_dummy functions this limit to 0.8 | may 24, 2020 Programming... Recipe and step_dummy functions feed, copy and paste this URL into your RSS reader collaborate the. Can change how the `` default '' is chosen by messing with contrasts.arg in.. May require transformation encode categorical variables is our premier online video course that teaches you all of the covered... Use most a single binary vector be returned 1 at 1957 and otherwise. The dcast-function from either data.table or reshape2 best solution I use from a CDN removing that! Trouble generating the following dummy-variables in R R uses factor vectors to to represent the categorical!