Introduction to sas macros

December 4, 2006 SAS macros can be very handy when you are doing certain things over and over again. We are going to learn macros by first learning the basic functions and then by working through two applications. The first application generates some basic univariate statistics that are in a nice format for exporting to Excel. In the next class, we Will use the data to generate some tables in excel. The second application Will be to winsorize data. I use this macro (or a variation on it) for virtually all the projects work on. Macro Basics PACE 1 orlo to View nut*ge 1. Macro statements ts begin with s Following are some c atements. %let – The let statement assigns a value or characters to a macro variable. For example: %let count=5; *** This assgns the value 5, to the macro variable count. Ifl laterwant to reference the macro variable, count, I reference it as &count. At this point &count is equal to 5. %let list=john bill fred; ** this assigns john bill fred to the variable lista Note thatyou do not need to put the names in quotes eventhough it is a string.

SAS treats everything as a string. %eval – The eval function allows you to evaluate or compute a mathematical operation. For example: ount was equal to 5 before the let statement was executed, it now is equal to 6 (5+1). %qscan – QSCAN searches for a Word in a list that is specified by its position in that list. Syntax Argument- the list you are searching across (it is usually a macro variable). N – the number of positions to move over Delimiters – This describes how each word or expression Will be delimited.

If you want to delimit the words by a space, you should enclose the space with %str. %str is like putting something in quotes. Your delimiter Will normally look like Note that there is one space within the parentheses. Example: Suppose we want to obtain the second Word in the macro variable %list (that we defined above to be «john bill fred» and then put that in the variable called firstname. %let )) ; *here the contents ofthe variable firstname is billa %put – the put statement prints the contents to the log window.

For example: %put &list; ** this Will print the contents of the variable list to the log window. 2. Macro names and execution The functions discussed a xecuted anywhere in 10 a SAS program. Often tho e executing macro suppose we want to print the contents of the variable, list, one name ata time. We Will use a %do loop (similar to the do loops e used in Perl). %do – Do loops often come in handy. For example suppose we want to print the contents of the variable &list, one name at a time. let list=John Bill Fred; %macro printnames; ** This marks the start of the macro printnames; 3; This is the start of a do loop with i as the counter; following we print «Name: » followed by a Word from the macro variable list. It searches for the ith word (which is the counter, delimited by a space»;*/ %put Name: %qscan(&list,&i,%str( )); %end; %mend; this next statement executes the %printnames; Output from the macro printnames; 238 %let list—John Bill Fred; 239 240 macro printnames; 241 242 243 244 245 Name: John Name: Bill 0 Name: Fred selected a random sample in that exercise.

We created a table called event. Let’s create table that contains descriptive statistics for the variables, cr ret, car, and ue. In this case, we want to send a couple of parameters to the macro, these parameters Will be variables used in the macro. This is useful if you want to execute the macro multiple times but change certain parameters.

We are going to send it a variable called RESULTS and a variable called STAT RESULTS is the name of the file we are going to generate and STAT is the statistic we want let listl=car cr ret ue; *include the list of variables ; %let dataf=event; *provide the dataset name for univariate analysis; %macro proc univariate noprint data-; *execute the univariate procedure; var ; *the list ofvariables ; output out- ; *the name of the output files, the statistic and variables; data ; set ; *create a variable that names the statistic that was computed; %MEND %uni(cmean,mean) *exec computing the mean and 40F 10 creatine an output dataset output dataset cN; %uni(cstd,std) *execute the macro computing std and creating an output dataset cstd; ata destats; retain stat ; *this retain statement buts the variable stat in the first column of the datset. ; set cmedian cmean cn; Following is a printout of the dataset called destats Obs stat c r _ ret median 0,0155 0,0199 0,0018 2 0. 0039 0. 0063 0. 4425 mean 3 N 70. 0000 70. 0000 77. 0000 3. 0 Macro Example 2 – The %winsorize macro Often times, at least in accounting research, we want to winsonze vanables.

That ‘s, for any observations containing values in the Iowest or highest percentile of the distribution, we want to set the values to the maximum (minimum) of the second highest (Iowest) percentile. I have constructed a macro, called winsorize, to do this. Winsorizing data in SAS /****Prepared by Andy Leone In this example, have a dataset, called compdat2, that contains total assets (data6), sales (datal 2) and income (datal 8) for a random sample of firms from Compustat. s 0 I want to winsorize each o les. The dataset values Will have the original variable names preceded by a w. For example, wdata6 Will contain the winsorized value of data6. The first thing we do is assign certain values to macro variables. ; list of variables you want to winsorize****/ %let winsvars—data6 data12 datal 8; name of the input dataset»/ let indat=compdat2; name of the output dataset»/ %let outset=wcompdat2; %macro winsorize; Here we are goingto initialize a few macro variable names; *ruars is this is a macro variable name that Will contain the list of variable names for the ranks of each of the variables we are winsonzlng. ; %let rvars=; *Iowvars is this is a macro variable name that Will contain the list of variable names that Will contain the value that we Will assign if the observation has a rank of 0 (i. e. , it falls in the Iowest percentile. let Icnwars *highvars is this is a macro variable name that Will contain the list hat we Will assign if the observation has a rank of 99 (i. e. , it falls in the highest percentile. • %let highvars-; *winvars is that macro variable name that Will contain the list Of variable names that Will Inal winsorized value. , 6 0 list ofvariables to winsorize. The trick is that the number ofvariables that we want to winsorize can Vary so we need to check and see how many variables we are vwnsorizlng. To do this, we execute a macro called words. This macro is located below the winsorize macro, and simply counts the number of words in Our list of variables (winsvars). The number of variables is then assigned to the variable called varcount. %let varcount=%words(&winsvars); *The do loop is executed here; %do i=l %to &varcount; *Here we are creating he list ofvariable names For the macro variables we initialized above. In Our example, The macro variable &winsvars, contains data6 data12 data18. For rvars, we want the variable rvars, to contain rdata6 rdata12 rda18, because these are the names we are going to assign to the ranks we obtain from proc ranks. The let statement is assigning rvars, its current contents plus an rwith the next Word in the winsvars list concatenated on. The first time through, it Will be rdata6, the second time through it Will be rdata6 rdata12. ; %let )); *We do the same thing to construct Our list for the macro variables lovwu’ars, highvars, and winvars. %let lomars J); %let h%QSCAN(,,%STR( )); %let winvars- )); *After we run through the loop, , we want to trim off any extra spaces from the beginning or end of the variables we created. %let rva to trim off any created. , %let %let rvars=%trim(); %let highvars=%trim(); ** Now that we have our names constructed, we can go now go through the process. * Here, we execute the rank procedure which assigns the rank Go each variable in the dataset. use the groups=100 option to tell the procedure to create percentiles. If I wanted deciles, would use groups=10 The dataset is whatever dataset we assigned with the macro variable .

The ranks are going to be assigned to the list of variable names we created above, rvars. The variables that we are going to rank are the variables in the list winvars; proc rank out-rgeg4 groups=100; ranks ; var ; So the procedure that Will be executed actually looks like this: proc rank data=event out=rgeg4 groups=100; anks rdata6 rdata12 rdata18; var data6 data12 dataa8; the observations with a rank of O, the miniumum value from the second percentile (those with a rank of 1). For example, suppose the smallest value of data6 for observations in the second percentile of data6 (rank=l), is 0. 8. Then we are going to assign all observations with a rank of 0, a value of . 8 for wdata6 (winsorized data6).

In this data step, we first assign the variables in the lists ‘ars (&hiyars), values equal to the corresponding variable but only ifthe rank is 1 (98). For example, the variable Iwdata6 Will be set equal data6 but only if the observation has a ank ofl (rdata6=1). Lwdata6 Will have a missing value if rdata6 is not equal to 1. Similarly, the variable hdata6, Will be set equal to data6 but only if the observation has a rank of 98 (rdata6=98). This Will be done for each of the variables we are winsorizing. data windata; set rgeg4,• *Here I am using arrays (like we did in Perl). For the first array, I am creating an array called var and this array Will contain the list of variables we are winsorizing.

I am also explicitly telling SAS how many elements are in the array. Remember, varcount contains the number of variables we are winsonzlng. ; rray var(&varcount) &winsvars; array &rvars; array Iowgg(&varcount) &lowvars; array *Notice how I am executin ere but I don’t have % in within a datastep. When you do this, you don’t use am doing it this way because I want this do loop to be executed for each observation in the datset. *; do to &varcount; if rankv(i)=98 then highgg(i)=var(i); end; if then *The next thing we are going to do is create a sql statement to obtain the maximum and minimum values for the groupings 98 and 1 This is what we Will set groups and O to. *The following just creates the select statements; *for each variable that I winsonze, I need a statement to obtain the min and max. For example, I have a variable called [wdat6, which contains values of data6 for all observations with a rank of 1(rdata6=1). I want the sql statement to get the minimum value of Iwdata6 because that Will be the minimum value for group 1. I Will create an sql statement that selects Min(lwdata6) as Iwdata6 Since don’t have a group by statement, it Will obtain the value for the entire dataset (remember Iwdata6 has missing value for all observations except those in group 1); *In our example, minset i up looking like this: Min(lwdata6) as Iwdata6, as Iwdata 12,