-
Notifications
You must be signed in to change notification settings - Fork 51
/
block01_basicsWorkspaceWorkingDirProject.rmd
352 lines (246 loc) · 17.4 KB
/
block01_basicsWorkspaceWorkingDirProject.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
R basics, workspace and working directory, RStudio projects
========================================================
### Basics of working with R in the Console and RStudio goodies
Launch RStudio (and, therefore, R).
Notice the default panes:
* Console (entire left)
* Workspace/History (tabbed in upper right)
* Files/Plots/Packages/Help (tabbed in lower right)
FYI: you can change the default location of the panes:
<http://www.rstudio.com/ide/docs/using/customizing#pane-layout>
Go into the Console, where we interact with the live R process.
Make an assignment and then inspect the object you just created.
```{r}
x <- 3 * 4
x
```
All R statements where you create objects -- "assignments" -- have this
form:
```{r, eval = FALSE}
objectName <- value
```
and in my head I hear, e.g., "x gets 12".
You will make lots of assignments and the operator `<-` is a pain to
type. Don't be lazy and use `=`, although it would work, because it
will just sow confusion later. Instead, utilize RStudio's keyboard
shortcut:
* In windows and linux, press alt and the minus sign: alt + -
* On Mac OS, press option (also labelled alt on JB's keyboard!) and
the minus sign: alt + -
Notice that RStudio automagically surrounds `<-` with spaces, which
demonstrates a useful code formatting practice. Code is miserable to
read on a good day. Give your eyes a break and use spaces.
RStudio offers many handy [keyboard shortcuts](http://www.rstudio.com/ide/docs/using/keyboard_shortcuts).
Object names cannot start with a digit and cannot contain certain
other characters such as a comma or a space. You will be wise to adopt
a convention for demarcating words in names. (Someone even wrote [an article about this](http://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf) in the R Journal!)
```
iUseCamelCase
other.people.use.periods
even_others_use_underscores
```
Make another assignment
```{r}
thisIsAReallyLongName <- 2.5
```
To inspect this newly created object, try out RStudio's completion facility: type the first few characters, press TAB, add characters until you disambiguate, then press return. Tab completion, which is offered here by the RStudio IDE, is extremely helpful. Exploit this whenever possible for maximum efficiency and minimum aggravation.
Make another assignment
```{r}
jennyRocks <- 2^3
```
Let's try to inspect:
```{r}
jennyrocks
jenyRocks
```
Figure out for yourself why the above does not work.
Implicit contract with the computer / scripting language: Computer
will do tedious computation for you. In return, you will be completely
precise in your instructions. Typos matter. Case matters. __Get better
at typing.__
R has a mind-blowing collection of built-in functions that are
accessed like so
```{r, eval = FALSE, tidy = FALSE}
functionName(arg1 = val1, arg2 = val2, and so on)
```
Let's try using `seq()` which helps make regular sequences of numbers
and, while we're at it, demo more helpful features of RStudio.
Type `se` and hit TAB. A pop up shows you possible
completions. Specify `seq()` by typing more to disambiguate or using
the up/down arrows to select. Notice the floating tool-tip-type help
that pops up, reminding you of a function's arguments. If you want
even more help, press F1 as directed to get the full documentation in
the help tab of the lower right pane. Now open the parentheses and
notice the automatic addition of the closing parenthesis and the
placement of cursor in the middle. Type the arguments `1,10` and hit
return. RStudio also exits the parenthetical expression for you. IDEs
are great.
```{r}
seq(1, 10)
```
The above also brings up another topic: how R resolves function
arguments. You can always specify in `name = value` form. But if you
do not, _R attempts to resolve by position_. So above, it is assumed
that we want a sequence `from = 1` that goes `to = 1`. Since we didn't
specify step size, the default value of `by` in the function
definition is used, which ends up being 1 in this case. For functions
I call often, I might use this "resolve by position" functionality for the first
argument or maybe the first two. After that, I always use `name =
value`.
Make this assignment and notice that RStudio helps with quotation marks, just like it did with parentheses.
```{r}
yo <- "hello world"
```
If you just make an assignment, you don't get to see the value, so
then you're tempted to immediately inspect.
```{r}
y <- seq(1, 10)
y
```
This common action can be shortened by surrounding the assignment with
parentheses, which causes assignment and "print to screen" to happen.
```{r}
(y <- seq(1, 10))
```
Not all functions have (or require) arguments:
```{r}
date()
```
Now look at your workspace -- in the upper right pane. __The workspace is
where user-defined objects accumulate.__ You can also get a listing of
these objects with commands:
```{r, eval = FALSE}
objects()
ls()
```
If you want to remove something you can do this
```{r}
rm(y)
```
To remove everything:
```{r}
rm(list = ls())
```
or click the broom in the workspace pane.
### Workspace and working directory
One day you will need to quit R, go do something else and return to
your analysis later, perhaps months or years later.
One day you will have multiple analyses going that use R and you want
to keep them separate.
One day you will need to hand an analysis over to someone else to critique, extend, or reuse.
One day you will need to bring data from the outside world into R and
send numerical results and figures from R back out into the world.
To handle these real life situations, you need to make two decisions:
* What about your analysis is "real", i.e. you will save it as your
lasting record of what happened?
* Where does your analysis "live"?
#### Workspace, .RData
As a beginning R user, it's OK to consider your workspace
"real". _Very soon_, I urge you to evolve to the next level, where you
consider your saved R scripts as "real". (In either case, of course
the input data is very much real and requires preservation!) With the
input data and the R code you used, you can reproduce
_everything_. You can make your analysis fancier. You can get to the
bottom of puzzling results and discover and fix bugs in your code. You
can reuse the code to conduct similar analyses in new projects. You
can remake a figure with different aspect ratio or save is as TIFF
instead of PDF. Etc etc.
First, let's imagine that you regard your workspace as "real". You save it and reload it over and over again (consciously or unconsciously). It's probably heartbreaking when R or your whole machine crashes and you need to start over. You're going to either redo a lot of typing (making mistakes all the way) or will have to mine your R history for the commands you used. Rather than
[becoming an expert on managing the R history](http://www.rstudio.com/ide/docs/using/history), a better use of your time and psychic energy is to keep your "good" R code in a script for future reuse.
But, because it can be useful sometimes, go ahead and note that the commands you've recently executed appear in the History tab of the upper right pane.
You don't have to choose right now and the two strategies are not
incompatible. First, let's demo the save / reload the workspace approach.
Upon quitting R, you have to decide if you want to save your
workspace, for potential restoration the next time you launch R. Depending on
your set up, R or your IDE, eg RStudio, will probably prompt you to
make this decision.
Before proceeding, make sure your workspace contains a few objects. If you cleaned out your workspace above, you could find some assignments in your command history and use the "To Console" button or copy/paste to resubmit.
Quit R/Rstudio, either from the menu, using a keyboard shortcut, or by
typing `q()` in the Console. You'll get a prompt like this:
> Save workspace image to ~/.Rdata?
_Note where the workspace image is to be saved_ and then click `Save`. This will probably happen in your _home directory_, but the exact details will be machine- and OS-dependent.
Using your favorite method, visit the directory where the image was saved and verify there is a file named `.RData` with a very recent modification timestamp. It's binary file, specific to R, so nothing good will come of trying to open and view this file in, e.g., a text editor. You will also see a file `.Rhistory`, holding the commands submitted in your recent session. This is plain text and feel free to open and view it.
Restart RStudio. In the Console you will see a line like this:
```
[Workspace loaded from ~/.RData]
```
indicating that your workspace has been restored. Look in the Workspace pane and you'll see the same objects as before. In the History tab of the same pane, you should also see your command history.You're back in business. This way of starting and stopping analytical work will not serve you well for long but it's a start.
#### Working directory
Any process running on your computer has a notion of its "working directory". In R, this is where R will look, by default, for files you ask it to load. It is also where, by default, any files you write to disk will go. Chances are your current working directory is the directory we inspected above, i.e. the one where RStudio wanted to save the workspace, which is probably also your home directory.
You can explicitly check your working directory with:
```{r, eval = FALSE}
getwd()
```
It is also displayed at the top of the RStudio console.
As a beginning R user, it's OK to let your home directory or any other weird directory on your computer be R's working directory. _Very soon_, I urge you to evolve to the next level, where you organize your analytical projects into directories and, when working on a project, set R's working directory to the associated directory.
__Although I do not recommend it__, in case you're curious, you can set R's working directory at the command line like so:
```{r, eval = FALSE}
setwd("~/myCoolProject")
```
__Although I do not recommend it__, you can also use RStudio's Files pane to navigate to a directory and then set it as working directory from the menu: Session --> Set Working Directory --> To Files Pane Location. (You'll see even more options there). Or within the Files pane, choose __More__ and __Set As
Working Directory__.
But there's a better way. A way that also puts you on the path to managing your R work like an expert.
### RStudio projects
Keeping all the files associated with a project organized together -- input data, R scripts, analytical results, figures -- is such a wise and common practice that RStudio has built-in support for this via it's _projects_.
<http://www.rstudio.com/ide/docs/using/projects>
Let's make one to use for the rest of this tutorial. Do this: Projects menu --> Create project.... New Project. The directory name you choose here will be the project name. Call it whatever you want (but bear in mind that good names are short and informative).
<!--- I created a directory and, therefore RStudio project, called `swc` in my `tmp` directory, FYI. -->
<!--- https://github.com/yihui/knitr/issues/277 -->
<!--- chunk below does not execute because I don't even have such a directory right now -->
```{r, echo=FALSE, eval=FALSE}
setwd("~/tmp/swc")
getwd()
```
Now verify that the directory associated with your project is also the working directory of our current R process:
```{r, eval=FALSE}
getwd()
```
_I won't print my output here because this document itself does not reside in the RStudio Project we just created and it will be confusing._
Let's enter a few commands in the Console, as if we are just beginning an analytical project. I'm going to set the intercept $a$ and slope $b$ of a line, generate some $x$ values uniformly on the interval $[0, 1]$, and finally generate $y$ values as $a + bx$ plus some noise from a Gaussian distribution.
To emulate a real analysis, let's write a numerical result to file for later use -- the average of the $x$'s -- and let's save a scatterplot to PDF -- a scatterplot of $y$ versus $x$ with the true data-generating line superimposed.
```{r}
a <- 2
b <- -3
sigSq <- 0.5
x <- runif(40)
y <- a + b * x + rnorm(40, sd = sqrt(sigSq))
(avgX <- mean(x))
write(avgX, "avgX.txt")
plot(x, y)
abline(a, b, col = "purple")
dev.print(pdf, "toylinePlot.pdf")
```
Let's say this is a good start of an analysis and you're ready to preserve the logic and code. Visit the History tab of the upper right pane. Select these commands, skipping any that didn't work or contained typos. Click "To Source". Now you have a
new pane containing a nascent R script. Click on the floppy disk to save. Give it a name ending in `.R`, I used `toyline.R` and note that, by default, it will go in the directory associated with your project.
Quit RStudio. Inspect the folder associated with your project if you wish. Understand why certain files are or are not there. View the PDF in an external viewer, view the plain text files (the script and the average of the $x$'s) any way you wish.
Restart RStudio. Notice that things, by default, restore to where we were earlier, e.g. objects in the workspace, the command history, which files are open for editing, where we are in the file system browser, the working directory for the R process, etc. __These are all Good Things.__
Change some things about your code. Top priority would be to set a sample size `n` at the top, e.g. `n <- 40`, and then replace all the hard-wired 40's with `n`. Change some other minor-but-detectable stuff, i.e. alter the sample size `n`, the slope of the line `b`,the color of the line ... whatever. Clean out your workspace and then practice the different ways to re-run the code:
* Walk through line by line by keyboard shortcut (command + enter) or mouse (click Run in the upper right corner of editor pane).
* Source the entire document -- equivalent to entering `source('toyline.R')` in the Console -- by keyboard shortcut (shift command S) or mouse (click Source in the upper right corner of editor pane or select from the mini-menu accessible from the associated down triangle).
* Source with echo from the Source mini-menu.
Visit your figure in an external viewer to verify that the PDF is changing as you expect.
In your favorite OS-specific way, search your files for "toylinePlot.pdf" and presumably you will find the PDF itself (no surprise) but _also the script that created it (`toyline.R`)_. This latter phenomenon is a huge win. One day you will want to remake a figure or just simply understand where it came from. If you rigorously save figures to file __with R code and not ever ever ever the mouse or the clipboard__, you will sing my praises one day. Trust me.
### stuff
It is traditional to save R scripts with a `.R` or `.r` suffix. Follow this convention unless you have some extraordinary reason not to.
Comments start with one or more `#` symbols. Use them. RStudio helps you (de)comment selected lines with Ctrl+Shift+C (windows and linux) or Command+Shift+C (mac). Also available from the Code menu.
Clean out the workspace, ie pretend like you've just revisited this project after a long absence. The broom icon or `rm(list = ls())`. Good idea to do this, restart R (available from the Session menu), re-run your analysis to truly check that the code you're saving is complete and correct (or at least rule out obvious problems!).
This workflow will serve you well in the future:
* Create an RStudio project for an analytical project
* Keep inputs there (we'll soon talk about importing)
* Keep scripts there; edit them, run them in bits or as a whole from
there
* Keep outputs there (like the PDF written above)
Avoid using the mouse for pieces of your analytical workflow, such as loading a dataset or saving a figure. Terribly important for reproducibility and for making it possible to retrospectively determine how a numerical table or PDF was actually produced (searching on local disk on filename, among .R files, will lead to the relevant script).
Many long-time users never save the workspace, never save .RData files (I'm one of them), never save or consult the history. Once/if you get to that point, there are options available in RStudio to disable the loading of .RData and permanently suppress the prompt on exit to save the workspace (go to Tools->Options->General).
For the record, when loading data into R and/or writing outputs to file, you can always specify the absolute path and thereby insulate yourself from the current working directory. This is rarely useful when using RStudio. My older workflow, based on Emacs + ESS, did use this approach, bu with personal helper functions to ease the pain.
Links that may be relevant -- but may not be!
[Working in the console (RStudio)](http://www.rstudio.com/ide/docs/using/console)
[RStudio keyboard shortcuts](http://www.rstudio.com/ide/docs/using/keyboard_shortcuts)
[Big list of RStudio documentation](http://www.rstudio.com/ide/docs/)
<!--- If I use RStudio's Knit HTML button to preview the HTML, it clutters up my directory with various files. Tidy up. Although everything suggests (for example, look at RStudios' doc page about Customizing Markdown Rendering) that the base64_images images option should be set, I believe it is somehow not true when using Knit HTML. In which case we can't safely delete the figure directory. Get to the bottom of this later. For now, explicit deletion of the HTML product and the figure directory is necessary. -->
```{r, echo = FALSE}
unlink(c("avgX.txt", "toylinePlot.pdf"))
```
<div class="footer">
This work is licensed under the <a href="http://creativecommons.org/licenses/by-nc/3.0/">CC BY-NC 3.0 Creative Commons License</a>.
</div>