-
Notifications
You must be signed in to change notification settings - Fork 1
/
lab_developers.Rmd
98 lines (66 loc) · 11.6 KB
/
lab_developers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
title: Resources for lab developers
output:
html_document:
toc: true
toc_float: true
---
As the direct author of the code you will have the most intimate knowledge of the code and project details. This means that your ability to communicate with others in your lab is critical to the smoothness and success of the project.
## General tips:
- Learning to data science can be a steep learning curve. **Be patient with yourself!** Anyone who has ever written code has definitely wanted to pull their hair out at one time or another when they've gotten stuck on a tricky issue.
- In respect to this, we recommend trying to **get plugged into coding groups**. Not only can they be helpful for helping you get unstuck, but also just helpful for support in general or for learning tips for how to improve your code.
- **Seek out resources** that can help give you the basics of coding. Take a look at our [pretty long list](./more_resources.html). Think about how you best learn. Do you like [reading through material on your own](./more_resources.html#Free-online-resources), or do you prefer a more formal class setting like what a [workshop](./more_resources.html#training-networks) might provide? If you want a mix of both structure and work-at-your-own pace, [online courses](./more_resources.html#learn-on-your-own-courses) may be the way to start.
- **Data science is best performed as a team effort**. There's no such thing as the "lone genius" in data science. Don't be afraid to ask others for help! Sometimes you just explaining the problem will help you to see a solution!
- It is a skill to know when to ask for help and when you should continue to plug away at a problem on your own. With the help and guidance of your lab manager and/or lab leader, you can work on striking the right balance. On one hand, if you are stuck on a problem that you are making no progress on it is an inefficient use of everyone's time and energy for you to continue to be frustrated -- **definitely reach out for help**! Others want you and your project to succeed and are generally happy to help you!
- If you are in a small lab or a lab that is not very programming proficient, you may need to get creative in finding help. At the data science lab, one of our goals will be to help you get connected with others to collaborate with. You can also take a look at [online and in person coding groups](./more_resources.html#groups-for-discussing-code).
## Tips on debugging
So much of data science is formatting and troubleshooting. See this guide for some handy tips for that. Bugs don't always come in the form of actual error messages either. The worst bugs are often ones that are silent and have messed up your results without you knowing.
We are borrowing from the [debugging guide from the Childhood Cancer Data Lab here](https://github.com/AlexsLemonade/training-modules/blob/master/intro-to-R-tidyverse/00b-debugging_resources.md):
### 1) Carefully read any and all error messages
This may seem like a silly thing to include as a tip, but it's very easy to gloss over an error message without actually reading it. Often, R may be telling you exactly what is wrong, but if you don't take the time to understand what the error message means, you will have trouble fixing the error. Error messages often refer to R terms (e.g.. "argument", "directory") so if you need a refresher on what some terms mean, we recommend going through one of the [intro to R tutorials we recommend](./more_resources.html#r_and_tidyverse).
Secondly, realize that just because you don't receive an error message, doesn't mean that your code did what you intended it to. You also will need to carefully review your code (and your results) to try to find "silent" bugs (situations where R did exactly what you asked, but you didn't get what you intended).
### 2) Identify which line and phrase of code is the source of the error.
If you ran many lines of code, you may not know which part of your code is the origin of the error message. Isolating the source of the error and trying to better understand your problem should be your first course of action. The best way to determine this, is by running each line, and each phrase by itself, one at a time.
Chunk-out your code and test the individual bits of code. Do you have a lot of lines of code, a lot of arguments, or multiple functions at once? Try each piece by itself to narrow down what piece appears to be the origin of the problem.
### 3) Be sure that the code you think you have run has all successfully run and in order.
It could be that the problem with your code isn't that it doesn't work as it is written, but that you didn't run it or didn't run it in the correct order. This should be one of the first things you check, while checking that the objects that you believe should be in your environment, are in your environment.
It's also good practice to be periodically quitting your current R session and starting a new one, in addition to clearing your R notebook output. If you are encountering problems and haven't refreshed your R session, you may want to do that before further troubleshooting.
In the course of troubleshooting, you will want to re-run all of your code, perhaps many many many times in order to get to the bottom of the problem.
### 4) Google your error message
The main advantage to Googling your errors is that you are likely not the first person to encounter the problem. Certain phrases and terms in the error message will yield more fruitful search results then others.
When you do Google, a few common sources that will probably come up that we recommend looking at are:
#### a) [StackOverflow](https://stackoverflow.com/)
StackOverflow this is a forum where people post questions and problems they encounter in their code.
#### b) [GitHub Issues](https://docs.github.com/en/issues/tracking-your-work-with-issues/about-issues)
People also will post their problems to GitHub issues. Often these are more geared toward fixing problems with the package or software itself, but this is a way to potentially get direct help on an issue from the authors of the package you are using.
#### c) [R-bloggers](https://www.r-bloggers.com/)
R-bloggers has examples of R code that you can use to figure out how to construct various analyses. This is a good resource for example code, although it's format isn't built for asking exact questions like StackOverflow.
### 5) Look at the documentation for a function to make sure you are using it correctly
Once you've better determined the origin of the problem, you should use whatever documentation is available to you regarding the problematic code. When using a new function from a package you are unfamiliar with, it's worthwhile to skim through the documentation so you know how to properly use the functions. For base R functions, Tidyverse functions, and some Bioconductor packages, the documentation will give you a lot of the information you need. However, you will also likely find that not all documentation is thorough or clear.
As we discussed in intro_to_R notebook, objects have structures and types. Having input that doesn't match the requirements that a function has can be a common source of errors. Pay special attention to what the documentation says about what kind of input and output the function is designed to use.
#### Use the RStudio help bar
Here's a screenshot from the help window in RStudio. Note that here we searched for the levels function. R documentation includes information about what the expected arguments are as well as examples of how to use a function. Note here that this documentation tells us that the input for x is probably a factor. search_bar
For Bioconductor package functions, look at their page on [bioconductor.org](https://bioconductor.org/) The documentation on Bioconductor pages have information that can be valuable for troubleshooting. Vignettes can have good example workflows to get started with (can use the `browseVignettes` function for RStudio to open them). In addition, every Bioconductor package has a PDF reference where all the functions and objects for that package are described. They can take some getting used to, but generally can have helpful information.
### 6) Google it again
Because it's unlikely your first attempt at Googling will lead you straight to an answer; this is something you should continue to try with different wordings. Through trial and error, and also Google algorithms learning about what you look for, your search results can eventually lead you to helpful examples and forums.
### 7) Look at the source code for that function
This should rarely be your first approach to solving a problem, since this approach is difficult and doesn't always pay off. This approach will require a a bit more practice at reading code, so it may not be the most fruitful approach depending on the readability and complexity of the code.
### 8) Post to an appropriate forum on StackOverflow or a GitHub Issue
After you've tediously mined the internet for solutions to your problem and still not resolved your problem, you can post your problem to the internet for help.
## Tips on asking for help
Asking for help is a skill in itself! You will be able to more successfully receive help from others if you can better communicate the problem from the get go. Read this article for more details. We will cover the basics here.
**The best coding help requests include** (paraphrased [from the article by Jere Xu](https://betterprogramming.pub/how-to-properly-ask-for-help-in-coding-ad202751aaca)):
- **A description of the problem**— Explain what you are doing in the first place and what language and frameworks you are using. Give context to what you are trying to do!
- **What code did you use to get this problem** — Provide your exact code and data you used and detail exactly what you did in order to get to the problem
- **Expected result** — Describe what the intended result of your code is and maybe even show a real-world example (assuming that what you’re building isn’t completely new).
- **Actual result** - Copy and paste the exact error message you are getting or otherwise provide a screenshot of the problem you are seeing.
- **Environment** — What operating system and version are you running on? What package managers and libraries are you using?
## Reading about code review
- For tips for writing good code and creating reproducible projects: [Reproducibility in Cancer Informatics](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/)
- For tips for writing pull requests: [Engaging in Code Review as an Author](https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/engaging-in-code-review---as-an-author.html) by Candace Savonen.
- For tips for reviewing pull requests: [Engaging in Code Review as an Reviewer](https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/engaging-in-code-review---as-a-reviewer.html) by Candace Savonen.
- For more on effective code review: [A zen manifesto for effective code reviews](https://www.freecodecamp.org/news/a-zen-manifesto-for-effective-code-reviews-e30b5c95204a/) by Jean-Charles Fabre.
- About [Good scientific coding practices](https://github.com/AlexsLemonade/training-modules/blob/master/intro-to-R-tidyverse/00c-good-scientific-coding-practices.md) by the Childhood Cancer Data Lab.
- [Tips for debugging code from the Childhood Cancer Data Lab](https://github.com/AlexsLemonade/training-modules/blob/master/intro-to-R-tidyverse/00b-debugging_resources.md)
- For good practices on code review: [Best practices for Code Review](https://smartbear.com/en/learn/code-review/best-practices-for-peer-code-review/) from SmartBear.
**See our list of [recommended resources](./more_resources.html) to get going!**