FAQs
If you’re ready to dive into bigger data and projects, get prepared with these FAQs outlining best tools and practices.
Best Practices
What is DRY and why does it matter?
What is DRY and why does it matter?
DRY stands for ‘Don’t Repeat Yourself’. It’s a principle that encourages reusable, modular code to reduce duplication and improve maintainability.
What is code documentation and why should I use it?
What is code documentation and why should I use it?
Documentation explains what your code does and why. Good documentation makes it easier to debug, share, and revisit code later.
Tools
Should I use notebooks or scripts?
Should I use notebooks or scripts?
Use notebooks for exploration and explanation. Use scripts for automation, scalability, and production workflows.
Why is version control important?
Why is version control important?
Version control helps you track changes, collaborate with others, and revert to earlier versions if something goes wrong. Git is the most common system.
Datasets
What are the most common data issues?
What are the most common data issues?
Common data issues include missing values, outliers, and incorrectly formatted or malformed data. These can impact analysis and must be handled carefully before modeling.
What are summary statistics?
What are summary statistics?
Summary statistics describe and summarize data. Common examples include mean, median, standard deviation, minimum, and maximum.
What is basic analysis?
What is basic analysis?
Basic analysis often includes computing correlations, visualizing distributions, identifying trends, and checking assumptions before building models.