https://ift.tt/CNV5cIv A Python developer’s guide to the optimal setup for collaboration, production readiness and time saving Image by a...
A Python developer’s guide to the optimal setup for collaboration, production readiness and time saving
Note: The setup instructions/ useful commands for recommended tools can be found in the python-dev-setup repo, which will be referenced throughout the article
Coding classes betrayed me. I was starting my first professional Python project and nothing was set up right on my machine:
- My Windows Command Prompt was not compatible with the BASH shell scripts intended to make project setup easy.
- I didn’t have a good Integrated Development Environment (IDE) for working with .py instead of .ipynb files.
- When I tried running the code, it quickly errored because my Python package versions were different than the versions my team members had installed.
I was used to running code in Jupyter notebooks launched from Anaconda and pip installing whatever packages were needed ad hoc. Environment setup ideal for true development was not a lesson I was taught.
I was scrambling, desperate not to seem incompetent, because a key piece of my education had been ignored.
The truth is, you might know a lot about Python coding in an educational setting, but if you don’t know anything about environment setup, you are in for a rude awakening when you start your first project in production
What exactly do I mean by “environment setup”?
I mean having all the tools you need to efficiently perform the following 5 tasks:
- Version control your code
- Run scripts from command line
- Edit and debug your code
- Manage the Python version you use to run your code
- Manage the Python package versions you use to run your code
Why is environment setup important?
Having a good setup is crucial for a number of reasons:
1. Team Collaboration
When multiple people are working on the same code base, it is hard to collaborate if environment setup is not uniform among all team members. Code that works on one team member’s machine may break when another team member tries running it on their machine if they are using different versions of python or python packages.
2. Production Readiness
The problem extends beyond team collaboration to production. If the production environment is different than the machine you’ve been developing on, code may break once deployed. In programming, it is crucial to control as many external variables as possible so that production behavior is as predictable as possible.
3. Time Saving
When you are first getting started on a new project or testing someone else’s code, you can waste a lot of time preparing your machine to run their code if environment setup is not straightforward.
And time wasted is not limited to the beginning of a project — if setup is not standardized, developers will continue to loose time troubleshooting version incompatibility errors that could have easily been avoided.
I hear you but what do I need to do?
Environment setup can be confusing and complicated. I have struggled through the process many times myself.
The variety of options available for accomplishing the 5 key tasks of environment setup (see the What exactly do I mean by “environment setup”? section above) can be overwhelming.
To save you time, effort and frustration, I have listed the tools I have found most helpful (as well as their key features) below. I have tested these tools across operating systems to ensure optimal setup no matter your OS. Read on for all things needed to create a great environment setup on your machine and reference the python-dev-setup repo for install instructions/tips!
1. Version Control Your Code
Recommend: Git
Git is the powerful software behind Github and Bitbucket. It is helpful for more than just basic version control, it is also key to:
- Team collaboration — You can make/test changes to the code on your local machine before pushing them to the team-shared remote code base. Additionally, you can create branches off the master branch so that your development work doesn’t impact the rest of the team or the code in production. Lastly, you can easily merge other team member’s branches into your own branch or the master branch when development is complete.
- Release Management — The branching system is also key for release management. Depending on project requirements, different enhancements may have different timelines for when they should be deployed to production. By isolating new features in separate branches, you can easily deploy each feature independently.
For more information on Git, see: What is Git and Why Should You Use it?
To get started using Git:
- Install Git — Mac, Linux, Windows
- Configure SSH Auth so that you don’t need to enter your username/ password every time you pull/push to your Github/Bitbucket repo — Mac/Linux, Windows
- Reference this list of useful git commands
- Download the archive_branch.sh script and add it to the root of your project to automate the tedious process of archiving inactive git branches
2. Run Scripts from Command Line
Recommend: Terminal for Mac Users, Git Bash for Windows Users
When developing within an integrated system, you will have to ditch notebooks and get used to running scripts/ commands from the command line. Mac’s Terminal app is ideal for this because it is Linux based and most apps are deployed to Linux machines in Production due to their cost effectiveness.
On a Windows, you can get very close to Mac terminal functionality by using the Git Bash terminal that comes with Git for Windows. Key differences in functionality include:
- notepad instead of open for opening a file
- shift + ins to copy/ paste into/ from the terminal
- python -i instead of just python to use an interactive Python shell in the terminal
Format Your Command Line Interface
No matter the command line interface (CLI) you use, it helps to format your CLI to work well with Git so that you know what branch you are working on and don’t accidentally commit code to the wrong branch.
See the Format Your Terminal section of the python-dev-setup repo for instructions.
3. Edit and Debug Your Code
Recommend: VS Code
While there are many IDE’s out there, Visual Studio Code (VS Code) is my favorite editor for local code editing (I have also tried Atom, PyCharm, and Spyder). I like VS Code for Python development because of its superior ability to trace function/ class definitions across nested files. In PyCharm, this feature stops working as the trace becomes more complex.
VS Code also has great source control features:
- You can easily see what branch of the repo you are viewing in the bottom left of your screen
2. If you are working across multiple repos (i.e. using one repo as a standard tools library that you import as a locally editable package), you can easily see the changes across repos without having to “cd” and “git status” multiple times. For example, if you add a script in both the app and tools repos below, you can view these changes simultaneously in VS Code’s Source Control tab.
Additionally, VS Code’s Debugging tool is super helpful, as long as you know the following tricks:
- Select your Python Interpreter as your project’s venv (NOTE: we will talk more about venv’s later but it is important that the venv is in the root of your project)
- Select a Python file to run
- Set breakpoints (if desired) by clicking in the left margin
- Navigate to the Debug tab and choose the “Python File — Debug the currently active Python file” fo the Debug Configuration
NOTE: For this to work, your VS Code workspace must be open to the directory the script is mean to be run from. Thus, it is easiest to have your scripts built to be run from the root of your project.
- Step through the code and view the variables/ data created along the way.
Lastly, when it is appropriate to use Jupyter notebook files (i.e. for testing code snippets), VS Code has a Jupyter extension to support this. All you need to do is set the kernel to be the venv in the root of your project and install the ipykernel package when prompted.
Reference the Install VS Code and Key Extensions section of the python-dev-setup repo to get started using VS Code.
The VS Code Exception: Remote Code Editing
Despite my love for VS Code when editing local files, I have been disappointed by its remote edit packages. Thus, when working with remote files (i.e. editing code hosted on a Linux machine) I have found Atom to be most useful due to its “ftp-remote-edit” package.
4. Manage the Python Version You Use to Run Your Code
Recommend: Pyenv
Pyenv is a tool that allows you to seamlessly manage multiple Python versions on your machine at once. While the same task can be accomplished with Anaconda, Anaconda is a heavier install and not free for commercial use.
To get started using pyenv:
- Install Pyenv — Mac, Linux, Windows
- Reference this list of useful pyenv commands
.bashprofile vs .bashrc
While reviewing the Mac and Linux pyenv install instructions you may notice that they are very similar barring one key difference — the Linux instructions include a .bashrc file.
In Linux, you can use the crontab to schedule jobs. On my current project we use cron to run a script every minute to check if our apps are running and restart them if they aren’t. We use pyenv to control our Python version, and initially installed pyenv on Linux the same way we had on our Macs (only including pyenv in the .bash_profile file). We could not figure out why our cron jobs weren’t running when the same command worked fine when executed from the command line.
That’s when we discovered the importance of the .bashrc file. Cron jobs that execute scripts use a non-interactive shell login that loads startup files from the .bashrc file (not the .bash_profile file). So, it is important to add pyenv to both the .bash_profile and .bashrc files during the Linux pyenv install.
See this article for more details on .bashrc vs .bash_profile.
5. Manage the Python Package Versions You Use to Run Your Code
Recommend: Poetry
Creating a Python virtual environment is crucial for dependency management. For more information on what a virtual environment is and why you should always use one, check out this article.
While pip installing from an up-to-date requirements.txt file that specifies hard-coded package versions is better than nothing, this method fails to account for your dependencies’ dependencies. For example, you may think you only installed pandas to run your code, but the pandas library is actually dependent on 5 other packages.
You can easily cause errors when creating a virtual environment from a requirements.txt if, for example:
- You specify a numpy version that is incompatible with the pandas version you specified
- You specify a package version that has a numpy version that is incompatible with the numpy version needed for your pandas version
Even if there are no errors when creating the virtual environment from the requirements.txt file, team members may wind up with slightly different sub-dependency versions which may cause problems down the line.
Thinking about sub-dependencies can easily make your head spin. Thankfully, poetry accounts for all of these inter-related dependencies and creates a “poetry.lock” file that you can push to your repo. All your teammates need to do to mirror your setup is run the “poetry install” command.
Is there a case where the lock file may actually cause issues among team members?
Yes, but this case is the exception. For example, if your code is loading other repos as locally editable packages, you won’t want your team members to be locked into the absolute path of your space where you may be working on different git branches of the sub-repos.
If you encounter a situation like this, you can always revert to pip installing dependencies from a requirements.txt with hard-coded versions after controlling your Python version with pyenv.
To get started with poetry:
- Install poetry
- Create a venv with poetry
- Reference these useful poetry commands
If you followed along with the instructions in this article…
Congratulations, you now have an awesome environment setup!
You have all the tools you need to effectively:
- Version control your code
- Run scripts from command line
- Edit and debug your code
- Manage the Python version you use to run your code
- Manage the Python package versions you use to run your code
I hope this helps, and you find yourself ready to hit the ground running the next time you start a new project!
Fall in Love with Your Environment Setup was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.
from Towards Data Science - Medium https://ift.tt/MF8C7A4
via RiYo Analytics
No comments