[Bernard and Pradyun spoke on Feb 6 and Feb 7 to give Bernard an understanding of pip's architecture.]
Contents
Feb 6th
Pradyun/Bernard discussion, 6 Feb 2020
Which side of we want to start from? User-facing or broader architecture
At a reasonably high level, what are the moving parts of pip?
Software architecture:
- pip is for installing packages, like a car is going A-B. the main bits within pip are
- a chunk of logic for downloading the things we
- a chunk of logic for converting what we've downloaded into something we can install
- deciding amongst the things we see we should download and install
- final chunk for installing things
- one chunk for uninstalling things
- methods for see the environment - e.g. it's state, whats installed
- search - for discovering things - reading YAML files (pip search yaml)
The different parts of pip:
- env
- config files - saves the user having to express synat ever time (e.g., looking to user's own dev infra for downloading packages, etc)
cli - parsing user input syntax https://pip.pypa.io/en/stable/user_guide/#config-file
- implentation files -
https://pradyunsg.me/talks/slides/2019-python-packaging-overview.pdf -- some relevant notes
https://pradyunsg.me/talks/ -- has the video of the talk corresponding to those slides
Chunks of work (https://github.com/pypa/pip/milestone/10)
knowing what the key func is
figuring out if the way we printout of the output is useful
making the info message look less like error message to the users
making the error messages more helpful
give them tools to debug the problem
there is a tradeoff here - more error messages, the more complexity you add to the codebase. every detailed message needs to add code and this adds complexity to the codebase
which switches are useful? which are not?
How possible is it for us to map all the pip command journeys' flows? Essentially
Look at how long this is: https://pip.pypa.io/en/stable/reference/pip_install/ (complexity in pip's options)
Feb 7th
Pradyun / Bernard discussion 7 Feb 2020
From Zulip conversation: "UX research on "professional" workflows, closed-source/"industry" usage. That's a base of users that we have very little access to." I'm not sure what routes we can find into that...
- research question: what are the workflows of professional users in "closed-source/industry"?
pip install / wheel / download
Install is a superset of wheel, which is a superset of download - to draw *very* broad strokes
a chunk of logic for converting what we've downloaded into something we can distribute/install
pip build logic since PEP 517 we've not had a good story for users who want to use sources
Bernard is fairly new to software dev; has experience w/ hardware + electronics; used software mostly as a consumer/user in the past.
Research question (needs better work):
- how do users expect to build packages into a format they can easily distribute -- wheel distributions and source distributions
If I'm a dev of a project that I want to publish that on PyPI. PyPI accepts certain types of files - wheels and distro - to go from local dir on my PC (which has all the files) to the state i can upload a built distro to pypi or a company server - that step is is unclear.
The reasons are described in this thread https://discuss.python.org/t/building-distributions-and-drawing-the-platypus/2062 . (Ignore the last 2/3 comments there's a lot of discussion.)
Q for me: what UX work needs to be carried out for this?
pip list vs pip show vs pip search vs pip freeze
These are all similar - they give information about packages
Looked at "search" related issues.
research questions:
- can we understand where users use these different functions in their workflow and what are they looking to do?
- how are users using these functions?
- are they discoverable?
Design: ideally the output of this would be to "put all of this functionality into 2 buckets"
review: install a bunch of packages and use the different
search tagged issues: https://github.com/pypa/pip/labels/C%3A%20search
2 patterns that users had issues with: no error messages / errors+info were together or "mysterious" messages
Q to Pradyun: is there a good approach to working out how to improve those messages?
- e.g.: paths of files that it's related to.
It depends on the context of how to display errors.
We have output that have holes that we want to fill - we need to work out what context needs to be given to the user
https://github.com/pypa/pip/issues/5182 -- as an example - instead of displaying an exception, better to wrap it into a "we tried to do this, but it failed".
Past case of similar issues
Best practices for writing error messages: https://www.nngroup.com/articles/ten-usability-heuristics/
desk research: are there best design guidelines for writing error messages for promoting heuristic #9?
- eg. awscli, a CLI provided by Amazon; that Pradyun hopes has had UX eyeballs/inputs. Their resources might not be public though. :P
- continue with categorising print better error message issues.
- look at closed issues to see how they approached the solution
Good outcome: Maintainer guidelines for error messages
Optimistic outcome: Maintainer guidance on identifying areas where we might have such errors
https://wiki.python.org/psf/Pip2020DonorFundedRoadmap
verbosity / output + failure visibility Broader than error messages. is this in/out of scope?
USER STORY
As a user I want to know whats make the display less mysterious
- how do we want to structure our output? where (and in what order) do users expect to see errors messages, actions they have to react to? what is a useful information architecture for messages to be displayed? (install.py#L385)
The objective was to redo the output for the pip install command (and connected to wheel / download)
Do we want to interrupt the user in a resolve dep. issue, or print an error at the end?
TODO read this:
https://github.com/pypa/pip/issues/4649#issuecomment-333690763
https://github.com/pypa/pip/issues/4649#issuecomment-333738894
Does the structure make sense?
- See how other package managers handle information output
Cascading tasks
- have helpers in the code there is a lot of output - do we want to show *everything*?
Experiment
experiment: take a package with lots of dependencies
- look at the regular output (regular = that which is not verbose), look at the verbose output see what is critical, what is not critical research question: identify what is critical, what is not critical.
how much output do users expect to see? how much output is useful to users?
what is missing from regular output, that's in verbose? (regular = that which is not verbose)
Are there "defined" steps in the complication?
- Yes and no - there are stages, not distinct steps.
- Each package that we see we download it, build it, see dependecies and do the same for them (looking at the env)
- how do we decide
- what to download
- what do we build?
- fetch dependencies
- repeat
- then finally install
The resolver
resolver this resolves what to download, build, this gives us a state and how to get there.
installing pip install based on the result of the resolver
the nuance of download, install, wheel wheel - command for the user to run, to get wheels for all packages they will use. the user will end up with a lot of wheel files for each of the packages and deps in a dir.
when they do download they'll get wheels and source. Wheels have static metadata, sources do not. (this was the purpose of the proposed metadata 2.0. it also make dep resolv. harder)
This issue is grounded in the lack of metadata we have about packages.
In the process of downloading, pip might end up building wheels as part of the build anyway. in the dep resol loop, you can potentially build a wheel (they have dep information). Not sure where they end up (not so important).
Everything in the dependency resolution process it will build wheels out of those (if its not in the env already), after the resolution
if there is something downlaoded without a wheel, pip wheel will build that, or error out if its not possible.
download - we'll dump this into a dir the user
- TODO: user documentation about pip wheel and pip install
- TODO: discuss the nuance between install / download / wheel
suggestion: look at .py files for details: https://github.com/pypa/pip/tree/master/src/pip/_internal/commands
review the command line usability for automated scripts pfmoore filed it -- https://github.com/pypa/pip/issues/6099
"pip blowsup it usually ends up breaking someones pipeline. a lot of people are going to pinged, probably in the middle of the night."
technical/professional users
I get paid to work on python, everyday. I have specific workflows, I want control on how I build, where I download from, how pip is called
1. scientific
- i have to compile
- research data science
- corporate application of data science
- more complicated build logic - using legacy/fortran/c code
2. corporate
- software/web/dev
- easier/simplier requirements
- strict control of what we develop with
3. distros
- who redistribute python "stuff"
- they are an intersection of 1 and 2
- differing requirements - they have their own audience (3) but server 2 + 3
4. non-technical/professional users
- no idea how to even start with these; what's a priority etc - Pradyun
TODO: reread https://packaging.python.org/glossary/ as a result of today's discussion
Non-professional users will find it easier to find errors/warnings at the end
- TODO (Pradyun): Move the warn-about-conflicts to the end, based on an understanding of what we discussion
Bernard to make a branch for this work, make it and test it. Maybe work with Pradyun on the messaging.
- TODO bernard: give Pradyun link to the where I will be putting this notes.
- TODO bernard: Improve and restore packaging problems wiki
TODO bernard: reread and digest: https://github.com/pypa/packaging-problems/wiki/User-experience-issues