terriko: (Default)
2024-09-27 06:00 pm

Best practices in practice: Black, the Python code formatter

This is crossposted from Curiousity.ca, my personal maker blog. If you want to link to this post, please use the original link since the formatting there is usually better.


I’m starting a little mini-series about some of the “best practices” I’ve tried out in my real-life open source software development. These can be specific tools, checklists, workflows, whatever. Some of these have been great, some of them have been not so great, but I’ve learned a lot. I wanted to talk a bit about the usability and assumptions made in various tools and procedures, especially relative to the wider conversations we need to have about open source maintainer burnout, mentoring new contributors, and improving the security and quality of software.





So let’s start with a tool that I love: Black.





Black’s tagline is “the uncompromising Python code formatter” and it pretty much is what it says on the tin: it can be used to automatically format Python code, and it’s reasonably opinionated about how it’s done with very few options to change. It starts with pep8 compliance (that’s the python style guide for those of you don’t need to memorize such things) and takes it further. I’m not going to talk about the design decisions they made but the black style guide is actually an interesting read if you’re into this kind of thing.





I’m probably a bit more excited about style guides than the average person because I spent several years reading and marking student code, including being a teaching assistant for a course on Perl, a language that is famously hard to read. (Though I’ve got to tell you, the first year undergraduates’ Java programs were absolutely worse to read than Perl.) And then in case mounds of beginner code wasn’t enough of a challenge, I also was involved in a fairly well-known open source project (GNU Mailman) with a decade of code to its name even when I joined so I was learning a lot about the experience of integrating code from many contributors into a single code base. Both of these are… kind of exhausting? I was young enough to not be completely set in my ways, but especially with the beginner Java code, it became really clear that debugging was harder when the formatting was adding a layer of obfuscation to the code. I’d have loved to have an autoformatter for Java because so many students could find their bugs easier once I showed them how to fix their indents or braces.





And then I spent years as an open source project maintainer rather than just a contributor, so it was my job to enforce style as part of code reviews. And… I kind of hated that part of it? It’s frustrating to have the same conversation with people over and over about style and be constantly leaving the same code review comments, and then on top of that sometimes people don’t *agree* with the style and want to argue about it, or people can’t be bothered to come back and fix it themselves so I either have to leave a potentially good bug fix on the floor or I have to fix it myself. Formatting code elegantly can be fun once in a while, but doing it over and over and over and over quickly got old for me.





So when I first heard about Black, I knew it was a thing I wanted for my projects.





Now when someone submits a thing to my code base, Black runs alongside the other tests, and they get feedback right away if their code doesn’t meet our coding standards. It hardly any time to run so sometimes people get feedback very fast. Many new contributors even notice failing required test and go do some reading and fix it before I even see it, and for those that don’t fix issues before I get there I get a much easier conversation that amounts to “run black on your files and update the pull request.” I don’t have to explain what they got wrong and why it matters — they don’t even need to understand what happens when the auto-formatter runs. It just cleans things up and we move on with life.





I feel like the workflow might actually be better if Black was run in our continuous integration system and automatically updated the submitted code, but there’s some challenges there around security and permissions that we haven’t gotten around to solving. And honestly, it’s kind of nice to have an easy low-stress “train the new contributors to use the tools we use” or “share a link to the contributors doc” opening conversation, so I haven’t been as motivated as I might be to fix things. I could probably have a bot leave those comments and maybe one of those days we’ll do that, but I’m going to have to look at the code for code review anyhow so I usually just add it in to the code review comments.





The other thing that Black itself calls out in their docs is that by conforming to a standard auto-format, we really reduce the differences between existing code and new code. It’s pretty obvious when the first attempt has a pile of random extra lines and is failing the Black check. We get a number of contributors using different integrated development environments (IDEs) that are pretty opinionated themselves, and it’s been freeing to not to deal with whitespace nonsense in pull requests or have people try to tell me on the glory if their IDE of choice when I ask them to fix it. Some python IDEs actually support Black so sometimes I can just tell them to flip a switch or whatever and then they never have to think about it again either. Win for us all!





So here’s the highlights about why I use Black:





As a contributor:






  1. Black lets me not think about style; it’s easy to fix before I put together a pull request or patch.




  2. It saves me from the often confusing messages you get from other style checkers.




  3. Because I got into the habit of running it before I even run my code or tests, it serves as a quick mistake checkers.




  4. Some of the style choices, like forcing trailing commas in lists, make editing existing code easier and I suspect increase code quality overall because certain types of bug are more obvious.





As a an open source maintainer:






  1. Black lets me not think about style.




  2. It makes basic code quality conversations easier. I used to have a *lot* of conversations about style and people get really passionate about it, but it wasted a lot of time when the end result was usually going to be “conform to our style if you want to contribute to this project”




  3. Fixing bad style is fast, either for the contributor or for me as needed.




  4. It makes code review easier because there aren’t obfuscating style issues.




  5. It allows for very quick feedback for users even if all our maintainers are busy. Since I regularly work with people in other time zones, this can potentially save days of back and forth before code can be used.




  6. It provides a gateway for users to learn about code quality tools. I work with a lot of new contributors through Google Summer of Code and Hacktoberfest, so they may have no existing framework for professional development. But also even a lot of experienced devs haven’t used tools like Black before!




  7. It provides a starting point for mentoring users about pre-commit checks, continuous integration tests, and how to run things locally. We’ve got other starting points but Black is fast and easy and it helps reduce resistance to the harder ones.




  8. It reduces “bike shedding” about style. Bikeshedding can be a real contributor to burnout of both maintainers and contributors, and this reduces one place where I’ve seen it occur regularly.




  9. It decreases the cognitive overhead of reading and maintainin a full code base which includes a bunch of code from different contributors or even from the same contributor years later. If you’ve spent any time with code that’s been around for decades, you know what I’m talking about.




  10. In short: it helps me reduce maintainer burnout for me and my co-maintainers.





So yeah, that’s Black. It improves my experience as an open source maintainer and as a mentor for new contributors. I love it, and maybe you would too? I highly recommend trying it out on your own code and new projects. (and it’s good for existing projects, even big established ones, but choosing to apply it to an existing code base gets into bikeshedding territory so proceed with caution!)





It’s only for Python, but if you have similar auto-formatters for other languages that you love, let me know! I’d love to have some to recommend to my colleagues at work who focus on other languages.

terriko: I am a serious academic (Twlight Sparkle looking confused) (Serious Academic)
2014-05-30 10:05 pm
Entry tags:

You can leave academia, but you can't get the academic spam out of your inbox

When I used to do research on spam, I wound up spending a lot of time listening to people's little pet theories. One that came up plenty was "oh, I just never post my email address on the internet" which is fine enough as a strategy depending on what you do, but is rather infeasible for academics who want to publish, as custom says we've got to put our email addresses on the paper. This leads to a lot of really awesome contacts with other researchers around the world, but sometimes it leads to stuff like the email I got today:


Dear Terri,

As stated by the Carleton University's electronic repository, you authored the work entitled "Simple Security Policy for the Web" in the framework of your postgraduate degree.

We are currently planning publications in this subject field, and we would be glad to know whether you would be interested in publishing the above mentioned work with us.

LAP LAMBERT Academic Publishing is a member of an international publishing group, which has almost 10 years of experience in the publication of high-quality research works from well-known institutions across the globe.

Besides producing printed scientific books, we also market them actively through more than 80,000 booksellers.

Kindly confirm your interest in receiving more detailed information in this respect.

I am looking forward to hearing from you.


Best regards,
Sarah Lynch
Acquisition Editor

LAP LAMBERT Academic Publishing is a trademark of OmniScriptum
GmbH & Co. KG

Heinrich-Böcking-Str. 6-8, 66121, Saarbrücken, Germany
s.lynch(at)lap-publishing.com / www. lap-publishing .com

Handelsregister Amtsgericht Saarbrücken HRA 10356
Identification Number (Verkehrsnummer): 13955
Partner with unlimited liability: VDM Management GmbH
Handelsregister Amtsgericht Saarbrücken HRB 18918
Managing director: Thorsten Ohm (CEO)


Well, I guess it's better than the many mispelled emails I get offering to let me buy a degree (I am *so* not the target audience for that, thanks), and at least it's not incredibly crappy conference spam. In fact, I'd never heard of this before, so I did a bit of searching.

Let's just post a few of the summaries from that search:

From wikipedia:
The Australian Higher Education Research Data Collection (HERDC) explicitly excludes the books by VDM Verlag and Lambert Academic Publishing from ...


From the well-titled Lambert Academic Publishing (or How Not to Publish Your Thesis):
Lambert Academic Publishing (LAP) is an imprint of Verlag Dr Muller (VDM), a publisher infamous for selling cobbled-together "books" made ...


And most amusingly, the reason I've included the phrase "academic spam" in the title:
I was contacted today by a representative of Lambert Academic Publishing requesting that I change the title of my blog post "Academic Spam", ...


So yeah, no. My thesis is already published, thanks, and Simple Security Policy for the Web is freely available on the web for probably obvious reasons. I never did convert the darned thing to html, though, which is mildly unfortunate in context!
terriko: (Default)
2014-05-30 08:34 pm
Entry tags:

PlanetPlanet vs iPython Notebook [RESOLVED: see below]

Short version:

I'd like some help figuring out why RSS feeds that include iPython notebook contents (or more specifically, the CSS from iPython notebooks) are showing up as really messed up in the PythonPython blog aggregator. See the Python summer of code aggregator and search for a MNE-Python post to see an example of what's going wrong.

Bigger context:

One of the things we ask of Python's Google Summer of Code students is regular blog posts. This is a way of encouraging them to be public about their discoveries and share their process and thoughts with the wider Python community. It's also very helpful to me as an org admin, since it makes it easier for me to share and promote the students' work. It also helps me keep track of everyone's projects without burning myself out trying to keep up with a huge number of mailing lists for each "sub-org" under the Python umbrella. Python sponsors not only students to work on the language itself, but also for projects that make heavy use of Python. In 2014, we have around 20 sub-orgs, so that's a lot of mailing lists!

One of the tools I use is PythonPython, software often used for making free software "planets" or blog aggregators. It's easy to use and run, and while it's old, it doesn't require me to install and run an entire larger framework which I would then have to keep up to date. It's basically making a static page using a shell script run by a cron job. From a security perspective, all I have to worry about is that my students will post something terrible that then gets aggregated, but I'd have to worry about that no matter what blogroll software I used.

But for some reason, this year we've had some problems with some feeds, and it *looks* like the problem is specifically that PlanetPlanet can't handle iPython notebook formatted stuff in a blog post. This is pretty awkward, as iPython notebook is an awesome tool that I think we should be encouraging students to use for experimenting in Python, and it really irks me that it's not working. It looks like Chrome and Firefox parse the feed reasonably, which makes me think that somehow PlanetPlanet is the thing that's losing a <style> tag somewhere. The blogs in question seem to be on blogger, so it's also possible that it's google that's munging the stylesheet in a way that planetplanet doesn't parse.

I don't suppose this bug sounds familiar to anyone? I did some quick googling, but unfortunately the terms are all sufficiently popular when used together that I didn't find any reference to this bug. I was hoping for a quick fix from someone else, but I don't mind hacking PlanetPlanet myself if that's what it takes.

Anyone got a suggestion of where to start on a fix?

Edit: Just because I saw someone linking this on twitter, I'll update in the main post: tried Mary's suggestion of Planet Venus (see comments below) out on Monday and it seems to have done the trick, so hurrah!
terriko: (Default)
2014-04-26 11:33 am
Entry tags:

Mailman 3.0 Suite Beta!

I'm happy to say that...


Mailman logo

Mailman 3.0 suite is now in beta!

As many of you know, Mailman's been my open source project of choice for a good many years. It's the most popular open source mailing list manager with millions of users worldwide, and it's been quietly undergoing a complete re-write and re-working for version 3.0 over the past few years. I'm super excited to have it at the point where more people can really start trying it out. We've divided it into several pieces: the core, which sends the mails, the web interface that handles web-based subscriptions and settings, and the new web archiver, plus there's a set of scripts to bundle them all together. (Announcement post with all the links.)

While I've done more work on the web interface and a little on the core, I'm most excited for the world to see the archiver, which is a really huge and beautiful change from the older pipermail. The new archiver is called Hyperkitty, and it's a huge change for Mailman.

You can take a look at hyperkitty live on the fedora mailing list archives if you're curious! I'll bet it'll make you want your other open source lists to convert to Mailman 3 sooner rather than later. Plus, on top of being already cool, it's much easier to work with and extend than the old pipermail, so if you've always wanted to view your lists in some new and cool way, you can dust off your django skills and join the team!

Hyperkitty logo

Do remember that the suite is in beta, so there's still some bugs to fix and probably a few features to add, but we do know that people are running Mailman 3 live on some lists, so it's reasonably safe to use if you want to try it out on some smaller lists. In theory, it can co-exist with Mailman 2, but I admit I haven't tried that out yet. I will be trying it, though: I'm hoping to switch some of my own lists over soon, but probably not for a couple of weeks due to other life commitments.

So yeah, that's what I did at the PyCon sprints this year. Pretty cool, eh?
terriko: (Default)
2014-03-29 12:33 pm

Sparkfun's Arduino Day Sale: looking for inspriation!


Arduino Day 2014


Sparkfun has a bunch of Arduinos on crazy sale today, and they're allowing backorders. It's a one day sale, ending just before midnight US mountain time, so you've still got time to buy your own! Those $3 minis are amazing.

I wound up buying the maximum amount I could, since I figure if I don't use them myself, they'll make nice presents. I have plans for two of the mini ones already, as part of one of my rainy day projects that's only a little past drawing board and into "let's practice arduino coding and reading sensor data" stage. But the rest are waiting for new plans!

I feel a teensy bit guilty about buying so many arduinos when I haven't even found a good use for the Raspberry Pi I got at PyCon last year. I did buy it a pretty rainbow case and a cable, but my original plan to use it as the brains for a homemade cnc machine got scuttled when John went and bought a nice handybot cnc router.

disassembled pibow case
A pretty picture of the pibow rainbow raspberry pi case from this most excellent post about it. They're on sale today too if you order through pimoroni

I've got a few arty projects with light that might be fun, but I kind of wanted to do something a bit more useful with it. Besides, I've got some arty blinky-light etextile projects that are going to happen first and by the time I'm done those I think I'll want something different.

And then there's the Galileo, which obviously is a big deal at work right now. One of the unexpected perks of my job is the maker community -- I've been hearing all about the cool things people have tried with their dev boards and seeing cool projects, and for a while we even had a biweekly meet-up going to chat with some of the local Hillsboro makers. I joined too late to get a chance at a board from the internal program, but I'll likely be picking one up up on my own dime once I've figured out how I'm going to use it! (John already has one and the case he made for it came off the 3d printer this morning and I'm jealous!)

So... I'm looking for inspiration: what's the neatest arduino/raspberry pi/galileo/etc. project you've seen lately?
terriko: (Default)
2014-03-01 10:39 pm

Google Summer of Code: What do I do next?

Python's in as a mentoring organization again this year, and I'm running the show again this year. Exciting and exhausting!

In an attempt to cut down on the student questions that go directly to me, I made a flow chart of "what to do next" :

gsoc

(there's also a more accessible version posted at the bottom of our ideas page)

I am amused to tell you all that it's already cut down significantly on the amount of "what do I do next?" emails I've gotten as an org admin compared to this time last year. I'm not sure if it's because it's more eye-catching or better placed or what makes it more effective, since those instructions could be found in the section for students before. We'll see its magical powers hold once the student application period opens, though!
terriko: (Default)
2013-10-17 05:23 pm

I'm joining Intel's Open Source Technology Center!

I'm pleased to announce that I will be joining Intel's Open Source Technology Center (OTC), starting October 21st.

This is a big transition for me: not only have I physically moved to the Portland area from Albuquerque, but I'm also moving from academia to industry. However, I'm not moving away from either security or research: my official job title is "Security Researcher - Software Security Engineer."

There are lots of crazy smart people at Intel, especially at OTC, and I'm really excited (and a little scared!) about joining their ranks. This is exactly the job I wanted: I'll be doing security in an open source context (not only behind closed doors!), working with interesting people on interesting projects, and I'll be positioned such that my work can have an impact on the state of computer security in a global sense. It sounds like I'll be working primarily on web and Android security, which is challenging, fascinating, intimidating, and highly important. Wish me luck!
terriko: (Default)
2013-09-27 11:07 am
Entry tags:

FYI: GSoC midterm emails

FYI: Google summer of code emails from midterms are being re-sent right now due to a bug in melange. It's safe to ignore these. These ARE NOT typoed final evaluation emails: final evaluations close in an hour and Google will be sending those emails on Oct 1st.

(Edit: Apparently the final eval emails went out early too, so you may have the correct emails now, a few days early by Google's original schedule. Congrats and condolences to all!)
terriko: (Default)
2013-08-15 01:41 pm

Interview with me up on FastCoLabs

Today is a good day: I get to be famous for being snarky!

There's a short interview with me up on FastCoLabs today, regarding my (in)famous slideshare presentation about women, biology, and computer science.

She did a nice job of trimming down my original answers, but I am sad that she missed the part where I said I didn't answer the question about what does cause the disparity in my slideshare presentation because half the point of the presentation was to get people to think rather than mindlessly accept shortened arguments with good face validity. (The corollary being that there's a meta-joke in the presentation because it is a shortened argument with good face validity.)

I edited out some of the other snarky things I said before I sent 'em. It's probably just as well. ;)

Anyhow, in case anyone reading this hasn't seen the original presentation before, I'll just embed it here:



In case the embed doesn't show up for you, here's a link: How does biology explain the low numbers of women in computer science? Hint: it doesn't.

Enjoy!
terriko: (Default)
2013-08-07 04:29 pm
Entry tags:

Congratulations to Python's Google Summer of Code students and mentors!

Congratulations to all 36 of Python's students and our many mentors; everyone passed midterms and will be continuing for the second half of the summer!

The midterms wrapped up while I was still recovering from surgery, so I've only just this week started going through the midterm reports submitted by students and mentors. It's a real treat to hear stories from students about how helpful their mentors and communities have been, how they've been able to bring perspective to hard problems and help students reshape their ideas and learn. The mentors have told stories about students who were clever, thought deeply about problems, and willing to adapt to work better with their communities.

I'm really looking forwards to seeing what our students produce in the second half of the summer! If you're curious, don't forget that you can check out the aggregated blogs from all of Python's GSoC students. And students, don't forget that though we took some time off so everyone could work on midterms, we're now in week 8, which means you should have another blog post up by Monday, August 12.
terriko: (Default)
2013-07-14 12:42 pm
Entry tags:

Mailman Virtual Hackathon

We're having a mailman virtual hackathon right now on #mailman on freenode. The plan is to run 'till around 2300 UTC today, so another 4h or so. Link for figuring out what that means in your time zone.

We're doing a variety of things: bug triage and fixing, discussion of architecture, new feature development, helping each other with any blocking problems, spouting off crazy new ideas, code review and merging, etc. We're especially hoping to make sure we clear any issues we can relating to GSoC projects, but there's plenty of work to go around. New folk are welcome too.

If you don't read this 'till after the fact, don't despair! There will likely be another such hackathon next Sunday, July 21. Keep an eye on the mailman-developers list for more details.
terriko: (Default)
2013-07-02 12:10 am
Entry tags:

What should you say in your status updates?

The Google Summer of Code students working under the Python umbrella are required to blog about their work over the summer, at least bi-weekly. It helps me keep track of how the students are doing, and hopefully helps their mentors keep track as well. I've just emailed a bunch of folk whose blog posts for the first two weeks are now late, and I included the following list of questions to get them thinking about what to write:

1. What have you accomplished the past two weeks (list specific items accomplished)?
2. What issues or roadblocks have you encountered the past two weeks?
3. Have they been resolved, and if so, how?
4. Do any of the issues or roadblocks still exist and what steps have been taken to resolve them?
5. Is further assistance necessary to resolve existing issues?
6. What do you plan to accomplish in the next two weeks?
7. How does your progress compare to your project schedule?

This list is the one that Systers uses for their required weekly status updates, and it's one that I've found very useful as a guide even for my own status updates at my day job. So I figured I'd post it here in case any of you are stumped on what to include in your next status update!
terriko: (Default)
2013-07-01 04:25 pm
Entry tags:

Quick reminder for PSF GSoC students: Your first blog posts are due today!

Just a reminder for all the Python Software Foundation Google summer of code students. You are required to blog, and although I've given you two weeks to settle in, you need to have at least one blog post written by today if you're hoping to pass this term. You've got a few hours before I start reading and sending emails cc'ing your mentors, so if you haven't started yet, hop to it!
terriko: (Default)
2013-06-09 10:07 pm
Entry tags:

Python student blogs

One of the things that Python asks of all students under our "umbrella" is that they blog regularly about their projects. This helps me keep track of how all the students are doing, and helps advertise the interesting work they'll be doing to a larger community. I've set up a blog aggregator here for Python's Summer of Code Updates and you can see that folk are already talking about their projects as they settle in.

Coding starts June 17th. Here's to a great summer!
terriko: (Pi)
2013-06-09 06:18 pm
Entry tags:

Welcome Summer of Code 2013 students!

The Python Software Foundation has 36 Google Summer of Code students starting next week!

If you'd like to learn more about any of the student projects as they were proposed, you can also see the list and descriptions on the GSoC Website. But here's a list, grouped by project:


Core Python
Phil Webster, IDLE Improvements
Jayakrishnan Rajagopalasarma, IDLE Improvements




ASCEND
Ksenija Bestuzheva, ASCEND: dynamic modelling improvements
Pallav Tinna, Porting to gtk3 and GUI improvements




Astropy
Madhura Parikh, Astropy: Develop the Astroquery toolkit into a coherent package
Axel Donath, AstroPy: Extending the functionality of the photutils package.



GNU Mailman
Manish Gill, Mailman: Authenticated REST-API in Postorius/Django.
Abhilash Raj, GNU Mailman - Integration of OpenPGP




Kivy
Abhinav, Kivy: Kivy Designer
Ivan Pusic, PyOBJus



MNE-Python
Mainak Jas, Real-time Machine Learning for MEG in MNE-Python
Roman Goj, MNE-Python: Implement time-frequency beamformers




OpenHatch
David Lu, Data Driven Mentorship App
Tarashish Mishra, OpenHatch: Rewrite training missions using oppia (Training missions, version 2)



PyDy
Tarun Gaba, PyDy: Visualization of the simulated motion of multibody systems
Tyler Wade, wxPython Bindings for PyPy using CFFI




PyPy
Manuel Jacob, Implementing Python 3.3 features for PyPy




Pyramid
Andraž Brodnik, Better Debug tools
Domen Kožar, Substance D improvements




PySoy
Juhani Åhman, PySoy: Improve Android and HTML5 Soy clients




Scikit-Image
Chintak Sheth, scikit-image: Image Inpainting for Restoration
Marc de Klerk, scikit-image: Segmentation Algorithms as a basis for an OpenCL feasible study
Ankit Agrawal, scikit-image : Implementation of STAR and Binary Feature Detectors and Descriptors



Scikit-learn
Kemal Eren, scikit-learn: Biclustering algorithms, scoring, and data generation
Nicolas Trésegnie, Scikit-learn : online low rank matrix completion


SciPy
Surya Kasturi, SciPy: Improving functionality and Maintainability of SciPy Central
Arink Verma, SciPy/NumPy : Performance parity between numpy arrays and Python scalars
Blake Griffith, Improvements to the sparse package of Scipy: support for bool dtype and better interaction with NumPy




SfePy
Ankit Mahato, SfePy: Enhancing the solver to simulate solid-liquid phase change phenomenon in convective-diffusive situations


Statsmodels
Ana Martínez Pardo, Statsmodels: Discrete choice models
Chad Fulton, Statsmodels: Time Series Analysis Extensions (esp. regime-switching models)


SunPy
Michael J. Malocha, SunPy - Interfacing with Heliocphysics Databases
Simon Liedtke, SunPy: Database of local data



Tahoe-LAFS
Mark Berger, Upload Strategy of Happiness in Tahoe-LAFS


Twisted
Shiyao Ma,Twisted: Switching to Formal Parsers
Kai Zhang,Twisted: Deferred Cancellation

We had a great number of talented applicants and I only wish we'd been able to take more of them. Congratulations to those accepted and to the rest of you, I hope you'll apply again next year!