Wednesday, April 26, 2017

Who Contributes to PostgreSQL Development?

In a talk which I gave at PGCONF.IN and, in a shorter version, at PGCONF.US, I had a few slides on who contributes to PostgreSQL development.  Here, I'd like to present a slightly expanded version of the information which was in the talk.  The information in this post considers calendar year 2016 and comes from two sources.

First, I went through the PostgreSQL commit log for 2016, manually tagged each commit by principal author, and recorded the number of new lines of code added by that commit based on git diff --stat -w -M, options which are intended to suppress (more or less successfully) whitespace-only changes and changes due to file renames.  I also manually eliminated a few large mechanical commits, principally translation updates.  Second, Thom Brown extracted the authors of every email sent to the pgsql-hackers mailing list during 2016, and I then cleaned that up and normalized the names in an attempt to make sure that all emails sent by the same person were counted under one name.  Note that, because this data is all for calendar year 2016, it includes the end of the PostgreSQL 9.6 development cycle and the beginning of the PostgreSQL 10 development cycle.

I feel that this data, taken together, presents a reasonable view of who is contributing to PostgreSQL development.  From the commit log data, we can see who is writing code, and also who is committing that code when it gets written.  From the email counts, we can see who is participating in mailing list discussions, which captures - at least to some degree - the work of reviewing patches, providing feedback on designs, reporting problems, etc.  Neither measure is perfect; notably, anyone who was frequently the second author on a patch might be under-represented in these numbers, and two people could have written the same number of emails yet one of them might have written much more detailed, thoughtful, and useful emails.  Nonetheless, I believe these numbers do a fairly good job of capturing who did the work of moving PostgreSQL development forward during calendar year 2016.

Note that this considers only core development.  Many other people contribute by contributing to projects such as pgAdmin, pgpool, pgbouncer, and various PostgreSQL connectors; others contribute to user education, advocacy, web site maintenance, and other efforts.  I think it would be useful to see statistics on those types of contributions as well, but I leave it to people more familiar with those areas to judge how such contributions would be best measured.

Disclaimers aside, and before we get into the details, here are some quick overall statistics:
  • In 2016, 141 people contributed at least 1 new line of code to PostgreSQL.  37 of those people account for 90% of the new lines of code contributed to PostgreSQL in 2016, and 14 of them account for 66% of the new lines of code contributed to PostgreSQL during 2016.
  • In 2016, 18 committers committed at least one patch for which they were not the principal author.  90% of the lines of code for which the principal committer was not the author were committed by 6 committers, and 66% of the lines of code for which the principal committer was not the author were committed by 2 committers.
  • In 2016, 528 people (modulo duplicate email addresses that I couldn't identify as belonging to the same person) sent at least 1 email to pgsql-hackers.  90% of those emails were sent by 78 people, and 66% of those emails were sent by 23 people.
Now, here are the detailed charts.  First, here are the 37 people who were the principal authors of 90% of lines of new code contributed during 2016.  Non-committers are marked with an asterisk.  "lines" shows the number of lines of code for which that person was the principal author, "pct_lines" shows that as a percentage of the total lines contributed, and "commits" is the number of commits across which those lines were spread.

  # |         author         | lines | pct_lines | commits
----+------------------------+-------+-----------+---------
  1 | Tom Lane               | 62077 |     29.20 |     637
  2 | Amit Langote [*]       |  9889 |      4.65 |      30
  3 | Robert Haas            |  9685 |      4.55 |     108
  4 | Stephen Frost          |  9177 |      4.32 |      46
  5 | Teodor Sigaev          |  8345 |      3.92 |      28
  6 | Michael Paquier [*]    |  7778 |      3.66 |     106
  7 | Andres Freund          |  5913 |      2.78 |      61
  8 | David Rowley [*]       |  5582 |      2.63 |      26
  9 | Alexander Korotkov [*] |  5174 |      2.43 |      11
 10 | Peter Eisentraut       |  4877 |      2.29 |     161
 11 | Heikki Linnakangas     |  4378 |      2.06 |      42
 12 | Thomas Munro [*]       |  3535 |      1.66 |      31
 13 | Magnus Hagander        |  3494 |      1.64 |      26
 14 | Amit Kapila [*]        |  3480 |      1.64 |      35
 15 | Kevin Grittner         |  3103 |      1.46 |      23
 16 | Andreas Karlsson [*]   |  3062 |      1.44 |      33
 17 | Bruce Momjian          |  3049 |      1.43 |      27
 18 | Fabien Coelho [*]      |  2768 |      1.30 |      22
 19 | Shigeru Hanada [*]     |  2752 |      1.29 |       3
 20 | Alvaro Herrera         |  2636 |      1.24 |      56
 21 | Jeevan Chalke [*]      |  2454 |      1.15 |       2
 22 | Etsuro Fujita [*]      |  2378 |      1.12 |      21
 23 | Kyotaro Horiguchi [*]  |  2171 |      1.02 |      19
 24 | Masahiko Sawada [*]    |  2129 |      1.00 |      20
 25 | Peter Geoghegan [*]    |  2121 |      1.00 |      25
 26 | Tomas Vondra [*]       |  2084 |      0.98 |      10
 27 | Craig Ringer [*]       |  2063 |      0.97 |      17
 28 | Artur Zakirov [*]      |  1962 |      0.92 |      13
 29 | Andrew Gierth [*]      |  1726 |      0.81 |       5
 30 | Dean Rasheed           |  1627 |      0.77 |      10
 31 | Daniel Vérité [*]      |  1530 |      0.72 |       4
 32 | Emre Hasegeli [*]      |  1497 |      0.70 |       5
 33 | Joe Conway             |  1471 |      0.69 |      10
 34 | Noah Misch             |  1430 |      0.67 |      31
 35 | Jim Nasby [*]          |  1404 |      0.66 |      11
 36 | Petr Jelinek [*]       |  1400 |      0.66 |      10
 37 | Pavel Stehule [*]      |  1254 |      0.59 |       6


Next, here are all of the committers who committed code for which they were not the principal author during 2016; in other words, these are committers that committed code written by non-committers (or possibly by another committer, but that's rare).  "lines" is the number of new lines of code added, "pct_lines" is that same number as a percentage of the total, and "commits" is the number of commits across which those lines were spread.

 #  |     committer      | lines | pct_lines | commits
----+--------------------+-------+-----------+---------
  1 | Robert Haas        | 37726 |     40.03 |     241
  2 | Tom Lane           | 25293 |     26.84 |     204
  3 | Alvaro Herrera     |  7611 |      8.08 |      59
  4 | Teodor Sigaev      |  7252 |      7.70 |      32
  5 | Heikki Linnakangas |  4191 |      4.45 |      33
  6 | Peter Eisentraut   |  3588 |      3.81 |      56
  7 | Andres Freund      |  2558 |      2.71 |      22
  8 | Simon Riggs        |  1886 |      2.00 |      21
  9 | Fujii Masao        |  1626 |      1.73 |      21
 10 | Magnus Hagander    |   638 |      0.68 |      30
 11 | Noah Misch         |   533 |      0.57 |      10
 12 | Andrew Dunstan     |   426 |      0.45 |       6
 13 | Kevin Grittner     |   401 |      0.43 |       8
 14 | Stephen Frost      |   381 |      0.40 |       6
 15 | Dean Rasheed       |    56 |      0.06 |       1
 16 | Bruce Momjian      |    49 |      0.05 |      10
 17 | Joe Conway         |    23 |      0.02 |       1
 18 | Michael Meskes     |     5 |      0.01 |       1


Finally, here are the 78 people who sent 90% of the emails to pgsql-hackers in 2016, with the number of emails sent by each and the same as a percentage of the total.

 #  |         author          | emails | pct_emails
----+-------------------------+--------+------------
  1 | Tom Lane                |   2911 |      11.00
  2 | Robert Haas             |   2682 |      10.14
  3 | Michael Paquier         |   1679 |       6.35
  4 | Andres Freund           |   1344 |       5.08
  5 | Alvaro Herrera          |    913 |       3.45
  6 | Amit Kapila             |    789 |       2.98
  7 | Craig Ringer            |    680 |       2.57
  8 | Peter Eisentraut        |    631 |       2.38
  9 | Pavel Stehule           |    583 |       2.20
 10 | Amit Langote            |    562 |       2.12
 11 | Peter Geoghegan         |    551 |       2.08
 12 | Kyotaro Horiguchi       |    502 |       1.90
 13 | Stephen Frost           |    443 |       1.67
 14 | Jim Nasby               |    437 |       1.65
 15 | Bruce Momjian           |    364 |       1.38
 16 | Tomas Vondra            |    349 |       1.32
 17 | Fabien Coelho           |    330 |       1.25
 18 | Magnus Hagander         |    311 |       1.18
 19 | Simon Riggs             |    311 |       1.18
 20 | Ashutosh Bapat          |    293 |       1.11
 21 | Petr Jelinek            |    283 |       1.07
 22 | David Steele            |    270 |       1.02
 23 | Heikki Linnakangas      |    264 |       1.00
 24 | Kevin Grittner          |    262 |       0.99
 25 | Thomas Munro            |    261 |       0.99
 26 | David G. Johnston       |    259 |       0.98
 27 | Etsuro Fujita           |    252 |       0.95
 28 | Haribabu Kommi          |    217 |       0.82
 29 | Noah Misch              |    215 |       0.81
 30 | Masahiko Sawada         |    209 |       0.79
 31 | David Rowley            |    205 |       0.77
 32 | Andrew Dunstan          |    178 |       0.67
 33 | Jeff Janes              |    177 |       0.67
 34 | Joshua D. Drake         |    177 |       0.67
 35 | Fujii Masao             |    162 |       0.61
 36 | Alexander Korotkov      |    155 |       0.59
 37 | Dilip Kumar             |    154 |       0.58
 38 | Joe Conway              |    151 |       0.57
 39 | Takayuki Tsunakawa      |    133 |       0.50
 40 | Tatsuo Ishii            |    127 |       0.48
 41 | Aleksander Alekseev     |    126 |       0.48
 42 | Andreas Karlsson        |    126 |       0.48
 43 | Greg Stark              |    123 |       0.46
 44 | Corey Huinker           |    122 |       0.46
 45 | David Fetter            |    113 |       0.43
 46 | Vitaly Burovoy          |    107 |       0.40
 47 | Merlin Moncure          |    106 |       0.40
 48 | Claudio Freire          |    105 |       0.40
 49 | Julien Rouhaud          |    100 |       0.38
 50 | Teodor Sigaev           |     93 |       0.35
 51 | Kouhei Kaigai           |     91 |       0.34
 52 | Josh Berkus             |     89 |       0.34
 53 | Artur Zakirov           |     87 |       0.33
 54 | Konstantin Knizhnik     |     83 |       0.31
 55 | Anastasia Lubennikova   |     80 |       0.30
 56 | Daniel Verite           |     80 |       0.30
 57 | Oleksandr Shulgin       |     80 |       0.30
 58 | Yury Zhuravlev          |     79 |       0.30
 59 | Mithun Cy               |     76 |       0.29
 60 | Christoph Berg          |     75 |       0.28
 61 | Andreas Seltenreich     |     74 |       0.28
 62 | Christian Ullrich       |     74 |       0.28
 63 | Pavan Deolasee          |     71 |       0.27
 64 | Vladimir Sitnikov       |     71 |       0.27
 65 | Dean Rasheed            |     68 |       0.26
 66 | Oleg Bartunov           |     67 |       0.25
 67 | Stas Kelvich            |     67 |       0.25
 68 | Jesper Pedersen         |     63 |       0.24
 69 | Thom Brown              |     63 |       0.24
 70 | Karl O. Pinc            |     61 |       0.23
 71 | Robbie Harwood          |     58 |       0.22
 72 | Andrew Gierth           |     55 |       0.21
 73 | Vik Fearing             |     55 |       0.21
 74 | Andrew Borodin          |     53 |       0.20
 75 | Victor Wagner           |     52 |       0.20
 76 | Chapman Flack           |     50 |       0.19
 77 | Fabrízio de Royes Mello |     50 |       0.19
 78 | Rushabh Lathia          |     50 |       0.19


Thanks to all who contributed to PostgreSQL development during 2016!  A database dump of the data used to construct these reports is available for those who may find it useful.

5 comments:

  1. Very interesting statistics! This makes me think that I have to do more...

    ReplyDelete
  2. Cool!
    might be interested to have an overview of enterprises behind dev or smth ... just to have an idea

    ReplyDelete
    Replies
    1. I started doing this for myself but gave up after tagging 68 of 143 authors. Some people are hard to figure out. I do not think the comments here support preformatted text so you can see the result in my Reddit comment.

      https://www.reddit.com/r/PostgreSQL/comments/67uk65/who_contributes_to_postgresql_development/dgtxnt6/

      Delete
    2. Nice! thanks for the info

      Delete
  3. This comment has been removed by the author.

    ReplyDelete