The Wayback Machine - https://web.archive.org/web/20150928101740/http://planetpython.org:80/

skip to navigation
skip to content

Planet Python

Last update: September 28, 2015 07:49 AM

September 27, 2015


PyCon

PyConZA 2015: 1 & 2 October, Johannesburg

South Africa’s fourth PyCon kicks off in just three days' time in Johannesburg!

The conference takes place at the Witwatersrand University on the 1 & 2 October, with sprints at JoziHub on the 3 & 4 October.

Schedule highlights include:

See za.pycon.org for all the details and the full schedule.

September 27, 2015 05:42 PM


Ludovic Gasc

Macro-benchmark with Django, Flask and AsyncIO (aiohttp.web+API-Hour)

Disclaimer: If you have some bias and/or dislike AsyncIO, please read my previous blog post before starting a war.

Warning: Since I've published this article, my first benchmark published in public, I've received a lot of remarks. Even I've tried to have no errors to be the closest to the "truth", for this benchmark, I've made a mistake: No keepalive for Flask and Django. It's why I've made a second benchmark, and now API-Hour is participating to FrameworkBenchmarks contest, to have the most realistics numbers about theses problematics.
Thanks everybody that helped me to give me all pieces of information to improve my knowledge.
Please to forgive me, first times are always catastrophics, especially in public ;-)

Context of this macro-benchmark

Today, I propose you to benchmark a HTTP daemon based on AsyncIO, and compare results with a Flask and Django version.

For those who didn't follow AsyncIO news, aiohttp.web is a light Web framework based on aiohttp. It's like Flask but with less internal layers.
aiohttp is the implementation of HTTP with AsyncIO.

Moreover, API-Hour helps you to have multiprocess daemons with AsyncIO.
With this tool, we can compare Flask, Django and aiohttp.web in the same conditions.
This benchmark is based on a concrete need of one of our customers: they wanted to have a REST/JSON API to interact with their telephony server, based on Asterisk.
One of the WebServices gives the list of agents with their status. This WebService is heavily used because they use it on their public Website (itself having a serious traffic) to show who is available.

First, I've made a HTTP daemon based on Flask and Gunicorn, which gave honorable results. Later on, I replaced the HTTP part and pushed in production a daemon based on aiohttp.web and API-Hour.
A subset of theses daemons are used for this benchmark.
I've added a Django version because with Django and Flask, I certainly cover 90% of tools used by Python Web developers.

I've tried to have the same parameters for each daemon: for example, I obviously use the same number of workers, 16 in this benchmark.

I don't benchmark Django manage.py or dev HTTP server of Flask, I use Gunicorn, as most people use on production, to try to compare apples with apples.

Hardware

Network  benchmark

I've almost 1Gb/s with this network:

On Server:
$ iperf -c 192.168.2.101 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 192.168.2.101, TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 5] local 192.168.2.100 port 24831 connected with 192.168.2.101 port 5001
[ 4] local 192.168.2.100 port 5001 connected with 192.168.2.101 port 16316
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.1 sec 1.06 GBytes 903 Mbits/sec
[ 5] 0.0-10.1 sec 1.11 GBytes 943 Mbits/sec

On Client:
$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 4] local 192.168.2.101 port 5001 connected with 192.168.2.100 port 24831
------------------------------------------------------------
Client connecting to 192.168.2.100, TCP port 5001
TCP window size: 28.6 MByte (default)
------------------------------------------------------------
[ 6] local 192.168.2.101 port 16316 connected with 192.168.2.100 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 1.06 GBytes 908 Mbits/sec
[ 4] 0.0-10.2 sec 1.11 GBytes 927 Mbits/sec


System configuration
It's important to configure your PostgreSQL as a production server.
You need also to configure your Linux kernel to handle a lot of open sockets and some TCP tricks.
Everything is in the benchmark repository.

Client benchmark tool

From my experience with AsyncIO, Apache Benchmark (ab), Siege, Funkload and some old fashion HTTP benchmarks tools don't hit enough for an API-Hour daemon.
For now, I use wrk and wrk2 to benchmark.
wrk hits as fast as possible, where wrk2 hits with the same rate.

Metrics observed

I record three metrics:
  1. Requests/sec: Least interesting of metrics. (see below)
  2. Error rate: Sum of all errors (socket timeout, socket read/write, 5XX errors...)
  3. Reactivity: Certainly the most interesting of the three, it measures the time that our client will actually wait.

WebServices daemons

You can find all source code in API-Hour repository: https://github.com/Eyepea/API-Hour/tree/master/benchmarks
Each daemon has at least two WebServices:
On Flask daemon, I added /agents_with_pool endpoint, to use a database connection pool with Flask, but it isn't really good, you'll see later.
On Django daemon, I added /agents_with_orm endpoint, to measure the overhead to use Django-ORM instead of to use SQL directly. Warning: I didn't find a solution to have the exact same query.

Methodology

Each daemon will run alone to preserve resources.
Between each run, the daemon is restarted to be sure that previous test doesn't pollute the next one.

First turn

At the beginning, to have an idea how much maximum HTTP queries each daemon can support, I quickly attack (30 seconds) on localhost.

Warning ! This benchmark doesn't represent the reality you can have in production, because you don't have a network limitation nor latency, it's only for calibration.

Simple JSON document

In each daemon folder in benchmarks repository, you can read the output result of each wrk.
To simplify the reading, I summarize the captured values with an array and graphs:


Requests/s Errors Avg Latency (s)
Django+Gunicorn 70598 4489 7.7
Flask+Gunicorn 79598 4433 13.16
aiohttp.web+API-Hour 395847 0 0.03

Requests by seconds
(Higher is better)
Errors
(Lower is better)

Latency (s)
(Lower is better)

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 583 2518 0.324
Django ORM+Gunicorn 572 2798 0.572
Flask+Gunicorn 634 2985 13.16
Flask (connection pool) 2535 79704 12.09
aiohttp.web+API-Hour 4179 0 0.098

Requests by seconds
(Higher is better)
Errors
(Lower is better)

Latency (s)
(Lower is better)


 Conclusions for the next round

On high charge, Django doesn't have the same behaviour as Flask: Both handle more or less the same requests rate, but Django penalizes less global latency of HTTP queries. The drawback is that the slow HTTP queries are very slow (26,43s for Django compared to 13,31s for Flask).
I removed Django ORM test for the next round because it isn't exactly the same SQL query generated and the performance difference with a SQL query is negligible.
I removed also Flask DB connection pool because the error rate is too important compared to other tests.

Second round

Here, I use wrk2, and changed the run time to 5 minutes.
A longer run time is very important because of how resources availability can change with time.
There are at least two reasons for this:

1. Your test environment runs  on top of some OS which continues its activity during the test.
Therefore, you need a long time to be more insensitive to transient use of your test machine resources by other things
like another OS daemon or cron job triggering meanwhile.

2. The ramp-up of your test will gradually consume more resources at different levels: at the level of your Python scripts & libs,
 as well as at the level of you OS / (Virtual) Machine.
This decrease of available resources will not necessarily be instantaneous, nor linear.
This is a typical source of after-deployment bad surprises in prod.
Here too, to be as close as possible to production scenario, you need to give time to your test to arrive to a "hover", eventually saturating some resources.
Ideally you'd saturate the network first (which in this case is like winning the jackpot).

Here, I'm testing at a constant 4000 queries per second, this time through the network.

Simple JSON document


Requests/s Errors Avg Latency (s)
Django+Gunicorn 1799 26883 97
Flask+Gunicorn 2714 26742 52
aiohttp.web+API-Hour 3995 0 0.002

Requests by seconds
(Higher is better)
Errors
(Lower is better)

Latency (s)
(Lower is better)

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 278 37480 141.6
Flask+Gunicorn 304 40951 136.8
aiohttp.web+API-Hour 3698 0 7.84

Requests by seconds
(Higher is better)
Errors
(Lower is better)

Latency (s)
(Lower is better)

(Extra) Third round

For the fun, I used the same setup as second round, but with only with 10 requests/seconds during 30 seconds to see if under a low load, sync daemons could be quicker, because you have the AsyncIO overhead.

Agents list from database


Requests/s Errors Avg Latency (s)
Django+Gunicorn 10 0 0.01936
Flask+Gunicorn 10 0 0.01874
aiohttp.web+API-Hour 10 0 0.00642

Latency (s)
(Lower is better)

Conclusion

AsyncIO with aiohttp.web and API-Hour increases the number of requests per second, but more importantly, you have no sockets nor 5XX errors and the waiting time for each user is very really better, even with low load. This benchmark uses an ideal network setup, and therefore it doesn't cover a much worse scenario where your client arrives over a slow network (think smartphone users) on your Website.

It has been said often: If your webapp is your business, reduce waiting time is a key winner for you:

Some clues to improve AsyncIO performances

Even if this looks like good performance, we shouldn't rest on our laurels, we can certainly find more optimizations:

  1. Use an alternative event loop: I've tested to replace AsyncIO event loop and network layer by aiouv and quamash. For now, it doesn't really have a huge impact, maybe in the future.
  2. Have multiplex protocols from frontend to backend: HTTP 2 is now a multiplex protocol, it means you can stack several HTTP queries without waiting for the first response. This pattern should increase AsyncIO performances, but it must be validated by a benchmark.
  3. If you have another idea, don't hesitate to post it in comments.

Don't take architectural decisions based on micro-benchmarks

It's important to be very cautious with benchmarks, especially with micro-benchmarks. Check several different benchmarks, using different scenari, before to conclude on architecture for your application.

Don't forget this is all about IO-bound

If I was working for an organisation with a lot of CPU-bound projects, (such as a scientific organisation for example), my speech would be totally different.
But, my day-to-day challenges are more about I/O than about CPU, probably like for most Web developers.

Don't simply take me as a mentor. The needs and problematics of one person or organisation are not necessarily the same as your, even if that person is considered as a "guru" in one opensource community or another. 

We should all try to keep a rational, scientific approach instead of religious approach when selecting your tools.
I hope this post will give you some ideas to experiment with. Feel free to share your tips to increase performances, I'd be glad to include them in my benchmarks!

I hope that these benchmarks will be an eye-opener for you.

September 27, 2015 02:27 PM


Carl Trachte

MSSQL sqlcmd -> bcp csv dump -> Excel

A couple months back I had a one-off assignment to dump some data from a vendor provided relational database to a csv file and then from there to Excel (essentially a fairly simple ETL - extract, transform, load exercise).  It was a little trickier than I had planned it.  Disclaimer:  this may not be the best approach, but it worked . . . at least twice . . . on two different computers and that was sufficient.

Background:

Database:  the relational database provided by the vendor is the back end to a graphic mine planning application.  It does a good  job of storing geologic and mine planning data, but requires a little work to extract the data via SQL queries. 

Weighted Averages:  specifically, the queries are required to do tonne-weighted averages and binning.  Two areas that I've worked in, mine planning and mineral processing (mineral processing could be considered a subset of metallurgy or chemical engineering), require a lot of work with weighted averages.  Many of the database programming examples on line deal with retail and focus one sales in the form of sum of sales by location.  The weighted average by tonnes or gallons of flow requires a bit more SQL code.

Breaking Up the SQL and the CSV Dump Problem:  in order to break the weighted average and any associated binning into smaller, manageable chunks of functionality, I used MSSQL (Microsoft SQL Server) global temporary tables in my queries.  Having my final result set in one of these global temporary tables allowed me to dump it to a csv file using the MSSQL bcp utility.  There are other ways to get a result set and produce a csv file from it with Python.  I wanted to isolate as much functionality within the MSSQL database as possible.  Also, the bcp utility gives some feedback when it fails - this made debugging or troubleshooting the one off script easier, for me, at least.

As far as the SQL goes, I may have been able to do this with a single query without too much trouble.  There are tools within Transact-SQL for pivoting data and doing the sort of things I naively and crudely do with temporary tables.  That said, in real life, the data are seldom this simple and this clean.  There are far more permutations and exceptions.  The real life version of this problem has fourteen temporary tables versus the four shown here.

Sanitized Mock Up Scenario:  there's no need to go into depth on our vendor's database schema or the specific technical problem - both are a tad complicated.  I like doing tonne-weighted averages with code but it's not everyone's cup of tea.  In the interest of simplifying this whole thing and making it more fun, I've based it on the old Star Trek Episode Devil in the Dark about an underground mine on a distant planet.






Mock Data:  we're modeling mined out areas and associated tonnages of rock bearing pergium, gold, and platinum in economic concentrations.  (I don't know what pergium is, but it was worth enough that going to war with Mother Horta seemed like a good idea).  Here is some code to create the tables and fill in the data (highly simplified schema - each mined out area is a "cut").

SQL Server 2008 R2 (Express) - table creation and mock data SQL code .  I'm not showing the autogenerated db creation code - it's lengthly - suffice it to say the database name is JanusVIPergiumMine.  Also, there are no keys in the tables for the sake of simplicity.

USE JanusVIPergiumMine;

CREATE TABLE cuts (
    cutid INT,
    cutname VARCHAR(50),
    monthx VARCHAR(30),
    yearx INT);

CREATE TABLE cutattributes (
    cutid INT,
    attributex VARCHAR(50),
    valuex VARCHAR(50));

CREATE TABLE tonnes(
    cutid INT NULL,
    tonnes FLOAT);

CREATE TABLE dbo.gradesx(
 cutid int NULL,
 gradename varchar(50) NULL,
 gradex float NULL);

DELETE FROM cuts;

INSERT INTO cuts
    VALUES (1, 'HappyPergium1', 'April', 2015),
           (2, 'HappyPergium12', 'April', 2015),
           (3, 'VaultofTomorrow1', 'April', 2015),
           (4, 'VaultofTomorrow2', 'April', 2015),
           (5, 'Children1', 'April', 2015),
           (6, 'Children2', 'April', 2015),
           (7, 'VandenbergsFind1', 'April', 2015),
           (8, 'VandenbergsFind2', 'April', 2015);

DELETE FROM cutattributes;

INSERT INTO cutattributes
    VALUES (1, 'Drift', 'Level23East'),
           (2, 'Drift', 'Level23East'),
           (3, 'Drift', 'Level23West'),
           (4, 'Drift', 'Level23West'),
           (5, 'Drift', 'BabyHortasCutEast'),
           (6, 'Drift', 'BabyHortasCutEast'),
           (7, 'Drift', 'BabyHortasCutWest'),
           (8, 'Drift', 'BabyHortasCutWest');

DELETE FROM tonnes;

INSERT INTO tonnes
    VALUES (1, 28437.0),
           (2, 13296.0),
           (3, 13222.0),
           (4, 6473.0),
           (5, 6744.0),
           (6, 8729.0),
           (7, 10030.0),
           (8, 2345.0);

DELETE FROM gradesx;

INSERT INTO gradesx
    VALUES (1, 'Au g/tonne', 6.44),
           (1, 'Pt g/tonne', 0.54),
           (1, 'Pergium g/tonne', 15.23),
           (2, 'Au g/tonne', 7.83),
           (2, 'Pt g/tonne', 0.77),
           (2, 'Pergium g/tonne', 4.22),
           (3, 'Au g/tonne', 0.44),
           (3, 'Pt g/tonne', 3.54),
           (3, 'Pergium g/tonne', 2.72),
           (4, 'Au g/tonne', 0.87),
           (4, 'Pt g/tonne', 2.87),
           (4, 'Pergium g/tonne', 1.11),
           (5, 'Au g/tonne', 12.03),
           (5, 'Pt g/tonne', 0.33),
           (5, 'Pergium g/tonne', 10.01),
           (6, 'Au g/tonne', 8.72),
           (6, 'Pt g/tonne', 1.38),
           (6, 'Pergium g/tonne', 5.44),
           (7, 'Au g/tonne', 7.37),
           (7, 'Pt g/tonne', 1.59),
           (7, 'Pergium g/tonne', 4.05),
           (8, 'Au g/tonne', 3.33),
           (8, 'Pt g/tonne', 0.98),
           (8, 'Pergium g/tonne', 3.99);

Python Code to Run the Dump/ETL to CSV:  this is essentially a series of os.system calls to MSSQL's sqlcmd and bcp.  What made this particularly brittle and hairy is the manner in which the lifetime of temporary tables is determined in MSSQL.  To get the temporary table with my results to persist, I had to wrap its creation inside a process.  I'm ignorant as to the internal workings of buffers and memory here, but the MSSQL sqlcmd commands do not execute or write to disk exactly when you might expect them to.  Nothing is really completed until the process hosting sqlcmd is killed.

At work I actually got the bcp format file generated on the fly - I wasn't able to reproduce this behavior for this mock exercise.  Instead, I generated a bcp format file for the target table dump "by hand" and put the file in my working directory.

As I show further on, this SQL data dump will be run from a button within an Excel spreadsheet.

Mr. Spock, or better said, Horta Mother says it best:


Subprocesses, sqlcmd, bcp, Excel . . .

PAAAAAIIIIIIIN!



#!C:\Python34\python

# blogsqlcmdpull.py

# XXX
# Changed my laptop's name to MYLAPTOP.
# Yours will be whatever your computer
# name is.

import os
import subprocess as subx
import shlex
import time
import argparse

# Need to make sure you are in proper Windows directory.
# Can vary from machine to machine based on
# environment variables.
# Googled StackOverflow.
# 5137497/find-current-directory-and-files-directory
EXCELDIR = os.path.dirname(os.path.realpath(__file__))
os.chdir(EXCELDIR)
print('\nCurrent directory is {:s}'.format(os.getcwd()))

parser = argparse.ArgumentParser()
# 7 digit argument like 'Apr2015'
# Feed in at command line
parser.add_argument('monthyear',
    help='seven digit, month abbreviation (Apr2015)',
    type=str)
args = parser.parse_args()
MONTHYEAR = args.monthyear

# Use Peoplesoft/company id so that more than
# one user can run this at once if necessary
# (note:  will not work if one user tries to
#         run multiple instances at the same
#         time - theoretically <not tested>
#         tables will get mangled and data
#         will be corrupt.)
USER = os.getlogin()

CSVDUMPNAME = 'csvdumpname'
CSVDUMP = 'nohandjamovnumbersbcp'
CSVEXT = '.csv'
HOMESERVERNAME = 'homeservername'
LOCALSERVER = r'MYLAPTOP\SQLEXPRESS'
USERNAME = 'username'

# Need to fill in month, year
# with input from Excel spreadsheet.
QUERYDICT = {'month':"'{:s}'",
             'year':0,
             USERNAME:USER}

# For sqlcmd and bcp
ERRORFILENAME = 'errorfilename'
STDOUTFILENAME = 'stdoutfilename'
ERRX = 'sqlcmderroutput.txt'
STDOUTX = 'sqcmdoutput.txt'
EXIT = '\nexit\n'
UTF8 = 'utf-8'
GOX = '\nGO\n'

# 2 second pause.
PAUSEX = 2
SLEEPING = '\nsleeping {pause:d} seconds . . .\n'

# XXX - Had to generate this bcp format file
#       from table in MSSQL Management Studio -
#       dos command line:
# bcp ##TARGETX format nul -f test.fmt -S MYLAPTOP\SQLEXPRESS -t , -c -T

# XXX - you can programmatically extract
#       column names from the bcp format
#       file or
#       you can dump them from SQLServer
#       with a separate query in bcp -
#       I have done neither here
#       (I hardcoded them).
FMTFILE = 'formatfile'
COLBCPFMTFILE = 'bcp.fmt'

CMDLINEDICT = {HOMESERVERNAME:LOCALSERVER,
               'exit':EXIT,
               CSVDUMPNAME:CSVDUMP,
               ERRORFILENAME:ERRX,
               STDOUTFILENAME:STDOUTX,
               'go':GOX,
               USERNAME:USER,
               'pause':PAUSEX,
               FMTFILE:COLBCPFMTFILE}

# Startup for sqlcmd interactive mode.
SQLPATH = r'C:\Program Files\Microsoft SQL Server'
SQLPATH += r'\100\Tools\Binn\SQLCMD.exe'
SQLCMDEXE = [SQLPATH]
SQLCMDARGS
= shlex.split(
    ('-S{homeservername:s}'.format**CMDLINEDICT)),
         posix=False)
SQLCMDEXE.extend(SQLCMDARGS)

BCPSTR = ':!!bcp "SELECT * FROM ##TARGETX{username:s};" '
BCPSTR += 'queryout {csvdumpname:s}.csv -t , '
BCPSTR += '-f {formatfile:s} -S {homeservername:s} -T'
BCPSTR = BCPSTR.format(**CMDLINEDICT)

def cleanslate():
    """
    Delete files from previous runs.
    """
    # XXX - only one file right now.
    files = [CSVDUMP + CSVEXT]
    for filex in files:
        if os.path.exists(filex) and os.path.isfile(filex):
            os.remove(filex)
    return 0

MONTHS = {'Jan':'January',
          'Feb':'February',
          'Mar':'March',
          'Apr':'April',
          'May':'May',
          'Jun':'June',
          'Jul':'July',
          'Aug':'August',
          'Sep':'September',
          'Oct':'October',
          'Nov':'November',
          'Dec':'December'}

def parseworkbookname():
    """
    Get month (string) and year (integer)
    from name of workbook (Apr2015).
    Return as month, year 2 tuple.
    """
    # XXX
    # Write this out - will eventually
    # need error checking/try-catch
    monthx = MONTHS[MONTHYEAR[:3]]
    yearx = int(MONTHYEAR[3:])
    return monthx, yearx

# Global Temporary Tables
TONNESTEMPTBL = """
CREATE TABLE ##TONNES{username:s} (
    yearx INT,
    monthx VARCHAR(30),
    cutid INTEGER,
    drift VARCHAR(30),
    tonnes FLOAT);
"""

FILLTONNES = """
USE JanusVIPergiumMine;

DECLARE @DRIFT CHAR(5) = 'Drift';
INSERT INTO ##TONNES{username:s}
    SELECT cutx.yearx,
           cutx.monthx,
           cutx.cutid,
           cutattrx.valuex AS drift,
           tonnesx.tonnes
    FROM cuts cutx
        INNER JOIN cutattributes cutattrx
            ON cutx.cutid = cutattrx.cutid
        INNER JOIN tonnes tonnesx
            ON cutx.cutid = tonnesx.cutid
    WHERE cutx.yearx = {year:d} AND
          cutx.monthx = {month:s} AND
          cutattrx.attributex = @DRIFT;
"""

GRADESTEMPTBL = """
CREATE TABLE ##GRADES{username:s} (
    cutid INTEGER,
    drift VARCHAR(30),
    gradenamex VARCHAR(50),
    graden FLOAT);

"""

FILLGRADES = """
USE JanusVIPergiumMine;
DECLARE @DRIFT CHAR(5) = 'Drift';
INSERT INTO ##GRADES{username:s}
    SELECT cutx.cutid,
           cutattrx.valuex AS drift,
           gradesx.gradename,
           gradesx.gradex
    FROM cuts cutx
        INNER JOIN cutattributes cutattrx
            ON cutx.cutid = cutattrx.cutid
        INNER JOIN gradesx
            ON cutx.cutid = gradesx.cutid
    WHERE cutx.yearx = {year:d} AND
          cutx.monthx = {month:s} AND
          cutattrx.attributex = @DRIFT;
"""

# Sum and tonne-weighted averages
MONTHLYPRODDATASETTEMPTBL = """
CREATE TABLE ##MONTHLYPRODDATASET{username:s} (
    yearx INT,
    monthx VARCHAR(30),
    drift VARCHAR(30),
    tonnes FLOAT,
    gradename VARCHAR(50),
    grade FLOAT);
"""

FILLMONTHLYPRODDATASET = """
INSERT INTO ##MONTHLYPRODDATASET{username:s}
    SELECT tonnesx.yearx,
           tonnesx.monthx,
           tonnesx.drift,
           SUM(tonnesx.tonnes) AS tonnes,
           gradesx.gradenamex AS gradename,
           SUM(tonnesx.tonnes * gradesx.graden)/
           SUM(tonnesx.tonnes) AS graden
    FROM ##TONNES{username:s} tonnesx
        INNER JOIN ##GRADES{username:s} gradesx
            ON tonnesx.cutid = gradesx.cutid
    GROUP BY tonnesx.yearx,
             tonnesx.monthx,
             tonnesx.drift,
             gradesx.gradenamex;
"""

# Pivot
TARGETXTEMPTBL = """
CREATE TABLE ##TARGETX{username:s} (
    yearx INT,
    monthx VARCHAR(30),
    drift VARCHAR(30),
    tonnes FLOAT,
    pergium FLOAT,
    Au FLOAT,
    Pt FLOAT);
"""

FILLTARGETX = """
DECLARE @PERGIUM CHAR(15) = 'Pergium g/tonne';
DECLARE @GOLD CHAR(10) = 'Au g/tonne';
DECLARE @PLATINUM CHAR(10) = 'Pt g/tonne';
INSERT INTO ##TARGETX{username:s}
    SELECT mpds.yearx,
           mpds.monthx,
           mpds.drift,
           MAX(mpds.tonnes) AS tonnes,
           MAX(perg.grade) AS pergium,
           MAX(au.grade) AS Au,
           MAX(pt.grade) AS Pt
    FROM ##MONTHLYPRODDATASET{username:s} mpds
        INNER JOIN ##MONTHLYPRODDATASET{username:s} perg
            ON perg.drift = mpds.drift AND
            perg.gradename = @PERGIUM
        INNER JOIN ##MONTHLYPRODDATASET{username:s} au
            ON au.drift = mpds.drift AND
            au.gradename = @GOLD
        INNER JOIN ##MONTHLYPRODDATASET{username:s} pt
            ON pt.drift = mpds.drift AND
            pt.gradename = @PLATINUM
    GROUP BY mpds.yearx,
             mpds.monthx,
             mpds.drift
    ORDER BY mpds.drift;
"""

# 1) Create global temp tables.
# 2) Fill global temp tables.
# 3) Get desired result set into the target global temp table.
# 4) Run bcp against target global temp table.
# 5) Drop global temp tables.
CREATETABLES = {1:TONNESTEMPTBL,
                2:GRADESTEMPTBL,
                3:MONTHLYPRODDATASETTEMPTBL,
                4:TARGETXTEMPTBL}
FILLTABLES = {1:FILLTONNES,
              2:FILLGRADES,
              3:FILLMONTHLYPRODDATASET,
              4:FILLTARGETX}

def getdataincsvformat():
    """
    Retrieve data from MSSQL server.
    Dump into csv text file.
    """
    numtables = len(CREATETABLES)
    with open('{errorfilename:s}'.format(**CMDLINEDICT), 'w') as e:
        with open('{stdoutfilename:s}'.format(**CMDLINEDICT), 'w') as f:
            sqlcmdproc = subx.Popen(SQLCMDEXE, stdin=subx.PIPE,
                    stdout=f, stderr=e)
            for i in range(numtables):
                cmdx = (CREATETABLES[i + 1]).format(**QUERYDICT)
                print(cmdx)
                sqlcmdproc.stdin.write(bytes(cmdx +
                    '{go:s}'.format(**CMDLINEDICT), UTF8))
                print(SLEEPING.format(**CMDLINEDICT))
                time.sleep(PAUSEX)
            for i in range(numtables):
                cmdx = (FILLTABLES[i + 1]).format(**QUERYDICT)
                print(cmdx)
                sqlcmdproc.stdin.write(bytes(cmdx +
                    '{go:s}'.format(**CMDLINEDICT), UTF8))
                print(SLEEPING.format(**CMDLINEDICT))
                time.sleep(PAUSEX)
            print('bcp csv dump command (from inside sqlcmd) . . .')
            sqlcmdproc.stdin.write(bytes(BCPSTR, UTF8))
            print(SLEEPING.format(**CMDLINEDICT))
            time.sleep(PAUSEX)
            sqlcmdproc.stdin.write(bytes('{exit:s}'.format(**CMDLINEDICT), UTF8))
    return 0
         
monthx, yearx = parseworkbookname()

# Get rid of previous files.
print('\ndeleting files from previous runs . . .\n')
cleanslate()

# Get month and year into query dictionary.
QUERYDICT['month'] = QUERYDICT['month'].format(monthx)
QUERYDICT['year'] = yearx

getdataincsvformat()

print('done')

It's ugly, but it works.

Keeping with the Horta theme, this would be a good spot for an image break:

Damnit, Jim, I'm a geologist not a database programmer.

You're an analyst, analyze.

Load to Excel:  this is fairly straightforward - COM programming with Mark Hammond and company's venerable win32com.  The only working version of the win32com library I had on my laptop on which I am writing this blog entry was for a Python 2.5 release that came with an old version of our mine planning software (MineSight/Hexagon) - the show must go on!

#!C:\MineSight\mpython

# blognohandjamnumberspython2.5.py

# mpython is Python 2.5 on this machine.
# Had to remove collections.namedtuple
# (used dictionary instead) and new
# string formatting (reverted to use
# of ampersand for string interpolation).

# Lastly, did not have argparse at my
# disposal.

from __future__ import with_statement

"""
Get numbers into spreadsheet
without having to hand jam
everything.
"""


import os
from win32com.client import Dispatch

# Plan on receiving Excel file's
# path from call from Excel workbook.


import sys


# Path to Excel workbook.
WB = sys.argv[1]
# Worksheet name.
WSNAME = sys.argv[2]


BACKSLASH = '\\'

# Looking for data file in current directory.
# (same directory as Python script)
CSVDUMP = 'nohandjamovnumbersbcp.csv'


# XXX - repeated code from data dump file.
CURDIR = os.path.dirname(os.path.realpath(__file__))
os.chdir(CURDIR)
print('\nCurrent directory is %s' % os.getcwd())


# XXX - I think there's a more elegant way to
#       do this path concatenation with os.path.
CSVPATH = CURDIR + BACKSLASH + CSVDUMP


# Fields in csv dump.
YEARX = 'yearx'
MONTHX = 'monthx'
DRIFT = 'drift'
TONNES = 'tonnes'
PERGIUM = 'pergium'
GOLD = 'Au'
PLATINUM = 'Pt'


FIELDS = [YEARX,
          MONTHX,
          DRIFT,
          TONNES,
          PERGIUM,
          GOLD,
          PLATINUM]


# Excel cells.
# Map this to csv dump and brute force cycle to fill in.
ROWCOL = '%s%d'

COLUMNMAP = dict((namex, colx) for namex, colx in
        zip(FIELDS, ['A', 'B', 'C', 'D',
            'E', 'F', 'G']))


EXCELX = 'Excel.Application'

def getcsvdata():
    """
    Puts csv data (CMP dump) into
    a list of data structures
    and returns list.
    """
    with open(CSVPATH, 'r') as f:
        records = []
        for linex in f:
            # XXX - print for debugging/information
            print([n.strip() for n in linex.split(',')])
            records.append(dict(zip(FIELDS,
                (n.strip() for n
                    in linex.split(',')))))
    return records


# Put Excel stuff here.
def getworkbook(workbooks):
    """
    Get handle to desired workbook
    """
    for x in workbooks:
        print(x.FullName)
        if x.FullName == WB:
            # XXX - debug/information print statement
            print('EUREKA')
            break
    return x


def fillinspreadsheet(records):
    """
    Fill in numbers in spreadsheet.

    Side effect function.
    records is a list of named tuples.
    """
    excelx = Dispatch(EXCELX)
    wb = getworkbook(excelx.Workbooks)
    ws = wb.Worksheets.Item(WSNAME)
    # Start entering data at row 4.
    row = 4
    for recordx in records:
        for x in FIELDS:
            column = COLUMNMAP[x]
            valuex = recordx[x]
            cellx = ws.Range(ROWCOL % (column, row))
            # Selection makes pasting of new value visible.
            # I like this - not everyone does.  YMMV
            cellx.Select()
            cellx.Value = valuex
        # On to the next record on the next row.
        row += 1
    # Come back to origin of worksheet at end.
    ws.Range('A1').Select()
    return 0
               
cmprecords = getcsvdata()
fillinspreadsheet(cmprecords)

print('done')

On to the VBA code inside the Excel spreadsheet (macros) that execute the Python code:

Option Explicit


Const EXECX = "C:\Python34\python "
Const EXECXII = "C:\MineSight\mpython\python\2.5\python "
Const EXCELSCRIPT = "blognohandjamnumberspython2.5.py "
Const SQLSCRIPT = "blogsqlcmdpull.py "


Sub FillInNumbers()

    Dim namex As String
    Dim wb As Workbook
    Dim ws As Worksheet
   
    Dim longexecstr As String
   
    Set ws = Selection.Worksheet
    'Try to get current worksheet name to feed values to query.
    namex = ws.Name
   
    longexecstr = EXECXII & " " & ActiveWorkbook.Path
    longexecstr = longexecstr & Chr(92) & EXCELSCRIPT
    longexecstr = longexecstr & ActiveWorkbook.Path & Chr(92) & ActiveWorkbook.Name
    longexecstr = longexecstr & " " & namex

    VBA.Interaction.Shell longexecstr, vbNormalFocus
   
End Sub


Sub GetSQLData()
    Dim namex As String
    Dim ws As Worksheet
   
    Set ws = Selection.Worksheet
    'Try to get current worksheet name to feed values to query.
    namex = ws.Name

    VBA.Interaction.Shell EXECX & ActiveWorkbook.Path & _
        Chr(92) & SQLSCRIPT & namex, vbNormalFocus
   
End Sub


I always use Option Explicit in my VBA code - that's not particularly pythonic, but being pythonic inside the VBA interpreter can be hazardous.  As always, YMMV.

Lastly, a rough demo and a data check.  We'll run the SQL dump from the top button on the Excel worksheet:




And now we'll run the lower button to put the data into the spreadsheet.  It's probably worth noting here that I did not bother doing any type conversions on the text coming out of the SQL csv dump in my Python code.  That's because Excel handles that for you.  It's not free software (Excel/Office) - might as well get your money's worth.


We'll do a check on the first row for tonnes and a pergium grade.  Going back to our original data:

Cuts 1 and 2 belong to the drift Level23East.

Tonnes:

VALUES (1, 28437.0),
       (2, 13296.0),


Total:  41733

Looks good, we know we got a sum of tonnes right.  Now the tonne-weighted average:

Pergium:

(1, 'Pergium g/tonne', 15.23),
(2, 'Pergium g/tonne', 4.22),


(28437 * 15.23 + 13296 * 4.22)/41733 = 11.722

It checks out.  Do a few more checks and send it out to the Janus VI Pergium Mine mine manager.

Notes:


This is a messy one-off mousetrap.  That said, this is often how the sausage gets made in a non-programming, non-professional development environment.  We do have an in-house Python developer Lori.  Often she's given something like this and told to clean it up and make it into an in-house app.  That's challenging.  Ideally, the mining professional writing the one-off and the dev get together and cross-educate vis a vis the domain space (mining) and the developer space (programming, good software design and practice).  It's a lot of fun but the first go around is seldom pretty.

Thanks for stopping by.



Leonard Nimoy
1931 - 2015

September 27, 2015 02:56 AM


A. Jesse Jiryu Davis

"Dodge Disaster And March To Triumph As A Mentor", at Software-as-Craft Philadelphia

Old Franklin Library


I'm speaking Tuesday night in Philadelphia at the Software As Craft Meetup. It's 6pm at PromptWorks, 1211 Chestnut Street, Suite 400. I hope you can come!

Craftspeople have always learned by apprenticing with masters. The master-apprentice relationship is older than professions—older than humanity, even. If you want to nurture this ancient tradition, you must be a great mentor to your junior colleagues.

But what if you don't know how? If aren't confident in your mentorship ability, come to my talk. I've been atrocious at it in the past, then succeeded. I have a failproof plan to make any mentorship triumphant.

September 27, 2015 02:51 AM

September 26, 2015


Nikola

Environment Survey Results and the Future of Python 2.7 in Nikola

Recently, the Nikola team asked the community about the Python versions used to run Nikola. We received a total of 138 responses. The survey results are as follows:

Most users already are on Python 3

Two thirds (66%) of Nikola users are already on Python 3. Some users run both versions concurrently (for example, on different operating systems)

#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .title{font-family:"monospace";font-size:16px}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .legends .legend text{font-family:"monospace";font-size:14px}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis text{font-family:"monospace";font-size:10px}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis text.major{font-family:"monospace";font-size:10px}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .series text{font-family:"monospace";font-size:8px}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .tooltip text{font-family:"monospace";font-size:16px}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 text.no_data{font-size:64px} #chart-279c0edf-ad0c-4464-9828-4e61d34be350{background-color:#f0f0f0}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 path,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 rect,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 circle{-webkit-transition:250ms ease-in;-moz-transition:250ms ease-in;transition:250ms ease-in}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .graph > .background{fill:#f0f0f0}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .plot > .background{fill:#f8f8f8}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .graph{fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 text.no_data{fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .title{fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .legends .legend text{fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .legends .legend:hover text{fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis .line{stroke:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis .guide.line{stroke:rgba(0,0,0,0.6)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis .major.line{stroke:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis text.major{stroke:rgba(0,0,0,0.9);fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.y .guides:hover .guide.line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .line-graph .axis.x .guides:hover .guide.line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .stackedline-graph .axis.x .guides:hover .guide.line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .xy-graph .axis.x .guides:hover .guide.line{stroke:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis .guides:hover text{fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .reactive{fill-opacity:.5}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .reactive.active,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .active .reactive{fill-opacity:.9}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .series{stroke-width:1.0;stroke-linejoin:round;stroke-linecap:round;stroke-dasharray:0,0}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .series text{fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .tooltip rect{fill:#f8f8f8;stroke:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .tooltip text{fill:rgba(0,0,0,0.9)}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .map-element{fill:rgba(0,0,0,0.9);stroke:rgba(0,0,0,0.6) !important;opacity:.9;stroke-width:3;-webkit-transition:250ms;-moz-transition:250ms;-o-transition:250ms;transition:250ms}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .map-element:hover{opacity:1;stroke-width:10}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-0,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-0 a:visited{stroke:#00b2f0;fill:#00b2f0}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-1,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-1 a:visited{stroke:#43d9be;fill:#43d9be}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-2,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-2 a:visited{stroke:#0662ab;fill:#0662ab}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-3,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-3 a:visited{stroke:#00668a;fill:#00668a}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-4,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-4 a:visited{stroke:#98eadb;fill:#98eadb}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-5,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-5 a:visited{stroke:#97d959;fill:#97d959}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-6,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-6 a:visited{stroke:#033861;fill:#033861}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-7,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-7 a:visited{stroke:#ffd541;fill:#ffd541}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-8,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-8 a:visited{stroke:#7dcf30;fill:#7dcf30}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-9,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-9 a:visited{stroke:#3ecdff;fill:#3ecdff}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-10,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-10 a:visited{stroke:#daaa00;fill:#daaa00}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-11,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-11 a:visited{stroke:#00b2f0;fill:#00b2f0}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-12,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-12 a:visited{stroke:#43d9be;fill:#43d9be}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-13,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-13 a:visited{stroke:#0662ab;fill:#0662ab}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-14,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-14 a:visited{stroke:#00668a;fill:#00668a}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-15,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .color-15 a:visited{stroke:#98eadb;fill:#98eadb} #chart-279c0edf-ad0c-4464-9828-4e61d34be350 text.no_data{text-anchor:middle}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .guide.line{fill-opacity:0}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .centered{text-anchor:middle}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .title{text-anchor:middle}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .legends .legend text{fill-opacity:1}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.x text{text-anchor:middle}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.x:not(.web) text[transform]{text-anchor:start}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.y text{text-anchor:end}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.y2 text{text-anchor:start}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.y .logarithmic text:not(.major),#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.y2 .logarithmic text:not(.major){font-size:50%}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis .guide.line{stroke-dasharray:4,4}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis .major.guide.line{stroke-dasharray:6,6}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis text.major{stroke-width:0.5px}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .horizontal .axis.y .guide.line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .horizontal .axis.y2 .guide.line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .vertical .axis.x .guide.line{opacity:0}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .horizontal .axis.always_show .guide.line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .vertical .axis.always_show .guide.line{opacity:1 !important}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.y .guides:hover .guide.line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.y2 .guides:hover .guide.line,#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis.x .guides:hover .guide.line{opacity:1}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .axis .guides:hover text{opacity:1}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .nofill{fill:none}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .subtle-fill{fill-opacity:.2}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .dot{stroke-width:1px;fill-opacity:1}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .dot.active{stroke-width:5px}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .series text{stroke:none}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .series text.active{opacity:1}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .tooltip rect{fill-opacity:0.8}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .tooltip text{fill-opacity:1}#chart-279c0edf-ad0c-4464-9828-4e61d34be350 .tooltip text tspan.label{fill-opacity:.8}Python 2 and 3 users33.33%414.2529522647003206.9249999999999852.90%240.91185929789873365.1589326452046313.77%259.8359410719378157.91524364838745Python 2 and 3 usersPython 2 onlyPython 3 onlyPython 2 and 3

A diverse selection of operating systems

The most popular operating system is Ubuntu (with 54 users), followed by Debian and Mac OS X (37 users each). There were also 27 Windows users taking part in the survey. This was a multiple–choice question — there were 211 data points in total)

There are multiple operating system versions listed in the second (optional) question about operating systems, ranging from Debian 6 squeeze (oldstable) to the newest versions

#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .title{font-family:"monospace";font-size:16px}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .legends .legend text{font-family:"monospace";font-size:14px}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis text{font-family:"monospace";font-size:10px}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis text.major{font-family:"monospace";font-size:10px}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .series text{font-family:"monospace";font-size:8px}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .tooltip text{font-family:"monospace";font-size:16px}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b text.no_data{font-size:64px} #chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b{background-color:#f0f0f0}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b path,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b rect,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b circle{-webkit-transition:250ms ease-in;-moz-transition:250ms ease-in;transition:250ms ease-in}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .graph > .background{fill:#f0f0f0}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .plot > .background{fill:#f8f8f8}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .graph{fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b text.no_data{fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .title{fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .legends .legend text{fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .legends .legend:hover text{fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis .line{stroke:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis .guide.line{stroke:rgba(0,0,0,0.6)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis .major.line{stroke:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis text.major{stroke:rgba(0,0,0,0.9);fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.y .guides:hover .guide.line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .line-graph .axis.x .guides:hover .guide.line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .stackedline-graph .axis.x .guides:hover .guide.line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .xy-graph .axis.x .guides:hover .guide.line{stroke:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis .guides:hover text{fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .reactive{fill-opacity:.5}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .reactive.active,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .active .reactive{fill-opacity:.9}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .series{stroke-width:1.0;stroke-linejoin:round;stroke-linecap:round;stroke-dasharray:0,0}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .series text{fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .tooltip rect{fill:#f8f8f8;stroke:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .tooltip text{fill:rgba(0,0,0,0.9)}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .map-element{fill:rgba(0,0,0,0.9);stroke:rgba(0,0,0,0.6) !important;opacity:.9;stroke-width:3;-webkit-transition:250ms;-moz-transition:250ms;-o-transition:250ms;transition:250ms}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .map-element:hover{opacity:1;stroke-width:10}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-0,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-0 a:visited{stroke:#00b2f0;fill:#00b2f0}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-1,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-1 a:visited{stroke:#43d9be;fill:#43d9be}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-2,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-2 a:visited{stroke:#0662ab;fill:#0662ab}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-3,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-3 a:visited{stroke:#00668a;fill:#00668a}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-4,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-4 a:visited{stroke:#98eadb;fill:#98eadb}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-5,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-5 a:visited{stroke:#97d959;fill:#97d959}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-6,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-6 a:visited{stroke:#033861;fill:#033861}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-7,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-7 a:visited{stroke:#ffd541;fill:#ffd541}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-8,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-8 a:visited{stroke:#7dcf30;fill:#7dcf30}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-9,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-9 a:visited{stroke:#3ecdff;fill:#3ecdff}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-10,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-10 a:visited{stroke:#daaa00;fill:#daaa00}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-11,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-11 a:visited{stroke:#00b2f0;fill:#00b2f0}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-12,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-12 a:visited{stroke:#43d9be;fill:#43d9be}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-13,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-13 a:visited{stroke:#0662ab;fill:#0662ab}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-14,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-14 a:visited{stroke:#00668a;fill:#00668a}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-15,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .color-15 a:visited{stroke:#98eadb;fill:#98eadb} #chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b text.no_data{text-anchor:middle}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .guide.line{fill-opacity:0}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .centered{text-anchor:middle}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .title{text-anchor:middle}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .legends .legend text{fill-opacity:1}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.x text{text-anchor:middle}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.x:not(.web) text[transform]{text-anchor:start}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.y text{text-anchor:end}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.y2 text{text-anchor:start}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.y .logarithmic text:not(.major),#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.y2 .logarithmic text:not(.major){font-size:50%}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis .guide.line{stroke-dasharray:4,4}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis .major.guide.line{stroke-dasharray:6,6}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis text.major{stroke-width:0.5px}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .horizontal .axis.y .guide.line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .horizontal .axis.y2 .guide.line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .vertical .axis.x .guide.line{opacity:0}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .horizontal .axis.always_show .guide.line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .vertical .axis.always_show .guide.line{opacity:1 !important}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.y .guides:hover .guide.line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.y2 .guides:hover .guide.line,#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis.x .guides:hover .guide.line{opacity:1}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .axis .guides:hover text{opacity:1}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .nofill{fill:none}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .subtle-fill{fill-opacity:.2}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .dot{stroke-width:1px;fill-opacity:1}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .dot.active{stroke-width:5px}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .series text{stroke:none}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .series text.active{opacity:1}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .tooltip rect{fill-opacity:0.8}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .tooltip text{fill-opacity:1}#chart-92251739-c4d6-42b2-b1be-7c0df48eeb6b .tooltip text tspan.label{fill-opacity:.8}Operating systems used by survey participants0.010.020.030.040.050.05466.72435897435896267.0000000000000637108.6653846153846347.8226495726496537150.60641025641021347.8226495726496527192.54743589743583395.3653846153846420234.48846153846148428.64529914529929276.4294871794872480.942307692307748318.3705128205128485.696581196581237360.3115384615384490.450854700854733402.25256410256407509.467948717948733444.1935897435897509.467948717948732486.13461538461536514.22222222222234528.075641025641504.71367521367523Operating systems used by survey participantsUbuntuDebianMac OS XWindowsArch LinuxBSDLinux MintFedoraRedHat/CentOSopenSUSEGentooOther

Switching is not a problem for the community

Out of all the participants, only 10 (7.25%) said that they would not install a Python 3 interpreter, even if Nikola were to require one. More than 76% of our users have a Python 3 interpreter installed already.

#chart-9ee06c88-4890-47e6-a705-0c624103130f .title{font-family:"monospace";font-size:16px}#chart-9ee06c88-4890-47e6-a705-0c624103130f .legends .legend text{font-family:"monospace";font-size:14px}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis text{font-family:"monospace";font-size:10px}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis text.major{font-family:"monospace";font-size:10px}#chart-9ee06c88-4890-47e6-a705-0c624103130f .series text{font-family:"monospace";font-size:8px}#chart-9ee06c88-4890-47e6-a705-0c624103130f .tooltip text{font-family:"monospace";font-size:16px}#chart-9ee06c88-4890-47e6-a705-0c624103130f text.no_data{font-size:64px} #chart-9ee06c88-4890-47e6-a705-0c624103130f{background-color:#f0f0f0}#chart-9ee06c88-4890-47e6-a705-0c624103130f path,#chart-9ee06c88-4890-47e6-a705-0c624103130f line,#chart-9ee06c88-4890-47e6-a705-0c624103130f rect,#chart-9ee06c88-4890-47e6-a705-0c624103130f circle{-webkit-transition:250ms ease-in;-moz-transition:250ms ease-in;transition:250ms ease-in}#chart-9ee06c88-4890-47e6-a705-0c624103130f .graph > .background{fill:#f0f0f0}#chart-9ee06c88-4890-47e6-a705-0c624103130f .plot > .background{fill:#f8f8f8}#chart-9ee06c88-4890-47e6-a705-0c624103130f .graph{fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f text.no_data{fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .title{fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .legends .legend text{fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .legends .legend:hover text{fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis .line{stroke:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis .guide.line{stroke:rgba(0,0,0,0.6)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis .major.line{stroke:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis text.major{stroke:rgba(0,0,0,0.9);fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.y .guides:hover .guide.line,#chart-9ee06c88-4890-47e6-a705-0c624103130f .line-graph .axis.x .guides:hover .guide.line,#chart-9ee06c88-4890-47e6-a705-0c624103130f .stackedline-graph .axis.x .guides:hover .guide.line,#chart-9ee06c88-4890-47e6-a705-0c624103130f .xy-graph .axis.x .guides:hover .guide.line{stroke:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis .guides:hover text{fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .reactive{fill-opacity:.5}#chart-9ee06c88-4890-47e6-a705-0c624103130f .reactive.active,#chart-9ee06c88-4890-47e6-a705-0c624103130f .active .reactive{fill-opacity:.9}#chart-9ee06c88-4890-47e6-a705-0c624103130f .series{stroke-width:1.0;stroke-linejoin:round;stroke-linecap:round;stroke-dasharray:0,0}#chart-9ee06c88-4890-47e6-a705-0c624103130f .series text{fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .tooltip rect{fill:#f8f8f8;stroke:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .tooltip text{fill:rgba(0,0,0,0.9)}#chart-9ee06c88-4890-47e6-a705-0c624103130f .map-element{fill:rgba(0,0,0,0.9);stroke:rgba(0,0,0,0.6) !important;opacity:.9;stroke-width:3;-webkit-transition:250ms;-moz-transition:250ms;-o-transition:250ms;transition:250ms}#chart-9ee06c88-4890-47e6-a705-0c624103130f .map-element:hover{opacity:1;stroke-width:10}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-0,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-0 a:visited{stroke:#00b2f0;fill:#00b2f0}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-1,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-1 a:visited{stroke:#43d9be;fill:#43d9be}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-2,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-2 a:visited{stroke:#0662ab;fill:#0662ab}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-3,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-3 a:visited{stroke:#00668a;fill:#00668a}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-4,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-4 a:visited{stroke:#98eadb;fill:#98eadb}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-5,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-5 a:visited{stroke:#97d959;fill:#97d959}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-6,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-6 a:visited{stroke:#033861;fill:#033861}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-7,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-7 a:visited{stroke:#ffd541;fill:#ffd541}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-8,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-8 a:visited{stroke:#7dcf30;fill:#7dcf30}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-9,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-9 a:visited{stroke:#3ecdff;fill:#3ecdff}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-10,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-10 a:visited{stroke:#daaa00;fill:#daaa00}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-11,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-11 a:visited{stroke:#00b2f0;fill:#00b2f0}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-12,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-12 a:visited{stroke:#43d9be;fill:#43d9be}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-13,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-13 a:visited{stroke:#0662ab;fill:#0662ab}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-14,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-14 a:visited{stroke:#00668a;fill:#00668a}#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-15,#chart-9ee06c88-4890-47e6-a705-0c624103130f .color-15 a:visited{stroke:#98eadb;fill:#98eadb} #chart-9ee06c88-4890-47e6-a705-0c624103130f text.no_data{text-anchor:middle}#chart-9ee06c88-4890-47e6-a705-0c624103130f .guide.line{fill-opacity:0}#chart-9ee06c88-4890-47e6-a705-0c624103130f .centered{text-anchor:middle}#chart-9ee06c88-4890-47e6-a705-0c624103130f .title{text-anchor:middle}#chart-9ee06c88-4890-47e6-a705-0c624103130f .legends .legend text{fill-opacity:1}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.x text{text-anchor:middle}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.x:not(.web) text[transform]{text-anchor:start}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.y text{text-anchor:end}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.y2 text{text-anchor:start}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.y .logarithmic text:not(.major),#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.y2 .logarithmic text:not(.major){font-size:50%}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis .guide.line{stroke-dasharray:4,4}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis .major.guide.line{stroke-dasharray:6,6}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis text.major{stroke-width:0.5px}#chart-9ee06c88-4890-47e6-a705-0c624103130f .horizontal .axis.y .guide.line,#chart-9ee06c88-4890-47e6-a705-0c624103130f .horizontal .axis.y2 .guide.line,#chart-9ee06c88-4890-47e6-a705-0c624103130f .vertical .axis.x .guide.line{opacity:0}#chart-9ee06c88-4890-47e6-a705-0c624103130f .horizontal .axis.always_show .guide.line,#chart-9ee06c88-4890-47e6-a705-0c624103130f .vertical .axis.always_show .guide.line{opacity:1 !important}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.y .guides:hover .guide.line,#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.y2 .guides:hover .guide.line,#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis.x .guides:hover .guide.line{opacity:1}#chart-9ee06c88-4890-47e6-a705-0c624103130f .axis .guides:hover text{opacity:1}#chart-9ee06c88-4890-47e6-a705-0c624103130f .nofill{fill:none}#chart-9ee06c88-4890-47e6-a705-0c624103130f .subtle-fill{fill-opacity:.2}#chart-9ee06c88-4890-47e6-a705-0c624103130f .dot{stroke-width:1px;fill-opacity:1}#chart-9ee06c88-4890-47e6-a705-0c624103130f .dot.active{stroke-width:5px}#chart-9ee06c88-4890-47e6-a705-0c624103130f .series text{stroke:none}#chart-9ee06c88-4890-47e6-a705-0c624103130f .series text.active{opacity:1}#chart-9ee06c88-4890-47e6-a705-0c624103130f .tooltip rect{fill-opacity:0.8}#chart-9ee06c88-4890-47e6-a705-0c624103130f .tooltip text{fill-opacity:1}#chart-9ee06c88-4890-47e6-a705-0c624103130f .tooltip text tspan.label{fill-opacity:.8}Python 3 interpreter usage76.81%385.9886753816475356.653969854599415.94%207.84106735479537197.711859297898747.25%278.8833009245565149.9499802167815Python 3 interpreter usageHave a Python …Have a Python 3 interpreterCan install Py…Can install Python 3 if requiredRefuse to inst…Refuse to install Python 3

Comments, concerns and suggestions

After answering the four survey questions, users could leave comments. Many of them were positive and commended the decision. Some participants even claimed that they do not like Python 2.x and it might even deter them from contributing. Many participants asked us to kill 2.7 and to go forward with Python 3.

There were, however, a few participants that had concerns about our decision. A few users cited the inability to migrate due to having existing Python 2-only software. For most people, this is not a problem — Python 2 and 3 can coexist on one machine, with separate packages and binaries. Those users would be just use Nikola with Python 3.x, and their legacy software with 2.7 — and it would work without any problems. The only people who could have a problem are those using 2.7–only code in plugins.

Some users also suggested bundling an interpreter with Nikola (this is done by eg. Dropbox). We believe that this is not a good thing to do. Bundling an interpreter is a lot of work, and Dropbox does this because they want to protect their code (not applicable in an open–source project) and because they have a large non-technical user base (scared by black terminal windows used by h4x0rz). However, there is a partial solution: if you prefer, you can use a Docker image that runs Nikola under Arch Linux, created by Rob Brewer (and which has been blessed by the Nikola team).

Yet another suggestion was to make Nikola available via Homebrew or MacPorts. While there is no package for Nikola available in those repositories, our Getting Started Guide recommends using Homebrew, MacPorts or Fink with virtualenv and pip on OS X, by providing install instructions for those three repositories (using Python 3). If you want a package for one of those systems, you can always contribute your own (if this is possible in those communities) — the Nikola community will be happy to use it.

There were also concerns about the ARM architecture. We have had reports of users running Nikola on ARM–based devices (including one survey participant) — just keep in mind that it is really slow on SD cards (which is the main storage device of Raspberry Pi).

The future of Python 2.7 in Nikola

Taking the results of this survey into consideration, the Nikola developers decided that Python 2.7 support will be dropped in Nikola v8.0.0. This version will be released in early 2016. Before that date, we will migrate all our remaining infrastructure to Python 3.x. The next version of Nikola, v7.7.2, will be released on 2015-10-03 and will display a warning if the user is running Python 2.7. We might keep the compatibility hacks after v8.0.0, but we will not officially support using Python 2.7 with Nikola.

Motivation (added 2015-10-27)

Supporting two Python versions is a lot of work. We have to use various compatibility hacks to make Unicode work properly, and then there are still issues caused by the fact Python 2 was not built with Unicode in mind. The developers decided that it is time to let Python 2 go and thus make the development much easier.

Switching to Python 3.x

If you are running Nikola with Python 2.7, you should switch to Python 3.x soon. Doing so is simple and is a one–time process. You should follow the Getting Started Guide for more information, or read the instructions below:

Windows

  1. Install Python 3.5 from the official website (python.org)
  2. Install virtualenv using py -m pip install virtualenv
  3. Create a virtualenv and activate it (for more information, read virtualenv documentation)
  4. Install lxml and Pillow wheels from Christoph Gohlke’s website (using pip install c:\paths\to\the\two\files.whl)
  5. Install Nikola using pip install "Nikola[extras]"

Mac OS X

Follow the “Installing on OS X” section of the Getting Started Guide to install Nikola and Python from Homebrew, MacPorts or Fink.

Linux

To install Nikola using Python 3.x on Linux, you should first identify your installation method.

If you use a distribution package (eg. python-nikola from Arch Linux’s AUR, or Fedora’s packages), you should look for the Python 3 version of those packages. If those are not available, you should install Nikola manually and report a bug with your distribution.

If you installed Nikola manually, we recommend creating a virtualenv for it. Please follow the instructions in the Getting Started Guide (you might need to see the troubleshooting hints and adjust them for your OS/Python 3 package name)

(Note that Nikola requires Python 3.3 or newer; if you are running a really old distribution, it might not be available.)

Migrating a site

You can use your existing Nikola site with Python 3, without any special modifications to the code. However, you will likely receive this error when you run nikola build for the first time:

doit.dependency.DatabaseException: Dependencies file in '.doit.db' seems to use an old format or is corrupted.
To fix the issue you can just remove the database file(s) and a new one will be generated.

In case you do, you can just remove the mentioned .doit.db file and run nikola build again. Note that this will lead to rebuilding your site from scratch — but this is a one–time process, and the next rebuild should be an incremental one.

PS. you can also see the results on the Google Forms results summary page. If you want to do your own data analysis, we can share the raw data (.csv) — contact me (Chris Warrick) if you would like to get access. The charts in this post were generated courtesy of pygal, using the :chart: directive, which is built into Nikola.

September 26, 2015 10:00 AM

September 25, 2015


Peter Bengtsson

ElasticSearch, snowball analyzer and stop words

Disclaimer: I'm an ElasticSearch noob. Go easy on me

I have an application that uses ElasticSearch's more_like_this query to find related content. It basically works like this:

>>> index(index, doc_type, {'id': 1, 'title': 'Your cool title is here'})
>>> index(index, doc_type, {'id': 2, 'title': 'About is a cool headline'})
>>> index(index, doc_type, {'id': 3, 'title': 'Titles are your big thing'})

Then you can pick one ID (1, 2 or 3) and find related ones.
We can tell by looking at these three silly examples, the 1 and 2 have the words "is" and "cool" in common. 1 and 3 have "title" (stemming taken into account) and "your" in common. However, is there much value in connected these documents on the words "is" and "your"? I think not. Those are stop words. E.g. words like "the", "this", "from", "she" etc. Basically words that are commonly used as "glue" between more unique and specific words.

Anyway, if you index something in ElasticSearch as a text field you get, by default, the "standard" analyzer to analyze the incoming stuff to be indexed. The standard analyzer just splits the words on whitespace. A more compelling analyzer is the Snowball analyzer (original here) which supports intelligent stemming (turning "wife" ~= "wives") and stop words.

The problem is that the snowball analyzer has a very different set of stop words. We did some digging and thought this was the list it bases its English stop words on. But this was wrong. Note that that list has words like "your" and "about" listed there.

The way to find out how your analyzer treats a string and turns it into token is to the the _analyze tool. For example:

curl -XGET 'localhost:9200/{myindexname}/_analyze?analyzer=snowball' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 5,
      "token": "about",
      "type": "",
      "start_offset": 0,
      "position": 1
    },
    {
      "end_offset": 10,
      "token": "your",
      "type": "",
      "start_offset": 6,
      "position": 2
    },
    {
      "end_offset": 18,
      "token": "special",
      "type": "",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "",
      "start_offset": 28,
      "position": 7
    }
  ]
}

So what you can see is that it finds the tokens "about", "your", "special" and "word". But it stop word ignored "is", "a" and "the". Hmm... I'm not happy with that. I don't think "about" and "your" are particularly helpful words.

So, how do you define your own stop words and override the one in the Snowball analyzer? Well, let me show you.

In code, I use pyelasticsearch so the index creation is done in Python.

STOPWORDS = (
    "a able about across after all almost also am among an and "
    "any are as at be because been but by can cannot could dear "
    "did do does either else ever every for from get got had has "
    "have he her hers him his how however i if in into is it its "
    "just least let like likely may me might most must my "
    "neither no nor not of off often on only or other our own "
    "rather said say says she should since so some than that the "
    "their them then there these they this tis to too twas us "
    "wants was we were what when where which while who whom why "
    "will with would yet you your".split()
)

def create():
    es = get_connection()
    index = get_index()
    es.create_index(index, settings={
        'settings': {
            'analysis': {
                'analyzer': {
                    'extended_snowball_analyzer': {
                        'type': 'snowball',
                        'stopwords': STOPWORDS,
                    },
                },
            },
        },
        'mappings': {
            doc_type: {
                'properties': {
                    'title': {
                        'type': 'string',
                        'analyzer': 'extended_snowball_analyzer',
                    },
                }
            }
        }
    })

With that in place, now delete your index and re-create it. Now you can use the _analyze tool again to see how it analyzes text on this particular field. But note, to do this we need to know the name of the index we used. (so replace {myindexname} in the URL):

$ curl -XGET 'localhost:9200/{myindexname}/_analyze?field=title' -d 'about your special is a the word' | json_print
{
  "tokens": [
    {
      "end_offset": 18,
      "token": "special",
      "type": "",
      "start_offset": 11,
      "position": 3
    },
    {
      "end_offset": 32,
      "token": "word",
      "type": "",
      "start_offset": 28,
      "position": 7
    }
  ]
}

Cool! Now we see that it considers "about" and "your" as stop words. Much better. This is handy too because you might have certain words that are globally not very common but within your application it's very repeated and not very useful.

Thank you willkg and Erik Rose for your support in tracking this down!

September 25, 2015 09:03 PM


David MacIver

Future directions for Hypothesis

There’s something going on the Hypothesis project right now: There are currently three high quality pull requests open from people who aren’t me adding new functionality.

Additionally, Alexander Shorin (author of the characters strategy one) has a CouchDB backed implementation of the Hypothesis example database which I am encouraging him to try to merge into core.

When I did my big mic drop post it was very unclear whether this was going to happen. One possible outcome of feature freeze was simply that Hypothesis was going to stabilize at its current level of functionality except for when I occasionally couldn’t resist the urge to add a feature.

I’m really glad it has though. There’s a vast quantity of things I could do with Hypothesis, and particularly around data generation and integrations it’s more or less infinitely parallellisable and doesn’t require any deep knowledge of Hypothesis itself, so getting other people involved is great and I’m very grateful to everyone who has submitted work so far.

And I’d like to take this forward, so I’ve updated the documentation and generally made the situation more explicit:

Firstly, it now says in the documentation that I do not do unpaid Hypothesis feature development. I will happily take sponsorship for new features, but for the rest of it I will absolutely help you every step of the way in writing and designing the feature, but it’s up to the community to actually drive the work.

Secondly, I’ve now labelled all enhancements that I think are accessible for someone else to work on. Some of these are large-ish and people will need me (or, eventually, someone else!) to lend a hand with, but I think they all have the benefit of being relatively self-contained and approachable without requiring too deep an understanding of Hypothesis.

Will this work? Only time (and effort) will tell, but I think the current set of pull requests demonstrates that it can work, and the general level of interest I see from most people I introduce Hypothesis to seems to indicate that it’s got a pretty good fighting chance.

September 25, 2015 01:48 PM


Marc-André Lemburg

Starting a Python blog

Long ago, in the late 1990s, I had a website on Christian Tismer’s Starship Python to show my Python projects to the community and report on new developments, provide tips, hints and small utilities. The site was called “Marc’s Python Pages”:

image

Turning a hobby into business

I then launched my company eGenix.com Software, Skills and Services GmbH in 2000 to market a web application server I had been working on for a couple of years. As it turned out, I was too early with the product. The market was still thriving using CGI scripts, Perl and a couple of static pages to run websites. At the same time, the Internet bubble burst, so it wasn’t exactly perfect timing for starting a DotCom company.

I eventually sold a single license of the application server to Steilmann in Bochum and then turned to consulting work, using the eGenix mx Extensions I had written for the application server as a way to market the company and myself to companies using Python.

This worked reasonably well and I have since run several projects, in-house at clients or outsourced to eGenix, and continued to add new useful commercial products to our portfolio - mostly around the commercial ODBC database Python interface mxODBC I had originally written for the application server.

Commercial and Open Source Software

Incidentally, the mxODBC development in 1997 also triggered the development of a date/time library mxDateTime at the time, since Python had no way of storing date/time values as objects apart from using Unix ticks values. Since it was a basic building block, I open-sourced it, in the same spirit as Python itself was open source (even though the term wasn’t known at the time). mxDateTime became the de facto standard for date/time storage, until Python itself received a datetime module.

I also open sourced several other C extensions for Python, which were all used in the application server, such as mxTextTools, mxTools, mxProxy, etc. - what was then to become the eGenix mx Extensions.

Still enjoying Python and it’s community

Over 15 years later, I still haven’t lost interest in Python, which I think means something. I continue to enjoy working with it, for it and enjoy the community that has developed around Python every single day.

I usually write lots of emails on mailing lists to discuss and stay in touch with people. Lately, I found I was missing a more persistent way of writing down ideas and snippets, something along the lines of what once were the Starship pages, so here you go… hope you’ll enjoy the ride.

If you want to follow the blog, please see the contact page. It’s currently possible to use RSS, Twitter and Tumblr for this.

Enjoy,

Marc-André

PS: The website is not yet complete, e.g. the project pages don’t work yet. I’ll add more content over the next few weeks.

September 25, 2015 07:39 AM


Codementor

Python Q&A with #1 Stack Overflow Python Expert

Martijn Pieters, Stack Overflow Python Legend

Codementor Python expert and Stack Overflow legend Martijn Pieters joined us for a session in office hours to share his insights about Python programming. Here are several questions asked by our viewers, and hopefully you will find Martijn’s response useful as well!

The text below is a summary done by the Codementor team and may vary from the original video and if you see any issues, please let us know!


 

Object-Oriented Programming Composition (OOD) and How to use it

A composition is any containment relationship, so anything you put inside of something else is a containment. In inheritance, you create a hierarchy of objects that define behavior. Therefore, inheritance is like a vehicle that can be sub-classed (e.g. car, truck, plane, etc.) However, you will never sub-class a vehicle into a person, though a vehicle can contain people. To understand the difference between containment and inheritance, you can think of them in this way: a car can hold at most 6 people, and a bus can hold about twenty. You can change what a class (car or bus) contains, but you can’t change the class itself. Thus, although containment and inheritance share a relationship in this way, they are different concepts.

Is Mixin in Python a Legacy from Other Languages and is it a Necessary or Useful Feature?

Police Bicycle (Licencse)

I think mixin is an extremely useful feature due to its ability to create an additional class that defines a different aspect that doesn’t fit in the rest of your structure. The concept of mixing is somewhat like emergency vehicles in giving a set of classes a bit of extra functionality. For example, you can have a truck that is also an emergency vehicle and therefore needs a siren. However, not all trucks are emergency vehicles, and maybe other types of vehicles may also be emergency vehicles, such as motorcycles. Motorcycles are part of the police force, and some police even use bicycles these days, but not all motorcycles or bikes are emergency vehicles. Mixin classes can definitely help with this sort of situation.

Can Python Optimization Work on Classes with no Assigned Variables?

Python cannot optimize classes that don’t have assigned variables, as it cannot know what your class names refer to until you run the program. Class names themselves are dynamic, so the code could in principle swap out those names into something else. Python Optimization only works on immutable objects.

Will CPython move from using a stack interpreter to a register-based interpreter to make the use of Python on resource-constrained devices like smartphones more feasible?

Background: Python bytecode currently uses a stack, which means it uses a piece of memory where two recent values are pushed onto the stack. Bytecode takes off the top one or two values, and replaces the top of the stack with the results of the expression (e.g. In the command 1+2, 1 and 2 goes into the stack. Bytecode takes them, adds them up, and pushes back 3 onto the stack. The next bytcode may store 3 into a variable or send it into a function.) On the other hand, a register-based interpreter uses a limited number of registers and stores the value in there instead of getting things from a stack, so it has to address any of your bytecode as registers. The advantages are that you can reuse register values, and just-in-time compilers (JIT compilers) can use registers more efficiently than stack interpreters. Thus, it is possible to control memory-use on register-based interpreters. However, a stack is easier to code.

I am not actually a part of the core Python team of developers, so I don’t know what they are thinking about or whether they have ever considered using a register-based interpreter. However, I do know that at Google there was a project called Unladen Swallow, which was a project to see if the Python interpreter could be reworked to perform faster. They ended up switching to a stack rather than a registered-base interpreter. Unfortunately, the unladen swallow project is discontinued, but a lot of its ideas did go into the PyPi imitation. In conclusion, this question has been thought of and it has underwent some discussion, but I think it is unlikely Python will go through an overhaul to adapt something entirely different.

Q: Why do You Have Such a Strong Opinion against a Following Feature on Stack Overflow?

Preface: On Meta Stack Exchange, Martijn argued against a following feature for Stack Overflow, his point being that Stack Overflow is not a social network and should never be a social network. However, to many beginners, Stack Overflow experts are much clearer and the documentation, so they would literally follow him and other experts with trillions of points, as they would get more out of scrolling through their answers and reading them. Thus, this viewer is curious why Martijn is so opposed to the following feature.

The following feature brings in the danger where people start to focus too much on a subset of experts, which is why I am so against the following feature. I was once new as well. Though my account says I’ve been on Stack Overflow for five years, I didn’t do anything at the very beginning until 2~2.5 years ago. Before that, I only answered a couple of clone questions here and there, so I’m actually relatively new to the scene. I haven been around as long as Jon Skeet, for example. Therefore, as an upcoming/newcoming guy, I wouldn’t want to have to buy my way into an established network of what people follow. Moreover, Stack Overflow, to me, is all about the content itself with the questions and answers, so I’d rather see a feed of all the questions and highly voted posts, because I’m not the only guy out there writing Python answers.

That said, you can get an RSS feed on all Stack Overflow answerers if you wanted to. There is an RSS icon on every page, where you can just scroll down on the bottom right-hand side and click on them to get the url for the feed. You can even follow all questions and tags. As I learned Python too long ago, I don’t know what sites to recommend for learning Python, but I’d probably follow Planet Python’s RSS feed if I were a beginner.


Other posts in this series with Martijn Pieters:

Martijn Pieters

Need Martijn’s help? Book an 1-on-1 session!

or join us as an expert mentor!

September 25, 2015 04:29 AM

6 Useful Python Libraries Recommended by #1 Stack Overflow Answerer

Martijn Pieters, Stack Overflow Python Legend

Codementor Python expert and Stack Overflow legend Martijn Pieters sat down with us during Office Hours and shared some of the less-well known libraries that he likes to use for web development or daily tasks.

The text below is a summary done by the Codementor team and may vary from the original video and if you see any issues, please let us know!


 

Requests: HTTP for Humans

I am a huge fan of the requests library. If you have to do anything with http requests to other servers, this library goes straight past your urllib that comes with Python, and install requests. It is a fantastic tool and really has a much more intuitive API, which makes handling web requests much easier.

Beautiful Soup

Another great tool for web scraping is Beautiful Soup. When you’re web scraping and looking at web pages to extract information from, you should check this tool out. Beautiful Soup is a great html parsing library that makes and sends off the html “soup”.

Robobrowser


I recently came across a new library that marries the aforementioned two tools—Robobrowser. It is based on the requests library and beautiful soup to replace an older tool called mechanize, which does similar things. Namely, the Robobrowser lets you simulate a browser that goes out to the web, fetches html pages, lets you fill in forms, and submits those. Therefore, if you have to crawl a web page that requires forms or follow links to get to the data you want to get to and extract, take a look at Robobrowser, which can be also found on GitHub. Though it’s a relatively young project, I like the way the API works and the tool it uses.

Autobahn


On the server side of web development, Autobahn has piqued my interest. This tool makes programming web sockets easy, which is very helpful to me as a web developer. It’s also interesting as it makes use of Python 3.4’s new library.

Flake 8 Lint


For daily development, I like to use flake8 lint, which is based on Flake8. This Sublime Text plugin combines Pep8, which is for the Python style guide testing, together with Pyflakes, which detects common errors in your code (e.g. imports you forgot to add or imports you have too many of). Flake8 makes writing code much easier for me, and it integrates quite nicely with modern IDEs. I use sublime text and I have a plugin that puts flake8 error messages right at my fingertips. This is not a new project, but it is certainly something I use every day.

Scrapy

I haven’t actually come across a project where I use Scrapy, but to my knowledge Scrapy is excellent at extracting patterns for specific data. For example, if you want to scrape the latest football scores of Britain’s major leagues, you’d have to go to multiple sites and pages to extract the same information. Scrapy is good for such a task, as it can automate searching and walk through a site to extract very structured data you can specify through expressions. I’d love to have a project for me to use Scrapy that but I haven’t had the chance yet.


Other posts in this series with Martijn Pieters:

Martijn Pieters

Need Martijn’s help? Book a 1-on-1 session!

or join us as an expert mentor!

September 25, 2015 04:29 AM

Martijn Pieters on the Future of Django

 

Martijn Pieters, Stack Overflow Python Legend

Codementor Python expert and Stack Overflow legend Martijn Pieters joined us for an Office Hour session, and he dedicated some time to answer some of our viewers questions. With the rise of Ruby on Rail’s popularity in development, one of the viewers asked Martijn about his thoughts on the future of Django in web development and other system tools.

The text below is a summary done by the Codementor team and may vary from the original video and if you see any issues, please let us know!


 

 

I’m a web developer myself, and that’s a large focus on what I do, and I have always been strong in that. I started in Zope, which was a pioneer in its age, and I think Python is still pioneering many things in web development today.

In the meantime, the scientific community has really taken Python into its heart, making some fantastic tools such as SciPy, NumPy, and Pandas. These all capitalize on Python’s ease of use and dynamic nature to explore data sets, which allows these tools to do some great science. Recently, in universities, Python has taken over java as the number one learning language, so we’ll see a lot more of Python in the future as long as it finds new places to play its strengths—the ability to let people write readable code and to rapidly evolve the code. Therefore, Python is sure to earn a place in more areas that I can’t foresee.


Other posts in this series with Martijn Pieters:

Martijn Pieters

Need Martijn’s help? Book a 1-on-1 session!

or join us as an expert mentor!

September 25, 2015 04:29 AM

Python 2.7 vs Python 3.4 ─ What should Python Beginners choose?

Martijn Pieters, Stack Overflow Python Legend

Codementor Python expert and Stack Overflow legend Martijn Pieters joined us for an Office Hour session, in which he provided some informative opinions about several issues asked by our viewers.

As Martijn’s used Python 3.4 in his demo, one of the questions asked was whether if there is a good reason for Python beginners to move from 2.7 to 3.4

The text below is a summary done by the Codementor team and may vary from the original video and if you see any issues, please let us know!


 

Python 2.7 vs Python 3.4 ── What’s the difference and what should Python Beginners choose?

“This really depends on what third-party libraries you rely on.”

Personally, until Python 3.3 or 3.4, I would have stuck with Python 2.7, but I am now using Python 3.4 because of all the neat new features, and I think those changes will also make life easier for beginners. For example, the object-oriented path library and enumeration addition should be very helpful, and it’s not that difficult to switch over to Python 3. I think the Python developers have nailed the 3.x line, as it performs really well, and the interpreter uses history better than previous releases. Personally I really like the auto-completion features, which is why I used Python 3.4 in my demos. Moreover, Python 3.3. and 3.4 are getting all the developers’ attention, which means all the cool new additions are gathered in those versions, and they will be moving the ecosystem forward with the 3.x line. I believe 2.7 will be the last version for the 2.x series.

However, in Python 2.x a beginner won’t have to worry about all the implicit encoding and decoding issues that may arise when mixing byte strings and Unicode strings, so that may be something to consider.

The greatest difference between Python 2 and 3 is the libraries you use. If you are doing web development and use a lot of libraries that only support Python 2.x and have not yet been ported to Python 3.x, you might not have the same experience that makes Python so great.

In conclusion, beginners should have no problem switching over to Python 3, but if you are a beginner who depends on a lot of libraries that are 2.x only, then maybe it’s not such a great idea.


Other posts in this series with Martijn Pieters:

Martijn Pieters

Need Martijn’s help? Book a 1-on-1 session!

or join us as an expert mentor!

September 25, 2015 04:29 AM

Python Internals: Codementor Office Hours with Martijn Pieters

See the video here: Stack Overflow Legend Martijn Pieters: Python Optimization and How it Can Affect Your Code

Stack Overflow #1 Python Answerer Martijn Pieters

Codementor Python expert mentor Martijn Pieters is a legend on Stack Overflow.   A Stack Overflow all-star, he’s amassed a reputation of over 250,000 – ranking him at #2 so far this year and among the top 40 answerers of all time – over the course of responding to more than 10,000 questions.

Our interview with Martijn hit the front page of Hacker News  a few weeks ago, and now we’re super excited to have him as our next guest for our upcoming Office Hours!

Open Office Hours with Martijn Pieters: Python Internals: Optimization Choices Made

On Aug 6 at 11am PDT, Martin will host an open office hours to talk about Python and specifically about Python internals and optimization choices made.  In Martijn’s own words:

The CPython developers have made specific optimisations in the interpreter that may affect how your Python code runs. We’ll explore these choices and what they mean for your Python code.


What are Codementor Office Hours? 
This is a special free event sponsored by Codementor.  You are invited to a free session with Codementor Python��expert mentor Martijn Pieters.  In an interactive small group setting, Martijn will discuss with you about Python internals and any Python related questions you’d like to ask about.

When: Wednesday August 6th, 11am PDT / 2pm EDT
Where: Codementor Office Hours @ Google Hangouts
Cost: Free 

Only 8 spaces available - RSVP now!

To RSVP: Tweet about this and tell us:
a) why you’d like to attend and
b) (optional) what questions do you have for Matijn

September 25, 2015 04:29 AM

Tutorial: How to Create Custom Exceptions in Python

Ankur Ankan is the lead developer of pgmpy, a Python library for Probabilistic Graphical Models. He mentors students at Google Summer Code and has great passion for Python programming.

This tutorial was originally posted in his blog.


 

While writing tests we often need to compare objects defined by us and we then have to compare each attribute of the object one by one.

Let’s take an example of a simple Car class. The Car object has two attributes speed and color.

class Car:
    def __init__(self, speed, color):
        self.speed = speed
        self.color = color

So if you try writing test for it using unittest.TestCase.AssertEqual method.

import unittest

class TestCar(unittest.TestCase):
    def test_car_equal(self):
       car1 = Car(40, 'red')
       car2 = Car(40, 'red')
       self.assertEqual(car1, car2)

If you now try to run this test, the test case will fail.

self.assertEqual(Car(40, 'red'), Car(40, 'red'))
AssertionError: <assert.Car object at 0x7f8edbc078d0> != <assert.Car object at 0x7f8edbc07908>

This is because unittest doesn’t know how to compare two Car objects.

So, we have a two options to make this work:
1. We can define a __eq__ method in Car class.
2. We can create a custom assertion class.

1. __eq__ method

__eq__ is one of the rich comparison methods. Whenever == is used on any object its __eq__ method is called. For example car1 == car2 internally calls car1.__eq__(car2).

Our new car class:

class Car:
   def __init__(self, speed, color):
      self.speed = speed
      self.color = color

   def __eq__(self, another_car):
       if type(self)==type(another_car) and self.speed==another_car.speed and
               self.color==another_car.color:
           return True
        else:
           return False

And now the same test passes. But when car1 is not equal to car2, the test fails and the output is not at all informative about why the test failed. It simply prints:

self.assertEqual(car1, car2)
AssertionError: <Car object at 0x7f1f86af6a20> != <Car object at 0x7f1f86af6a58>

So, one drawback of this approach is that we can’t show some message about why the equality failed. Showing an error message by printing it wouldn’t be a good design because when the user does a simple car1 == car2 and if they are not equal, __eq__ would still print the details of why they are not equal.

Now, we see the second approach in which we can give a detailed message about why the equality failed.

2. Custom assertion class

We can write a custom assertion class for out car class.

class AssertCarEqual:
    def assert_car_equal(self, car1, car2):
        if type(car1) != type(car2):
            raise AssertionError("car1 is of type: ", type(car1), " and car2 is of type: ", type(car2))
        elif car1.speed != car2.speed:
            raise AssertionError("speed of car1: ", car1.speed, " and speed of car2: ", car2.speed)
        elif car1.color != car2.color:
            raise AssertionError("color of car1: ", car1.color, " and color of car2: ", car2.color)

And modifying the test like this:

class TestCar(unittest.TestCase, AssertCarEqual):
    def test_car_equal(self):
        car1 = car.Car(40, 'blue')
        car2 = car.Car(40, 'red')
        self.assert_car_equal(car1,car2)

Now when we run the test, the test fails and gives a message about why the test failed.

raise AssertionError("color of car1: ", car1.color, " and color of car2: ", car2.color)
AssertionError: ('color of car1: ', 'blue', ' and color of car2: ', 'red')

Ankur AnkanNeed Ankur’s help? Book a 1-on-1 session!

or join us as an expert mentor!

September 25, 2015 04:29 AM


Stephen Ferg

Workaround for flask/babel/sphinx bug on Python 3+

I’m using Python 3.4 on Windows. Recently I tried to install and use Sphinx.  When I did, I encountered an error that ended with the string

an integer is required (got type str)

Googling that string, I found an explanation of the problem on stackoverflow, HERE. As Andy Skirrow wrote on August 3, 2015, this is a bug in the current distribution of babel.

a pickled file babel/global.dat is included in a the package and this can’t be read by python 3 because it was created by script running under python 2.

The problem (as I understand it) is that Python 2.x pickles/unpickles datetime objects as ASCII strings, but Python 3.x pickles/unpickles them as Unicode strings.

The babel folks are working on it. But until they fix it, I needed a really simple-minded solution.  I needed something that would fit my brain.

Fortunately, I still had Python 2.7 installed on my PC.  So here is what I did.

  1. I went into the Python34 site-packages/babel folder, found the globals.dat file, and copied it into the Python27 folder.
  2. I wrote a program fix_a to unpickle (load) the globals.dat file and save it as a string. I saved this program in the Python27 folder, and ran it under Python 2.7.
  3. I wrote a program fix_b that imported the datetime module and repickled the string into a file named globals.dat. I saved this program in the Python34 folder, and ran it under Python 3.4.
  4. I copied the new globals.dat file over the original globals.dat file in the Python34 site-packages/babel folder.

It worked.  Sphinx is now working fine.


The text of fix_a.py is:

import pickle
f = open("global.dat","rb")
obj = pickle.load(f)
f.close()
print(repr(obj))

and I ran it this way, from inside the Python27 folder:

python -m fix_a > junk.txt

The text of fix_b.py is:

import datetime
import pickle
d = [copied the text of junk.txt here]
f = open("global.dat", "wb")
pickle.dump(d, f)
f.close()

and I ran it this way, from inside the Python34 folder:

python -m fix_b

To save you some trouble, here is the Python 3.4 globals.dat file that I made.  Because WordPress wouldn’t allow me to upload it with a “dat” extension, it has a “doc” extension.  When you download it, you should rename it and give it a “dat” extension.

global.doc


September 25, 2015 04:09 AM

September 24, 2015


Julien Tayon

Querying complex unknown database with python

In my job, I sometimes have to deal with web applications that are pointing to half a dozen databases with an overall of more than 160 tables.

Not to say it can be complex, but sometimes it is :)

So I made a small script using 2 of my favourite technologies to deal with that: graphviz, and sqlsoup.

Sqlsoup is a lesser known child of the great software that Mike Bayer (zzeek) made: sqlalchemy.

I have been told that when you know an ORM you don't need to know SQL. I strongly disagree (especially when it comes to performance issues with wrong types of indexes or lack of). However, when you use the declarative sqlalchemy syntax, sqlalchemy does a lot of thinks rights that may helps a lot: it creates foreign keys when you use the backrefs (unless you use MyISAM). And sometimes you have a de-synchronization in your model (ho! someone made an alter table with SQL), you need to be able to do stuff on the data and your model does not help.

And these foreign keys helps a lot to construct an entity diagram relationship of the python objects that may be used.... to navigate easily in a flow of data. Because, the model may be a nice map. But sometimes you may need to rebuild it when the map is not the territory anymore. And time is pressing.


gist here https://gist.github.com/jul/e255d76590930545d383


Here, as a test case I used turbogears quickstart that I consider an under-rated framework. It is very well written and correct, and it has a lot of production features I like: possibility to easily change the technologies you don't like in the stack (mako or genshi, it is your choice), database versioning ...

The only stuff a quickstart builds is the data for authorization/authentication.

Here is the result of this script (of course, installing graphviz is mandatory to transform the out.dot file to a picture, and as you see the window's version has some glitches for the fonts)

turbogears quickstart database construction

and what is nice is that with using the interactive option you can directly go in the database with your diagram under your nose and directly get your data:

ipython -i generate_diagram.py postgresql://tg@localhost/tg
 

#postgresql://tg@localhost/tg problem with u'migrate_version'
#(...)
#SQLSoupError("table 'migrate_version' does not have a primary key defined",) 
#nb col = 17 
#nb fk = 4 
In [1]: db.tg_permission.join(db.tg_group_permission).join(db.tg_group).join(db. tg_user_group).join(db.tg_user).filter(db.tg_user.user_name == 'editor').all() Out[1]:[ MappedTg_permission(permission_id=2,permission_name=u'read',description=u'editor can read'), MappedTg_permission(permission_id=3,permission_name=u'see stats',description=u' can see stats') ]

I thought of making a module. But sometimes in the turmoil of production you don't have the time for it. A single short script is sometimes what you need with very few requirements that can be tweaked fast to adapt.

Note: Alembic (still from Mike Bayer) -that is nicely included in turbogears- can of course generate sqlalchemy models from a db and even make a diff between a database and an sqlalchemy declarative model. I don't know this guy but I must admit that even if sqlalchemy is a tad heavy when used correctly his softwares are great. And even if it is heavy pip install works fairly well in regard of the stability and complexity of the API.

Mister Mike Bayer -if you read me- I am a fanboy of yours and I like how you answer to people on the mailing list.

PS: yes I know about the graphviz module. But I use graphviz so often, it is faster for me to just write directly in a file.

September 24, 2015 11:44 PM


Mahmoud Hashemi

Remap: Nested Data Multitool for Python

This entry is the first in a series of "cookbooklets" showcasing more advanced Boltons. If all goes well, the next 5 minutes will literally save you 5 hours.

Intro

Data is everywhere, especially within itself. That's right, whether it's public APIs, document stores, or plain old configuration files, data will nest. And that nested data will find you.

UI fads aside, developers have always liked "flat". Even Python, so often turned to for data wrangling, only has succinct built-in constructs for dealing with flat data. List comprehensions, generator expressions, map/filter, and itertools are all built for flat work. In fact, the allure of flat data is likely a direct result of this common gap in most programming languages.

Let's change that. First, let's meet this nested adversary. Provided you overlook my taste in media, it's hard to fault nested data when it reads as well as this YAML:

reviews:
  shows:
    - title: Star Trek - The Next Generation
      rating: 10
      review: Episodic AND deep. <3 Data.
      tags: ['space']
    - title: Monty Python's Flying Circus
      rating: 10
      tags: ['comedy']
  movies:
    - title: The Hitchiker's Guide to the Galaxy
      rating: 6
      review: So great to see Mos Def getting good work.
      tags: ['comedy', 'space', 'life']
    - title: Monty Python's Meaning of Life
      rating: 7
      review: Better than Brian, but not a Holy Grail, nor Completely Different.
      tags: ['comedy', 'life']
      prologue:
        title: The Crimson Permanent Assurance
        rating: 9

Even this very straightforwardly nested data can be a real hassle to manipulate. How would one add a default review for entries without one? How would one convert the ratings to a 5-star scale? And what does all of this mean for more complex real-world cases, exemplified by this excerpt from a real GitHub API response:

[{
    "id": "3165090957",
    "type": "PushEvent",
    "actor": {
      "id": 130193,
      "login": "mahmoud",
      "gravatar_id": "",
      "url": "https://api.github.com/users/mahmoud",
      "avatar_url": "https://avatars.githubusercontent.com/u/130193?"
    },
    "repo": {
      "id": 8307391,
      "name": "mahmoud/boltons",
      "url": "https://api.github.com/repos/mahmoud/boltons"
    },
    "payload": {
      "push_id": 799258895,
      "size": 1,
      "distinct_size": 1,
      "ref": "refs/heads/master",
      "head": "27a4bc1b6d1da25a38fe8e2c5fb27f22308e3260",
      "before": "0d6486c40282772bab232bf393c5e6fad9533a0e",
      "commits": [
        {
          "sha": "27a4bc1b6d1da25a38fe8e2c5fb27f22308e3260",
          "author": {
            "email": "mahmoud@hatnote.com",
            "name": "Mahmoud Hashemi"
          },
          "message": "switched reraise_visit to be just a kwarg",
          "distinct": true,
          "url": "https://api.github.com/repos/mahmoud/boltons/commits/27a4bc1b6d1da25a38fe8e2c5fb27f22308e3260"
        }
      ]
    },
    "public": true,
    "created_at": "2015-09-21T10:04:37Z"
}]

The astute reader may spot some inconsistency and general complexity, but don't run away.

Remap, the recursive map, is here to save the day.

Remap is a Pythonic traversal utility that creates a transformed copy of your nested data. It uses three callbacks -- visit, enter, and exit -- and is designed to accomplish the vast majority of tasks by passing only one function, usually visit. The API docs have full descriptions, but the basic rundown is:

It may sound complex, but the examples shed a lot of light. So let's get remapping!

Normalize keys and values

First, let's import the modules and data we'll need.

import json
import yaml  # https://pypi.python.org/pypi/PyYAML
from boltons.iterutils import remap  # https://pypi.python.org/pypi/boltons

review_map = yaml.load(media_reviews)

event_list = json.loads(github_events)

Now let's turn back to that GitHub API data. Earlier one may have been annoyed by the inconsistent type of id. event['repo']['id'] is an integer, but event['id'] is a string. When sorting events by ID, you would not want string ordering.

With remap, fixing this sort inconsistency couldn't be easier:

from boltons.iterutils import remap

def visit(path, key, value):
    if key == 'id':
        return key, int(value)
    return key, value

remapped = remap(event_list, visit=visit)

assert remapped[0]['id'] == 3165090957

# You can even do it in one line:
remap(event_list, lambda p, k, v: (k, int(v)) if k == 'id' else (k, v))

By default, visit gets called on every item in the root structure, including lists, dicts, and other containers, so let's take a closer look at its signature. visit takes three arguments we're going to see in all of remap's callbacks:

key and value are exactly what you would expect, though it may bear mentioning that the key for a list item is its index. path refers to the keys of all the parents of the current item, not including the key. For example, looking at the GitHub event data, the commit author's name's path is (0, 'payload', 'commits', 0, 'author'), because the key, name, is located in the author of the first commit in the payload of the first event.

As for the return signature of visit, it's very similar to the input. Just return the new (key, value) you want in the remapped output.

Drop empty values

Next up, GitHub's move away from Gravatars left an artifact in their API: a blank 'gravatar_id' key. We can get rid of that item, and any other blank strings, in a jiffy:

drop_blank = lambda p, k, v: v != ""
remapped = remap(event_list, visit=drop_blank)

assert 'gravatar_id' not in remapped[0]['actor']

Unlike the previous example, instead of a (key, value) pair, this visit is returning a bool. For added convenience, when visit returns True, remap carries over the original item unmodified. Returning False drops the item from the remapped structure.

With the ability to arbitrarily transform items, pass through old items, and drop items from the remapped structure, it's clear that the visit function makes the majority of recursive transformations trivial. So many tedious and error-prone lines of traversal code turn into one-liners that usually remap with a visit callback is all one needs. With that said, the next recipes focus on remap's more advanced callable arguments, enter and exit.

Convert dictionaries to OrderedDicts

So far we've looked at actions on remapping individual items, using the visit callable. Now we turn our attention to actions on containers, the parent objects of individual items. We'll start doing this by looking at the enter argument to remap.

# from collections import OrderedDict
from boltons.dictutils import OrderedMultiDict as OMD
from boltons.iterutils import remap, default_enter

def enter(path, key, value):
    if isinstance(value, dict):
        return OMD(), sorted(value.items())
    return default_enter(path, key, value)

remapped = remap(review_list, enter=enter)
assert remapped['reviews'].keys()[0] == 'movies'
# True because 'reviews' is now ordered and 'movies' comes before 'shows'

The enter callable controls both if and how an object is traversed. Like visit, it accepts path, key, and value. But instead of (key, value), it returns a tuple of (new_parent, items). new_parent is the container that will receive items remapped by the visit callable. items is an iterable of (key, value) pairs that will be passed to visit. Alternatively, items can be False, to tell remap that the current value should not be traversed, but that's getting pretty advanced. The API docs have some other enter details to consider.

Also note how this code builds on the default remap logic by calling through to the default_enter function, imported from the same place as remap itself. Most practical use cases will want to do this, but of course the choice is yours.

Sort all lists

The last example used enter to interact with containers before they were being traversed. This time, to sort all lists in a structure, we'll use the remap's final callable argument: exit.

from boltons.iterutils import remap, default_exit

def exit(path, key, old_parent, new_parent, new_items):
    ret = default_exit(path, key, old_parent, new_parent, new_items)
    if isinstance(ret, list):
        ret.sort()
    return ret

remap(review_list, exit=exit)

Similar to the enter example, we're building on remap's default behavior by importing and calling default_exit. Looking at the arguments passed to exit and default_exit, there's the path and key that we're used to from visit and enter. value is there, too, but it's named old_parent, to differentiate it from the new value, appropriately called new_parent. At the point exit is called, new_parent is just an empty structure as constructed by enter, and exit's job is to fill that new container with new_items, a list of (key, value) pairs returned by remap's calls to visit. Still with me?

Either way, here we don't interact with the arguments. We just call default_exit and work on its return value, new_parent, sorting it in-place if it's a list. Pretty simple! In fact, very attentive readers might point out this can be done with visit, because remap's very next step is to call visit with the new_parent. You'll have to forgive the contrived example and let it be a testament to the rarity of overriding exit. Without going into the details, enter and exit are most useful when teaching remap how to traverse nonstandard containers, such as non-iterable Python objects. As mentioned in the "drop empty values" example, remap is designed to maximize the mileage you get out of the visit callback. Let's look at an advanced usage reason that's true.

Collect interesting values

Sometimes you just want to traverse a nested structure, and you don't need the result. For instance, if we wanted to collect the full set of tags used in media reviews. Let's create a remap-based function, get_all_tags:

def get_all_tags(root):
    all_tags = set()

    def visit(path, key, value):
        all_tags.update(value['tags'])
        return False

    remap(root, visit=visit, reraise_visit=False)

    return all_tags

print(get_all_tags(review_map))
# set(['space', 'comedy', 'life'])

Like the first recipe, we've used the visit argument to remap, and like the second recipe, we're just returning False, because we don't actually care about contents of the resulting structure.

What's new here is the reraise_visit=False keyword argument, which tells remap to keep any item that causes a visit exception. This practical convenience lets visit functions be shorter, clearer, and just more EAFP. Reducing the example to a one-liner is left as an exercise to the reader.

Add common keys

As a final advanced remap example, let's look at adding items to structures. Through the examples above, we've learned that visit is best-suited for 1:1 transformations and dropping values. This leaves us with two main approaches for addition. The first uses the enter callable and is suitable for making data consistent and adding data which can be overridden.

base_review = {'title': '',
               'rating': None,
               'review': '',
               'tags': []}

def enter(path, key, value):
    new_parent, new_items = default_enter(path, key, value)
    try:
        new_parent.update(base_review)
    except:
        pass
    return new_parent, new_items

remapped = remap(review_list, enter=enter)

assert review_list['shows'][1]['review'] == ''
# True, the placeholder review is holding its place

The second method uses the exit callback to override values and calculate new values from the new data.

def exit(path, key, old_parent, new_parent, new_items):
    ret = default_exit(path, key, old_parent, new_parent, new_items)
    try:
        ret['review_length'] = len(ret['review'])
    except:
        pass
    return ret

remapped = remap(review_list, exit=exit)

assert remapped['shows'][0]['review_length'] == 27
assert remapped['movies'][0]['review_length'] == 42
# True times two.

By now you might agree that remap is making such feats positively routine. Come for the nested data manipulation, stay for the number jokes.

Corner cases

This whole guide has focused on data that came from "real-world" sources, such as JSON API responses. But there are certain rare cases which typically only arise from within Python code: self-referential objects. These are objects that contain references to themselves or their parents. Have a look at this trivial example:

self_ref = []
self_ref.append(self_ref)

The experienced programmer has probably seen this before, but most Python coders might even think the second line is an error. It's a list containing itself, and it has the rather cool repr: [[...]].

Now, this is pretty rare, but reference loops do come up in programming. The good news is that remap handles these just fine:

print(repr(remap(self_ref)))
# prints "[[...]]"

The more common corner case that arises is that of duplicate references, which remap also handles with no problem:

my_set = set()

dupe_ref = (my_set, [my_set])
remapped = remap(dupe_ref)

assert remapped[0] is remapped[-1][-1]
# True, of course

Two references to the same set go in, two references to a copy of that set come out. That's right: only one copy is made, and then used twice, preserving the original structure.

Wrap-up

If you've made it this far, then I hope you'll agree that remap is useful enough to be your new friend. If that wasn't enough detail, then there are the docs. remap is well-tested, but making something this general-purpose is a tricky area. Please file bugs and requests. Don't forget about pprint and repr/reprlib, which can help with reading large structures. As always, stay tuned for future boltons cookbooklets, and much much more.


September 24, 2015 07:25 PM


Evennia

Pushing through a straw

Recently, a user reported a noticeable delay between sending a command in-game to multiple users and the result of that command appearing to everyone. You didn't notice this when testing alone but I could confirm there was almost a second delay sometimes between entering the command and some users seeing the result. A second is very long for stuff like this. Processing time for a single command is usually in the milliseconds. What was going on?

Some background 

Evennia has two components, the Portal and the Server, running as two separate processes. The basic principle is that players connecting to an Evennia instance connects to the Portal side - this is the outward facing part of Evennia. The connection data and any other input and output will be piped from the Portal to the Server and back again.

The main reason for this setup is that it allows us to completely reset the Server component (reloading module data in memory is otherwise error-prone or at least very tricky to make generically work in Python) without anyone getting disconnected from the Portal. On the whole it works very well.

Debugging


Tracing of timings throughout the entire Server<->Portal pipeline quickly led me to rule out the command handler or any of the core systems being responsible - including both sides of the process divide was a few milliseconds. But in the transfer between Portal and Server, an additional 900 miliseconds were suddenly added! This was clearly the cause for the delay.

Turns out that it all came down to faulty optimization. At some point I built a batch-send algorithm between the Server and Portal. The idea was to group command data together if they arrived too fast - bunch them together and send them as a single batch. In theory this would be more efficient once the rate of command sending increased. It was partly a DoS protection, partly a way to optimize transfer handling.

The (faulty) idea was to drop incoming data into a queue and if the rate was too high, wait to empty that queue until a certain "command/per second" rate was fullfilled. There was also a timed routine that every second force-emptied the queue to make sure it would be cleaned up also if noone made any further commands.

In retrospect it sounds silly but the "rate of command" was based on a simple two-time-points derivative;

rate = 1 / (now - last_command_time)

If this rate exceeded a set value, the batch queuing mechanism would kick in. The issue with this (which is easy to see now) is that if you set your limit at (say) 100 commands / second, two commands can happen to enter so close to each other time that their rate far exceed that limit just based on the time passed between them. But there are not actually 100 commands coming in every second which is really what the mechanism was meant to react to.

So basically using a moment-to-moment rate like this is just too noisy to be useful; the value will jump all over the place. The slowdown seen was basically the DoS protection kicking in because when you are relaying data to other users, each of them will receive "commands" in quick succession - fast enough to trigger the limiter. These would be routed to the queue and the sometimes-delay simply depended on  when the queue cleanup mechanism happened to kick in.

Resolution


Once having identfied the rate measuring bug, the obvious solution to this would be to gather command rates over a longer time period and take an average - you will then filter out the moment-to-moment noise and get an actually useful rate.

Instead I ended up going with an even simpler solution: Every command that comes in ups a counter. If I want a command rate limit of 100 commands/second, I wait until that counter reaches 100. At that point I check when the time difference between now and when the counter was last reset. If this value is below 1, well then our command rate is higher than 100/second and I can kick in whatever queuing or limiter is needed. The drawback is that until you have 100 commands you won't know the rate. In practice though, once the rate is high enough to be of interest, this simple solution leads to an automatic check with minimal overhead.

In the end I actually removed the batch-sending component completely and instead added command DoS protection higher up on the Portal side. The Command-input is now rate limited using the same count-until-limit mechanism. Seems to work fine. People have no artificial slowdowns anymore and the DoS limiter will only kick in at loads that are actually relevant. And so all was again well in Evennia world.

September 24, 2015 06:28 PM


Caktus Consulting Group

Introduction to Monte Carlo Tree Search

For DjangoCon 2015, Jeff Bradberry created an A.I. for our booth game, Ultimate Tic Tac Toe. Reprinted here from jeffbradberry.com is his explanation of the Monte Carlo Tree Search used to build the A.I.

The subject of game AI generally begins with so-called perfect information games. These are turn-based games where the players have no information hidden from each other and there is no element of chance in the game mechanics (such as by rolling dice or drawing cards from a shuffled deck). Tic Tac Toe, Connect 4, Checkers, Reversi, Chess, and Go are all games of this type. Because everything in this type of game is fully determined, a tree can, in theory, be constructed that contains all possible outcomes, and a value assigned corresponding to a win or a loss for one of the players. Finding the best possible play, then, is a matter of doing a search on the tree, with the method of choice at each level alternating between picking the maximum value and picking the minimum value, matching the different players' conflicting goals, as the search proceeds down the tree. This algorithm is called Minimax.

The problem with Minimax, though, is that it can take an impractical amount of time to do a full search of the game tree. This is particularly true for games with a high branching factor, or high average number of available moves per turn. This is because the basic version of Minimax needs to search all of the nodes in the tree to find the optimal solution, and the number of nodes in the tree that must be checked grows exponentially with the branching factor. There are methods of mitigating this problem, such as searching only to a limited number of moves ahead (or ply) and then using an evaluation function to estimate the value of the position, or by pruning branches to be searched if they are unlikely to be worthwhile. Many of these techniques, though, require encoding domain knowledge about the game, which may be difficult to gather or formulate. And while such methods have produced Chess programs capable of defeating grandmasters, similar success in Go has been elusive, particularly for programs playing on the full 19x19 board.

However, there is a game AI technique that does do well for games with a high branching factor and has come to dominate the field of Go playing programs. It is easy to create a basic implementation of this algorithm that will give good results for games with a smaller branching factor, and relatively simple adaptations can build on it and improve it for games like Chess or Go. It can be configured to stop after any desired amount of time, with longer times resulting in stronger game play. Since it doesn't necessarily require game-specific knowledge, it can be used for general game playing. It may even be adaptable to games that incorporate randomness in the rules. This technique is called Monte Carlo Tree Search. In this article I will describe how Monte Carlo Tree Search (MCTS) works, specifically a variant called Upper Confidence bound applied to Trees (UCT) , and then will show you how to build a basic implementation in Python.

Imagine, if you will, that you are faced with a row of slot machines, each with different payout probabilities and amounts. As a rational person (if you are going to play them at all), you would prefer to use a strategy that will allow you to maximize your net gain. But how can you do that? For whatever reason, there is no one nearby, so you can't watch someone else play for a while to gain information about which is the best machine. Clearly, your strategy is going to have to balance playing all of the machines to gather that information yourself, with concentrating your plays on the observed best machine. One strategy, called Upper Confidence Bound 1 (UCB1), does this by constructing statistical confidence intervals for each machine

xi±((2lnn)/(ni))

where:
  • xi: the mean payout for machine i
  • ni: the number of plays of machine i
  • n: the total number of plays

Then, your strategy is to pick the machine with the highest upper bound each time. As you do so, the observed mean value for that machine will shift and its confidence interval will become narrower, but all of the other machines' intervals will widen. Eventually, one of the other machines will have an upper bound that exceeds that of your current one, and you will switch to that one. This strategy has the property that your regret, the difference between what you would have won by playing solely on the actual best slot machine and your expected winnings under the strategy that you do use, grows only as O(lnn). This is the same big-O growth rate as the theoretical best for this problem (referred to as the multi-armed bandit problem), and has the additional benefit of being easy to calculate.

And here's how Monte Carlo comes in. In a standard Monte Carlo process, a large number of random simulations are run, in this case, from the board position that you want to find the best move for. Statistics are kept for each possible move from this starting state, and then the move with the best overall results is returned. The downside to this method, though, is that for any given turn in the simulation, there may be many possible moves, but only one or two that are good. If a random move is chosen each turn, it becomes extremely unlikely that the simulation will hit upon the best path forward. So, UCT has been proposed as an enhancement. The idea is this: any given board position can be considered a multi-armed bandit problem, if statistics are available for all of the positions that are only one move away. So instead of doing many purely random simulations, UCT works by doing many multi-phase playouts.

The first phase, selection, lasts while you have the statistics necessary to treat each position you reach as a multi-armed bandit problem. The move to use, then, would be chosen by the UCB1 algorithm instead of randomly, and applied to obtain the next position to be considered. Selection would then proceed until you reach a position where not all of the child positions have statistics recorded.

Selection: Here the positions and moves selected by the UCB1 algorithm. Note that a number of playouts have already been run to accumulate the statistics shown. Each circle contains the number of wins / number of times played.
Selection. Here the positions and moves selected by the UCB1 algorithm at each step are marked in bold. Note that a number of playouts have already been run to accumulate the statistics shown. Each circle contains the number of wins / number of times played.

The second phase, expansion, occurs when you can no longer apply UCB1. An unvisited child position is randomly chosen, and a new record node is added to the tree of statistics.

Expansion: The position marked 1/1 at the bottom of the tree has no further statistics records under it, so we choose a random move and add a new record for it (bold), initialized to 0/0.
Expansion. The position marked 1/1 at the bottom of the tree has no further statistics records under it, so we choose a random move and add a new record for it (bold), initialized to 0/0.

After expansion occurs, the remainder of the playout is in phase 3, simulation. This is done as a typical Monte Carlo simulation, either purely random or with some simple weighting heuristics if a light playout is desired, or by using some computationally expensive heuristics and evaluations for a heavy playout. For games with a lower branching factor, a light playout can give good results.

Simulation: Once the new record is added, the Monte Carlo simulation begins, here depicted with a dashed arrow. Moves in the simulation may be completely random, or may use calculations to weight the randomness in favor of moves that may be better.
Simulation. Once the new record is added, the Monte Carlo simulation begins, here depicted with a dashed arrow. Moves in the simulation may be completely random, or may use calculations to weight the randomness in favor of moves that may be better.

Finally, the fourth phase is the update or back-propagation phase. This occurs when the playout reaches the end of the game. All of the positions visited during this playout have their play count incremented, and if the player for that position won the playout, the win count is also incremented.

Back-Propagation: After the simulation reaches an end, all of the records in the path taken are updated. Each has its play count incremented by one, and each that matches the winner has its win count incremented by one, here shown by the bolded numbers.
Back-Propagation. After the simulation reaches an end, all of the records in the path taken are updated. Each has its play count incremented by one, and each that matches the winner has its win count incremented by one, here shown by the bolded numbers.

This algorithm may be configured to stop after any desired length of time, or on some other condition. As more and more playouts are run, the tree of statistics grows in memory and the move that will finally be chosen will converge towards the actual optimal play, though that may take a very long time, depending on the game.

For more details about the mathematics of UCB1 and UCT, see Finite-time Analysis of the Multiarmed Bandit Problem and Bandit based Monte-Carlo Planning.

Now let's see some code. To separate concerns, we're going to need a Board class, whose purpose is to encapsulate the rules of a game and which will care nothing about the AI, and a MonteCarlo class, which will only care about the AI algorithm and will query into the Board object in order to obtain information about the game. Let's assume a Board class supporting this interface:

class Board(object):
    def start(self):
        # Returns a representation of the starting state of the game.
        pass

    def current_player(self, state):
        # Takes the game state and returns the current player's
        # number.
        pass

    def next_state(self, state, play):
        # Takes the game state, and the move to be applied.
        # Returns the new game state.
        pass

    def legal_plays(self, state_history):
        # Takes a sequence of game states representing the full
        # game history, and returns the full list of moves that
        # are legal plays for the current player.
        pass

    def winner(self, state_history):
        # Takes a sequence of game states representing the full
        # game history.  If the game is now won, return the player
        # number.  If the game is still ongoing, return zero.  If
        # the game is tied, return a different distinct value, e.g. -1.
        pass

For the purposes of this article I'm not going to flesh this part out any further, but for example code you can find one of my implementations on github. However, it is important to note that we will require that the state data structure is hashable and equivalent states hash to the same value. I personally use flat tuples as my state data structures.

The AI class we will be constructing will support this interface:

class MonteCarlo(object):
    def __init__(self, board, **kwargs):
        # Takes an instance of a Board and optionally some keyword
        # arguments.  Initializes the list of game states and the
        # statistics tables.
        pass

    def update(self, state):
        # Takes a game state, and appends it to the history.
        pass

    def get_play(self):
        # Causes the AI to calculate the best move from the
        # current game state and return it.
        pass

    def run_simulation(self):
        # Plays out a "random" game from the current position,
        # then updates the statistics tables with the result.
        pass

Let's begin with the initialization and bookkeeping. The board object is what the AI will be using to obtain information about where the game is going and what the AI is allowed to do, so we need to store it. Additionally, we need to keep track of the state data as we get it.

class MonteCarlo(object):
    def __init__(self, board, **kwargs):
        self.board = board
        self.states = []

    def update(self, state):
        self.states.append(state)

The UCT algorithm relies on playing out multiple games from the current state, so let's add that next.

import datetime

class MonteCarlo(object):
    def __init__(self, board, **kwargs):
        # ...
        seconds = kwargs.get('time', 30)
        self.calculation_time = datetime.timedelta(seconds=seconds)

    # ...

    def get_play(self):
        begin = datetime.datetime.utcnow()
        while datetime.datetime.utcnow() - begin < self.calculation_time:
            self.run_simulation()

Here we've defined a configuration option for the amount of time to spend on a calculation, and get_play will repeatedly call run_simulation until that amount of time has passed. This code won't do anything particularly useful yet, because we still haven't defined run_simulation, so let's do that now.

# ...
from random import choice

class MonteCarlo(object):
    def __init__(self, board, **kwargs):
        # ...
        self.max_moves = kwargs.get('max_moves', 100)

    # ...

    def run_simulation(self):
        states_copy = self.states[:]
        state = states_copy[-1]

        for t in xrange(self.max_moves):
            legal = self.board.legal_plays(states_copy)

            play = choice(legal)
            state = self.board.next_state(state, play)
            states_copy.append(state)

            winner = self.board.winner(states_copy)
            if winner:
                break

This adds the beginnings of the run_simulation method, which either selects a move using UCB1 or chooses a random move from the set of legal moves each turn until the end of the game. We have also introduced a configuration option for limiting the number of moves forward that the AI will play.

You may notice at this point that we are making a copy of self.states and adding new states to it, instead of adding directly to self.states. This is because self.states is the authoritative record of what has happened so far in the game, and we don't want to mess it up with these speculative moves from the simulations.

Now we need to start keeping statistics on the game states that the AI hits during each run of run_simulation. The AI should pick the first unknown game state it reaches to add to the tables.

class MonteCarlo(object):
    def __init__(self, board, **kwargs):
        # ...
        self.wins = {}
        self.plays = {}

    # ...

    def run_simulation(self):
        visited_states = set()
        states_copy = self.states[:]
        state = states_copy[-1]
        player = self.board.current_player(state)

        expand = True
        for t in xrange(self.max_moves):
            legal = self.board.legal_plays(states_copy)

            play = choice(legal)
            state = self.board.next_state(state, play)
            states_copy.append(state)

            # `player` here and below refers to the player
            # who moved into that particular state.
            if expand and (player, state) not in self.plays:
                expand = False
                self.plays[(player, state)] = 0
                self.wins[(player, state)] = 0

            visited_states.add((player, state))

            player = self.board.current_player(state)
            winner = self.board.winner(states_copy)
            if winner:
                break

        for player, state in visited_states:
            if (player, state) not in self.plays:
                continue
            self.plays[(player, state)] += 1
            if player == winner:
                self.wins[(player, state)] += 1

Here we've added two dictionaries to the AI, wins and plays, which will contain the counts for every game state that is being tracked. The run_simulation method now checks to see if the current state is the first new one it has encountered this call, and, if not, adds the state to both plays and wins, setting both values to zero. This method also adds every game state that it goes through to a set, and at the end updates plays and wins with those states in the set that are in the plays and wins dicts. We are now ready to base the AI's final decision on these statistics.

from __future__ import division
# ...

class MonteCarlo(object):
    # ...

    def get_play(self):
        self.max_depth = 0
        state = self.states[-1]
        player = self.board.current_player(state)
        legal = self.board.legal_plays(self.states[:])

        # Bail out early if there is no real choice to be made.
        if not legal:
            return
        if len(legal) == 1:
            return legal[0]

        games = 0
        begin = datetime.datetime.utcnow()
        while datetime.datetime.utcnow() - begin < self.calculation_time:
            self.run_simulation()
            games += 1

        moves_states = [(p, self.board.next_state(state, p)) for p in legal]

        # Display the number of calls of `run_simulation` and the
        # time elapsed.
        print games, datetime.datetime.utcnow() - begin

        # Pick the move with the highest percentage of wins.
        percent_wins, move = max(
            (self.wins.get((player, S), 0) /
             self.plays.get((player, S), 1),
             p)
            for p, S in moves_states
        )

        # Display the stats for each possible play.
        for x in sorted(
            ((100 * self.wins.get((player, S), 0) /
              self.plays.get((player, S), 1),
              self.wins.get((player, S), 0),
              self.plays.get((player, S), 0), p)
             for p, S in moves_states),
            reverse=True
        ):
            print "{3}: {0:.2f}% ({1} / {2})".format(*x)

        print "Maximum depth searched:", self.max_depth

        return move

We have added three things in this step. First, we allow get_play to return early if there are no choices or only one choice to make. Next, we've added output of some debugging information, including the statistics for the possible moves this turn and an attribute that will keep track of the maximum depth searched in the selection phase of the playouts. Finally, we've added code that picks out the move with the highest win percentage out of the possible moves, and returns it.

But we are not quite finished yet. Currently, our AI is using pure randomness for its playouts. We need to implement UCB1 for positions where the legal plays are all in the stats tables, so the next trial play is based on that information.

# ...
from math import log, sqrt

class MonteCarlo(object):
    def __init__(self, board, **kwargs):
        # ...
        self.C = kwargs.get('C', 1.4)

    # ...

    def run_simulation(self):
        # A bit of an optimization here, so we have a local
        # variable lookup instead of an attribute access each loop.
        plays, wins = self.plays, self.wins

        visited_states = set()
        states_copy = self.states[:]
        state = states_copy[-1]
        player = self.board.current_player(state)

        expand = True
        for t in xrange(1, self.max_moves + 1):
            legal = self.board.legal_plays(states_copy)
            moves_states = [(p, self.board.next_state(state, p)) for p in legal]

            if all(plays.get((player, S)) for p, S in moves_states):
                # If we have stats on all of the legal moves here, use them.
                log_total = log(
                    sum(plays[(player, S)] for p, S in moves_states))
                value, move, state = max(
                    ((wins[(player, S)] / plays[(player, S)]) +
                     self.C * sqrt(log_total / plays[(player, S)]), p, S)
                    for p, S in moves_states
                )
            else:
                # Otherwise, just make an arbitrary decision.
                move, state = choice(moves_states)

            states_copy.append(state)

            # `player` here and below refers to the player
            # who moved into that particular state.
            if expand and (player, state) not in plays:
                expand = False
                plays[(player, state)] = 0
                wins[(player, state)] = 0
                if t > self.max_depth:
                    self.max_depth = t

            visited_states.add((player, state))

            player = self.board.current_player(state)
            winner = self.board.winner(states_copy)
            if winner:
                break

        for player, state in visited_states:
            if (player, state) not in plays:
                continue
            plays[(player, state)] += 1
            if player == winner:
                wins[(player, state)] += 1

The main addition here is the check to see if all of the results of the legal moves are in the plays dictionary. If they aren't available, it defaults to the original random choice. But if the statistics are all available, the move with the highest value according to the confidence interval formula is chosen. This formula adds together two parts. The first part is just the win ratio, but the second part is a term that grows slowly as a particular move remains neglected. Eventually, if a node with a poor win rate is neglected long enough, it will begin to be chosen again. This term can be tweaked using the configuration parameter C added to __init__ above. Larger values of C will encourage more exploration of the possibilities, and smaller values will cause the AI to prefer concentrating on known good moves. Also note that the self.max_depth attribute from the previous code block is now updated when a new node is added and its depth exceeds the previous self.max_depth.

So there we have it. If there are no mistakes, you should now have an AI that will make reasonable decisions for a variety of board games. I've left a suitable implementation of Board as an exercise for the reader, but one thing I've left out here is a way of actually allowing a user to play against the AI. A toy framework for this can be found at jbradberry/boardgame-socketserver and jbradberry/boardgame-socketplayer.

This version that we've just built uses light playouts. Next time, we'll explore improving our AI by using heavy playouts, by training some evaluation functions using machine learning techniques and hooking in the results.

UPDATE: The diagrams have been corrected to more accurately reflect the possible node values.

September 24, 2015 01:47 PM


Justin Mayer

Tacklebox for the Fish Shell

So you have installed the Fish shell. Now what?

Rather than manually port all of your Bash/Zsh configuration, you might first consider whether someone has already implemented your desired function in Fish. My Tacklebox and Tackle projects are designed to make it easier for you to utilize existing Fish shell plugins, modules, and shell prompt themes — as well as to have an easy way to add your own. You can read more about it in the Tacklebox announcement post.

Assuming you have followed the installation and usage docs and have installed Tacklebox and Tackle, you can now take advantage of some handy tools, as we’ll see below.

Keeping up-to-date

I often forget which aspects of my computing environment require manual updating, and even when I remember them, recalling the proper update command invocation for each is an additional burden. The up command simplifies this process considerably.

Use Vundle to manage Vim plugins? Update them via:

up vundle

Update Python packages via:

up python

Update Vundle, Python, Homebrew, Fish completions, and others via:

up all

Extract

Similar to the above, it’s often hard to remember which command invocations to use when extracting compressed archives. Now you can simply use one simple command for a wide variety of file types:

extract archive.tar.xz

If you have Pixz and a compatible version of Tar (see respective docs for more detail), there’s also a handy compress command for fast compression of files and directories:

compress your-file-or-directory

Python virtual environments

If you previously used virtualenvwrapper to interact with Python virtual environments via Bash, in Fish you should use the Virtualfish module, which is bundled with Tackle. To create a new project in ~/Projects/ and a new virtual environment in ~/Virtualenvs/:

vf project your-project-name

If you already have an existing project, you can create just the virtual environment via:

vf new your-virtual-env-name

When you want to work on your project and activate its virtual environment:

workon your-project-name

Pip

If, like me, you prefer to restrict Pip to virtual environments so as to avoid accidentally installing a Python package into your global site-packages directory, it can be cumbersome when you occasionally want to invoke Pip globally. You can use the gpip command to make this easier. For example, to see whether Pip, Setuptools, Wheel, or other global packages are outdated:

gpip list --outdated

Closing remarks

These are just a few of the things Tacklebox and Tackle can do to make your Fish shell a more productive place to be. If you have any enhancement ideas, please submit an issue or pull request on the relevant repository. Also, follow me on Twitter to be notified when new handy features are added to Tacklebox and Tackle.

September 24, 2015 07:00 AM


Django Weblog

Django 1.9 alpha 1 released

As part of the Django 1.9 release process, today we've released Django 1.9 alpha 1, a preview/testing package that represents the first stage in the 1.9 release cycle and an opportunity for you to try out the changes coming in Django 1.9.

Django 1.9 has a myriad of goodies which you can read about in the in-development 1.9 release notes.

This alpha milestone marks a complete feature freeze. The current release schedule calls for a beta release in about a month and a release candidate about a month from then. We'll only be able to keep this schedule if we get early and often testing from the community. Updates on the release schedule schedule are available on the django-developers mailing list

As with all alpha and beta packages, this is not for production use. But if you'd like to take some of the new features for a spin, or to help find and fix bugs (which should be reported to the issue tracker), you can grab a copy of the alpha package from our downloads page or on PyPI. As always, signed MD5, SHA1, and SHA256 checksums of the 1.9 alpha package are available.

The PGP key ID used for this release is Tim Graham: 1E8ABDC773EDE252.

September 24, 2015 12:38 AM

September 23, 2015


Reuven Lerner

Registration is open for my October Webinars (about regexps and technical training)

September has been busy with work and holidays, but I’m gearing up for an exciting and busy October. Among other things, I’m giving two (free) Webinars in that month, and you can already register for them:

The post Registration is open for my October Webinars (about regexps and technical training) appeared first on Lerner Consulting Blog.

September 23, 2015 10:26 PM


PyCon

Thank you to our Launch-Day sponsors

The new PyCon 2016 website is now live! The conference volunteers have worked hard to include all of the essential details about the schedule, venue, and hotels ahead of the Call for Proposals next week and the opening of Registration in mid-October.

Our launch-day sponsors this year — those organizations that have already pledged support toward keeping PyCon affordable for as wide a range of attendees as possible — are from a broad array of fields that illustrate just how widely Python is used in today’s world:

For more details, see the detailed sponsor descriptions on our Sponsors Page. We look forward to seeing every one of these sponsors at the conference.

In the meantime, the PyCon volunteer staff will be busy rolling out new information about the conference every week here on the site as well as on our social media accounts — so stay tuned!

Important Dates:

2015

2016

September 23, 2015 07:34 PM


Brian Okken

Test Classes: No OO experience required (PT005)

You do not need to understand object oriented programming to making use of test classes. They are a way of organizing code and they allow a granularity between module and function. This episode covers some of the confusion around using test classes, and a few benefits of using classes. Listen Now on iTunes, Stitcher, SoundCloud, […]

The post Test Classes: No OO experience required (PT005) appeared first on Python Testing.

September 23, 2015 06:52 PM


Caktus Consulting Group

PyCon 2016: Behind the Design

Having helped to design an award-winning event site for last year’s PyCon in Montreal, we are thrilled to collaborate again with the Python Software Foundation (PSF) on this year’s site for PyCon 2016.

PyCon 2016 will be held in Portland, OR and PSF knew they wanted to convey the distinctive mood and look of that city with the 2016 website. Working collaboritively with PSF, Designer Trevor Ray drew on everything from the unique architecture of the city’s craftsman-style bungalow houses and the surrounding mountainous landscape to the cool color scheme of the Pacific-Northwest region. The team developed a site that leads the user on a journey. As he or she scrolls, the user is brought further into the city, from the low, rolling, misty, forest-topped mountains on the outskirts, into the very heart of its neighborhoods.

Trevor also redesigned the PyCon logo for 2016, giving it a peaked shape, hand-lettered typography (a reference to the thriving craft community of Portland), and a classic, balanced layout. Ultimately, our team and PSF worked together to achieve a site that we hope is welcoming and functional.

We’re excited for this year’s PyCon, and we hope the site excites attendees as well. Only 249 days till PyCon!

September 23, 2015 06:30 PM