MickBlog: europython

Europython 2007 day 3

My quick notes for Europython day 3:

9:30 - 10:00 Managing a multilingual decentralised internet network

  • Health and safety agency
  • Distribute information via web
  • Lengthy publication workflow
  • Published to web, optionally to print, then translated using xliff
  • Mostly stored in plone
  • Wrote bulk upload tool
  • Use FCK, switching to edit on pro
  • Use ldap to manage workflow
  • plone workflows
  • xliff for translation
  • use zope catalog for search
  • thesaurus of 2400 terms, hierarchical, fully translated
  • publication product to manage printed pdf publications
  • upload pdf and metadata, title and indexes generated
  • syndication tool to add feeds and generate feeds
  • can subscribe to topics and receive emails
  • drag and drop bulk upload tool
  • small java applet
  • search and replace tool
  • 200k users per month
  • 50 plone portals
  • why open source?
    • publicly funded, public agency
    • previously spent 60K a year on proprietary software, license and maint alone
    • had to pay to modify anything
    • couldn't afford large amount of users due to licences
  • planning to release all software at beginning of 2008
  • working with OSOR (Open Source Observatory and Repository)

10:00 - 10:30 Technical issues of a multilingual decentralised internet network

  • Started in 2004 collecting requirements
  • plone 2.1
  • squid 2.5
  • elevateIT 1.0 - add ons creaeted for plone, especially hierarchical metadata
  • upgrading to plone 2.5/3.0, zope 2.9/2.10, squid 2.6
  • zeo redundant setup, replication
  • challenge communicating with x users from y departments and z countries
  • 50% of users never say anything, 40% need basic support and 10% quite eager
  • needed a system that can change quickly
  • need to be able to change requirements while working on system
  • 75% of requests arrive after prototype is done
  • lots of ui requests
  • issue of edit on pro license:
    • needed wai compliant editor
    • named user licenses, no idea who the icelandic webmaster is at the moment!
    • site license, everything hosted on one place, but users spread out over europe
    • per processor license, 2 zeo dual proc machines, 6 front ends, 2 squid servers = 12 procs, ouch
    • negotiated site license
  • bad to rely on folders to structure site, one piece of content visible in many places
  • also bad because several pieces of information might be produced by different teams working in different folders, can't give everyone access to every folder
  • bad to change content because it needs to be retranslated into 22 languages
  • use metadata to solve multiple folder issue
  • dublin core not sufficient
  • meta data keywords not the same as ones used by editors, some terms too technical for search engine indexes
  • thesaurus needs specialized metadata, e.g. term, theme, NACE code
  • each piece of metadata creates a special view
  • web manager vs marketing guy
    • web manager wants clear structured content
    • marketing wants flashier pages, with custom design for different sections
  • too much customization leads to hard upgrades
  • good metadata is hard, e.g. long titles breaks breadcrumbs onto multiple lines
  • migrating existing html hard, e.g. hard to find title, <title> tag not always used, neither is <h1>
  • metadata allows for alert system, subscribing to topics
  • originally thought it was ok to subscribe to places in site an search queries
  • user's don't need that flexibility, and overloaded servers
  • pdf product supports image generation, pdf property parsing, language detection, config file upload, xliff suppor, meta taggable chapter links
  • example of over complicated tool
  • need to support more esoteric practises
  • translation needs to fallback when not available
  • not everything translated, budget limits
  • can't use lingua plone
  • i18nattribute approach
  • object stores each translation in dict under language key
  • system can retrieve by langauge and fallback to original
  • content added in english and later translated
  • generate xliff from content and send them out
  • european translators use trados which supports xliff but they don't officially support this
  • people use html editors too
  • so export to html
  • frameworks use gettext, updates require system restart and managing po files problematic
  • 160K links to external docs stored
  • people work on these links
  • workflow commenting feature became major way of communication between content managers
  • commenting state available from every state, which always returns to former state immediately, allows users to add comments without changing state
  • initially political issue of each country having own portal
  • all portals there n times
  • skins loaded n times from cache
  • each portal has its own catalog, network search harder
  • put everything in one portal and use sub folders a good trick
  • e-tags, expiry date not good enough for caching
  • smart caching of authenticated connections using e-tags vital if lot of users logged in
  • catalog is bottleneck, put it own db with own cache
  • if have multiple sites, make sure to use 301 redirects or crawled multiple tmes
  • binary data stored externally
  • nagios for monitoring
  • can monitor load, conflict errors, used threads, memory usage, download times
  • http://osha.europa.eu/

11:00 - 11:30 Django Show and Tell

  • http://grono.net/
    • very big, popular social site
    • lots of features
    • pre magic removal
    • custom upload queue tool for files
    • jacob kaplan mosstool based on 2to3 being worked on to convert pre-magic to post-magic (apparently)
    • memcached
    • use bzr, sucks! Too slow :)
  • http://commune.ro/
    • just launched
    • classified ads site
    • all about locality
    • got lots of content from local papers

11:30 - 12:00 Measuring Web Services

  • Starts off talking about his trip to egypt :)

  • Working on super secret web app

  • web 2.0 architecture

  • Set of servers serving GET static content, 30 images, 3 css, 1 dtd, 1 html

  • Another set of servers with business logic, using json

  • UI in javascript, served from static files, makes calls to json servers

  • Use python for static server, apache overkill. < 200 lines of code

  • rpc server in python

  • simple framework for responding to json rpc requests

  • by writing own servers don't have logging and performance tools you'll get with apache

  • monitoring pyramid: profiling (smallest part), monitoring, logs (biggest part):

    • logging: big, after the fact files
    • monitoring: check server is working
    • profiling: what is this individual call doing?
  • business people really like logs, lots of business info in there

  • can estimate user population sizes, times of day, geographic locations

  • load and performance testing before launch important

  • graph latency 95% (ms) vs load (qps), define acceptable response time ms level and graph using that. load number used to define acceptable load level on machines

  • when go live get real data, load (qps) vs time

  • nice to know where to allocate resources, so see usage by country

  • logging:

    • every request gets unique number
    • time in ms resolution
    • remote ip
    • method call name and params
    • errors
    • value returned
  • example:

    System Message: WARNING/2 (<string>, line 168)

    Literal block expected; none found.

    1, 2007-07-15 16:08.431, 192.68.1.32, GetStockQoute, ("GOOG",) ... 1, 2007-07-15 16:09.561, (524.98, 10.87)

  • suggestion to log duration of requests, handy information

  • need traceback, sql queries, and other information in logs

  • monitoring process which relaunches server on crash

  • monitoring service logs some metrics of server as seen from outside

  • monitor flapping, server repeatedly going up and down

  • so if crashes > n times an hour set off pagers

  • monitor latency spikes, get 95% latency over last day, hour, 15 minutes, 5 minutes, minute

  • failure spikes, again day to minute

  • User vs system generated errors

  • Don't want StockDoesNotExist error triggering pagers, but do want KeyError

  • Derive from UserException and serialize, for others handle differently

  • Watch for UserException spikes, can be DOS

  • Watch for sudden dips in load compared to normal levels

  • don't panic though, could be the world cup

  • on servers have special rpc request which runs particular function in profiler on live servers

  • care about wall clock time, not cpu time

  • cpu time doesn't pick up on blocked i/o

  • use cProfile in 2.5 or profile

  • hotshot generates random data :)

  • have special decorators which control what's logged, e.g. suppress 3rd arg and just record size, ideal for spell checking for example

  • also don't want to record private information

12:00 - 12:30 Measuring Python Performance

  • Using particular code (pic) for exmaple
  • note that time.clock() and time.time() have different resolutions on different platforms
  • simple way to measure wall time
  • matplotlib for histogram
  • quantized time plot, multiples of smallest time size on machine
  • do standard deviation histogram, poisson not gaussian
  • plot how average changes over time and how the minimum changes
  • minimum more useful, since as get faster get closer to 0
  • should throw away outliers on both sides
  • timing rule: each timing run calls test funciton some "number" of times
  • switched from .get() to try/except, 3 times slower
  • write utility functions to run test code
  • using magic number of "1000" runs, better to calculate
  • calibrate function
  • guido asks if reinventing timeit module, andrew points out he is explainig how it works :)
  • how does string length affect performance?
  • try/except faster for larger strings
  • defaultdict is fastest approach btw
  • timeit not easy to use from interactive shell
  • using hotshot for profiling
  • pstats.Stats interface dates from 1994
  • cumbersome
  • hotshot2kcachegrind from mailing list
  • makes hotspots very obvious
  • for example shows stat on network file system very slow
  • zipimport neat trick to fix this
  • has profiler which generates kcachegrind files directly
  • dtrace looks like good new tool
  • slides and text on http://dalkescientific.com/writings/
  • also kcachegrind conversion tool for new profiler
  • could we rewrite kcachegrind in python?

14:00 - 15:00 Keynote by David Axmark

  • Long history, first database, unireg, created in 1982
  • Started with the idea of doing business with a freely available db
  • Started from day one with a commercial agenda
  • Worked "part time" for the first few years
  • Started no dedicated company for MySQL initially, used old companies, bad idea, causes lots of complications later
  • Worked on a "if you use this for more than 30 days we'd like you to pay" principle
  • When switched to GPL their income went down 80%
  • Developed to solve their problems
  • Developed for practical use
  • Stability was key, aimed for speed over features
  • Easy install rule, must install and use within 15 minutes
  • Documented everything, even if english was a bit ropey
  • To get used a you need both the feature and understanding
  • Documentation was more up to date than the code you downloaded, since it was built from latest source, while download was latest release
  • Initial top income was windows licenses
  • General principles:
    • Get code good as possible first time
    • modular architecture
    • repeatable bug reports got highest priority. really tried to get good reports as fast as possible, so responded very quickly, often with a diff directly back to submitter. Interesting point as to why open source is so good, get many more bug reports
    • fix the small things, detailed feedback is valuable
    • community made of many different groups, made mysql work with many as they could, so worked with as many languages as possible
    • hired experts regardless of location
  • How not to handle bugs:
    • tried to compile old pre mysql library with big compiler from big company in nw us
    • when compiling with high optimization string functions crashed
    • did bug report including c, generated assembly with arrows pointing to bugs
    • 3+ months later they worked on it, new minor release with 5 bugs fixed and wanted them to buy it. After wrangling with manager got free upgrade
    • a year later and a major release and all the bugs are back
    • never submitted another report
  • BTW, Monty's daughter is called My, it's not "my" sql, it's "me" sql :)
  • Supports lots of platforms, 64bit in 2000, clean code so was a recompile away
  • All code compiled from one tree, as all code written with portability in mind
  • Lots of storage engines, many customers have custom ones
  • Some interesting ones, like archive which is select and insert only, no delete or update, cluster which has nothing in memory, black hole which goes to /dev/null
  • Falcon backend which is web oriented, ACID and 64bit optimised
  • MyISAM++ has crash recovery, data warehouse, block oriented with data caching
  • Lots of commercial back ends from different companies, e.g. high compression or high volume logging
  • mysql cluster: fault tolerant, shared nothing, high availability, scalable, high performance, currently slow on joins, ndb/python binding to low level. Have to adapt db app design to it a bit.
  • primary keys are blazingly fast
  • mysql 5.0 has all the "normal" db features now
  • LOAD DATA extension which can do transformation and calculations on the fly at load time from text files
  • 5.1 adding partitioning, row based replication, full text indexing, replication of cluster, xml support, event system, logs now in tables
  • 6.0 adding multiple source replication with conflict handling, backup api, new storaage engines
  • Anything python needs?
    • Issue of how foreign key constraints are handled in django test fixtures, strays from sql specs. Being fixed.
  • Roadmap: more plugins, alter table, query profiling, stored procedure debugging, sql standards,
  • Shifting to much more public development, developers are on public irc now, internal mailing lists opened up
  • Spend a lot of time fighting software patents
  • questions:
    • what does postgresql have that's missing and missed the most. Used to be easy, much harder now. DDL (?) transactions
    • feature mysql has that people don't know about. Used to be transactions, probably the different storage engines
    • mysql has lots of tiny deviations from the sql standards, annoying. Working on it, lots of things improved in mysql 5.0. Deliberately stray from standard in places, since it's broken. Should be info on this somewhere on site.
    • Have any patents? Got some when acquired a company. Think it might be useful to pool patents in open source community if it was legally possible.
    • What's the key reason for mysql's succcess? Partially technology and practical approach to using technology, listened to people's needs. Made it very easy to get going. Started designing it for the web, so good timing and grew with the web.
    • Why no mysql web admin kit from mysql? Phpmyadmin good so don't want to compete. When decided to do a desktop gui there was lack of any good one.
    • What are current gains and losses, and need more finances? It varies, make lots some times, so about break even.

15:15 - 15:45 Future of EuroPython Plenary Session

  • Previous europythons had a very large number people helping to organise
  • First couple of years had lots of people from local area organising, last year had only one person
  • This year had only one person again
  • Need to find out why people aren't volunteering
  • Lots of roles needed for next europython
  • Document will be uploaded to site
  • Anyone who wants to put on europython 2009 you must help organise 2008
  • Big issues with registration system
  • pycon uk and europython should join forces
  • big concern that it might eat too much of a volunteer's time
  • europython volunteers list which should be low volume
  • announcing volunteer roles only, no conversations
  • one trick other conferences do is invite roadies who get discounts
  • one observation from pycon uk is people get obsessed with technology of site
  • http://zookeeper.org/ (?) is conference system
  • one previous trick was maintaining a list of people doing cool stuff people wanted talks on and pester them for talks
  • zope folks did this and pestered zope people, worked well
  • need more recording of talks so can hold more in parallel
  • one reason for holding in vilnius was to encourage people from eastern europe. didn't happen so much this year due to lack of contacts, but much better next year

16:00 - 17:45 Lightning Talks

  • No detailed notes, I had to leave 1/2 way for my plane and I was the person holding up the one minute left page :)

  • I can remember a couple of lightning talks

  • Martian is part of grok, it configures your app by searching modules for certain patterns. E.g. instead of:

    class MyClass(SomeObj):
        ...
    registry.register(MyClass)
    

    You can just do:

    class MyClass(SomeObj):
        ....
    

    And you configure a grokker which searches for certain patterns and configures your application. This has lots of advantages: less typing, less error prone and deferes configuration to after import time.

  • grok announcement (complete with gravely film trailer voice, sound effects and music). http://grok.zope.org/

  • pydoctor, a python api documentation tool, features AST parsing instead of module importing, and has a web server mode which lets you edit docstrings. http://codespeak.net/~mwh/pydoctor/

  • an codec hack which lets you use curly braces and semi-colons instead of whitespace in python (boo). http://timhatch.com/projects/pybraces/

  • More pypy stuff, including scheme, javascript and prolog interpreters

Europython 2007 day 2

My quick notes for Europython so far:

9:00 - 9:30 RPython: Need for speed aka C and C# considered harmful

  • RPython overview
  • Subset of python
  • Side effect of pypy development
  • Wanted to use python as much as possible
  • Statically typed
  • No dynamic abilities like adding methods
  • Java like objects
  • Only __init__ and __del__ supported
  • No introspection
  • Test and debug on CPython
  • Type inference
  • Few existing python modules work, due to more dynamic features being used
  • More static version of open() used
  • Can't compile regexes, but can use compiled objects
  • Not intended to be a general purpose framework
  • Compilation can give strange errors
  • However it's much faster than python, only reason you'd use it :)
  • Can write extension modules for CPython in RPython
  • Still incomplete
  • CLI backend produces assemblies
  • Fast as C#
  • 4 - 100 times faster than ironpython
  • Experimental access to .net libraries
  • CarbonPython highly experimental
  • Produces a .net dll
  • Usable by other .net languages
  • Iron + Carbon = steel
  • Need to define multiple entry points
  • Demo showing ironpython running code in 10 secs and carbonpython's produced dll running in 0.7 secs
  • JVM similar to cli backend
  • Shares code with cli backend
  • Not as mature
  • Only produces executables at the moment
  • JS backend can produce complex code
  • Can do ajax
  • Can bind to libraries, e.g. mochikit
  • Example of terminal running on server
  • Rpython has a lot of rough edges
  • Harder to use than python
  • More convenient than other languages like C
  • Much faster than CPython

9:30 - 10:00 How our Python trading platform got 40 times faster by switching to RPython

  • EWT is a trading company, mix of traders (football players) and nerds. The traders are very noisy
  • Based in LA. Apparently have a palm obsession
  • Get 6am calls due to NY opening hours
  • Co located servers with exchanges
  • 10^5 market updates a sec
  • can place 2500 orders a sec
  • Big python daemon with twisted services
  • C modules
  • Wanted easier way to maintain fast code
  • Example of a binary tree translation
  • No __iter__ so need to use a while loop
  • No __str__
  • Rpython can be seen as a compiler for python or an interpreter for C
  • Using rctypes to interface with external libs
  • rctypes is ctypes
  • Can generate ctypes code from c headers
  • Rpython has confusing errors, usually have to comment out code to isolate problem.
  • Debug with gdb
  • Generally quite easy to follow in gdb
  • Generated code can be scary
  • Rpython has no no special methods, lack of builtins, lack of modules, no long, no list sort, and more.
  • Can be frustrating to convert python code into rpython
  • Using rctypes can interface to python c interface and embed interpreter
  • Can generate code on the fly before compilation
  • Definitely hit a wall when using rpython
  • Starting rpython from scratch easier than porting python code
  • Runtime segfaults

10:00 - 10:30 Twisted and Zope in real time monitoring for oil and gas industries

  • A company working on wireless monitoring had some new hardware but no software
  • Oil wells need regular adjustment.
  • Many wells have dials which are read manually every day
  • Traditional telemetry is very risky near oil wells, source of sparks
  • New approach uses low power wireless network
  • Short range but long enough to go to equipment outside of danger area
  • Sensors talk to control mast which relays via TCP and telnet
  • Had to choose a comms option
  • Also wanted an rpc server
  • Used LineReceiver for protocol
  • Very simple line based protocol
  • Demo

11:00 - 11:30 Taking advantage of multiple CPUs for games - simply.

  • Making threading simple
  • threadmap
  • Using threads with pygame
  • Provide a few simple ways to use threads
  • For games multiple process don't work well, need shared memory
  • Different OS's and versions have different threading characteristics
  • Is it worth using multiple cpus? Generally optimize for slower (single cpu) machines first, use multiple cpus as a late stage.
  • GIL is good for preventing a lot of thread nastiness. Pattern used in other projects too, but generally evolve to more fine grained locking.
  • Atomic ints are very platform specific and vary greatly. Not really available to python code outside of C
  • SDL putting together portable atomic int library
  • Python and SDL threading models quite similar, both provide portable threading.
  • Pygame and SDL are already threaded. e.g. sound thread
  • Need to choose which parts of the game to thread
  • Generally similar performance bottlenecks, e.g. drawing images
  • Would generally thread image drawing to screen, bit, flip and update
  • Want to try and keep threads separate from each other, reduces locking issues
  • Need to look at the source to figure out which pygame methods are thread safe, docs being updated with this information

11:30 - 12:00 Python in a large commercial application

  • Works for tideway
  • Used to work in AT&T, lots of cool research
  • Maintainer of omniORB
  • Tideway maps and enumerates servers and services in an organization
  • Maps tens of thousands of servers
  • Maps connections and dependencies between servers
  • Can map virtual machines
  • Shows things like packages installed, network interfaces, services
  • Tracks history of changes and infrastructure
  • Infrastructure of services
  • Reasoning service figures out what to do, event coordination
  • Discovery service, talks to machines
  • Data store
  • Knowledge add on knows about software and hardware, e.g. EOL dates
  • Data store is an object graph storage, SQLish query language, full text index
  • 305K lines of python, includes 65K testing
  • 11K of java, used for things like the graphing, tom sawyer
  • 10K of C++
  • 11K of home grown languages Reasoning, compiles to python
  • Use apache, webware, tomcat (java), tom sawyer (java), berkely db, nmap, ply, pexpect, pyxml, pycrypto, pysnmp, python-ldap, omniORB, JacORB (java)
  • Agile methodology, use http://www.rallydev.com/
  • Lots of tests, unit and QA acceptance tests
  • Code review
  • Customers shouldn't care but think it's not a real language
  • Python used initially as proof of concept, never changed :)
  • Good:
    • Maintainable
    • Dynamic typic hasn't caused bugs
    • Performs fine
    • Dynamic nature is very helpful
    • Powerful features
  • Bad:
    • Perception, not java or c#
    • Hiring is harder
    • Integration: java apis, web services (WSDL, SOAP)
    • Web framework of the day, moving to java spring mvc for future ui work
    • Performance, migrating bits to C++
    • Documentation
    • Doesn't always suit all developer's brains

12:00 - 12:30 Pyweek: Making games in 7 days

  • Alternative title: the magical power of deadlines!
  • Most people in room had tried to make a game, very few had finished
  • Forces you to finish it
  • 1 week, themed, from scratch, published libraries ok, free or new assets, no prizes, peer reviewed
  • Showing games, e.g. cow abducting one
  • Takes place every 6 months
  • Generally code through the weekend, sleep, work in the week and code through the second weekend
  • Starts at 00:00 GMT for everyone
  • Their first project was stim, the incredible machine on steam
  • Was very hard to make, and very hard to play. Too ambitious
  • Lessons learned: production values pays off, complexity kills
  • Their team won, Typus Pocus
  • Aimed for simplicity, high production values, playable from day one, aimed at aunt
  • Found stortroopers.com which has good free art for people
  • pyweek 5 in August

16:00 - 17:00 Lightning Talks

  • MyEuroPython

  • Death to zope instances

    • Zope 2 used plugins
    • Zope 3 is now a framework
    • code uses zope
    • write, add dependencies, deploy
    • make-zope-app and deploy-zope-app
    • demo
    • Uses paste
  • Python component architecture

    • Extend by adaptation not inheritence
    • Hypothetical architecture
    • Interfaces to define, registry to register, utilities to use, adapters to extend
    • Example Divider
    • Modulo adapter
  • Chandler i18n overview

    • What is chandler
    • Some key stuff missing in python, e.g. localised sorting
    • Unicode and date support
  • lolpython

  • Mobile Web Server

    • web server with a globally accessible url

    • A phone

    • apache and mod_python

    • internet

    • browser

    • can't access phone via network

    • use a gateway

    • *.mobilesite.net - free

      System Message: WARNING/2 (<string>, line 232); backlink

      Inline emphasis start-string without end-string.

    • Takes a picture from the browser :)

    • Send messages, browse appointments

  • z3c.formjs demo

    • framework built on z3c.form which provides hooks into form code to add javascript and ajax
    • http://demo.carduner.net/
    • javascript framework agnostic
    • chat room demo
  • pycon uk

    • aims to be cheap and accessible to all
    • usual talks and events
    • tutorials, beginner and advanced
    • sprints, 2 days before and 2 days after
    • early bird rate until end of july
    • accepting talks
    • submit by 31st of august to make it to conf cd
    • http://www.pyconuk.org/
  • Humanized Enso

  • egee grid

  • Show me bazaar

    • bazaar workshop @ 11:30am in Zeta ? room
    • ask any canonical person about bazaar too
    • django hiring django developers

17:15 - 18:15 Guido's Keynote

  • Python 3000
  • Fixing early design mistakes, first incompatible version
  • p3yk and py3k-struni branches
  • August 2007 3.0 alpha 1, 3.0 in august 2008
  • 2.6 alpha 1 December 2007, final in june 2008
  • After 3.0a1 big reorg of standard library will happen
  • 3.0 backwards incompatible
  • 2.6 fully backwards compatible. Py3k warnings mode, and many features backported using __future__
  • 2to3 source conversion tool
  • only looks at sytax, no datafow analysis or type inference
  • e.g. can't spot a user object with a keys method when doing d.keys() to list(d.keys()), will do anyway
  • General plan for porting:
    1. unit tests
    2. run under 2.6 with py3k warnings
    3. use 2to3, don't hand edit output
    4. test under 3.0
    5. fix problems and rerun 2to3
    6. release 2.6 an 3.0 versions
  • If got with converting 2to3 from 2.6 branch to 3.0 branch approach can easily support both versions at once
  • Start using newer features now, new style classes, sorted(), xrange(), int//int, relative import, new exception hierarchy, segregate unicode processing into a separate module
  • Don't try to write source level compatibility
  • Go to 2.6 first, not and < 2.6 to 3.0
  • New features:
  • utf-8 is default for source
  • can use unicode in identifiers
  • standard library remains ascii
  • still need to figure out normalizatio, which alphabets are supported and support for right to left
  • main reason for supporting all this is to aid non-native english people learning python
  • unicode strings
  • java models, all strings unicode and a bytes type
  • need to specify encoding to go between them
  • dropping u"" prefix
  • .encode() always goes from str to bytes, .decode() always from bytes to str
  • base64, rot13, bz2 "codecs" dropped
  • bytes type
  • mutable array of small ints
  • implemented using unsigned char[]
  • New i/o library
  • stackable components inspired by perl and java
  • low level unbuffered bytes io, platform specific
  • middle level buffer
  • top level unicode encoding/decoding
  • compatible api
  • open(filename) returns buffered text file
  • open(filenme, "b") returns buffered binary file
  • print is a function
  • automatic translation is 98% correct
  • string formatting
  • now use "{0} {foo}" placeholders, positional and dict lookup
  • "This is {0} {name}".format("a", name="string")
  • {0.foo}, {0[name]}, {0:8}
  • All classes are now new style, class A: and class A(object) both new style
  • classes can be decorated
  • signature annotations
  • def foo(a: "something", b: range(10)) -> 42: ...
  • no assigned meaning, you can inspect foo.func_annotations
  • e.g. could write a type checking decorator
  • new metaclass syntax, class C(B1, B2, metaclass=MC): ...
  • __metaclass__ gone, keywords passed to MC.__new__()
  • classes support function call syntax, , * and keywords
  • MC.__prepare__ returns namespace dict for class body execution
  • issubclass and isinstance use __subclasscheck__ and __instancecheck__
  • used for virtual inheritence in ABCs
  • Abstrace Base Classes, abc.py
  • Voluntary base classes, e.g. Iterable
  • mix ins, e.g. DictMixin, provides abstract methods you must override, as well as mix in concrete helper methods
  • can register virtual subclasses, e.g. A.register(C), issubclass(C, A) -> True, C isn't modified, but must implement A's abstract methods
  • standard ABCs: e.g. Hashable, Iterable, Container, Mapping, IOBase, Number, etc
  • collections.py, io.py and numbers.py
  • exceptions must derive from BaseException now
  • must be instanciated
  • except E as v
  • the new variable contains __traceback__, __cause__ and __context__, when leaving except: block it will be deleted to save memory
  • exceptions are chained, if one is raised inside another can trace back through them
  • int and long are now just int which is a long
  • int division returns a float, 1/2 -> 0.5
  • 0o777 is now octal instead of 0777
  • 0b10010 is new binary notation
  • Move to iterables/iterators instead of lists
  • range is like xrange
  • zip, map, filter return iterators
  • dictionary views inspired by java collections
  • iterkeys, iteritems and itervalues gone
  • dict.keys, .items and .values return dict views
  • lightweight object which can be iterated repeatedly, keys and items have set semantics, values has collections semantics (iter and len support)
  • set will change if underlying dictionary changes
  • default comparison changed, didn't make sense, depended on type names and memory addresses, e.g. [1,2,""] now raises TypeError
  • default <, >, <=, >= raise TypeError
  • nonlocal keyword lets you assign variables in outer scope
  • new super() call, super() equivalent to super(ThisClass, self)
  • set literals, {1,2,3}
  • no empty set literal though
  • set comprehension, {f(x) for x in S in P(x)}
  • lots of little things, .next() -> __next__(), f.func_code -> f.__code__, reduce is dead, lambda lives
  • C api will probably change
  • probably changes to do with unicode and bytes
  • will definitely need recompile
  • trying not to change C APIs, adding or deleting APIs. Don't want to change semantics of existing code
  • last slide list of arguments why reduce is going :)

Questions:

  • str.translate using unicode.translate semantics (a dict)
  • no print method on file object, would force everyone to implement
  • regexes on byte strings? yes
  • only a couple of people think they will stick with 2.6 for years
  • quite a few people want to switch to 3.0 and drop 2.6 support
  • tkinter will be unicode friendly, tk is
  • will performance be impacted by using unicode strings? Too early to tell
  • question of newline translation, probably should open in binary and handle manually
  • also spotted that unicode.translate uses a table of ordinals
  • what will the translation tool not be able to help with? One big area is string literals to contain binary data
  • any plans to port array implementation like numpy to stdlib? No due to incompatible release cycles. For numpy there is a part will make it in, a redesign of the buffer api. Will be able to describe structure of binary blobs

System Message: WARNING/2 (<string>, line 396)

Bullet list ends without a blank line; unexpected unindent.
(PEP 3118)

System Message: WARNING/2 (<string>, line 397)

Block quote ends without a blank line; unexpected unindent.
  • getting rid of xml? Would like to see solution to xml plus problem. Lot of cruft needs to be removed from std lib anyway
    • Want to be more consistent in module and package naming, e.g. StringIO.StringIO vs StringIO. Funnily, solved by moving into io library. Will also grow a ByteIO companion.
    • Modules in the same category, e.g. db related, will go into a package.
    • Brett would like help on this
  • Will generic functions be in 3.0? Currently very much up in the air. Phillip Eby hasn't had time to revise PEP to make it easier to understand. Would like either more examples or smaller subset
  • Machinery for adaptation using ABCs? No, none. Makes it easy to write your own. A good implementation would probably make it in.

Europython 2007 Day 1

Europython is upon us and I'm sitting here enjoying hotel wifi and coffee in (fairly) sunny Vilnius.

So far it's quite different to PyCon. Apart from an almost completely different set of people, I actually got to spend time chatting with Guido over a beer. Fanboy alert! ;)

Anyway, here's some notes I took on some of the talks I've sat through so far. You can view other folk's coverage of europython:

10:00 - 10:30 Pythonic Interfaces

  • Talking about class interfaces
  • Enforces interface
  • Looked at other interface libraries
    • zope.interface, pep 245, cookbook, pyprotocols
    • poorly documented and lack of examples
  • "interface" package
  • Abstract base classes
  • Very java inspired
  • PEP 3119 very similar
  • Some performance penalty, caches classes for speed
  • http://www.mikeware.com/
  • Compared to good test suite coverage?
  • Not released yet
  • Writing a PEP, will post library later?!?

11:00 - 11:30 Case Study of a Pylons Project

  • Overview of pylons and how it worked out
  • http://www.developers.org.ua/
  • Switched from php to python
  • supervisord, cherrypy, paste, pylons, postgres and mysql, apache + php for legacy stuff
  • Switched from myghty to mako
  • Took a while to figure out how to setup sqlalchemy stuff for 3 dbs in pylons
  • Use routes, minor addons to BaseController to handle i18n, db and validation errors
  • gettext and unicode for i18n
  • mako and paste encode correctly
  • authentication
    • tried authentication with authkit with no success
    • next tried paste.auth but had problems too
    • fell back to wordpress for authentication
  • forms
    • tried toscawidgets but gave up
    • used htmlfill and formencode
    • wrote validation code: validate and error decorators
  • paste.fixture for testing
  • fiddly setting up wsgi stack by hand for testing
  • deployment
    • using mod_proxy and pylons process
    • supervisor2 to monitor
    • runs staging and production copies
  • good:
    • pylons approach to project structure
    • routes, sqlalchemy, mako
    • i18n
    • wsgi
  • bad
    • documentation, had to use source
    • lack of features
    • deployment
  • pylon's is a hacker's framework
  • slow start to pylons but paying off
  • approx 2000 visitors a day

12:00 - 12:30 Tux Droid, a python-fueled robot

  • Robot controlled wirelessly from PC
  • USB dongle
  • remote control
  • buttons on head, wings
  • wing motors
  • base spinny motor
  • blinking eyes
  • microphone
  • beak motors
  • lighting eyes
  • IR receiver (for remote)
  • light sensor
  • IR LED to remote control stuff from tux
  • audio in/out
  • i2c connector
  • volume control
  • can store sounds in tux
  • 8GB flash, 512MB ram (?)
  • daemon exposes tux via tcp to application code
  • can play with tux in python shell
  • refactoring api to be more pythonic
  • fully open source, schematics, firmware
  • http://www.tuxisalive.com/

14:00 - 15:00 PyPy 1.0

  • Overview of what pypy is.

  • "I've worked on 3 implementations of new style classes, one for jython, one for python and one for pypy. Hopefully that is my last"

  • 1.0 contains a fully compliant python interpreter, a tool chain which produces C, LLVM and .net python interpreters, JIT, optimizations, CLR friendly backend for PyPy.net and taint and proxy object spaces.

  • Can mix and match features in produced interpreters

  • Needs polishing and missing some important extension modules

  • Special language features and rpython already useful

  • Performance is getting quite good, approximately 1.2x to 3x slower on average. In one case 6x slower.

  • Working on GC performance

  • JIT true trump card

  • JIT huge investment in resources, hard to evolve

  • Created a JIt generation framework (timeshifter)

  • Partial evaluation techniques to generate dynamic compiler

  • Psyco inspired

  • Of course hard to implement

  • Types in a dynamic language are a problem

  • e.g. know an operation is an addition, but of what? Makes constant folding hard

  • Solution is to "promote" run time interpreter parts to compiler parts, get information from the evaluation

  • Use other tricks like lazy allocation of objects

  • Dynamic generation process language agnostic

  • E.g. someone wrote a prolog interpreter in rpython, so get JIT for prolog

  • ia32 and ppc backengs

  • Int arithmetic optimized

  • in speed range of gcc -o0

  • In some cases, e.g. demo 63x faster than CPython!

  • "If you need exactly this example..."

  • Implemented as a transformation of the low level control flow graphs, use graph "colouring" to analyse. Timeshifting, turning run time graph into compile time graph

  • "green" - compile time value

  • "red" - run time value

  • see graph pics

  • Specialize function for given values of green values, e.g. f(x, y) becomes f_3(y) when x = 3

  • Need to handle conditional cases, .e.g code conditional on red code.

  • Also need to handle merges, where can reuse code

  • Need to catch loops

  • Can manually hint promotion:

    def f(x,y):
      x1 = hint(x, promote=True)
      return x1 * x1 + y * y
    

    First generates:

    def f_(x, y):
      switch x:
        pass
      default:
        compile_more(value=x)
    

    After called with 3:

    def f_(x, y):
      switch x:
        case 3:
          return 9 + y * y
      default:
        compile_more(value=x)
    

    Obviously works well with repeated calls with same values

15:00 - 15:30 PyPy Python Interpreter(s) Features

  • Adding new features to the interpreter

  • Integrating into different environments, e.g. clr

  • Handling calls to and from the target environment

  • Features are independent of backend, e.g. security tainting of objects

  • Transparent proxies, e.g. distribution or peristence

  • Demo of rope, behaves like a string :)

  • million char string constructs faster

  • transparent proxy of list shared between processes

  • frames shared transparently

  • e.g. remote_open("/etc/passwd").read()

  • can attach pdb to traceback of remote object

  • orthogonal persistence

  • taint object space allows you to control where objects get accessed. E.g. prevents sensitive object getting logged:

    >>> x = 3
    >>> x
    3
    >>> taint(x)
    >>> x
    ...exception...
    >>> l = [x]
    >>> l
    ...exception...
    
  • tainted object can't cross i/o barrier until untainted. But can still fully manipulate.

  • stackless

  • getting faster and faster

  • If you're not too dependent on extension modules pypy is getting very close to being able to run your application as is

  • OS threads don't work very well

16:00 - 16:30 z3c.dav – an implementation of WebDAV for Zope3

  • z3c.dav
  • Implementation of dav related RFCs.
  • DAV is built on HTTP to add content authoring operations
  • Adds features like locking and a data model
  • New HTTP verbs and responses
  • Defined new dav related interfaces and utilities in zope3
  • For each IDAVProperty dav prperty looks up utilities, looks up widget to render it, and finally adapter to get the values
  • IDAVWidgets control rendering of the property
  • z3c.dav defines basic types and interfaces
  • locking implemented via IDAVLockManager
  • Uses zope.locking
  • HTTP if-match and if-not-match headers implemented to allow clients to allow clients to access locked objects with the correct tokens
  • COPY and MOVE implemented using zope.copypasteremove
  • New status codes implemented
  • Currently working to get "litmus tests" working against dav implementation.
  • Need to reimplement dublin core adapters
  • Should be possible to backport to zope 2.x
  • Released on cheeseshop

16:30 - 17:00 unittest is Broken

  • Can't add extensions in a modular manner, can't distribute them
  • Different parts are separate (loading, running, reporting, etc) but very coupled.
  • New framework, test_harness and compose a runner from different parts
  • Bunch of extensions written already, e.g. todo, xml output, skipping, separate interpreter per test, etc
  • division of labour, different steps in test running process, each doing a discrete part
  • Why not one of the other frameworks?
  • They don't do what Collin needed, in particular ability to easily extend
  • http://oakwinter.com/code/test_harness/
  • Question of comparison with trial, how to handle deferreds? Should be possible
  • Backwards compatible with unittest
  • Don't need to organise into classes
  • Doesn't have the same test discovery sophistication of py.test or nose yet
  • Holger suggested an open space shoot out between the frameworks

17:00 - 17:30 A practical example of Test Driven Development for a GUI using wxPython

  • gui code very error prone
  • gui libraries are not unit testable
  • Use mock objects to simulate gui code
  • tools to perform acceptance testing of gui available
  • Using pmock (http://pmock.sf.net/)
  • using pywinauto for acceptance tests
  • working on sample data entry application
  • mocking view and model in tests
  • Goes through of writing test first, then code and makes test pass
  • Changes a view to a mock object
  • Sets expected call on the mock object
  • Sets return value on mock object
  • Adds and refactors tests

17:45 - 18:45 Keynote by Simon Willison

  • Talking about OpenID
  • What is OpenID
  • Covers common questions
  • JanRain provides libraries for sites
  • idproxy.net proxies yahoo! accounts, could also do google and others
  • openid.net - developer oriented
  • openidenabled.com - general site
  • vapourware announcement - want to make pylons and django request/response objects compatible
  • openid aggregators, pull info from all openid providers