Using time zones is generally considered to be one of the nastier parts of handling time series. In particular, daylight saving time (DST) transitions are a common source of complications. In this article, I’ll walk you through how to handle the time zone with Python.
Many time series users choose to work with Coordinated Universal Time or UTC series, which is the successor to Greenwich Mean Time and is the current international standard.
Time zone is expressed as offsets from UTC; for example, New York is four hours behind UTC during daylight saving time and 5 hours the rest of the year.
Introduction to Time Zone
A time zone represents a region on Earth that uses a standard time. They are often based on country borders or lines of longitude. Greenwich Mean Time (GMT) is the average solar time at the Royal Observatory in Greenwich, London, considered to be at zero degrees longitude.
Although GMT and Coordinated Universal Time (UTC) essentially reflect the same time, GMT is a time zone, while UTC is a time standard used as the basis for civil time and time zones around the world.
Although GMT was once a time standard, it is now primarily used as the time zone for some countries in Africa and Western Europe. UTC, which is based on very precise atomic clocks and the rotation of the Earth, is the new standard today.
Handling Time Zones with Python
In Python, time zone information comes from the third-party pytz library, which exposes the Olson database, a compilation of information about global time zones.
This is especially important for historical data, as DST transition dates (and even UTC offsets) have been changed several times according to the whims of local governments. In the United States, DST transition times have been changed several times since 1900.
For detailed information about the pytz library, you will need to consult the documentation for that library. As far as this book goes, pandas encapsulate the functionality of pytz so that you can ignore its API outside of time zone names. Time zone names can be found interactively and in the documentation:
import pytz pytz.common_timezones[-5:]
['US/Eastern', 'US/Hawaii', 'US/Mountain', 'US/Pacific', 'UTC']
To get a time zone object from pytz, use pytz.timezone:
tz = pytz.timezone('US/Eastern') tz
'US/Eastern' EST-1 day, 19:00:00 STD>
Panda methods will accept time zone names or these objects. I recommend using just the names.
Localization and Time Zone Conversion with Python
By default, pandas time series are time zone naive. Consider the following time series:
rng = pd.date_range('3/9/2012 9:30', periods=6, freq='D') ts = Series(np.random.randn(len(rng)), index=rng)
The index’s tz field is None:
Date ranges can be generated with a time zone set:
pd.date_range('3/9/2012 9:30', periods=10, freq='D', tz='UTC')
[2012-03-09 09:30:00, …, 2012-03-18 09:30:00]
Length: 10, Freq: D, Timezone: UTC
Conversion from naive to localized is handled by the tz_localize method:
ts_utc = ts.tz_localize('UTC') ts_utc
2012-03-09 09:30:00+00:00 0.414615 2012-03-10 09:30:00+00:00 0.427185 2012-03-11 09:30:00+00:00 1.172557 2012-03-12 09:30:00+00:00 -0.351572 2012-03-13 09:30:00+00:00 1.454593 2012-03-14 09:30:00+00:00 2.043319 Freq: D
[2012-03-09 09:30:00, …, 2012-03-14 09:30:00] Length: 6, Freq: D, Timezone: UTC
Once a time series has been located in a particular time zone, it can be converted to another time zone using tz_convert:
2012-03-09 04:30:00-05:00 0.414615 2012-03-10 04:30:00-05:00 0.427185 2012-03-11 05:30:00-04:00 1.172557 2012-03-12 05:30:00-04:00 -0.351572 2012-03-13 05:30:00-04:00 1.454593 2012-03-14 05:30:00-04:00 2.043319 Freq: D
In the case of the above time series, which overlaps a DST transition in the US / East time zone, we could locate to EST and convert, for example, UTC or Berlin:
ts_eastern = ts.tz_localize('US/Eastern') ts_eastern.tz_convert('UTC')
2012-03-09 14:30:00+00:00 0.414615 2012-03-10 14:30:00+00:00 0.427185 2012-03-11 13:30:00+00:00 1.172557 2012-03-12 13:30:00+00:00 -0.351572 2012-03-13 13:30:00+00:00 1.454593 2012-03-14 13:30:00+00:00 2.043319 Freq: D
Locating naive timestamps also checks for ambiguous or nonexistent times around daylight saving time transitions. Hope you liked this article on time zone management with Python. Please feel free to ask your valuable questions in the comments section below.