How to use Python dataclasses
ake advantage of Python dataclasses to make your Python classes less verbose and more powerful at the same time
Everything in Python is an object, or so the saying goes. If you want to create your own custom objects, with their own properties and methods, you use Python’s class
object to make that happen. But creating classes in Python sometimes means writing loads of repetitive, boilerplate code to set up the class instance from the parameters passed to it or to create common functions like comparison operators.
Dataclasses, introduced in Python 3.7 (and backported to Python 3.6), provide a handy way to make classes less verbose. Many of the common things you do in a class, like instantiating properties from the arguments passed to the class, can be reduced to a few basic instructions.
Python dataclass example
Here is a simple example of a conventional class in Python:
[ Also on InfoWorld: What is Python? Powerful, intuitive programming ]
class Book:
'''Object for tracking physical books in a collection.'''
def __init__(self, name: str, weight: float, shelf_id:int = 0):
self.name = name
self.weight = weight # in grams, for calculating shipping
self.shelf_id = shelf_id
def __repr__(self):
return(f"Book(name={self.name!r},
weight={self.weight!r}, shelf_id={self.shelf_id!r})")
The biggest headache here is the way each of the arguments passed to __init__
has to be copied to the object’s properties. This isn’t so bad if you’re only dealing with Book
, but what if you have to deal with Bookshelf
, Library
, Warehouse
, and so on? Plus, the more code you have to type by hand, the greater the chances you’ll make a mistake.
Here is the same Python class, implemented as a Python dataclass:
from dataclasses import dataclass @dataclass class Book: '''Object for tracking physical books in a collection.''' name: str weight: float shelf_id: int = 0
When you specify properties, called fields, in a dataclass, @dataclass
automatically generates all of the code needed to initialize them. It also preserves the type information for each property, so if you use a code linter like mypy
, it will ensure that you’re supplying the right kinds of variables to the class constructor.
Another thing @dataclass
does behind the scenes is automatically create code for a number of common dunder methods in the class. In the conventional class above, we had to create our own __repr__
. In the dataclass, this is unnecessary; @dataclass
generates the __repr__
for you.
Once a dataclass is created it is functionally identical to a regular class. There is no performance penalty for using a dataclass, save for the minimal overhead of the decorator when declaring the class definition.
Customize Python dataclass fields with the field
function
The default way dataclasses work should be okay for the majority of use cases. Sometimes, though, you need to fine-tune how the fields in your dataclass are initialized. To do this, you can use the field
function.
from dataclasses import dataclass, field from typing import List @dataclass class Book: '''Object for tracking physical books in a collection.''' name: str condition: str = field(compare=False) weight: float = field(default=0.0, repr=False) shelf_id: int = 0 chapters: List[str] = field(default_factory=list)
When you set a default value to an instance of field
, it changes how the field is set up depending on what parameters you give field
. These are the most commonly used options for field
(there are others):
default
: Sets the default value for the field. You need to usedefault
if you a) usefield
to change any other parameters for the field, and b) you want to set a default value on the field on top of that. In this case we usedefault
to setweight
to0.0
.default_factory
: Provides the name of a function, which takes no parameters, that returns some object to serve as the default value for the field. In this case, we wantchapters
to be an empty list.repr
: By default (True
), controls if the field in question shows up in the automatically generated__repr__
for the dataclass. In this case we don’t want the book’s weight shown in the__repr__
, so we userepr=False
to omit it.compare
: By default (True
), includes the field in the comparison methods automatically generated for the dataclass. Here, we don’t wantcondition
to be used as part of the comparison for two books, so we setcompare=
False
.
Note that we have had to adjust the order of the fields so that the non-default fields come first.
Use __post_init__
to control Python dataclass initialization
At this point you’re probably wondering: If the __init__
method of a dataclass is generated automatically, how do I get control over the init process to make finer-grained changes?
Enter the __post_init__
method. If you include the __post_init__
method in your dataclass definition, you can provide instructions for modifying fields or other instance data.
from dataclasses import dataclass, field from typing import List @dataclass class Book: '''Object for tracking physical books in a collection.''' name: str weight: float = field(default=0.0, repr=False) shelf_id: int = field(init=False) chapters: List[str] = field(default_factory=list) condition: str = field(default="Good", compare=False) def __post_init__(self): if self.condition == "Discarded": self.shelf_id = None else: self.shelf_id = 0
In this example, we have created a __post_init__
method to set shelf_id
to None
if the book’s condition is initialized as "Discarded"
. Note how we use field
to initialize shelf_id
, and pass init
as False
to field
. This means shelf_id
won’t be initialized in __init__
.
Use InitVar
to control Python dataclass initialization
Another way to customize Python dataclass setup is to use the InitVar
type. This lets you specify a field that will be passed to __init__
and then to __post_init__
, but won’t be stored in the class instance.
By using InitVar
, you can take in parameters when setting up the dataclass that are only used during initialization. An example:
from dataclasses import dataclass, field, InitVar from typing import List @dataclass class Book: '''Object for tracking physical books in a collection.''' name: str condition: InitVar[str] = None weight: float = field(default=0.0, repr=False) shelf_id: int = field(init=False) chapters: List[str] = field(default_factory=list) def __post_init__(self, condition): if condition == "Discarded": self.shelf_id = None else: self.shelf_id = 0
Setting a field’s type to InitVar
(with its subtype being the actual field type) signals to @dataclass
to not make that field into a dataclass field, but to pass the data along to __post_init__
as an argument.
In this version of our Book
class, we’re not storing condition
as a field in the class instance. We’re only using condition
during the initialization phase. If we find that condition
was set to "Discarded"
, we set shelf_id
to None
— but we don’t store condition
in the class instance.
When to use Python dataclasses — and when not to use them
One common scenario for using dataclasses is as a replacement for the namedtuple. Dataclasses offer the same behaviors and more, and they can be made immutable (as namedtuples are) by simply using @dataclass(frozen=True)
as the decorator.
Another possible use case is replacing nested dictionaries, which can be clumsy to work with, with nested instances of dataclasses. If you have a dataclass Library
, with a list property shelves
, you could use a dataclass ReadingRoom
to populate that list, and then add methods to make it easy to access nested items (e.g., a book on a shelf in a particular room).
But not every Python class needs to be a dataclass. If you’re creating a class mainly as a way to group together a bunch of static methods, rather than as a container for data, you don’t need to make it a dataclass. For instance, a common pattern with parsers is to have a class that takes in an abstract syntax tree, walks the tree, and dispatches calls to different methods in the class based on the node type. Because the parser class has very little data of its own, a dataclass isn’t useful here.
Source: infoworld