Essentially steps:
Define the source of your inputs(X values)
1
2
3
4
5
6
7ImageItemList.from_folder(path)
## This step generate the `ItemList` class.
## ItemBase: The ItemBase class defines what an “item” in your inputs or your targets looks like.
## ItemList: An ItemList defines a collections of `items` (e.g., ItemBase objects) including how they are individually fetched and displayed.Define how you want to split your inputs into training and validation datasets using one of the built-in mechanisms for doing so.
1
2
3
4
5
6ImageItemList.from_folder(path)
.split_by_folder()
## This step generate the `ItemLists` class.
## ItemLists: A collection of ItemList instances for your inputs or targets. the `split` function above will return a separate ItemList instance for both your training and validation sets in an `ItemLists` object.Define the source of your targets (that is your y values) and combine them with the inputs of your training and validation datasets in the form of fastai
LabelList
objects. LabelList subclasses the PyTorchDataset
class.1
2
3
4
5
6
7
8
9
10
11ImageItemList.from_folder(path)
.split_by_folder()
.label_from_folder()
## This step generate the `LabelLists` class
## LabelList: A LabelList is a PyTorch Dataset that combines your input and target ItemList classes (an inputs ItemList + a targets ItemList = a LabelList).
## LabelLists: A collection of LabelList instances you get as a result of your `labeling` function. Again, a LabelList` is a PyTorch Dataset and essentially defines the things, your inputs and optionally targets, fed into the forward function of your model.
## Pre-Processing: This is also where any PreProcessor classes you’ve passed into your ItemList class run. These classes define things you want done to your data once before they are turned into PyTorch Datasets/DataLoaders. Examples include things like tokenizing and numericalizing text, filling in missing values in tabular, etc…. You can define a default `PreProcessor` or collection of PreProcessors you want ran by overloading the _processor class variable in your custom ItemList.Add a test dataset (optional).
1
2
3
4
5
6data = (ImageItemList.from_folder(path)
.split_by_folder()
.label_from_folder()
.add_test_folder())
## If you add a test set, like we do above, the same pre-processing applied to your validation set will be applied to your test.Add transforms to your
LabelList
objects (optional). Here you can apply data augmentation to either, or both, your inputs and targets.1
2
3
4
5
6
7data = (ImageItemList.from_folder(path)
.split_by_folder()
.label_from_folder()
.add_test_folder()
.transform(tfms, size=64))
## Transforms define data augmentation you want done to either, or both, of your inputs and targets datasets.Build PyTorch DataLoaders from the Datasets defined above and package them up into a fastai
DataBunch
.1
2
3
4
5
6
7
8
9
10data = (ImageItemList.from_folder(path)
.split_by_folder()
.label_from_folder()
.add_test_folder()
.transform(tfms, size=64)
.databunch())
## The step generate the `DataBunch` class
## A DataBunch is a collection of PyTorch DataLoaders returned when you call the databunch function. It also defines how they are created from your training, validation, and optionally test LabelList instances.
Once this is done, you’ll have everything you need to train, validate, and test any PyTorch nn.Module using the fastai library. You’ll also have everything you need to later do inference on future data.
Example
1 | class ImageTuple(ItemBase): |
1 | class TargetTupleList(ItemList): |
_bunch
contains the name of the class that will be used to create a DataBunch_processor
contains a class (or a list of classes) of PreProcessor that will then be used as the default to create processor for this ItemList_label_cls
contains the class that will be used to create the labels by default
1 | class ImageTupleList(ImageList): |