Essentially steps: 1. Define the source of your inputs(X values) 1
2
3
4
5
6
7ImageItemList.from_folder(path)
## This step generate the `ItemList` class.
## ItemBase: The ItemBase class defines what an “item” in your inputs or your targets looks like.
## ItemList: An ItemList defines a collections of `items` (e.g., ItemBase objects) including how they are individually fetched and displayed.1
2
3
4
5
6ImageItemList.from_folder(path)
.split_by_folder()
## This step generate the `ItemLists` class.
## ItemLists: A collection of ItemList instances for your inputs or targets. the `split` function above will return a separate ItemList instance for both your training and validation sets in an `ItemLists` object.LabelList
objects. LabelList subclasses the PyTorch Dataset
class. 1
2
3
4
5
6
7
8
9
10
11ImageItemList.from_folder(path)
.split_by_folder()
.label_from_folder()
## This step generate the `LabelLists` class
## LabelList: A LabelList is a PyTorch Dataset that combines your input and target ItemList classes (an inputs ItemList + a targets ItemList = a LabelList).
## LabelLists: A collection of LabelList instances you get as a result of your `labeling` function. Again, a LabelList` is a PyTorch Dataset and essentially defines the things, your inputs and optionally targets, fed into the forward function of your model.
## Pre-Processing: This is also where any PreProcessor classes you’ve passed into your ItemList class run. These classes define things you want done to your data once before they are turned into PyTorch Datasets/DataLoaders. Examples include things like tokenizing and numericalizing text, filling in missing values in tabular, etc…. You can define a default `PreProcessor` or collection of PreProcessors you want ran by overloading the _processor class variable in your custom ItemList.1
2
3
4
5
6data = (ImageItemList.from_folder(path)
.split_by_folder()
.label_from_folder()
.add_test_folder())
## If you add a test set, like we do above, the same pre-processing applied to your validation set will be applied to your test.LabelList
objects (optional). Here you can apply data augmentation to either, or both, your inputs and targets. 1
2
3
4
5
6
7data = (ImageItemList.from_folder(path)
.split_by_folder()
.label_from_folder()
.add_test_folder()
.transform(tfms, size=64))
## Transforms define data augmentation you want done to either, or both, of your inputs and targets datasets.DataBunch
. 1
2
3
4
5
6
7
8
9
10data = (ImageItemList.from_folder(path)
.split_by_folder()
.label_from_folder()
.add_test_folder()
.transform(tfms, size=64)
.databunch())
## The step generate the `DataBunch` class
## A DataBunch is a collection of PyTorch DataLoaders returned when you call the databunch function. It also defines how they are created from your training, validation, and optionally test LabelList instances.
Once this is done, you’ll have everything you need to train, validate, and test any PyTorch nn.Module using the fastai library. You’ll also have everything you need to later do inference on future data.
Example
1 | class ImageTuple(ItemBase): |
1 | class TargetTupleList(ItemList): |
_bunch
contains the name of the class that will be used to create a DataBunch_processor
contains a class (or a list of classes) of PreProcessor that will then be used as the default to create processor for this ItemList_label_cls
contains the class that will be used to create the labels by default
1 | class ImageTupleList(ImageList): |
Reference
- https://blog.usejournal.com/finding-data-block-nirvana-a-journey-through-the-fastai-data-block-api-c38210537fe4
- https://docs.fast.ai/tutorial.itemlist.html