I want to make a classifier to classify the akita and shiba inu, alaska and husky.
Creating your own dataset from Google Images
1
from fastai.vision import *
Get a list of URLs
Search and scroll
Go to Google Images and search for the images you are interested in. The more specific you are in your Google Search, the better the results and the less manual pruning you will have to do.
Scroll down until you've seen all the images you want to download, or until you see a button that says 'Show more results'. All the images you scrolled past are now available to download. To get more, click on the button, and continue scrolling. The maximum number of images Google Images shows is 700.
It is a good idea to put things you want to exclude into the search query, for instance if you are searching for the Eurasian wolf, "canis lupus lupus", it might be a good idea to exclude other variants:
You can also limit your results to show only photos by clicking on Tools and selecting Photos from the Type dropdown. ### Download into file Now you must run some Javascript code in your browser which will save the URLs of all the images you want for you dataset.
In Google Chrome press CtrlShiftj on Windows/Linux and CmdOptj on macOS, and a small window the javascript 'Console' will appear. In Firefox press CtrlShiftk on Windows/Linux or CmdOptk on macOS. That is where you will paste the JavaScript commands.
You will need to get the urls of each of the images. Before running the following commands, you may want to disable ad blocking extensions (uBlock, AdBlockPlus etc.) in Chrome. Otherwise the window.open() command doesn't work. Then you can run the following commands:
Create directory and upload urls file into your server
Choose an appropriate name for your labeled images. You can run these steps multiple times to create different labels.
1
help(download_images)
Help on function download_images in module fastai.vision.data:
download_images(urls:Collection[str], dest:Union[pathlib.Path, str], max_pics:int=1000, max_workers:int=8, timeout=4)
Download images listed in text file `urls` to path `dest`, at most `max_pics`
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
Error Invalid URL '': No schema supplied. Perhaps you meant http://?
View data
1
help(DataBunch)
Help on class DataBunch in module fastai.basic_data:
class DataBunch(builtins.object)
| Bind `train_dl`,`valid_dl` and `test_dl` in a data object.
|
| Methods defined here:
|
| __getattr__(self, k:int) -> Any
|
| __init__(self, train_dl:torch.utils.data.dataloader.DataLoader, valid_dl:torch.utils.data.dataloader.DataLoader, fix_dl:torch.utils.data.dataloader.DataLoader=None, test_dl:Union[torch.utils.data.dataloader.DataLoader, NoneType]=None, device:torch.device=None, dl_tfms:Union[Collection[Callable], NoneType]=None, path:Union[pathlib.Path, str]='.', collate_fn:Callable=<function data_collate at 0x7f14501736a8>, no_check:bool=False)
| Initialize self. See help(type(self)) for accurate signature.
|
| __repr__(self) -> str
| Return repr(self).
|
| __setstate__(self, data:Any)
|
| add_test(self, items:Iterator, label:Any=None, tfms=None, tfm_y=None) -> None
| Add the `items` as a test set. Pass along `label` otherwise label them with `EmptyLabel`.
|
| add_tfm(self, tfm:Callable) -> None
|
| dl(self, ds_type:fastai.basic_data.DatasetType=<DatasetType.Valid: 2>) -> fastai.basic_data.DeviceDataLoader
| Returns appropriate `Dataset` for validation, training, or test (`ds_type`).
|
| export(self, file:Union[pathlib.Path, str, _io.BufferedWriter, _io.BytesIO]='export.pkl')
| Export the minimal state of `self` for inference in `self.path/file`. `file` can be file-like (file or buffer)
|
| one_batch(self, ds_type:fastai.basic_data.DatasetType=<DatasetType.Train: 1>, detach:bool=True, denorm:bool=True, cpu:bool=True) -> Collection[torch.Tensor]
| Get one batch from the data loader of `ds_type`. Optionally `detach` and `denorm`.
|
| one_item(self, item, detach:bool=False, denorm:bool=False, cpu:bool=False)
| Get `item` into a batch. Optionally `detach` and `denorm`.
|
| pre_transform = _db_pre_transform(self, train_tfm:List[Callable], valid_tfm:List[Callable])
| Call `train_tfm` and `valid_tfm` after opening image, before converting from `PIL.Image`
|
| presize = _presize(self, size:int, val_xtra_size:int=32, scale:Tuple[float]=(0.08, 1.0), ratio:Tuple[float]=(0.75, 1.3333333333333333), interpolation:int=2)
| Resize images to `size` using `RandomResizedCrop`, passing along `kwargs` to train transform
|
| remove_tfm(self, tfm:Callable) -> None
|
| sanity_check(self)
| Check the underlying data in the training set can be properly loaded.
|
| save(self, file:Union[pathlib.Path, str, _io.BufferedWriter, _io.BytesIO]='data_save.pkl') -> None
| Save the `DataBunch` in `self.path/file`. `file` can be file-like (file or buffer)
|
| show_batch(self, rows:int=5, ds_type:fastai.basic_data.DatasetType=<DatasetType.Train: 1>, reverse:bool=False, **kwargs) -> None
| Show a batch of data in `ds_type` on a few `rows`.
|
| ----------------------------------------------------------------------
| Class methods defined here:
|
| create(train_ds:torch.utils.data.dataset.Dataset, valid_ds:torch.utils.data.dataset.Dataset, test_ds:Union[torch.utils.data.dataset.Dataset, NoneType]=None, path:Union[pathlib.Path, str]='.', bs:int=64, val_bs:int=None, num_workers:int=6, dl_tfms:Union[Collection[Callable], NoneType]=None, device:torch.device=None, collate_fn:Callable=<function data_collate at 0x7f14501736a8>, no_check:bool=False, **dl_kwargs) -> 'DataBunch' from builtins.type
| Create a `DataBunch` from `train_ds`, `valid_ds` and maybe `test_ds` with a batch size of `bs`. Passes `**dl_kwargs` to `DataLoader()`
|
| load_empty = _databunch_load_empty(path, fname:str='export.pkl') from builtins.type
| Load an empty `DataBunch` from the exported file in `path/fname` with optional `tfms`.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
|
| batch_size
|
| dls
| Returns a list of all DeviceDataLoaders. If you need a specific DeviceDataLoader, access via the relevant property (`train_dl`, `valid_dl`, etc) as the index of DLs in this list is not guaranteed to remain constant.
|
| empty_val
|
| fix_ds
|
| is_empty
|
| loss_func
|
| single_ds
|
| test_ds
|
| train_ds
|
| valid_ds
1
help(ImageDataBunch.from_folder)
Help on method from_folder in module fastai.vision.data:
from_folder(path:Union[pathlib.Path, str], train:Union[pathlib.Path, str]='train', valid:Union[pathlib.Path, str]='valid', test:Union[pathlib.Path, str, NoneType]=None, valid_pct=None, seed:int=None, classes:Collection=None, **kwargs:Any) -> 'ImageDataBunch' method of builtins.type instance
Create from imagenet style dataset in `path` with `train`,`valid`,`test` subfolders (or provide `valid_pct`).
1
help(ImageDataBunch.from_folder)
Help on method from_folder in module fastai.vision.data:
from_folder(path:Union[pathlib.Path, str], train:Union[pathlib.Path, str]='train', valid:Union[pathlib.Path, str]='valid', test:Union[pathlib.Path, str, NoneType]=None, valid_pct=None, seed:int=None, classes:Collection=None, **kwargs:Any) -> 'ImageDataBunch' method of builtins.type instance
Create from imagenet style dataset in `path` with `train`,`valid`,`test` subfolders (or provide `valid_pct`).
Some of our top losses aren't due to bad performance by our model. There are images in our data set that shouldn't be.
Using the ImageCleaner widget from fastai.widgets we can prune our top losses, removing photos that don't belong.
1
from fastai.widgets import *
First we need to get the file paths from our top_losses. We can do this with .from_toplosses. We then feed the top losses indexes and corresponding dataset to ImageCleaner.
Notice that the widget will not delete images directly from disk but it will create a new csv file cleaned.csv from where you can create a new ImageDataBunch with the corrected labels to continue training your model.
In order to clean the entire set of images, we need to create a new dataset without the split. The video lecture demostrated the use of the ds_type param which no longer has any effect. See the thread for more details.
1 2 3 4 5 6
db = (ImageList.from_folder(path) .split_none() .label_from_folder() .transform(get_transforms(), size=224) .databunch() )
You can also find duplicates in your dataset and delete them! To do this, you need to run .from_similars to get the potential duplicates' ids and then run ImageCleaner with duplicates=True. The API works in a similar way as with misclassified images: just choose the ones you want to delete and click 'Next Batch' until there are no more images left.
Make sure to recreate the databunch and learn_cln from the cleaned.csv file. Otherwise the file would be overwritten from scratch, losing all the results from cleaning the data from toplosses.