spaCy - Train Command

As name implies, this command will train a model. The output will be in spaCy’s JSON format and on every epoch the model will be saved out to the directory.

To package the model using spaCy package command, model details and accuracy scores will be added to meta.json file.

The Train command is as follows:

python -m spacy [lang] [output_path] [train_path] [dev_path]
[--base-model] [--pipeline] [--vectors] [--n-iter] [--n-early-stopping][--n-examples] [--use-gpu] [--version] [--meta-path] [--init-tok2vec][--parser-multitasks] [--entity-multitasks] [--gold-preproc] [--noise-level][--orth-variant-level] [--learn-tokens] [--textcat-arch] [--textcat-multilabel][--textcat-positive-label] [--verbose]

Arguments

The table below explains its arguments −

ARGUMENT	TYPE	DESCRIPTION
Lang	positional	This argument is used for model language.
output_path	positional	This argument represents the directory to store model in. It will be created if it does not pre-exist.
train_path	positional	It is the location of JSON-formatted training data which can be a file or a directory of files.
dev_path	positional	It is the location of JSON-formatted development data for evaluation which can be a file or a directory of files.
--base-model, -b	option	Introduced in version 2.1, represents the name of the base model to update. It is optional and can be any loadable spaCy model.
--pipeline, -p	option	It is also introduced in version 2.1. This is comma-separated names of pipeline components to train. The default value is 'tagger,parser,ner'.
--replace-components, -R	flag	This argument will replace components from the base model.
--vectors, -v	option	It is the model from which the vectors should be loaded.
--n-iter, -n	option	It will give the number of iterations. The default value is 30.
--n-early-stopping, -ne	option	It represents the maximum number of training epochs without dev accuracy improvement.
--n-examples, -ns	option	It will be the number of examples to use. The default value of 0 will use all examples.
--use-gpu, -g	option	Use this argument if you want to use GPU. You need to provide GPU-ID. The default value of -1 will be for CPU only.
--version, -V	option	It will be the model version.
--meta-path, -m	option	Introduced in version 2.0, represents an optional path to model meta.json. It will overwrite all the relevant properties like lang, pipeline and spacy_version.
--init-tok2vec, -t2v	option	Introduced in version 2.1, represents the path to pretrained weights for the token-to-vector parts of the models.
--parser-multitasks, -pt	option	It is the side objectives for parser CNN. For example, 'dep' or 'dep,tag'
--entity-multitasks, -et	option	It is the side objectives for NER CNN. For example, 'dep' or 'dep,tag'
--width, -cw	option	Introduced in version 2.2.4, represents the width of CNN layers of Tok2Vec component.
--conv-depth, -cd	option	Introduced in version 2.2.4, represents the depth of CNN layers of Tok2Vec component.
--cnn-window, -cW	option	Introduced in version 2.2.4, represents the window size for CNN layers of Tok2Vec component.
--cnn-pieces, -cP	option	Introduced in version 2.2.4, represents the maxout size for CNN layers of Tok2Vec component.
--bilstm-depth, -lstm	option	Introduced in version 2.2.4, represents the depth of BiLSTM layers of Tok2Vec component.
--embed-rows, -er	option	This argument indicates the amount of corruption for data augmentation. The value will be in float.
--orth-variant-level, -ovl	option	This argument indicates the orthography variation for data augmentation.
--gold-preproc, -G	flag	This flag will use gold preprocessing.
--learn-tokens, -T	flag	It is flag and Make parser learn gold-standard tokenization by merging the sub-tokens. It is typically used for languages like Chinese.
--textcat-multilabel, -TML	flag	Introduced in version 2.2, represents the text classification classes are not mutually exclusive (multilabel).
--textcat-arch, -ta	option	Introduced in version 2.2, represents the text classification model architecture. Default value is "bow".
--textcat-positive-label, -tpl	option	Introduced in version 2.2, represents the text classification positive label for binary classes with two labels.
--tag-map-path, -tm	option	Introduced in version 2.2.4, represents the location of JSON-formatted tag map.
--verbose, -VV	flag	Introduced in version 2.0.13,shows more detailed messages during training.
--help, -h	flag	This argument is used to show help message and available arguments.

spacy_command_line_helpers.htm