- spaCy Tutorial
- spaCy - Home
- spaCy - Introduction
- spaCy - Getting Started
- spaCy - Models and Languages
- spaCy - Architecture
- spaCy - Command Line Helpers
- spaCy - Top-level Functions
- spaCy - Visualization Function
- spaCy - Utility Functions
- spaCy - Compatibility Functions
- spaCy - Containers
- Doc Class ContextManager and Property
- spaCy - Container Token Class
- spaCy - Token Properties
- spaCy - Container Span Class
- spaCy - Span Class Properties
- spaCy - Container Lexeme Class
- Training Neural Network Model
- Updating Neural Network Model
- spaCy Useful Resources
- spaCy - Quick Guide
- spaCy - Useful Resources
- spaCy - Discussion
spaCy - Train Command
As name implies, this command will train a model. The output will be in spaCy’s JSON format and on every epoch the model will be saved out to the directory.
To package the model using spaCy package command, model details and accuracy scores will be added to meta.json file.
The Train command is as follows:
python -m spacy [lang] [output_path] [train_path] [dev_path] [--base-model] [--pipeline] [--vectors] [--n-iter] [--n-early-stopping][--n-examples] [--use-gpu] [--version] [--meta-path] [--init-tok2vec][--parser-multitasks] [--entity-multitasks] [--gold-preproc] [--noise-level][--orth-variant-level] [--learn-tokens] [--textcat-arch] [--textcat-multilabel][--textcat-positive-label] [--verbose]
Arguments
The table below explains its arguments −
ARGUMENT | TYPE | DESCRIPTION |
---|---|---|
Lang | positional | This argument is used for model language. |
output_path | positional | This argument represents the directory to store model in. It will be created if it does not pre-exist. |
train_path | positional | It is the location of JSON-formatted training data which can be a file or a directory of files. |
dev_path | positional | It is the location of JSON-formatted development data for evaluation which can be a file or a directory of files. |
--base-model, -b | option | Introduced in version 2.1, represents the name of the base model to update. It is optional and can be any loadable spaCy model. |
--pipeline, -p | option | It is also introduced in version 2.1. This is comma-separated names of pipeline components to train. The default value is 'tagger,parser,ner'. |
--replace-components, -R | flag | This argument will replace components from the base model. |
--vectors, -v | option | It is the model from which the vectors should be loaded. |
--n-iter, -n | option | It will give the number of iterations. The default value is 30. |
--n-early-stopping, -ne | option | It represents the maximum number of training epochs without dev accuracy improvement. |
--n-examples, -ns | option | It will be the number of examples to use. The default value of 0 will use all examples. |
--use-gpu, -g | option | Use this argument if you want to use GPU. You need to provide GPU-ID. The default value of -1 will be for CPU only. |
--version, -V | option | It will be the model version. |
--meta-path, -m | option | Introduced in version 2.0, represents an optional path to model meta.json. It will overwrite all the relevant properties like lang, pipeline and spacy_version. |
--init-tok2vec, -t2v | option | Introduced in version 2.1, represents the path to pretrained weights for the token-to-vector parts of the models. |
--parser-multitasks, -pt | option | It is the side objectives for parser CNN. For example, 'dep' or 'dep,tag' |
--entity-multitasks, -et | option | It is the side objectives for NER CNN. For example, 'dep' or 'dep,tag' |
--width, -cw | option | Introduced in version 2.2.4, represents the width of CNN layers of Tok2Vec component. |
--conv-depth, -cd | option | Introduced in version 2.2.4, represents the depth of CNN layers of Tok2Vec component. |
--cnn-window, -cW | option | Introduced in version 2.2.4, represents the window size for CNN layers of Tok2Vec component. |
--cnn-pieces, -cP | option | Introduced in version 2.2.4, represents the maxout size for CNN layers of Tok2Vec component. |
--bilstm-depth, -lstm | option | Introduced in version 2.2.4, represents the depth of BiLSTM layers of Tok2Vec component. |
--embed-rows, -er | option | This argument indicates the amount of corruption for data augmentation. The value will be in float. |
--orth-variant-level, -ovl | option | This argument indicates the orthography variation for data augmentation. |
--gold-preproc, -G | flag | This flag will use gold preprocessing. |
--learn-tokens, -T | flag | It is flag and Make parser learn gold-standard tokenization by merging the sub-tokens. It is typically used for languages like Chinese. |
--textcat-multilabel, -TML | flag | Introduced in version 2.2, represents the text classification classes are not mutually exclusive (multilabel). |
--textcat-arch, -ta | option | Introduced in version 2.2, represents the text classification model architecture. Default value is "bow". |
--textcat-positive-label, -tpl | option | Introduced in version 2.2, represents the text classification positive label for binary classes with two labels. |
--tag-map-path, -tm | option | Introduced in version 2.2.4, represents the location of JSON-formatted tag map. |
--verbose, -VV | flag | Introduced in version 2.0.13,shows more detailed messages during training. |
--help, -h | flag | This argument is used to show help message and available arguments. |
spacy_command_line_helpers.htm
Advertisements
To Continue Learning Please Login
Login with Google