- Assess and define the objective of each task: sometimes, it will be relevant to label context around the topic we are selecting for labelling, and some other times we are simply trying to classify different types of “items” out of a data set. For example, working on a data set that includes different products and we are trying to determine the percentage of the demand of each product, we might be tempted to label only the product itself on each data record, however, adding a little bit of context is always recommended and in most cases necessary. If on the other hand, we are trying to find more complex insight as if a client likes or dislikes a service or product, we shall have relevant tags to reflect this and therefore we should also label the context to get more accurate insights out of the labelling.
- Create as many labels relevant labels as possible. If they aren’t used enough, they can be either merged with another similar label, or deleted. Creating as granular as possible labels is recommended, merging them as the need for specificity decreases.
- Exclusion reasons: Records containing no useful information whatsoever can be excluded, and will no longer affect the model in any way. Examples include records that simply contain a salutation, or non-specific statements such as ‘I have a problem’.
- If model performance and label issues/improvements are clear after labelling 1000 records, a completely new model can be created with the same data and refined labels, to fast-track elite performance.