Spacy in Machine Learning

In Machine Learning, spaCy is a very useful open-source library for advanced natural language processing (NLP) tasks for Python. If you work with a lot of text, you might want to learn more about it. For example, what is it? What do the words mean in context? Who does what to whom? Which companies and which products are mentioned? Which texts are similar to each other? In this article, I will take you through, spaCy in Machine Learning.

Why spaCY?

spaCy is specially designed for production use and helps you create applications that process and “understand” large volumes of text. It can be used to create systems for extracting information or understanding natural language, or for preprocessing text for deep learning.

Also, Read – Text Classification with TensorFlow.

Installing spaCy in your systems is a very easy task like installing all other packages in Python. You can easily install it by using the pip command in your terminal – pip install spacy.

You will also need to access at least one of the spaCy language models. spaCy can be used to analyze texts from different languages ​​including English, German, Spanish and French, each with its models. We’re going to be working with English text for this simple analysis, so go ahead and take spaCy’s little English language template, again via the command line: python -m spacy download en_core_web_sm.

Tokenization

The task of Text processing now comes down to loading your language model and passing strings directly to it. Now let’s see what it does with a sample review:

import spacy
nlp = spacy.load("en_core_web_sm")
review = "I'am so happy I went to this awesome Vegas buffet!"
doc = nlp(review)Code language: JavaScript (javascript)

To see the resulting output, we need to loop over the above NLP document:

for token in doc:
  print(token.text, token.pos_, token.lemma_, token.is_stop)Code language: CSS (css)
I'am PROPN I'am False
so ADV so True
happy ADJ happy False
I PRON -PRON- True
went VERB go False
to ADP to True
this DET this True
awesome ADJ awesome False
Vegas PROPN Vegas False
buffet NOUN buffet False
! PUNCT ! False

spaCy does not explicitly divide the original text into a list, but tokens are accessible by the index range:

print(doc[:5])Code language: CSS (css)

Output: I’am so happy I went

Spacy Dependencies

NLP consists of a lot of unique challenges, certainly with syntactic and semantic issues. spaCy identifies all the dependencies of each token as the text passes through the language model, let’s check the dependencies in our Text review:

for token in doc:
  print(token.text, token.dep_)Code language: CSS (css)
I'am ROOT
so advmod
happy amod
I nsubj
went ccomp
to prep
this det
awesome amod
Vegas compound
buffet pobj
! punct

It looks somewhat interesting, but visualizing these relationships reveals an even fuller story. Start by loading a submodule called displaCy to help with visualization:

from spacy import displacy
displacy.serve(doc)Code language: JavaScript (javascript)

Then we need to render the dependency tree from the document:

tree

Named Entity Recognition with Spacy

Machine learning practitioners often seek to identify key elements and individuals in unstructured text. This task, called Named Entity Recognition (NER), runs automatically as the text passes through the language model. To see which tokens it identifies as named entities in our restaurant review, simply browse doc.ents:

for ent in doc.ents:
  print(ent.text, ent.label_)Code language: CSS (css)
Vegas GPE

It recognizes “Vegas” as a named entity, but what does the label “GPE” mean? If you don’t know what any of the abbreviations mean, just ask spaCy to explain it to you:

spacy.explain("GPE")Code language: JavaScript (javascript)
Countries, cities, states

Additionally, the displacement method of displaCy can highlight named entities if the style argument is specified:

displacy.serve(doc, style='ent')Code language: JavaScript (javascript)
image for post

The coloured texts represent named entities by type. Consider this more complicated example with four different types of entities: 

document = nlp("One year ago, I visited the Eiffel Tower with Jeff in Paris, France")
displacy.serve(document, style='ent')Code language: JavaScript (javascript)
spacy

Also, Read – Data Science Project on Diamonds Analysis with Python.

I hope you liked this article on Spacy in Machine Learning. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning. Don’t forget to subscribe for the daily newsletters below to get our notifications in your inbox.

Follow Us:

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1535

Leave a Reply