top of page
  • armstrongWebb

Business analysts and Architects - how about creating an NLP apprentice?

Updated: Mar 23, 2021

Today the weather was dark and cold. Yes, another day in Covid lock-down. It may be the weekend, but having scanned the various journals and walked the dog, I set myself a challenge.


Setting myself a limit of a day - well, most of a day - could I create a tool that scanned a set of text and automatically generate an associated Entity-Relationship model?


For example, in the following: The customer can place one or more orders. The entities are customer and order, with the customer having a '1 to many' relationship - or cardinality - with order, ie one customer can place 'one or many orders'.


The challenge is that, unless treated in a special way, computer systems treat text as a collection of characters separated by spaces and punctuation marks. The tool needs to be able to extract meaning from the words and especially how they relate to one another.



Enter NLP...


Natural Language Processing, or NLP does just that. For example, it can analyse a sentence and decompose it in to its constituent parts, eg noun, verb, subject, object etc.


There are various toolkits that can be used to do this, eg NLTK (Natural Language Toolkit). However, for this task, I used spaCy, which is very powerful and becoming increasingly popular. It uses Machine Learning and can be extended, eg by 'learning' domain-specific (eg pharma) terminology. It also has a rather groovy visualiser, named displacy - more on this in a moment.


As usual, Python was my language of choice to bring it all together.



The test text


I used the following to test the tool:


...with the aim of extracting: the entities, relationships, cardinalities. Oh, and for an added benefit, the attributes associated with each entity.



Human vs Machine


Of course, a human would be able to extract the relevant information very quickly. Certainly, a lot faster than the time it takes to write and test the relevant code. But what if the text was 200 lines in length, or 400... You get the picture.


However, my intention is to create a tool to provide an independent interpretation of the E-R information, to assist in achieving a better quality result overall. It would be my E-R apprentice so to speak.



spaCy's displacy: what you see is what you get


displacy provides a visual representation of the parts of speech of the text. For example, a displacy image of the first sentence is:




- for each word, two key items of information are shown: i) part of speech (eg noun) and ii) its relationship. For example, orders is related to the 'head' verb 'place' (ie from 'to place').



Bringing it all together


Using the NLP analysis, I wrote code to extract entities and attributes, based on the expected relationships.


The end-result was:



Input:



E-R output:















Conclusion..


The above example took less than a day's effort. And I now have the makings of a useful tool that could help improve the quality of E-R analysis.


The use of ML and NLP will help create increasingly powerful tools to enable us to do a better day job. Watch this space!


34 views0 comments

Comments


bottom of page