728x90

Entity and Sentiment Analysis with the Natural Language API

The Cloud Natural Language API를 이용함으로써, text에서 entities를 뽑아낼 수 있고, 감정의 정도를 찾아낼 수 있고, 문법적인 분석도 가능하고, text를 categories에 분류하는 것도 가능하다.

과정은 이전과 유사하다.

1. 이전과 마찬가지로 GCP에 접속하여 Cloud Shell을 활성화 시킨다.

2. Credential API KEY를 받고, 경로설정을 해준다.

export API_KEY=<YOUR_API_KEY>

3. Make an Entity Analysis Request / json파일을 만든다.

처음에는 AnalyzeEntities를 사용할 것이다. ( Text로 부터 entities( 사람, 장소, 사건)을 뽑아낸다 )

그리고 nano, vim, emacs등을 통해 json파일을 편집해준다.

{ "document":{ "type":"PLAIN_TEXT", "content":"Joanne Rowling, who writes under the pen names J. K. Rowling and Robert Galbraith, is a British novelist and screenwriter who wrote the Harry Potter fantasy series." }, "encodingType":"UTF8" }

type : PLAIN_TEXT는 말 그대로 평문이라는 것이다. ( 굳이 Text가 아니어도된다. HTML도 지원한다. )

encodingType은 API가 작업할 text의 형태를 지정해준다.

4-1. Call the Natural Language API - AnalyzeEntities

curl "https://language.googleapis.com/v1/documents:analyzeEntities?key=${API_KEY}" \ -s -X POST -H "Content-Type: application/json" --data-binary @request.json

먼저 AnalyzeEntities를 이용한 분석이다

{ "entities": [ { "name": "Robert Galbraith", "type": "PERSON", "metadata": { "mid": "/m/042xh", "wikipedia_url": "https://en.wikipedia.org/wiki/J._K._Rowling" }, "salience": 0.7980405, "mentions": [ { "text": { "content": "Joanne Rowling", "beginOffset": 0 }, "type": "PROPER" }, { "text": { "content": "Rowling", "beginOffset": 53 }, "type": "PROPER" }, { "text": { "content": "novelist", "beginOffset": 96 }, "type": "COMMON" }, { "text": { "content": "Robert Galbraith", "beginOffset": 65 }, "type": "PROPER" } ] }, ... ] }

entity들의 type을 얻을 수 있고, 관련 Wikipedia URL도 획득할 수 있다.

Salience는 전체 문장 속에서 개체가 중요한(돌출) 정도를 나타낸다.

Mentions는 이 개체와 동일하게 발견된 다른 개체의 위치를 나타낸다.

4-2. Call the Natural Language API - AnalyzeSentiment

sentiment 분석을 하기 위해 먼저 앞선 json파일을 수정해준다.

{ "document":{ "type":"PLAIN_TEXT", "content":"Harry Potter is the best book. I think everyone should read it." }, "encodingType": "UTF8" }

한눈에 봐도 Sentiment가 담긴 문장이라는 것을 우리는 알 수 있을 것이다. 그리고 request를 API에 전송한다.

curl "https://language.googleapis.com/v1/documents:analyzeSentiment?key=${API_KEY}" \ -s -X POST -H "Content-Type: application/json" --data-binary @request.json

결과는 아래와 같다.

{ "documentSentiment": { "magnitude": 0.8, "score": 0.4 }, "language": "en", "sentences": [ { "text": { "content": "Harry Potter is the best book.", "beginOffset": 0 }, "sentiment": { "magnitude": 0.7, "score": 0.7 } }, { "text": { "content": "I think everyone should read it.", "beginOffset": 31 }, "sentiment": { "magnitude": 0.1, "score": 0.1 } } ] }

score는 -1.0 ~ 1.0 사이의 숫자로 나타나며 Positive한지 Negative한지를 나타낸다.

magnitude는 0 ~ 무한대 까지 숫자로 표현되며 문장에서 표현된 감정의 "가중치"를 나타낸다. ( Positive, Negative 상관없이 나타난다 )

결과를 보면 best book은 0.7로 Positive / I think everyone should read it 은 0.1로 Neutral한 결과를 보였다.

4-3. Call the Natural Language API - AnalyzeEntitiySentiment

앞선 4-2가 문장의 Sentiment를 분석했다면, 이번에는 개체들의 Sentiment정도를 분석한다.

그러기 위해서는 먼저 json파일을 수정해준다. ( 확고한 P/N을 나타내기 위한 문장으로 )

{ "document":{ "type":"PLAIN_TEXT", "content":"I liked the sushi but the service was terrible." }, "encodingType": "UTF8" }

request를 API에 전송한다.

curl "https://language.googleapis.com/v1/documents:analyzeEntitySentiment?key=${API_KEY}" \ -s -X POST -H "Content-Type: application/json" --data-binary @request.json

결과는 아래와 같다.

{ "entities": [ { "name": "sushi", "type": "CONSUMER_GOOD", "metadata": {}, "salience": 0.52716845, "mentions": [ { "text": { "content": "sushi", "beginOffset": 12 }, "type": "COMMON", "sentiment": { "magnitude": 0.9, "score": 0.9 } } ], "sentiment": { "magnitude": 0.9, "score": 0.9 } }, { "name": "service", "type": "OTHER", "metadata": {}, "salience": 0.47283158, "mentions": [ { "text": { "content": "service", "beginOffset": 26 }, "type": "COMMON", "sentiment": { "magnitude": 0.9, "score": -0.9 } } ], "sentiment": { "magnitude": 0.9, "score": -0.9 } } ], "language": "en" }

결과를 보면 sushi가 0.9로 Positive한 반면에, service는 -0.9로 Negative한 결과를 나타냈다. 제대로 분석했음을 알 수 있다. ( 다른 표현들이 더 있었다면 그것 또한 잡아냈을 것이다 )

4-4. Call the Natural Language API - AnalyzeSyntax

먼저 request.json을 수정해준다.

{ "document":{ "type":"PLAIN_TEXT", "content": "Joanne Rowling is a British novelist, screenwriter and film producer." }, "encodingType": "UTF8" }

그리고 API의 annotateText를 불러온다.

curl "https://language.googleapis.com/v1/documents:analyzeSyntax?key=${API_KEY}" \ -s -X POST -H "Content-Type: application/json" --data-binary @request.json

결과는 다음과 같다.

{ "text": { "content": "is", "beginOffset": 15 }, "partOfSpeech": { "tag": "VERB", "aspect": "ASPECT_UNKNOWN", "case": "CASE_UNKNOWN", "form": "FORM_UNKNOWN", "gender": "GENDER_UNKNOWN", "mood": "INDICATIVE", "number": "SINGULAR", "person": "THIRD", "proper": "PROPER_UNKNOWN", "reciprocity": "RECIPROCITY_UNKNOWN", "tense": "PRESENT", "voice": "VOICE_UNKNOWN" }, "dependencyEdge": { "headTokenIndex": 2, "label": "ROOT" }, "lemma": "be" },

PartOfSpeech는 "Joanne"가 명사라는 것을 알려준다.

dependencyEdge는 text들의 dependency parse tree를 만들게 해준다. 각 개체들 간의 관련도를 다이어그램으로 나타낸 것이다.

headTokenIndex는 토큰의 index를 나타내는 것이다. token 1 인 "Joanne"는 은 단어 "Rowling"을 가리킨다.

lemma는 단어를 정해진 기본적인 형태로 나타내준다. 예를 들어 runs, run, ran, running이 있다면 이는 run의 lemma인 것이다. lemma는 텍스트에서 단어 발생을 추적하는데 유용하다.

5. Multilingual natural language processing

The Natural Language API는 다양한 언어들을 지원한다. request.json을 수정하여 일본어를 분석해보자

{ "document":{ "type":"PLAIN_TEXT", "content":"日本のグーグルのオフィスは、東京の六本木ヒルズにあります" } }

analyzeEntities를 이용하여 분석할 것이다.

curl "https://language.googleapis.com/v1/documents:analyzeEntities?key=${API_KEY}" \ -s -X POST -H "Content-Type: application/json" --data-binary @request.json

결과는 다음과 같다

{ "entities": [ { "name": "日本", "type": "LOCATION", "metadata": { "mid": "/m/03_3d", "wikipedia_url": "https://en.wikipedia.org/wiki/Japan" }, "salience": 0.23854347, "mentions": [ { "text": { "content": "日本", "beginOffset": 0 }, "type": "PROPER" } ] }, { "name": "グーグル", "type": "ORGANIZATION", "metadata": { "mid": "/m/045c7b", "wikipedia_url": "https://en.wikipedia.org/wiki/Google" }, "salience": 0.21155767, "mentions": [ { "text": { "content": "グーグル", "beginOffset": 9 }, "type": "PROPER" } ] }, ... ] "language": "ja" }

wikipedia 관련 URL이 나오는 것을 알 수 있고, 각 객체들의 type이 나오는 것 또한 확인할 수 있다.

728x90

[Google Study JAM] Entity and Sentiment Analysis with the Natural Language API

Entity and Sentiment Analysis with the Natural Language API

티스토리툴바