South Park Text Analytics Shiny App

GitHub Code Link for the Rshiny App

Shiny App of Text Analytics for South Park Tv show.

R Shiny Apps are very popular, that is why I developed my first Shiny App based on Olympic Data from Kaggle. It will be updated after year 2020 Olympic games. Here is the Link. After this I now developed something grand-eloquent, which is for the long running TV series South Park.

This show has 22 seasons aired until now. I was able to find the data for scripts of all the aired episodes until season 18 from the GitHub repository of Kaylin Pavlik While this was more than enough information I was focusing on getting data for the rest of the seasons(19,20,21,22). For this task I briefly studied web scraping and thankfully it was helpful in obtaining that data with a little amount of knowledge of string manipulation.

Finally I was able to create one massive data set which would have all the script details from season 1 to season 22 for all episodes with who spoke them and what did these Characters say without unnecessary background interpretations or scenario descriptions.

But this was not the end, because I was lucky enough to find the R package in GitHub southparkr of Patrik Drhlik. This package included information from the IMDB website regarding Episode Name, Rating and Votes. Further it also included similar information regarding the scripts but more thoroughly until recent seasons. Which gave me an idea to use this package for data regarding the Rating and votes from IMDB because I already had the information for scripts.

No Inputs needed to generate Plots or Results

Plots generated here are to summarize so far how the South park season 1 to 22 has changed. In terms of Swear words, stop words, all words, sentiment and much more. If we focus on sentiment analysis it is more clear when you read documents related to AFINN, bing and nrc.

Trivia Sub Tab

This tab includes information in plots mainly generated by plotly with some memes from South Park. So patiently wait until they load, you can scroll through the page and read stuff.

Lines Sub Tab

Summarized information for number of lines with relative to seasons, characters and episodes will be plotted here.

Words Sub Tab

Summarized information for number of words with relative to seasons, characters and episodes will be plotted here.

Special Words Sub Tab

Summarized information for words, words without stop words, swear words with relative to seasons, characters and episodes will be plotted here.

Ratings and Votes from IMDB Sub Tab

Data from southparkr package related to ratings and votes of IMDB will be used to generate plots in this tab. There are two animated plots and it might take some time to generate also therefore patiently wait.

Sentiment Analysis Sub Tab

As above mentioned here also there will be plots related sentiment analysis. Which are related to AFINN, bing and nrc techniques.

Bigram and Trigram Analysis Sub Tab

This is something rare to be useful and time consuming, but will still generate plots. Therefore patiently wait until the plots are done to be view-able.

Inputs needed from the user to generate Plots and Results

Inputs from user where they can choose their own will be used to generate plots under these several tabs. Below generated are only plots but nothing more. These comparisons are mainly about Most number of lines, Most number of words, Most number of words without stop words, Swear words and sentiment analysis.

Sentiment Analysis is related to AFINN, bing and nrc techniques. Further, nrc has subgroups which are related to 10 different emotions. While AFINN and bing has only two emotions which are positive and negative.

Compare Two Seasons Tab.

Two seasons of the users choice will be used to generate plots. It will take some time so patiently wait until they are plotted. Where all the users who were from those two seasons.

Compare Two Characters Tab.

Two Characters of the users choice will be used to generate plots. It will take some time so patiently wait until they are plotted. All the seasons which they were active.

Compare Two Characters but Same Season Tab.

Two Characters of the Same Season will be used to generate plots. It will take some time so patiently wait until they are plotted. Here, we will not consider characters which were active in this particular season, rather all Characters which were active throughout all seasons. Therefore sometimes their might not be meaningful plots.

Compare Two Seasons but Same Character Tab.

Two Seasons of the Same Character will be used to generate plots. It will take some time so patiently wait until they are plotted. Here also we are not considering if the chosen character by the user is active in the chosen two seasons of choice. Therefore sometimes we might not generate meaningful plots.

About the Author Tab.

If you Click this, it will automatically open a tab which will lead you to my personal website, which is this one.

THANK YOU

Related