Twitter hashtag analysis of movie premieres in February 2022 in the USA


Sortze-data: 13-06-2024



Author: Víctor Yeste. Universitat Politècnica de Valencia. This work is an exploratory, quantitative, and not experimental study with an inductive inference type and a longitudinal follow-up. It analyzes movie data and tweets published by users using the official Twitter hashtags of movie premieres the week before, the same week, and the week after each release date. The scope of the study is the collection of movies released in February 2022 in the USA, and the object of the study includes them and the tweets that refer to the film in the 3 closest weeks to their premiere dates. The tweets recollected were classified by the week they were published, so they are classified by a time dimension called timepoint. The week before the release date has been designated as timepoint 1, the week of the release date is timepoint 2, and the week immediately afterward is timepoint 3. Another dimension that has been considered is if the movie has domestic production or not, which means that if one of the countries of origin is the United States, the movie is designated as domestic. The chosen variables are organized in two data tables, one for the movies and one for the collected tweets. Variables related to the movies: id: Internal id of the movie name: Title of the movie hashtag: Official hashtag of the movie countries: List of countries of the movie, separated by a semicolon mpaa: Film ratings system by the Motion Picture Association of America. It is a completely voluntary rating system and ratings have no legal standing. The currently rating systems include G (general audiences), PG (parental guidance suggested), PG-13 (parents strongly cautioned), R (restricted, under 17 requires accompanying parent or adult guardian) and NC-17 (no one 17 and under admitted)(Film Ratings - Motion Picture Association, n.d.) genres: List of genres of the movie, e.g., Action or Thriller, separated by a semicolon release_date: Release date of the movie in a format YYYY-MM-DD opening_grosses: Amount of USA dollars that the movie obtained on the opening date (the first week after the release date) opening_theaters: Amount of USA theaters that released the movie on the opening date (the first week after the release date) rating_avg: Average rating of the movie Variables related to the tweets: id: Internal id of the tweet status_id: Twitter id of the tweet movie_id: Internal id of the movie timepoint: Week number related to the movie premiere that the tweet was published on. “1” is the week before the movie release, “2” is the week after the movie release” and “3” is the second week after the movie release. author_id: Twitter id of the author of the tweet created_at: Date and time of the tweet, with format “YYYY-MM-DD HH:MM:SS” quote_count: Number of the tweet’s quotes reply_count: Number of the tweet’s replies retweet_count: Number of the tweet’s retweets like_count: Number of the tweet’s likes sentiment: Sentiment analysis of the tweet’s content with a range from -1 (negative) to 1 (positive) This dataset has contributed to the elaboration of the book chapters: Yeste, Víctor; Calduch-Losa, Ángeles (2022). Genre classification of movie releases in the USA: Exploring data with Twitter hashtags. In Narrativas emergentes para la comunicación digital (pp. 1012-1044). Dykinson, S. L. Yeste, Víctor; Calduch-Losa, Ángeles (2022). Exploratory Twitter hashtag analysis of movie premieres in the USA. In Desafíos audiovisuales de la tecnología y los contenidos en la cultura digital (pp. 169-187). McGraw-Hill Interamericana de España S.L. Yeste, Víctor; Calduch-Losa, Ángeles (2022). ANOVA to study movie premieres in the USA and online conversation on Twitter. The case of rating average using data from official Twitter hashtags. In El mapa y la brújula. Navegando por las metodologías de investigación en comunicación (pp. 151-168). Editorial Fragua.


DATASET lock_opendata.xlsx 28 KB