Google Analytics & Twitter dataset from a movies, TV series and videogames website
Descripción
Author: Víctor Yeste. Universitat Politècnica de Valencia. The object of this study is the design of a cybermetric methodology whose objectives are to measure the success of the content published in online media and the possible prediction of the selected success variables. In this case, due to the need to integrate data from two separate areas, such as web publishing and the analysis of their shares and related topics on Twitter, has opted for programming as you access both the Google Analytics v4 reporting API and Twitter Standard API, always respecting the limits of these. The website analyzed is hellofriki.com. It is an online media whose primary intention is to solve the need for information on some topics that provide daily a vast number of news in the form of news, as well as the possibility of analysis, reports, interviews, and many other information formats. All these contents are under the scope of the sections of cinema, series, video games, literature, and comics. This dataset has contributed to the elaboration of the PhD Thesis: Yeste Moreno, VM. (2021). Diseño de una metodología cibermétrica de cálculo del éxito para la optimización de contenidos web [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/176009 Data have been obtained from each last-minute news article published online according to the indicators described in the doctoral thesis. All related data are stored in a database, divided into the following tables: tesis_followers: User ID list of media account followers. tesis_hometimeline: data from tweets posted by the media account sharing breaking news from the web. status_id: Tweet ID created_at: date of publication text: content of the tweet path: URL extracted after processing the shortened URL in text post_shared: Article ID in WordPress that is being shared retweet_count: number of retweets favorite_count: number of favorites tesis_hometimeline_other: data from tweets posted by the media account that do not share breaking news from the web. Other typologies, automatic Facebook shares, custom tweets without link to an article, etc. With the same fields as tesis_hometimeline. tesis_posts: data of articles published by the web and processed for some analysis. stats_id: Analysis ID post_id: Article ID in WordPress post_date: article publication date in WordPress post_title: title of the article path: URL of the article in the middle web tags: Tags ID or WordPress tags related to the article uniquepageviews: unique page views entrancerate: input ratio avgtimeonpage: average visit time exitrate: output ratio pageviewspersession: page views per session adsense_adunitsviewed: number of ads viewed by users adsense_viewableimpressionpercent: ad display ratio adsense_ctr: ad click ratio adsense_ecpm: estimated ad revenue per 1000 page views tesis_stats: data from a particular analysis, performed at each published breaking news item. Fields with statistical values can be computed from the data in the other tables, but total and average calculations are saved for faster and easier further processing. id: ID of the analysis phase: phase of the thesis in which analysis has been carried out (right now all are 1) time: "0" if at the time of publication, "1" if 14 days later start_date: date and time of measurement on the day of publication end_date: date and time when the measurement is made 14 days later main_post_id: ID of the published article to be analysed main_post_theme: Main section of the published article to analyze superheroes_theme: "1" if about superheroes, "0" if not trailer_theme: "1" if trailer, "0" if not name: empty field, possibility to add a custom name manually notes: empty field, possibility to add personalized notes manually, as if some tag has been removed manually for being considered too generic, despite the fact that the editor put it num_articles: number of articles analysed num_articles_with_traffic: number of articles analysed with traffic (which will be taken into account for traffic analysis) num_articles_with_tw_data: number of articles with data from when they were shared on the media’s Twitter account num_terms: number of terms analyzed uniquepageviews_total: total page views uniquepageviews_mean: average page views entrancerate_mean: average input ratio avgtimeonpage_mean: average duration of visits exitrate_mean: average output ratio pageviewspersession_mean: average page views per session total: total of ads viewed adsense_adunitsviewed_mean: average of ads viewed adsense_viewableimpressionpercent_mean: average ad display ratio adsense_ctr_mean: average ad click ratio adsense_ecpm_mean: estimated ad revenue per 1000 page views Total: total income retweet_count_mean: average income favorite_count_total: total of favorites favorite_count_mean: average of favorites terms_ini_num_tweets: total tweets on the terms on the day of publication terms_ini_retweet_count_total: total retweets on the terms on the day of publication terms_ini_retweet_count_mean: average retweets on the terms on the day of publication terms_ini_favorite_count_total: total of favorites on the terms on the day of publication terms_ini_favorite_count_mean: average of favorites on the terms on the day of publication terms_ini_followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the terms on the day of publication terms_ini_user_num_followers_mean: average followers of users who have spoken of the terms on the day of publication terms_ini_user_num_tweets_mean: average number of tweets published by users who spoke about the terms on the day of publication terms_ini_user_age_mean: average age in days of users who have spoken of the terms on the day of publication terms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms on the day of publication terms_end_num_tweets: total tweets on terms 14 days after publication terms_ini_retweet_count_total: total retweets on terms 14 days after publication terms_ini_retweet_count_mean: average retweets on terms 14 days after publication terms_ini_favorite_count_total: total bookmarks on terms 14 days after publication terms_ini_favorite_count_mean: average of favorites on terms 14 days after publication terms_ini_followers_talking_rate: ratio of media Twitter account followers who have recently posted a tweet talking about the terms 14 days after publication terms_ini_user_num_followers_mean: average followers of users who have spoken of the terms 14 days after publication terms_ini_user_num_tweets_mean: average number of tweets published by users who have spoken about the terms 14 days after publication terms_ini_user_age_mean: the average age in days of users who have spoken of the terms 14 days after publication terms_ini_ur_inclusion_rate: URL inclusion ratio of tweets talking about terms 14 days after publication. tesis_terms: data of the terms (tags) related to the processed articles. stats_id: Analysis ID time: "0" if at the time of publication, "1" if 14 days later term_id: Term ID (tag) in WordPress name: Name of the term slug: URL of the term num_tweets: number of tweets retweet_count_total: total retweets retweet_count_mean: average retweets favorite_count_total: total of favorites favorite_count_mean: average of favorites followers_talking_rate: ratio of followers of the media Twitter account who have recently published a tweet talking about the term user_num_followers_mean: average followers of users who were talking about the term user_num_tweets_mean: average number of tweets published by users who were talking about the term user_age_mean: average age in days of users who were talking about the term url_inclusion_rate: URL inclusion ratio