![]() ![]() 2014 data # 1 - Reading in dataframes, data cleaning, creating DiGraphs # # Sections # 1- Reading in dataframes, cleaning data, and building the weighted + directed graphs # 2 - Visualization # 3 - Measuring centrality # 4 - Measuring modularity + detecting communities # 5 - Analyzing negative traffic # 6 - Comparing 2023 vs. The weights on the edges represent the number of # posts being made like this between 2 subreddits. The nodes represent subreddits, while the edges represent a post being made, # containing the other subreddit’s link in its title. In this dataset, # the graphs are directed. I primarily use data from 2014 from the source linked above, but also # use a much smaller dataset from 2023, which I used the reddit API and bs4 to obtain. Notebook overview # Credits: # Data Source: # Visualization code taken from course page and slightly adapted # About notebook # This notebook analyzes data from reddit, specifically hyperlinks in post titles linking to # other subreddits. autonotebook import tqdm as notebook_tqdm Users/claramoore/Desktop/VSCode_Projecrs/TestingStuff/venv/lib/python3.10/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. It would be interesting to gather more data to do a more thorough comparasion. However, we can see from simply looking at degree centrality, the same subreddits appear to be at the top (subredditdrama, iama, funny, gaming). Unfortunately, I was not able to get enough data to do a full analysis on the change in communities over time.Need to take same nodeset in both datasets for worthwhile comparasion.There is a small positive correlation between outgoing and incoming negative traffic, suggesting that subreddit communities spreading hate are also receiving it.Looking at relationship between outgoing and incoming negative traffic.Looking at proportion of negative traffic driven by top 10 negative communities.Most toxic communities by quantity of negative posts from that community (out degree).Most toxic communities by ratio of negative/all posts originating from that community.Looking for the most ‘toxic’ communities.Visual representation of negative traffic links.While not all communities found seem to make sense on first glance, several in the top 15 were very clearly able to be grouped by topic - the Louvian community detection appears to be effective on this graph, at least somewhat.In-degree centrality appears to be more relevant than out-degree centrality, which makes sense. Generally, the centrality ranks are highly correlated with each other.1 - Reading in dataframes, data cleaning, creating DiGraphs.4 - Measuring modularity + detecting communities.1- Reading in dataframes, cleaning data, and building the weighted + directed graphs.posts being made like this between 2 subreddits.The weights on the edges represent the number of containing the other subreddit’s link in its title.The nodes represent subreddits, while the edges represent a post being made, use a much smaller dataset from 2023, which I used the reddit API and bs4 to obtain.I primarily use data from 2014 from the source linked above, but also This notebook analyzes data from reddit, specifically hyperlinks in post titles linking to.Visualization code taken from course page and slightly adapted. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |