Hello all. I mentioned my plans to put something together along these lines a while back and there seemed to be some interest. What I will post in this guide is a summary of the PSA 8/9/10 populations of each holo/ultra rare card in specific sets and some analysis of those numbers. Today’s update on 6-14-16 at ~3PM contains most sets/variations from Base Set Shadowless 1st edition up through Neo Genesis Unlimited. Basic data included is the raw count of PSA 8, PSA 9 and PSA 10 cards as well as a total PSA 8-10 population and % above PSA 8 and PSA 10% for each card and each set. I used these two metrics to try and measure the “difficulty” of grading a set, or the “manufacture quality” of each set. That will follow in the next post.
New update 7-18-17 brings current Base Jungle and Fossil sets. There has been a ton of new grades since the last update!
Starting on my methodology. I used PSA pop reports for data. The date the data was pulled is in the far left column next to the set name. Updates will be performed over time and many more sets will be added over time. If interest is out there, I would like to bring this right up to current gen for all ultra rares and add more interesting metrics to analyze the numbers as well as possible history plots for each pop report update.
Current issues/planned updates:
Updates and pulling data from PSA are manual at this time. It took only about 1.5 hours to grab all the pop data shown, but to bring this spreadsheet to current and then trying to perform monthly updates would be extremely tedious and beyond the time I can devote to this. If anyone tech savvy has any ideas on automating this process of updating pop’s, please let me know through a post here or PM. I am extremely interested in the history aspect of this and charting pops over time, but it cannot be done without automation.
I plan to turn this into a google doc at some point and have it free to view to everyone through that platform. Over time I can possibly add edit capabilities to anyone who wishes to help especially towards resolving point #1. If automation cannot be achieved perhaps I can make it open to everyone for editing, but that could lead to issues which I guess can be resolved by keeping backups.
Next post will follow with Neo Destiny and some further analysis. Any and all questions/comments/concerns are welcomed! Let me know if you see any errors or have any suggestions on where to go with this.
Here is the rest, sorry for the attachments. Working towards google docs soon, hopefully in the next week or two.
Edited out photos as the link has the data.
To start with my commentary on it as well.
One thing I find very interesting is the popularity effect on how many cards are graded. Charizard, Raichu, Gyarados and similar popular Pokemon are much more frequently graded than others in the set. It doesn’t come as a surprise overall, but sometimes the magnitude of extra cards graded and the value that they continue to hold even at higher pops is surpising (4/102 1st edition charizard, looking at you).
Another thing I find interesting is how easily the gym sets grade in both the %>8 and 10% metrics compared to all other sets. Most sets are in roughly the same rank in the two metrics, however neo revelation 1st edition and base unlimited seemed to be fairly easy to get above a PSA 8 at over 80% chance, but with a lower end of PSA 10’s at <15% each.
Most 1st editions were of higher quality than the unlimited runs. Except for Neo Genesis for both metrics.
This cool. I find it interesting that Neo Revelation is the only unlimited set that has a higher POP 10 percentage vs its 1st ed variation.
As far as automating goes it can defiantly be done. All that would be required is web scrapping, a membership to PSA (access), and a database to store everything. The only thing that stops me from doing it (other then being lazy) is I don’t know how PSA would respond. They have the pop report for members only and I don’t doubt that they would notice a web scrapper on their sight.
Very interesting point on that emphasized part. I’ll have to look into web scraping and possibly give them a call on that before I get too far along on this. If any mods here have any issues with this being here let me know. I have wanted for a while to do this personally, but figured it would be interesting to many here and felt like it would be a good service to have it publicly available. Never thought about the potential liabilities or what have you.
Wait but I can get into the pop report without logging in. They made it free a few years ago it looks like. Hopefully this resolves the potential legality/morality side of it. @cullers
If that’s the case I don’t see an issue with it. Just so you know web scrapping is a term for programmers where you can build a software that scans the internet based on it’s HTML tags and web address. It’s not an easy thing to just pick up without prior programming knowledge (as far as I know there isn’t one that is user friendly)
I’ll have some free time on my hands this week. I might try and get one built, but no promises. If you find someone else who is willing they’ll probably be able to do it much faster then I could. (I’m not fast/good at programming)
An automated Pop Report is a fantastic concept. @cullers I believe the pop report is accessible for non-members. I just did a quick incognito search in which I was not logged in, and it allowed me to view and access the Pop. Regardless, I am not sure how PSA would respond. I would optimistically assume it would benefit their business, as it is increasing the level of engagement with their company.
In relation to @pokemanz question, certain cards or sets demand skew the percentages. I think the unlimited base set charizard is a great example. The unlimited base set was printed in the most literal sense of the word “unlimited”. There were 7-8 unlimited print runs, thus there are a ton of people with unlimited charizards. The lower grade statistic represents a combination of demand and the amount of people who are grading cards that are clearly damaged. Basically it is difficult to say, “this card grades worse than x card” when the demand and/or quantity for the charizard is much greater than really any set card.
Anyway, this is an amazing source! I think by and large it will helps quantify set quality and/or general demand.
it cool if you did some of the biggest sets in comparison with english and japanese =D Like erm… Sets like CLash of blue sky, and Dragon frontiers. No rarity. ect… oh wait noone can find mint copies in japanese enough to care lol, and if they do, then most likely be very low per year and not enough graded… ENGRISH QUANTITY IS INSAIN, but the quality is terrible.
Maybe i caz help, but what would i do, just update the pop report on the japanese sets or some shit lol… years later… POP 2 NO RARITY CHARIZARD!!! SO MUCH WORK!
While I am not exceptionally privy to some of the finer details of web scraping, I could definitely provide more than your average amount of help regarding Google Docs/Sheets. I’d really recommend using Google Sheets over Docs as you can use the spreadsheet functionality for calculating percentages and other fun stuff, but it’s your call.
If I see anything particularly helpful on web scraping into Google Sheets, I’ll let you know. One of the biggest headaches of web scraping is how customized it needs to be to get the relevant data - if you’re pulling POP data and the name of the card is stored in a field called “card title” and you set the scraper to pull that info, if PSA were to change their site field to call it “title” instead your scraper would no longer correctly pull the data. It’s a very delicate process, which is why most sites that want to make their data “accessible” to others create API’s that can be referenced in specific ways. That way they update their site field, update the API reference, and your reference to the API doesn’t need to change. I would be incredibly surprised if PSA had an API set up though. Lol.
edit:
Do you think you’d have a little time to communicate with me via skype/facebook/some other chat? I want to run through a few things and see if I can’t be of more help on this. Doing some research on using Google to scrape for data into a sheet.
Recently started to put together a PSA-9 1st Edition Neo Genesis Holo Set. A few of them are going to be diffucult to find though… Feraligatr #5, Lugia, Pichu… As there aren’t even any 9’s on Ebay currently…
@pokemonsyndicate English quality sucks from the BW + XY Era’s. Card condition from the Wizards of the Coast, old school ex, DPP, and HGSS Eras were pretty good. Most pack fresh cards come back a Mint 9 if Graded. About 10 percent come back a Gem-Mint 10…
Well this is disheartening, but I want a little more insight from those of you that have used PSA POP reports in the past - is there a static page version of the report that isn’t loaded dynamically based on the search results? I created a scrape into Google Sheets using importXML but it isn’t working to pull the data because the data is loading into the page after it has been scraped… I tested with a static copy of the page on my personal site and it worked fine, same formula entirely.
Basically what I need is a “basic” version of their website, anyone have any ideas or suggestions on that?
edit:
After digging more, I’ve determined I was correct - importXML through Google Sheets does not parse the site correctly because it does not execute javascript like a typical page load does, meaning using importXML to draw the data by reference is not going to be possible for direct import into the spreadsheet.
Unfortunately this means taking a longer route to try to get the data, such as writing actual functions in javascript to get results I’m looking for. I’ll keep playing with it but it won’t be a fast solution, if I can get it to work at all. A bit of a pain, really - if anyone knows of a static reference to the same data (aka somewhere else you can view it that isn’t the typical report page such as: www.psacard.com/pop/non-sport-cards/1999/pokemon-game/57801/ ), please do let me know.
Dang @justinator, that sucks to hear. I haven’t gotten into researching it too much yet unfortunately. Fairly busy week and I just happened to have an hour or two today to put together what I did. I knew this would be the easy part, but down the line (after some research of my own too) I can definitely get in touch somehow, skype or something. I just wouldn’t want to waste your time and would rather take this a little farther and expand my knowledge more first. Fun and interesting project and it has been a few years since I have done any kind of programming or anything of this nature really. Ambitious project, but I am definitely willing to put in some time and effort. Thanks for the help.
I definitely plan to get it extended out to all sets. It could happen soon with likely only another ~3 hours of work to get all sets done probably. Not a huge deal, but the biggest issue is it would be ~3 hours or so of work every time I wanted to update it. I’d rather put in some research on the scraping so that once it is done then it is self updating needing only minimal work here and there to chart the history where it was interesting, but we will see how that goes.
If it ends up seeming like it will be a dead end or just take several months or something, I will likely go through and finish the first pass all manually and at least have a baseline for all sets then just adding the updating function down the line if it ever becomes possible.