I understand Benford's law, I'm asking what the x axis is actually denoting.
Those bars are counting how many times "the number" has a 1, 2, 3, etc as its first digit.
But what is "the number"? Vote counts per polling station? Vote counts per riding? Vote counts over time?
You have to know this because certain kinds of number sets can have artificial constraints that can give an unnatural distribution of first digits.
If we don't even know what X is supposed to be, we can hardly draw any conclusions from these graphs.
If it's the leading number of votes in a electoral riding, then Benford's law won't work, because every electoral riding has between 60,000 to 80,000 voters. Thus the magnitude of the data doesn't vary enough. But maybe he plotted the vote counts in municipalities and townships, which should vary enough in magnitutde.
Even with polling stations the magnitude variation might be too small. You would need something like in the US, speak data on a county level. They have counties with only a few thousands votes and counties with a few million votes.
The leading number.
https://www.youtube.com/watch?v=XXjlR2OK1kM
The leading number of what?
I understand Benford's law, I'm asking what the x axis is actually denoting. Those bars are counting how many times "the number" has a 1, 2, 3, etc as its first digit.
But what is "the number"? Vote counts per polling station? Vote counts per riding? Vote counts over time?
You have to know this because certain kinds of number sets can have artificial constraints that can give an unnatural distribution of first digits.
If we don't even know what X is supposed to be, we can hardly draw any conclusions from these graphs.
You will have to ask OP.
If it's the leading number of votes in a electoral riding, then Benford's law won't work, because every electoral riding has between 60,000 to 80,000 voters. Thus the magnitude of the data doesn't vary enough. But maybe he plotted the vote counts in municipalities and townships, which should vary enough in magnitutde.
It's the poll results data from here:
https://elections.ca/content.aspx?section=res&dir=rep/off/43gedata&document=byed&lang=e
I separated all results across the country per party and per poll, then applied Benford's law to their sets.
You might be right about not getting enough variety in magnitude though... I'll try summing by polling station first
by district: https://i.maga.host/AsnD9H7.png
by polling station: https://i.maga.host/HOLwiqE.png
Even with polling stations the magnitude variation might be too small. You would need something like in the US, speak data on a county level. They have counties with only a few thousands votes and counties with a few million votes.