|
|
Welcome to the Invelos forums. Please read the forum
rules before posting.
Read access to our public forums is open to everyone. To post messages, a free
registration is required.
If you have an Invelos account, sign in to post.
|
|
|
|
Invelos Forums->DVD Profiler: Contribution Discussion |
Page:
1 2 3 4 5 6 ...15 Previous Next
|
Hiroshi Kan Ikeuchi common name |
|
|
|
Author |
Message |
| T!M | Profiling since Dec. 2000 |
Registered: March 13, 2007 | Reputation: | Posts: 8,752 |
| Posted: | | | | Quoting Berak: Quote: I would have to go through both the names manually and add together similar titles of different localities/releases, and then get two new numbers that would be the "correct" common name?! That is the only way to get the actual, correct results, yes. Nobody ever said the CLT it was a perfect tool - it's not. It offers a starting point. As I said in my very first post in this thread, it's perfectly understandable that not everyone is willing to go to these lengths, but that's no reason to bash those that ARE willing to do the work. Kluge dissected the CLT results for us, and showed that this guy is credited in six titles as Hiroshi Kan Ikeuchi, and in only two titles as Hiroshi "Kan" Ikeuchi. But rather than thanking him for his work, you bluntly told him to use the least-used name variant as the common name and to " Eat it and move on...." I'm very sorry, but he was right and you were wrong. Again: I'm not blaming you for not wanting to do the work: yes, it's tedious and time-consuming, and the tools we're given are limited at best. But the entries for Hiroshi Kan Ikeuchi & Hiroshi "Kan" Ikeuchi aren't going to link together automatically - somebody's got to do it. It doesn't have to be you, but not wanting to do the work yourself doesn't give you the right to bluntly dismiss the work done by others: Kluge did put in the extra effort, and his findings were correct, and very welcome. |
| | Berak | Bibamus morieundum est! |
Registered: May 10, 2007 | Posts: 1,059 |
| Posted: | | | | I'm sorry, but I just don't buy it.
If Ken were to explain his intent in the rules, I'd be happy to oblige - and happy to do the work, but as things stand now, I simply disagree in your interpretation.
If the CLT result is
John Rambo 140/365 John J. Rambo 233/463
I will go with John J. Rambo regardless of how many titles are the same. The CLT is useless even as a starting-point if this is not the way to do it. | | | Berak
It's better to burn out than to fade away! True love conquers all! |
| Registered: July 31, 2008 | Reputation: | Posts: 2,506 |
| Posted: | | | | Some users always will put in that little bit of extra work to accomplish a task. As T!M said, you have users here who do want to spend that bit of extra time to weed out the erroneous duplicate entries. If you don't want to, fine, let those who do want to go that extra step do it. |
| | T!M | Profiling since Dec. 2000 |
Registered: March 13, 2007 | Reputation: | Posts: 8,752 |
| Posted: | | | | Quoting Berak: Quote: If Ken were to explain his intent in the rules, I'd be happy to oblige - and happy to do the work, but as things stand now, I simply disagree in your interpretation. As myself and others have pointed out repeatedly, he has done that, by specifically stating that "the lookup tool is not to be blindly trusted". That comment in itself is already the exact opposite of what you're saying: where you state you'd go with the highest number no matter what, Ken starts off by saying that it's "not to be blindly trusted". That's the exact opposite: you want to trust them blindly, Ken tells you not to. He then went on encourage users to document errors in the database, and subsequently said that such errors "can and should be considered." And that is exactly what happened here: Kluge documented errors in the database, and his findings "can and should be considered." It's not really a question of you "buying it" or not - it's simply what the man said. If your "blind trust" was the way to go, then each and every common name-finding-thread would be a completely pointless exercise. Yet they're consistently accepted by the screeners as "proof" for adding and changing common names - even if their findings clash with the CLT numbers taken on face value. I agree with you that these things should be more clearly spelled out in the rules: for instance, I still run into users who are absolutely adamant in feeling that we should use a person's "real" / "correct" name as the common name, and flat-out refuse to believe that we're after the "most-credited form". Until I point them to Ken's comments about this. So yeah, these very basic premises really need to be spelled out in the rules, rather than being buried somewhere deep in some two-year-old forum thread. I certainly agree with that, but that issue applies to many sections of the rules. It doesn't take away from the fact that in this case, Ken did make his intent very clear. | | | Last edited: by T!M |
| Registered: March 13, 2007 | Posts: 1,414 |
| Posted: | | | | Kluge has a point at the bottom of it all, but Ken has said the number of Titles in the CLT controls. He didn't say anything about filtering it down to see how many duplicates of the titles are in the database--if that were what was intended, I would think he'd automate the CLT program to do just that. The "title" as he's defining it seems to be what he's after for the common name. And I don't know why it's important not to just do the easier thing and use the number of titles the CLT spits out as the common name, rather than go to a lot of labor for a different result that's not as transparent.
When it comes down to it, the common name doesn't need to be "right," it just has to be something readily verifiable that the users can treat as the common name. Doing further filtering manually gets away from that principle. | | | "This movie has warped my fragile little mind." |
| | Berak | Bibamus morieundum est! |
Registered: May 10, 2007 | Posts: 1,059 |
| Posted: | | | | I do not see "duplicate entries" as errors. If hyphens and colons as well as different titles for different releases should count against the CLT, there needs to be a clarification about this from Invelos - in the rules, and not the forums.
IMDb cloned data on the other hand, is seriously screwing up the CLT, and I commend the users trying to clean this up. And it is these entries I feel Ken has made a statement about when saying not to trust the CLT blindly.
The threads started regarding this and that actor's credited as name in this and that movie serves this purpose IMO, and over time the CLT will correct itself. But to seriously suggest that we should go through hundreds and thousands of entries to find "duplicate" entries serves no purpose at all in correcting the CLT... | | | Berak
It's better to burn out than to fade away! True love conquers all! |
| Registered: July 31, 2008 | Reputation: | Posts: 2,506 |
| Posted: | | | | Quoting gardibolt: Quote: Kluge has a point at the bottom of it all, but Ken has said the number of Titles in the CLT controls. He didn't say anything about filtering it down to see how many duplicates of the titles are in the database--if that were what was intended, I would think he'd automate the CLT program to do just that. The "title" as he's defining it seems to be what he's after for the common name. And I don't know why it's important not to just do the easier thing and use the number of titles the CLT spits out as the common name, rather than go to a lot of labor for a different result that's not as transparent.
When it comes down to it, the common name doesn't need to be "right," it just has to be something readily verifiable that the users can treat as the common name. Doing further filtering manually gets away from that principle. Except as has been stated a few times in the thread Ken has said not to blindly accept the figures of the CLT & to show where it's wrong. Here it's that too many different titles are being credited to a variant of a name. If you just want to "use the number of titles the CLT spits out as the common name" we may as well stop bothering to correct all the IMDb data with the common name threads and say "anything goes". | | | Last edited: by Ardos |
| | Berak | Bibamus morieundum est! |
Registered: May 10, 2007 | Posts: 1,059 |
| Posted: | | | | Quoting Forget_the_Rest: Quote: Quoting gardibolt:
Quote: Kluge has a point at the bottom of it all, but Ken has said the number of Titles in the CLT controls. He didn't say anything about filtering it down to see how many duplicates of the titles are in the database--if that were what was intended, I would think he'd automate the CLT program to do just that. The "title" as he's defining it seems to be what he's after for the common name. And I don't know why it's important not to just do the easier thing and use the number of titles the CLT spits out as the common name, rather than go to a lot of labor for a different result that's not as transparent.
When it comes down to it, the common name doesn't need to be "right," it just has to be something readily verifiable that the users can treat as the common name. Doing further filtering manually gets away from that principle.
Except as has been stated a few times in the thread Ken has said not to blindly accept the figures of the CLT & to show where it's wrong. Here it's that too many different titles are being credited to a variant of a name. If you just want to "use the number of titles the CLT spits out as the common name" we may as well stop bothering to correct all the IMDb data with the common name threads and say "anything goes". But IMDb cloned data and duplicate entries are two different things! | | | Berak
It's better to burn out than to fade away! True love conquers all! |
| Registered: July 31, 2008 | Reputation: | Posts: 2,506 |
| Posted: | | | | The principle is the same, both are throwing the figures off and giving rogue results. |
| Registered: March 13, 2007 | Posts: 21,610 |
| Posted: | | | | Quoting Forget_the_Rest: Quote: Some users always will put in that little bit of extra work to accomplish a task. As T!M said, you have users here who do want to spend that bit of extra time to weed out the erroneous duplicate entries. If you don't want to, fine, let those who do want to go that extra step do it. Forget: Its' great to put in that extra work, but when you don't explain and reveal the results of that work then it is worthless. And a waste of the users time and everybody that chooses to vote on it. Sadly some users don't understand that. Skip | | | ASSUME NOTHING!!!!!! CBE, MBE, MoA and proud of it. Outta here
Billy Video |
| | Berak | Bibamus morieundum est! |
Registered: May 10, 2007 | Posts: 1,059 |
| Posted: | | | | Quoting Forget_the_Rest: Quote: The principle is the same, both are throwing the figures off and giving rogue results. IMO the principal is not the same. As I've already explained here; Quote: I do not see "duplicate entries" as errors. If hyphens and colons as well as different titles for different releases should count against the CLT, there needs to be a clarification about this from Invelos - in the rules, and not the forums.
IMDb cloned data on the other hand, is seriously screwing up the CLT, and I commend the users trying to clean this up. And it is these entries I feel Ken has made a statement about when saying not to trust the CLT blindly.
The threads started regarding this and that actor's credited as name in this and that movie serves this purpose IMO, and over time the CLT will correct itself. But to seriously suggest that we should go through hundreds and thousands of entries to find "duplicate" entries serves no purpose at all in correcting the CLT.... | | | Berak
It's better to burn out than to fade away! True love conquers all! |
| Registered: July 31, 2008 | Reputation: | Posts: 2,506 |
| Posted: | | | | Quoting Woola: Quote: Quoting Forget_the_Rest:
Quote: Some users always will put in that little bit of extra work to accomplish a task. As T!M said, you have users here who do want to spend that bit of extra time to weed out the erroneous duplicate entries. If you don't want to, fine, let those who do want to go that extra step do it. Forget:
Its' great to put in that extra work, but when you don't explain and reveal the results of that work then it is worthless. And a waste of the users time and everybody that chooses to vote on it. Sadly some users don't understand that.
Skip The results have already been given for this. Quoting Berak: Quote: Quoting Forget_the_Rest:
Quote: The principle is the same, both are throwing the figures off and giving rogue results.
IMO the principal is not the same. As I've already explained here;
Quote: I do not see "duplicate entries" as errors. If hyphens and colons as well as different titles for different releases should count against the CLT, there needs to be a clarification about this from Invelos - in the rules, and not the forums.
IMDb cloned data on the other hand, is seriously screwing up the CLT, and I commend the users trying to clean this up. And it is these entries I feel Ken has made a statement about when saying not to trust the CLT blindly.
The threads started regarding this and that actor's credited as name in this and that movie serves this purpose IMO, and over time the CLT will correct itself. But to seriously suggest that we should go through hundreds and thousands of entries to find "duplicate" entries serves no purpose at all in correcting the CLT.... If you can't see how having multiple entries for ONE title is an error then we'll have to agree to disagree. |
| Registered: March 13, 2007 | Posts: 21,610 |
| Posted: | | | | Quoting Forget_the_Rest: Quote: Some users always will put in that little bit of extra work to accomplish a task. As T!M said, you have users here who do want to spend that bit of extra time to weed out the erroneous duplicate entries. If you don't want to, fine, let those who do want to go that extra step do it. Forget: Its' great to put in that extra work, but when you don't explain and reveal the results of that work then it is worthless. And a waste of the users time and everybody that chooses to vote on it. Sadly some users don't understand that. along with those users who believe that' kne's comment allows them to not document their work, but does allow them to make a variety of claims which are ultimately not helpful for the database. I don't believe the assessment of those who think Ken meant that ALL you. have to do is mention the CLT, tha simply allows for the insertion of almopst anytuing one wishes. We have to start somewhere and the CLT does give results, I am hoping to revamped system with 3.6. Now in fact the way I would do would undo all the work that has been done to this point and that would be a very good thing as it would also undo all of the damage. I will never buy into the arguments of those who believe they do not have to document ANYTHING, I will continue to believe that they have decided to use an absolutel;y bogus interpretation of ken's comments to serve their own purposes, and those purposess are not consistent with a high quality database. Skip | | | ASSUME NOTHING!!!!!! CBE, MBE, MoA and proud of it. Outta here
Billy Video |
| | T!M | Profiling since Dec. 2000 |
Registered: March 13, 2007 | Reputation: | Posts: 8,752 |
| Posted: | | | | Quoting Berak: Quote: IMO the principal is not the same. But it is. Your "IMDb cloned data is seriously screwing up the CLT" could effortlessly be replaced by "Incorrectly entered titles (and/or missing "original titles") are seriously screwing up the CLT". It's the exact same thing. If an incorrect, non-existing IMDb-mined common name has 36 entries in our database, and the actual on-screen credit has only 24, then we can document that error, and start using the correct name right away. The exact same thing applies here: now it's not the IMDb-data throwing the numbers off, but the incorrectly entered or missing original titles. But the situation is the same: bad data is throwing off the numbers, but per Ken, we "can and should" take that bad data into account when determining the actual common name. I fail to see how you see a difference between these two things. |
| Registered: July 31, 2008 | Reputation: | Posts: 2,506 |
| Posted: | | | | Skip,
You know that I fully agree with you about needing to document what you're doing. However here it's a moot point as it's quite well documented even before submitting the change. |
| Registered: September 29, 2008 | Posts: 384 |
| Posted: | | | | I haven't posted in awhile here because of threads like this but I've decided to hop in cause I really don't understand the problem here. What is so hard to understand about what T!M and Kluge have been talking about here. It's very very simple.
If all of the AVP2 titles were entered correctly, either via the Original Title Field or using correct punctuation (":" instead of "-"), the CLT results would be 6 vs. 2 as they've said. This is all they are doing, is taking the extra effort to really read what the CLT results are saying rather then blindly looking at the numbers given. It's the exact same as IMDB mined data. It's screwing up the CLT results.
If the above was done to the varying titles you would see the "Titles" number drop because CLT wouldn't count them as different titles. This is the number we go by right? Not "Profiles"? At least that's how I've always done it.
Now Berak, I can totally understand where you are coming from because I too won't be doing what these 2 are doing. I just don't have the time. But if they choose to, fantastic, it is far more accurate then either of our methods of just trusting the numbers. I think Ken supports both methods of using the CLT. Of course in a perfect world everyone would take the long route and drill down into these CLT results, but honestly that will never happen. It will always be a few very patient dedicated users willing to really look into these results rather then just using the number.
Skip....oh Skip. Why did this conversation turn into a "hate on T!M" discussion yet again. This discussion has absolutely nothing to do with T!M's mass changing of data that you hate so much. I'm not sure if you can't see that or if you can and just feel like bringing it up yet again. But it brings nothing to this discussion and only brings your crazy hatred of T!M to the forums for the thousandth time. I think we all realize you don't like what T!M does for the database by now, and it's become obvious to me that the makers of the program have no problem with what he's doing or they would have done something about it by now. So it's probably time to just drop it.
This discussion is merely about whether to blindly trust CLT results or to actually see what the CLT results are telling us, nothing more. Kluge and T!M are 100% correct and I applaud them in their extra effort to make our database more accurate. | | | "The perfect is the enemy of the good." - Voltaire |
|
|
Invelos Forums->DVD Profiler: Contribution Discussion |
Page:
1 2 3 4 5 6 ...15 Previous Next
|
|
|
|
|
|
|
|
|