|
code
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
URLDownloadToFile API failing due to disclaimer pageI am using the URLDownloadToFile API to download files from <http://192.120.13.77/Data/Public/E_AUST/Archive/> This fails as there is an interim disclaimer page that appears first, and an "I Agree" button needs to be clicked. Consequently, the use of the API only downloads the disclaimer page and not the CSV file/s I am attempting to get. I have tried to circumvent this by programmatically (in IE) loading this page: <http://192.120.13.77/viewtable.aspx?region=E_AUST&report=int924> Then I programmatically click the "I Agree" button, then capture the names of all the files in the "Recent Report" dropdown list at the bottom of the page. These filenames and the first URL above give me the path. Unfortunately, clicking the "I Agree" button in IE still doesn't allow me to get past the disclaimer page using the API. Any suggestions on how to get past this 'roadblock' are much appreciated. Thanks in advance Paul Martin Melbourne, Australia "Paul Martin" <melbournef***@gmail.com> wrote in message You got it backwards buddy. Cross posting means you posted your question to news:4d0f96a8-9457-4d89-8a5e-6030f1464924@x1g2000prh.googlegroups.com... | [Cross-posted at microsoft.public.vb.winapi] | multiple groups in the same post. What you did was multi-post which means you asked the same question in different groups in different posts...which is bad. Anyone who replies will only be replying to one group and not the other, which is bad. Kevin, I'm posting via Google Groups. How do you suggest I cross-
post? Paul Paul Martin wrote:
> Kevin, I'm posting via Google Groups. How do you suggest I cross- Don't use HTTP for NNTP tasks -- get a real newsreader.> post? > Kevin, I'm posting via Google Groups. How do you suggest I cross- You've not olny multi-posted. You've posted 3 times> post? here and twice in winapi! Cross-poating is like a CC email. You send the same message to both groups. It might be easier if you get a decent usenet reader rather than trying to make use of newsgroups through Google. Apologies for my mistakes in posting; this was my first attempt at
posting on more than one newsgroup. Does anyone have a solution to the problem? Thanks in advance Paul Paul Martin wrote:
> Apologies for my mistakes in posting; this was my first attempt at I already offered one. To quote:> posting on more than one newsgroup. Does anyone have a solution to > the problem? > > Thanks in advance Karl E. Peterson wrote: > Paul Martin wrote: HTH!>> Kevin, I'm posting via Google Groups. How do you suggest I cross- >> post? > > Don't use HTTP for NNTP tasks -- get a real newsreader. Karl, this is the problem I'm seeking a solution to:
I am using the URLDownloadToFile API to download files from <http://192.120.13.77/Data/Public/E_AUST/Archive/> This fails as there is an interim disclaimer page that appears first, and an "I Agree" button needs to be clicked. Consequently, the use of the API only downloads the disclaimer page and not the CSV file/s I amattempting to get. I have tried to circumvent this by programmatically (in IE) loading this page: <http://192.120.13.77/viewtable.aspx?region=E_AUST&report=int924> Then I programmatically click the "I Agree" button, then capture the names of all the files in the "Recent Report" dropdown list at the bottom of the page. These filenames and the first URL above give me the path. Unfortunately, clicking the "I Agree" button in IE still doesn't allow me to get past the disclaimer page using the API. Any suggestions on how to get past this 'roadblock' are much appreciated. Paul Martin wrote:
Show quoteHide quote > Karl, this is the problem I'm seeking a solution to: That looks/sounds nasty. Is the site owner not someone who's willing to work with > > I am using the URLDownloadToFile API to download files from > <http://192.120.13.77/Data/Public/E_AUST/Archive/> > > This fails as there is an interim disclaimer page that appears first, and an > "I Agree" button needs to be clicked. Consequently, the use of the API only > downloads the disclaimer page and not the CSV file/s I amattempting to get. > > I have tried to circumvent this by programmatically (in IE) loading this page: > <http://192.120.13.77/viewtable.aspx?region=E_AUST&report=int924> > Then I programmatically click the "I Agree" button, then capture the names > of all the files in the "Recent Report" dropdown list at the bottom of the > page. These filenames and the first URL above give me the path. > > Unfortunately, clicking the "I Agree" button in IE still doesn't allow me to > get past the disclaimer page using the API. Any suggestions on how to get > past this 'roadblock' are much appreciated. you? > That looks/sounds nasty. Is the site owner not someone who's willing to No, I've got to work it out at this end.work with > you? =?Utf-8?B?UGF1bCBNYXJ0aW4=?= <pmartin1960NOSPAM@hotmail.com> wrote:
> Then read up on the HTTP RFCs to see how data is transfered, and>> That looks/sounds nasty. Is the site owner not someone who's willing to >work with >> you? > >No, I've got to work it out at this end. perhaps use something like Wireshark to see how things work. -- --------- Scott Seligman <scott at <firstname> and michelle dot net> --------- The American Republic will endure, until politicians realize they can bribe the people with their own money. -- Alexis de Tocqueville I wasn't aware you could use API to click buttons in a web browser. In
fact, since they lack a window handle, I'm pretty sure you can't. In theory, I suppose you could load the page in a web browser control and use the DOM to simulate a click on that button...but it's not a button per se...it's an image with a hyperlink. I've used the DOM before to simuilate a click or to post info back to the server...but as far as images with hyperlinks, I'm not sure. Show quoteHide quote "Paul Martin" <pmartin1960NOSPAM@hotmail.com> wrote in message news:37238B03-C9DE-42D2-AB92-7EAFF46FF8C9@microsoft.com... | Karl, this is the problem I'm seeking a solution to: | | I am using the URLDownloadToFile API to download files from | <http://192.120.13.77/Data/Public/E_AUST/Archive/> | | This fails as there is an interim disclaimer page that appears first, and an | "I Agree" button needs to be clicked. Consequently, the use of the API only | downloads the disclaimer page and not the CSV file/s I amattempting to get. | | I have tried to circumvent this by programmatically (in IE) loading this page: | <http://192.120.13.77/viewtable.aspx?region=E_AUST&report=int924> | Then I programmatically click the "I Agree" button, then capture the names | of all the files in the "Recent Report" dropdown list at the bottom of the | page. These filenames and the first URL above give me the path. | | Unfortunately, clicking the "I Agree" button in IE still doesn't allow me to | get past the disclaimer page using the API. Any suggestions on how to get | past this 'roadblock' are much appreciated. > I wasn't aware you could use API to click buttons in a web browser. In I'm guessing he meant DOM rather than API. One can do just> fact, since they lack a window handle, I'm pretty sure you can't. > > In theory, I suppose you could load the page in a web browser control and > use the DOM to simulate a click on that button...but it's not a button per > se...it's an image with a hyperlink. I've used the DOM before to simuilate > a click or to post info back to the server...but as far as images with > hyperlinks, I'm not sure. about anything using the DOM. (And a Document object can be retrieved from any open window of class "Internet Explorer_Server", as well as being accessed via WebBrowserControl.Document.) I actually write a lot of little utlities as HTAs using VBScript. It's fast, easy, and if you know HTML/CSS you can make GUIs that are almost as complex and resonsive as compiled software. But in this case the OP seems to be asking for someone to download the pages and write his DHTML for him. And the intentions don't seem to be honorable. If it were my site and I had a disclaimer clickthrough, I wouldn't appreciate people automating a hack of it. I'm a bit surprised at how uninformed some of the responses have been to my
original post. I've stated that I'm using the API to download files from the given URL. It's failing because a disclaimer button needs to be clicked. There's nothing dishonorable in that intention; it's simply a technical problem I'm trying to solve. Check out the URL and you'll see what I mean. I'm an Excel VBA developer; do a search and you'll see numerous bona fide posts by me in that context. I'm not familiar with coding for the web, hence I've come to this newsgroup. I'm not looking for anyone to write code for me; I'm looking for help the same as anyone else. If someone can point me in the right direction or show me posted code that does what I'm looking for, I'll be more than appreciative. False and unfounded accusations are ignorant and a waste of energy. A waste of yours, mayayana, and a waste of mine that I then need to address it. Instead of making such ignorant claims, check your facts first. > I'm a bit surprised at how uninformed some of the responses have been to I'm getting a timeout on both URLs. They won't load.my > original post. I've stated that I'm using the API to download files from the > given URL. It's failing because a disclaimer button needs to be clicked. > There's nothing dishonorable in that intention; it's simply a technical > problem I'm trying to solve. Check out the URL and you'll see what I mean. > According to whois it's a Hewlett Packard IP. Whatever it is, you're trying to bypass the agreement function they put into place. Maybe that's OK with them. Maybe it's not. I don't know. In either case, you're in a VB group asking for help with what's actually a DHTML problem, and you want a specific solution for a specific page. In other words, you didn't ask something like, "How can I click a button on a webpage?" You asked how to get through a particular webpage. So aren't you asking for someone to study the source of that particular page and then tell you how to achieve what you want? Isn't that really asking for someone else to do your job? Beyond that, the only VB-related part is that you originally tried to use VB to download the desired file. But your actual question now is about DHTML/DOM in IE. So maybe I'm judging you too harshly, but it certainly appears to me that you're trying to auto-bypass a license, hoping to get someone else to actually do that for you, and looking for your Good Samaritan in the wrong newsgroup! Show quoteHide quote > I'm an Excel VBA developer; do a search and you'll see numerous bona fide > posts by me in that context. I'm not familiar with coding for the web, hence > I've come to this newsgroup. I'm not looking for anyone to write code for > me; I'm looking for help the same as anyone else. If someone can point me in > the right direction or show me posted code that does what I'm looking for, > I'll be more than appreciative. > > False and unfounded accusations are ignorant and a waste of energy. A waste > of yours, mayayana, and a waste of mine that I then need to address it. > Instead of making such ignorant claims, check your facts first. What newsgroup would you suggest? Bearing in mind that I'm using VB to
manipulate a website. Apologies for the multiple posts. I kept getting a system busy message (or
similar) and presumed nothing was being posted. I have written quite a bit of code, trying fromdifferent angles to solve my
problem. I'm not at work now, but can post some of it tomorrow. So what newsgroup would you suggest for a VB/VBA solution to manipulating web
pages? So what newsgroup would you suggest for a VB/VBA solution to manipulating web
pages? Okay, okay...in all fairnes I did look at the page Paul wants to use and
it's just some public works thing and the downloads are archived data. I don't see anything that pertains to bad software or the like. However, unless I'm getting paid, I'm not writing anyone's code for them either, snippets aside. And what the OP wants is no snippet. The fact of the matter is, the URL API you are using will only download a file, and if it's the disclaimer page because you need click the AGREE button, then you will need to find an alternative. There is no magic API that will click a button or image on a webpage. This means you're going to have to think outside the box a little bit here. Since it appears you are going to have to programmatically click that Agree button each time you access that page after the cookie has expired, it makes sense to make your process two step. Use the URL API to download the URL, check the file...if it contains an HTML header, then use the DOM to programatically answer the Agree clause, then use the URL API again to download the file. Simple If/Then/Else. If your reply is to ask me how to use the DOM to do what you want to do, save your keystrokes as that is a fairly complex answer. You'd be better served to Google terms like DOM and VB -NET to find examples of how to do it. They are out there. - Kev Show quoteHide quote "Paul Martin" <pmartin1960NOSPAM@hotmail.com> wrote in message news:5FFF90F0-6833-4547-ABB0-B50688F15DB4@microsoft.com... | I'm a bit surprised at how uninformed some of the responses have been to my | original post. I've stated that I'm using the API to download files from the | given URL. It's failing because a disclaimer button needs to be clicked. | There's nothing dishonorable in that intention; it's simply a technical | problem I'm trying to solve. Check out the URL and you'll see what I mean. | | I'm an Excel VBA developer; do a search and you'll see numerous bona fide | posts by me in that context. I'm not familiar with coding for the web, hence | I've come to this newsgroup. I'm not looking for anyone to write code for | me; I'm looking for help the same as anyone else. If someone can point me in | the right direction or show me posted code that does what I'm looking for, | I'll be more than appreciative. | | False and unfounded accusations are ignorant and a waste of energy. A waste | of yours, mayayana, and a waste of mine that I then need to address it. | Instead of making such ignorant claims, check your facts first. Kevin Provance wrote:
> Since it appears you are going to have to programmatically click that Agree Thinking *way* outside the box... Is it possible to forge a cookie? <g>> button each time you access that page after the cookie has expired, "Karl E. Peterson" <k***@mvps.org> wrote in message Maybe, but we'd have to see the code that was used to generate the cookie to news:%231pVbYXrJHA.528@TK2MSFTNGP06.phx.gbl... | Kevin Provance wrote: | Thinking *way* outside the box... Is it possible to forge a cookie? <g> know what was being saved to it. It looks like ASP was used to generate that page dynamically and I have to admit I don't know squat about it, especially if it's the .net version. I think it would be a lot more work versus how I would do it, which would be to check the downloaded file, if it's HTML load it into the DOM, do the AGREE thing and then call the download API again to get the file. Actually, now that I look at that page again, if one attempts to download the CSV file directly and gets the disclaimer page, the file starts to download after the agree it clicked, which would take the download part out of the code altogether. Granted it would start on it's own, but it takes the control away from the user. With that in mind, I think to be able to the use the download API to get the file directly, the OP would be better served to just get the original index page, put it in the DOM, do the agree stuff and then make a new call to the CSV file itself. That's the only way I see to do it, unless one wants to get into ASP code and the like, which is not my forte. :-) - Kev Kevin Provance wrote:
Show quoteHide quote > "Karl E. Peterson" <k***@mvps.org> wrote ... No way to get into the ASP. That's all serverside. And it's definitely using >| Kevin Provance wrote: >| Thinking *way* outside the box... Is it possible to forge a cookie? <g> > > Maybe, but we'd have to see the code that was used to generate the cookie to > know what was being saved to it. It looks like ASP was used to generate > that page dynamically and I have to admit I don't know squat about it, > especially if it's the .net version. I think it would be a lot more work > versus how I would do it, which would be to check the downloaded file, if > it's HTML load it into the DOM, do the AGREE thing and then call the > download API again to get the file. > > Actually, now that I look at that page again, if one attempts to download > the CSV file directly and gets the disclaimer page, the file starts to > download after the agree it clicked, which would take the download part out > of the code altogether. Granted it would start on it's own, but it takes > the control away from the user. With that in mind, I think to be able to > the use the download API to get the file directly, the OP would be better > served to just get the original index page, put it in the DOM, do the agree > stuff and then make a new call to the CSV file itself. > > That's the only way I see to do it, unless one wants to get into ASP code > and the like, which is not my forte. :-) cookies to make sure you've clicked that "I Agree" button. (Damnit! He's made it a puzzle! <g>) And, given it's ASP (ASPX, actually), that means the request is being evaluated and rejected in code on the server. When you first hit the site, a cookie is droped that assigns a SessionID... Name ASP.NET_SessionId Value 50oqa3zadphzrfay4axaw2v1 Host 192.120.13.77 Path / Secure No Expires At End Of Session Then, when you click "I Agree" you get another: Name .ASPXAUTH Value 6175F7932EA5F6CBDDC37E1F4EA2AE26622D4269D947AE566546D8EAB9707B7209AFE786EAC66B179FCF3F57D32F85E3CB9D6C4218695B9E1D99A3E3B136A31781FAD44E34D6D5CFFB3979B804622EF1 Host 192.120.13.77 Path / Secure No Expires At End Of Session It's conceivable that given the first, you could generate the second. But that Value looks to be a hash of the first Value. They both change, everytime you delete and regen the cookies for that domain. I'm back to saying this is gonna take cooperation from the site owner. I like your ideas, but they'll be really fragile should the site design change much at all, won't they? I'm not sure that clicking the "I Agree" button uses a cookie. I clicked the
button but found no cookie that had been updated as a result. I'm no expert, but I figured it's using some session object instead. When I get to work tomorrow, I'll post the code I've used and where it's failing. I think it's using DOM, but the click method fails. More later... =?Utf-8?B?UGF1bCBNYXJ0aW4=?= <pmartin1960NOSPAM@hotmail.com> wrote:
>I'm not sure that clicking the "I Agree" button uses a cookie. I clicked the It is setting a cookie.>button but found no cookie that had been updated as a result. I'm no expert, >but I figured it's using some session object instead. Cookies are, realistically, the only session object available to HTTP. -- --------- Scott Seligman <scott at <firstname> and michelle dot net> --------- The number of votes I cast is simply a reflection of how firmly I believe in his policies. -- Blackadder in Black Adder III:"Dish and Dishonesty" It's all irrelevent really. The solution to this will not involve the use
of cookies. Show quoteHide quote "Scott Seligman" <selig***@example.com> wrote in message news:gqgk0v$fc4$1@panix3.panix.com... | =?Utf-8?B?UGF1bCBNYXJ0aW4=?= <pmartin1960NOSPAM@hotmail.com> wrote: | >I'm not sure that clicking the "I Agree" button uses a cookie. I clicked the | >button but found no cookie that had been updated as a result. I'm no expert, | >but I figured it's using some session object instead. | | It is setting a cookie. | | Cookies are, realistically, the only session object available to HTTP. | | -- | --------- Scott Seligman <scott at <firstname> and michelle dot net> --------- | The number of votes I cast is simply a reflection of how firmly I | believe in his policies. | -- Blackadder in Black Adder III:"Dish and Dishonesty" Paul Martin wrote:
> I'm not sure that clicking the "I Agree" button uses a cookie. I clicked the Yeah, you can track the session id in the cookie it writes when you first hit that > button but found no cookie that had been updated as a result. I'm no expert, > but I figured it's using some session object instead. page. The first request checks for a coookie that identifies that session, then checks for another with a hash of that session. Failing to find the second, it asks for agreement, then writes the second. This is pretty easy to follow if you have the Web Developer Add-In for FireFox installed. > When I get to work tomorrow, I'll post the code I've used and where it's I don't know much about that.> failing. I think it's using DOM, but the click method fails. More later... "Karl E. Peterson" <k***@mvps.org> wrote in message Think of the DOM as one big treeview with each HTML tag being a node. Each news:uZ%23k08krJHA.6020@TK2MSFTNGP02.phx.gbl... | | I don't know much about that. node has properties like any given VB object to which data can be extracted. Some nodes have methods as well to simulate various tasks, like submitting data. The easiest way it to load a web page in a web browser control and set the data into an IHTMLDocument class via the MSHTML.TLB. Alternatively, the same cane be done passing the hWnd of an InternetExplorer_Sever window too using some accessibility API Hi Paul,
As the data you are trying to access is a government initiative I would write to them and ask them to provide a web service interface; they may already have one. Show quoteHide quote "Paul Martin" <melbournef***@gmail.com> wrote in message news:4d0f96a8-9457-4d89-8a5e-6030f1464924@x1g2000prh.googlegroups.com... > [Cross-posted at microsoft.public.vb.winapi] > > I am using the URLDownloadToFile API to download files from > <http://192.120.13.77/Data/Public/E_AUST/Archive/> > > This fails as there is an interim disclaimer page that appears first, > and an "I Agree" button needs to be clicked. Consequently, the use of > the API only downloads the disclaimer page and not the CSV file/s I am > attempting to get. > > I have tried to circumvent this by programmatically (in IE) loading > this page: > <http://192.120.13.77/viewtable.aspx?region=E_AUST&report=int924> > Then I programmatically click the "I Agree" button, then capture the > names of all the files in the "Recent Report" dropdown list at the > bottom of the page. These filenames and the first URL above give me > the path. > > Unfortunately, clicking the "I Agree" button in IE still doesn't allow > me to get past the disclaimer page using the API. Any suggestions on > how to get past this 'roadblock' are much appreciated. > > Thanks in advance > > Paul Martin > Melbourne, Australia Bill, I have contacted them and I am investigating accessing the files
by FTP but our company's login has expired, so this is on hold at present. It may work down the track. This is the code I worked on previously but got stuck on: Private Const HTTP_PATH As String = "http://192.120.13.77/ viewtable.aspx?region=E_AUST&report=int924" Private Const BUTTON_AGREE As String = "ctl00_ContentPlaceHolder1_imgAgree" Private Const HTTPREQ_SUCCESS As Integer = 200 Sub GetHtmlPage() Dim HttpReq As New MSXML2.XMLHTTP Dim HtmlDoc As New MSHTML.HTMLDocument Dim HtmlOpt As MSHTML.HTMLOptionElement Dim HtmlBtnAgree ' As MSHTML.HTMLButtonElement Dim HtmlForm As MSHTML.HTMLFormElement Dim astrCsvFiles() As String HttpReq.Open "GET", HTTP_PATH, False HttpReq.send If HttpReq.Status = HTTPREQ_SUCCESS Then HtmlDoc.body.innerHTML = HttpReq.responseText Set HtmlForm = HtmlDoc.forms("aspnetForm") Debug.Print HtmlForm.Name Set HtmlBtnAgree = HtmlDoc.getElementsByName(BUTTON_AGREE) Debug.Print TypeName(HtmlBtnAgree) HtmlBtnAgree.Click ' <<< FAILS End If End Sub Note that Debug.Print TypeName(HtmlBtnAgree) returns DispHTMLElementCollection, but I can't work out what type that HtmlBtnAgree should be declared as (so I've left the type undeclared for now). Either way, the click method fails. Any ideas on this course of action? Or whether I should be asking a different newsgroup? Thanks in advance Paul Hi Paul,
The problem is you will need to allow a cookie to be set, so the entire transaction needs to be in scope of a browser context. You could probably do this yourself with API, but I would be more inclined to use a web browser control (either hidden or drawn off screen etc). But really, the FTP approach sounds a lot better ;) Show quoteHide quote "Paul Martin" <melbournef***@gmail.com> wrote in message news:b1ce22d1-59d6-4a1d-bde4-a3b6e749e372@p6g2000pre.googlegroups.com... > Bill, I have contacted them and I am investigating accessing the files > by FTP but our company's login has expired, so this is on hold at > present. It may work down the track. > > This is the code I worked on previously but got stuck on: > > > Private Const HTTP_PATH As String = "http://192.120.13.77/ > viewtable.aspx?region=E_AUST&report=int924" > Private Const BUTTON_AGREE As String = > "ctl00_ContentPlaceHolder1_imgAgree" > Private Const HTTPREQ_SUCCESS As Integer = 200 > > Sub GetHtmlPage() > Dim HttpReq As New MSXML2.XMLHTTP > Dim HtmlDoc As New MSHTML.HTMLDocument > Dim HtmlOpt As MSHTML.HTMLOptionElement > Dim HtmlBtnAgree ' As MSHTML.HTMLButtonElement > Dim HtmlForm As MSHTML.HTMLFormElement > Dim astrCsvFiles() As String > > HttpReq.Open "GET", HTTP_PATH, False > HttpReq.send > > If HttpReq.Status = HTTPREQ_SUCCESS Then > HtmlDoc.body.innerHTML = HttpReq.responseText > Set HtmlForm = HtmlDoc.forms("aspnetForm") > Debug.Print HtmlForm.Name > > Set HtmlBtnAgree = HtmlDoc.getElementsByName(BUTTON_AGREE) > Debug.Print TypeName(HtmlBtnAgree) > HtmlBtnAgree.Click ' <<< FAILS > End If > End Sub > > Note that Debug.Print TypeName(HtmlBtnAgree) returns > DispHTMLElementCollection, but I can't work out what type that > HtmlBtnAgree should be declared as (so I've left the type undeclared > for now). Either way, the click method fails. Any ideas on this > course of action? Or whether I should be asking a different > newsgroup? > > Thanks in advance > > Paul > |
|||||||||||||||||||||||